info@worthwebscraping.com

Scraping News Websites like CNN & NBC using Python

Scraping News Websites like CNN & NBC using Python

Download Python Script

Send download link to:

News websites contains a lot of data. Every day more data is posted on these websites on most hot topics around the world. They are a great source not only for news but also for other things like health, fashion, finance, Tech, Gadgets etc. One can find new articles on almost any topics by Scraping News Websites

The main advantage of scraping news websites and overall data is that you can do it with virtually any web site — as long as the content is online, it is possible for you to scrape it, starting from weather forecasts to government spending, even if the particular site does not have an API for raw data access. You want only news articles about “health”? No problem at all! You want blog posts in a certain language? From a specific country? You got it! It is a simple and cost effective solution for obtaining data from the web that will save you a lot of time and money if done “sustainably”, so you could focus on what to do with the obtained data.

In this tutorial we will scrape two new websites CNN https://edition.cnn.com/ and NBC News https://www.nbcnews.com/ . We will go to these two websites and scrape all the news articles related to COVID-19.

See the complete code below:

from bs4 import BeautifulSoup as soup

import requests

CNN:

from datetime import date
today = date.today()
d = today.strftime("%m-%d-%y")
cnn_url="https://edition.cnn.com/world/live-news/coronavirus-pandemic-{}-intl/index.html".format(d)
html = requests.get(cnn_url)
bsobj = soup(html.content,'lxml')
for link in bsobj.findAll("h2"):
    print("Headline : {}".format(link.text))

Output:

for news in bsobj.findAll('article',{'class':'sc-jqCOkK sc-kfGgVZ hQCVkd'}):
    print(news.text.strip())

NBC News:

nbc_url='https://www.nbcnews.com/health/coronavirus'
r = requests.get('https://www.nbcnews.com/health/coronavirus')
b = soup(r.content,'lxml')
for news in b.findAll('h2'):
    print(news.text)

Output:

links = []
for news in b.findAll('h2',{'class':'teaseCard__headline'}):
    links.append(news.a['href'])
#for link in links:
    page = requests.get(link)
    bsobj = soup(page.content)
    for news in bsobj.findAll('div',{'class':'article-body__section article-body__last-section'}):
        print(news.text.strip())

Output:

Not only from CNN & NBC, we can scrape news data from other websites also. If you are in need of news content aggregation then our scraping services serves best your requirement.