How to Scrape Real Estate Data USA from Kugli.com using Python
How to Scrape Real Estate Data USA from Kugli.com using Python
Send download link to:
Kugli.com is a popular free classified ad posting website operating in multiple countries like USA, Spain, Italy, Greece etc. Anyone can post an ad for free on this website to reach out to potential customer/buyers. You can find ads related to Cars, Real Estate, Pets, Jobs, Sell and buy etc. Mostly local businesses and individuals post their ads here to buy or sell anything. So it is a great place to look for things in you locality like to rent a house or buy one, look for a used car or a mechanic, look for pets to buy or adopt, find local jobs. Specially, people scrape Real Estate data USA from kugli.
This is a relatively easy to scrape website. All the data on the website is in tabular form so we will have to navigate accordingly.
In this tutorial we will go to Kugli and search for real estate in USA.
https://www.kugli.com/Real_Estate/country/United_States-US/
We will scrape details like price, date posted, description and location. Then we ill scrape details from all the pages by changing URL dynamically. Watch video for detailed description.
See complete code below:
Import Libraries:
import requests
from bs4 import BeautifulSoup as soup
Send Get request:
page = requests.get('https://www.kugli.com/Real_Estate/country/United_States-US/')
bsobj = soup(page.content,'lxml')
Scrape all details from table:
for row in bsobj.findAll('tbody'):
cols = row.findChildren(recursive = False)
cols = [element.text.strip().split('\n') for element in cols]
cols
Output:
Creating Pandas data frame:
import pandas as pd
df = pd.DataFrame(data = cols[1:],columns = ['Price','Date','D','Description','Location']).dropna()
df.drop(labels=['D'],axis=1)
Output:
Get links of all the pages:
links = []
for a in range(0,100,10):
link = 'https://www.kugli.com/Real_Estate/country/United_States-US/from/'+str(a)
links.append(link)
links
Output:
Scraping data from multiple pages:
col = []
for link in links:
pages = requests.get(link)
bs = soup(pages.content,'lxml')
for row in bs.findAll('tbody'):
cols = row.findChildren(recursive = False)
for element in cols:
a = element.text.strip().split('\n')
col.append(a)
Create pandas data frame:
import pandas as pd
df = pd.DataFrame(data = col[1:],columns = ['Price','Date','D','Description','Location']).dropna()
df.drop(labels=['D'],axis=1)
Output:
Use this code and scrape Real Estate data USA and other business data. Also you can use our service of scraping Real Estate Websites for more USA property listings.