Web Scraping Property Listings from Booking.com with Python in 2023

In this article, we will learn how to scrape property listings from Booking.com using Python. We will use the requests and Beautiful Soup libraries to fetch the HTML content and then extract key information like property name, location, ratings, etc.

Prerequisites

To follow along, you will need:

Python 3 installed on your system

Basic knowledge of Python programming

pip installed to install Python packages

Installing Dependencies

We will use two Python packages - requests and BeautifulSoup for scraping.

To install them:

pip install requests beautifulsoup4

This will download and install the latest versions of these libraries.

Importing Modules

At the top of your script, import the necessary modules:

import requests
from bs4 import BeautifulSoup

requests will be used to send HTTP requests to fetch webpages.

BeautifulSoup will help parse and extract information from the HTML content.

Defining the Target URL

We will scrape property listings from this URL on Booking.com:

url = "<https://www.booking.com/searchresults.en-gb.html?ss=Manhattan%2C+New+York%2C+New+York+State%2C+United+States&map=1&efdco=1&label=gen173nr-1BCAEoggI46AdIM1gEaGyIAQGYAQm4AQfIAQzYAQHoAQGIAgGoAgO4AoH47qgGwAIB0gIkNTU1NTYzMjUtMzFhMS00NzZhLTg5YTQtZGFhOTNiZDRlMDI32AIF4AIB&sid=10b122fa3c8cf51d65c8a9e1ceb2fbcf&aid=304142&lang=en-gb&sb=1&src_elem=sb&src=index&dest_id=929&dest_type=district&ac_position=1&ac_click_type=b&ac_langcode=en&ac_suggestion_list_length=5&search_selected=true&search_pageview_id=7ae531418d840184&ac_meta=GhA3YWU1MzE0MThkODQwMTg0IAEoATICZW46Bm5ldyB5b0AASgBQAA%3D%3D&checkin=2023-11-08&checkout=2023-11-11&group_adults=2&no_rooms=1&group_children=0&sb_travel_purpose=leisure#map_closed>"

This URL fetches property listings for Manhattan, New York for given check-in and check-out dates.

You can modify the parameters like location, dates, number of people etc to fetch listings as per your need.

Setting User-Agent Header

Many websites block requests without a valid User-Agent string. So we need to set it to mimic a real browser:

headers = {
  "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
}

Replace this with any modern browser's user agent string.

Fetching the HTML Page

With the URL and headers defined, we can use the requests.get() method to fetch the HTML content:

response = requests.get(url, headers=headers)

This sends a GET request to the URL and stores the response in the response variable.

We should check that the request was successful before proceeding:

if response.status_code == 200:
  # Success!
  # Parse HTML here
else:
  print("Failed to fetch webpage")

A status code of 200 means the request was successful. Any other code means an error occurred.

Parsing the HTML

With the HTML content fetched, we can use Beautiful Soup to parse and extract information from it:

soup = BeautifulSoup(response.text, "html.parser")

This parses the response.text using the HTML parser.

We now have a BeautifulSoup object that represents the document structure.

Extracting Property Cards

The property listings on Booking.com are contained in div elements with a data-testid attribute value of property-card.

We can use Beautiful Soup's find_all() method to find all such elements:

property_cards = soup.find_all("div", {"data-testid": "property-card"})

This gives us a list of all property cards on the page.

Looping Through Property Cards

With the list of cards, we can loop through each one to extract information:

for card in property_cards:

  # Extract information from card here

Inside this loop, we will locate specific elements and extract their text.

Saving the Raw HTML

It's useful to save the raw HTML of each card before we start extracting text. This serves as a backup if we ever need to re-extract information.

filename = f"property_card.html"
card_html = str(card)

with open(filename, "w", encoding="utf-8") as file:
  file.write(card_html)

This converts the BeautifulSoup card object to a string and writes it to a file.

The filename is dynamically generated for each card.

Extracting Property Name

To extract the property name, we find the div with data-testid set to title:

title_element = card.find("div", {"data-testid": "title"})

if title_element is not None:
  property_name = title_element.text.strip()
else:
  property_name = "Title not found"

We first try to find the element. If found, we take its .text property and call .strip() to remove extra whitespace.

If the title element doesn't exist, we set a default value.

Extracting Location

The location string is stored in a span with data-testid set to address:

property_location = card.find("span", {"data-testid": "address"}).text.strip()

Similar to before, we find the element and extract the text inside it.

Extracting Star Rating

The star rating element has a class name we can search for:

star_rating_div = card.find("div", {"class": "e4755bbd60"})

if star_rating_div:
  star_rating = star_rating_div["aria-label"]
else:
  star_rating = "Star rating not found"

If the div is found, we extract the aria-label attribute which contains the rating value.

Extracting Review Count

The review count is stored in a div with class abf093bdfe:

review_count_div = card.find("div", {"class": "abf093bdfe"})

if review_count_div:
  review_count = review_count_div.text.strip()
else:
  review_count = "Review count not available"

Again, we try to extract the text if the element is found, else set a default value.

Extracting Description

The property description is contained in a div with class d7449d770c:

description = card.find("div", {"class": "d7449d770c"}).text.strip()

The rest of the logic is similar to what we've seen before.

Printing the Extracted Data

Finally, we can print the extracted information from each card:

print("Name:", property_name)
print("Location:", property_location)
print("Rating:", star_rating)
print("Review Count:", review_count)
print("Description:", description)
print("-" * 40) # separator

This will nicely print out the key details for each property listing.

You can also store this data in a dictionary or other data structure instead of printing.

Full Code

For your reference, here is the full web scraping script:

import requests
from bs4 import BeautifulSoup

url = "<https://www.booking.com/searchresults.en-gb.html?ss=Manhattan%2C+New+York%2C+New+York+State%2C+United+States&map=1&efdco=1&label=gen173nr-1BCAEoggI46AdIM1gEaGyIAQGYAQm4AQfIAQzYAQHoAQGIAgGoAgO4AoH47qgGwAIB0gIkNTU1NTYzMjUtMzFhMS00NzZhLTg5YTQtZGFhOTNiZDRlMDI32AIF4AIB&sid=10b122fa3c8cf51d65c8a9e1ceb2fbcf&aid=304142&lang=en-gb&sb=1&src_elem=sb&src=index&dest_id=929&dest_type=district&ac_position=1&ac_click_type=b&ac_langcode=en&ac_suggestion_list_length=5&search_selected=true&search_pageview_id=7ae531418d840184&ac_meta=GhA3YWU1MzE0MThkODQwMTg0IAEoATICZW46Bm5ldyB5b0AASgBQAA%3D%3D&checkin=2023-11-08&checkout=2023-11-11&group_adults=2&no_rooms=1&group_children=0&sb_travel_purpose=leisure#map_closed>"

headers = {
  "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
}

response = requests.get(url, headers=headers)

if response.status_code == 200:

  soup = BeautifulSoup(response.text, "html.parser")

  property_cards = soup.find_all("div", {"data-testid": "property-card"})

  for card in property_cards:

    filename = f"property_card.html"
    card_html = str(card)

    with open(filename, "w", encoding="utf-8") as file:
      file.write(card_html)

    title_element = card.find("div", {"data-testid": "title"})
    if title_element is not None:
      property_name = title_element.text.strip()
    else:
      property_name = "Title not found"

    property_location = card.find("span", {"data-testid": "address"}).text.strip()

    star_rating_div = card.find("div", {"class": "e4755bbd60"})
    if star_rating_div:
      star_rating = star_rating_div["aria-label"]
    else:
      star_rating = "Star rating not found"

    review_count_div = card.find("div", {"class": "abf093bdfe"})
    if review_count_div:
      review_count = review_count_div.text.strip()
    else:
      review_count = "Review count not available"

    description = card.find("div", {"class": "d7449d770c"}).text.strip()

    print("Name:", property_name)
    print("Location:", property_location)
    print("Rating:", star_rating)
    print("Review Count:", review_count)
    print("Description:", description)
    print("-" * 40)

else:
  print("Failed to fetch webpage")

And that's it! We've written a complete web scraper to extract hotel listings from Booking.com using Python. The same technique can be applied to scrape any site.

Hope this gives you a solid understanding of how to use requests and BeautifulSoup for web scraping.

While these examples are great for learning, scraping production-level sites can pose challenges like CAPTCHAs, IP blocks, and bot detection. Rotating proxies and automated CAPTCHA solving can help.

Proxies API offers a simple API for rendering pages with built-in proxy rotation, CAPTCHA solving, and evasion of IP blocks. You can fetch rendered pages in any language without configuring browsers or proxies yourself.

This allows scraping at scale without headaches of IP blocks. Proxies API has a free tier to get started. Check out the API and sign up for an API key to supercharge your web scraping.

With the power of Proxies API combined with Python libraries like Beautiful Soup, you can scrape data at scale without getting blocked.

Web Scraping Property Listings from Booking.com with Python in 2023

Prerequisites

Installing Dependencies

Importing Modules

Defining the Target URL

Setting User-Agent Header

Fetching the HTML Page

Parsing the HTML

Extracting Property Cards

Looping Through Property Cards

Saving the Raw HTML

Extracting Property Name

Extracting Location

Extracting Star Rating

Extracting Review Count

Extracting Description

Printing the Extracted Data

Full Code

Browse by language:

The easiest way to do Web Scraping

Web Scraping Property Listings from Booking.com with Python in 2023

Prerequisites

Installing Dependencies

Importing Modules

Defining the Target URL

Setting User-Agent Header

Fetching the HTML Page

Parsing the HTML

Extracting Property Cards

Looping Through Property Cards

Saving the Raw HTML

Extracting Property Name

Extracting Location

Extracting Star Rating

Extracting Review Count

Extracting Description

Printing the Extracted Data

Full Code

The easiest way to do Web Scraping

Don't leave just yet!