Scraping Booking.com Property Listings with CSharp in 2023

Oct 15, 2023 · 4 min read

In this article, we will see how to use C# and HtmlAgilityPack to scrape and extract data from Booking.com property listings.

Prerequisites

You will need:

  • Visual Studio and .NET 6 or later
  • HtmlAgilityPack NuGet package
  • Installing HtmlAgilityPack

    Install the HtmlAgilityPack NuGet package in your project.

    Adding Namespaces

    Add these namespaces:

    using HtmlAgilityPack;
    using System.Net;
    

    Defining the URL

    Define the target URL:

    string url = "https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2";
    

    Downloading the Page HTML

    Use WebClient to download the page HTML:

    WebClient client = new WebClient();
    string html = client.DownloadString(url);
    

    Loading the HTML

    Load the HTML into an HtmlDocument:

    HtmlDocument document = new HtmlDocument();
    document.LoadHtml(html);
    

    Selecting Property Cards

    Use XPath to select the property cards:

    var cards = document.DocumentNode.SelectNodes("//div[@data-testid='property-card']");
    

    Looping Through Cards

    Loop through the cards:

    foreach (var card in cards)
    {
      // Extract data from card
    }
    

    Extracting Title

    Get title element and its inner text:

    var titleElement = card.SelectSingleNode(".//div[@data-testid='title']");
    string title = titleElement.InnerText;
    

    Extracting Location

    Get location span and text:

    var locationElement = card.SelectSingleNode(".//span[@data-testid='address']");
    string location = locationElement.InnerText;
    

    Extracting Rating

    Get rating div's aria-label attribute value:

    var ratingElement = card.SelectSingleNode(".//div[contains(@class, 'e4755bbd60')]");
    string rating = ratingElement.GetAttributeValue("aria-label", "");
    

    Extracting Review Count

    Get review count div text:

    var reviewCountElement = card.SelectSingleNode(".//div[contains(@class, 'abf093bdfe')]");
    string reviewCount = reviewCountElement.InnerText;
    

    Extracting Description

    Get description div text:

    var descriptionElement = card.SelectSingleNode(".//div[contains(@class, 'd7449d770c')]");
    string description = descriptionElement.InnerText;
    

    Printing the Data

    Print out the extracted information:

    Console.WriteLine("Title: " + title);
    Console.WriteLine("Location: " + location);
    Console.WriteLine("Rating: " + rating);
    Console.WriteLine("Review Count: " + reviewCount);
    Console.WriteLine("Description: " + description);
    

    And that's how you can scrape data from Booking.com listings using C# and HtmlAgilityPack!

    The same approach can be used to scrape any site.

    Full code

    using HtmlAgilityPack;
    using System.Net;
    
    class Program
    {
      static void Main(string[] args)
      {
        string url = "https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2";
    
        WebClient client = new WebClient();
        string html = client.DownloadString(url);
    
        HtmlDocument document = new HtmlDocument();
        document.LoadHtml(html);
    
        var cards = document.DocumentNode.SelectNodes("//div[@data-testid='property-card']");
    
        foreach (var card in cards)
        {
          var titleElement = card.SelectSingleNode(".//div[@data-testid='title']");
          string title = titleElement.InnerText;
    
          var locationElement = card.SelectSingleNode(".//span[@data-testid='address']");
          string location = locationElement.InnerText;
    
          var ratingElement = card.SelectSingleNode(".//div[contains(@class, 'e4755bbd60')]");
          string rating = ratingElement.GetAttributeValue("aria-label", "");
    
          var reviewCountElement = card.SelectSingleNode(".//div[contains(@class, 'abf093bdfe')]");
          string reviewCount = reviewCountElement.InnerText;
    
          var descriptionElement = card.SelectSingleNode(".//div[contains(@class, 'd7449d770c')]");
          string description = descriptionElement.InnerText;
    
          Console.WriteLine("Title: " + title);
          Console.WriteLine("Location: " + location);
          Console.WriteLine("Rating: " + rating);
          Console.WriteLine("Review Count: " + reviewCount);
          Console.WriteLine("Description: " + description);
        }
      }
    }

    While these examples are great for learning, scraping production-level sites can pose challenges like CAPTCHAs, IP blocks, and bot detection. Rotating proxies and automated CAPTCHA solving can help.

    Proxies API offers a simple API for rendering pages with built-in proxy rotation, CAPTCHA solving, and evasion of IP blocks. You can fetch rendered pages in any language without configuring browsers or proxies yourself.

    This allows scraping at scale without headaches of IP blocks. Proxies API has a free tier to get started. Check out the API and sign up for an API key to supercharge your web scraping.

    With the power of Proxies API combined with Python libraries like Beautiful Soup, you can scrape data at scale without getting blocked.

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!