Scraping Booking.com Property Listings in Visual Basic in 2023

Oct 15, 2023 · 4 min read

In this article, we will learn how to scrape property listings from Booking.com using Visual Basic. We will use the HtmlAgilityPack and HttpClient libraries to fetch the HTML content and parse/extract details like property name, location, ratings etc.

Prerequisites

To follow along, you will need:

  • Visual Studio
  • Basic Visual Basic and HTML knowledge
  • Adding References

    We need to add references to the following libraries:

  • HtmlAgilityPack
  • System.Net.Http
  • In Visual Studio right click on References > Add Reference and select the above.

    Importing Namespaces

    Import the required namespaces:

    Imports HtmlAgilityPack
    Imports System.Net.Http
    

    Defining URL

    Define the target URL:

    Dim url As String = "<https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2>"
    

    Setting User Agent

    Set the user agent string:

    Dim userAgent As String = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
    

    Fetching HTML Page

    Use HttpClient to send request:

    Dim client As New HttpClient()
    client.DefaultRequestHeaders.TryAddWithoutValidation("User-Agent", userAgent)
    
    Dim response As HttpResponseMessage = client.GetAsync(url).Result
    Dim html As String = response.Content.ReadAsStringAsync().Result
    

    We add the user agent header and fetch the HTML.

    Parsing HTML

    Load HTML into HtmlDocument:

    Dim htmlDoc As New HtmlDocument()
    htmlDoc.LoadHtml(html)
    

    Extracting Cards

    Get elements with data-testid attribute:

    Dim cards = htmlDoc.DocumentNode.SelectNodes("//div[@data-testid='property-card']")
    

    This extracts the property cards using XPath.

    Processing Each Card

    Loop through the cards:

    For Each card As HtmlNode In cards
    
        ' Extract data from card
    
    Next
    

    Inside we extract details from each card node.

    Extracting Title

    Get h3 text:

    Dim title As String = card.SelectSingleNode("h3").InnerText
    

    Extracting Location

    Get address span text:

    Dim location As String = card.SelectSingleNode("span[@data-testid='address']").InnerText
    

    Extracting Rating

    Get aria-label attribute value:

    Dim rating As String = card.SelectSingleNode("div[@class='e4755bbd60']").GetAttributeValue("aria-label", "")
    

    Filter by class name.

    Extracting Review Count

    Get div text:

    Dim reviewCount As String = card.SelectSingleNode("div[@class='abf093bdfe']").InnerText
    

    Extracting Description

    Get description div text:

    Dim description As String = card.SelectSingleNode("div[@class='d7449d770c']").InnerText
    

    Printing Output

    Print the extracted data:

    Console.WriteLine("Title: " & title)
    Console.WriteLine("Location: " & location)
    Console.WriteLine("Rating: " & rating)
    Console.WriteLine("Review Count: " & reviewCount)
    Console.WriteLine("Description: " & description)
    

    Full Code

    Here is the complete VB.NET scraping code:

    Imports HtmlAgilityPack
    Imports System.Net.Http
    
    Module Module1
    
        Sub Main()
          Dim url As String = "<https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2>"
    
          Dim userAgent As String = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
    
          Dim client As New HttpClient()
          client.DefaultRequestHeaders.TryAddWithoutValidation("User-Agent", userAgent)
    
          Dim response As HttpResponseMessage = client.GetAsync(url).Result
          Dim html As String = response.Content.ReadAsStringAsync().Result
    
          Dim htmlDoc As New HtmlDocument()
          htmlDoc.LoadHtml(html)
    
          Dim cards = htmlDoc.DocumentNode.SelectNodes("//div[@data-testid='property-card']")
    
          For Each card As HtmlNode In cards
    
              Dim title As String = card.SelectSingleNode("h3").InnerText
              Dim location As String = card.SelectSingleNode("span[@data-testid='address']").InnerText
              Dim rating As String = card.SelectSingleNode("div[@class='e4755bbd60']").GetAttributeValue("aria-label", "")
              Dim reviewCount As String = card.SelectSingleNode("div[@class='abf093bdfe']").InnerText
              Dim description As String = card.SelectSingleNode("div[@class='d7449d770c']").InnerText
    
              Console.WriteLine("Title: " & title)
              Console.WriteLine("Location: " & location)
              Console.WriteLine("Rating: " & rating)
              Console.WriteLine("Review Count: " & reviewCount)
              Console.WriteLine("Description: " & description)
    
          Next
    
        End Sub
    
    End Module
    

    This extracts key data from Booking.com listings using VB.NET. The same approach can be used to scrape any site.

    While these examples are great for learning, scraping production-level sites can pose challenges like CAPTCHAs, IP blocks, and bot detection. Rotating proxies and automated CAPTCHA solving can help.

    Proxies API offers a simple API for rendering pages with built-in proxy rotation, CAPTCHA solving, and evasion of IP blocks. You can fetch rendered pages in any language without configuring browsers or proxies yourself.

    This allows scraping at scale without headaches of IP blocks. Proxies API has a free tier to get started. Check out the API and sign up for an API key to supercharge your web scraping.

    With the power of Proxies API combined with Python libraries like Beautiful Soup, you can scrape data at scale without getting blocked.

    Browse by tags:

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!