Scraping Booking.com Property Listings in Objective-C in 2023

Oct 15, 2023 · 4 min read

In this article, we will learn how to scrape property listings from Booking.com using Objective-C. We will use the NSURLSession and HTML Parser libraries to fetch the HTML content and then extract key information like property name, location, ratings, etc.

Prerequisites

To follow along, you will need:

  • Xcode installed
  • Basic knowledge of Objective-C programming
  • HTML Parser framework added to your project
  • Import Frameworks

    At the top of your .m file, import the required frameworks:

    #import <Foundation/Foundation.h>
    #import <HTMLParser/HTMLParser.h>
    

    Define URL

    Let's define the URL we want to scrape:

    NSURL* url = [NSURL URLWithString:@"<https://www.booking.com/searchresults.html?ss=New+York&>..."];
    

    We won't paste the full URL here.

    Set User Agent

    We need to set a valid user agent header:

    NSMutableURLRequest* request = [NSMutableURLRequest requestWithURL:url];
    [request setValue:@"Mozilla/5.0 (Windows NT 10.0; Win64; x64) ..." forHTTPHeaderField:@"User-Agent"];
    

    This will make the request appear to come from a real browser.

    Fetch HTML Page

    We can use NSURLSession to get the page HTML:

    NSURLSessionDataTask* task = [[NSURLSession sharedSession] dataTaskWithRequest:request completionHandler:^(NSData * _Nullable data, NSURLResponse * _Nullable response, NSError * _Nullable error) {
    
      if(data) {
    
        // Parse HTML
    
      }
    
    }];
    
    [task resume];
    

    We make sure data was returned before parsing.

    Parse HTML

    To parse the HTML, we use HTML Parser's parseHTML:error: method:

    NSError* error;
    HTMLDocument* doc = [HTMLDocument parseHTML:data error:&error];
    

    This loads the HTML into an HTMLDocument object.

    Extract Property Cards

    The property cards have a data-testid attribute we can search for:

    NSArray* cards = [doc findElementsWithAttribute:@"data-testid" match:@"property-card"];
    

    This finds all matching elements.

    Loop Through Cards

    We can iterate through the cards:

    for(HTMLElement* card in cards) {
    
      // Extract data from card
    
    }
    

    Inside this loop we will extract information from each card element.

    Extract Title

    To get the title, we search for the data-testid="title" element:

    NSString* title = [[card findElementWithAttribute:@"data-testid" match:@"title"] text];
    

    We grab the text content.

    Extract Location

    Similarly, the address is under a data-testid="address" element:

    NSString* location = [[card findElementWithAttribute:@"data-testid" match:@"address"] text];
    

    The pattern is the same for other fields.

    Extract Rating

    The star rating aria-label contains the score:

    NSString* rating = [[card findElementWithClass:@"e4755bbd60"] attribute:@"aria-label"];
    

    Here we get the aria-label attribute from the div.

    Extract Review Count

    The review count text is inside a class="abf093bdfe" element:

    NSString* reviewCount = [[card findElementWithClass:@"abf093bdfe"] text];
    

    Extract Description

    The description is in a class="d7449d770c" element:

    NSString* description = [[card findElementWithClass:@"d7449d770c"] text];
    

    Print Extracted Data

    We can print out the extracted data:

    NSLog(@"Name: %@", title);
    NSLog(@"Location: %@", location);
    NSLog(@"Rating: %@", rating);
    // etc...
    

    And that covers scraping Booking.com property listings in Objective-C! Let me know if you have any other questions.

    Full Code

    Here is the complete Objective-C code:

    #import <Foundation/Foundation.h>
    #import <HTMLParser/HTMLParser.h>
    
    int main() {
    
      NSURL* url = [NSURL URLWithString:@"https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2"];
    
      NSMutableURLRequest* request = [NSMutableURLRequest requestWithURL:url];
      [request setValue:@"Mozilla/5.0 (Windows NT 10.0; Win64; x64) ..." forHTTPHeaderField:@"User-Agent"];
    
      NSURLSessionDataTask* task = [[NSURLSession sharedSession] dataTaskWithRequest:request completionHandler:^(NSData * _Nullable data, NSURLResponse * _Nullable response, NSError * _Nullable error) {
    
        if(data) {
    
          NSError* error;
          HTMLDocument* doc = [HTMLDocument parseHTML:data error:&error];
    
          NSArray* cards = [doc findElementsWithAttribute:@"data-testid" match:@"property-card"];
    
          for(HTMLElement* card in cards) {
    
            NSString* title = [[card findElementWithAttribute:@"data-testid" match:@"title"] text];
            NSString* location = [[card findElementWithAttribute:@"data-testid" match:@"address"] text];
            NSString* rating = [[card findElementWithClass:@"e4755bbd60"] attribute:@"aria-label"];
            NSString* reviewCount = [[card findElementWithClass:@"abf093bdfe"] text];
            NSString* description = [[card findElementWithClass:@"d7449d770c"] text];
    
            NSLog(@"Name: %@", title);
            NSLog(@"Location: %@", location);
            NSLog(@"Rating: %@", rating);
            NSLog(@"Review Count: %@", reviewCount);
            NSLog(@"Description: %@", description);
    
          }
    
        }
    
      }];
    
      [task resume];
    
    }
    

    While these examples are great for learning, scraping production-level sites can pose challenges like CAPTCHAs, IP blocks, and bot detection. Rotating proxies and automated CAPTCHA solving can help.

    Proxies API offers a simple API for rendering pages with built-in proxy rotation, CAPTCHA solving, and evasion of IP blocks. You can fetch rendered pages in any language without configuring browsers or proxies yourself.

    This allows scraping at scale without headaches of IP blocks. Proxies API has a free tier to get started. Check out the API and sign up for an API key to supercharge your web scraping.

    With the power of Proxies API combined with Python libraries like Beautiful Soup, you can scrape data at scale without getting blocked.

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!