Scraping eBay Listings with Ruby and Nokogiri in 2023

Oct 5, 2023 ยท 4 min read

eBay is one of the largest online marketplaces with millions of active listings at any given time. In this tutorial, we'll walk through how to scrape and extract key data from eBay listings using Ruby and the Nokogiri library.

Setup

We'll need the following gems installed:

  • httparty - for sending HTTP requests to the eBay website
  • nokogiri - for parsing and extracting data from the HTML of the listings
  • Install them by adding this to your Gemfile:

    gem 'httparty'
    gem 'nokogiri'
    

    And running bundle install.

    We'll also define the starting eBay URL to scrape and a hash of headers to spoof a browser visit:

    require 'httparty'
    require 'nokogiri'
    
    url = "<https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2334524.m570.l1313&_nkw=baseball&_sacat=64482&LH_TitleDesc=0>"
    
    headers = {
      "User-Agent" => "Mozilla/5.0..."
    }
    

    Replace the User-Agent with your own browser's user agent string.

    Fetch the Listings Page

    We'll use the HTTParty gem to fetch the HTML content from the eBay URL:

    response = HTTParty.get(url, headers: headers)
    html = response.body
    doc = Nokogiri::HTML(html)
    

    The headers hash is passed to spoof a browser visit. We use the Nokogiri parser to parse the HTML.

    Extract Listing Data

    Now we can extract the key data points from each listing. eBay encloses each listing in a

    with class "s-item__info">. We'll find all of these:

    listing_items = doc.css("div.s-item__info")
    

    Then we can loop through each listing div and extract the relevant info using the class names:

    listing_items.each do |item|
    
      title = item.at_css("div.s-item__title").text.strip
    
      url = item.at_css("a.s-item__link")["href"]
    
      price = item.at_css("span.s-item__price").text.strip
    
      # And so on for other fields like seller, shipping, location etc
    
    end
    

    We use .text.strip to extract the text within tags like

    and . For the item URL, we extract the href attribute from the tag.

    Print Results

    Finally, we can print out all the extracted info for each listing:

    puts "Title: #{title}"
    puts "URL: #{url}"
    puts "Price: #{price}"
    
    puts "="*50 # Separator between listings
    

    This will output each listing's title, url, price and other data in a readable format.

    Full Code

    Here is the full code to scrape and extract eBay listing data:

    require 'httparty'
    require 'nokogiri'
    
    url = "<https://www.ebay.com/sch/i.html?_nkw=baseball>"
    
    headers = {
      "User-Agent" => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
    }
    
    response = HTTParty.get(url, headers: headers)
    html = response.body
    doc = Nokogiri::HTML(html)
    
    listing_items = doc.css("div.s-item__info")
    
    listing_items.each do |item|
    
      title = item.at_css("div.s-item__title").text.strip
    
      url = item.at_css("a.s-item__link")["href"]
    
      price = item.at_css("span.s-item__price").text.strip
    
      details = item.at_css("div.s-item__subtitle").text.strip
    
      seller_info = item.at_css("span.s-item__seller-info-text").text.strip
    
      shipping_cost = item.at_css("span.s-item__shipping").text.strip
    
      location = item.at_css("span.s-item__location").text.strip
    
      sold = item.at_css("span.s-item__quantity-sold").text.strip
    
      puts "Title: #{title}"
      puts "URL: #{url}"
      puts "Price: #{price}"
      puts "Details: #{details}"
      puts "Seller: #{seller_info}"
      puts "Shipping: #{shipping_cost}"
      puts "Location: #{location}"
      puts "Sold: #{sold}"
    
      puts "="*50
    
    end
    

    This gives an overview of how to use Ruby and Nokogiri to extract data from eBay listings. The key steps are sending a request, parsing the HTML, finding the listing divs, and extracting the text from tags based on their class names.

    Browse by tags:

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!