Scraping eBay Listings with Elixir and HTTPoison in 2023

Oct 5, 2023 ยท 3 min read

eBay is one of the largest online marketplaces with millions of active listings at any given time. In this tutorial, we'll walk through how to scrape and extract key data from eBay listings using Elixir and the HTTPoison library.

Setup

We'll need to add HTTPoison to our mix.exs dependencies:

def deps do
  [
    {:httpoison, "~> 1.8"}
  ]
end

And import it in our code:

import HTTPoison

We'll also define the eBay URL and a header for the user agent:

url = "<https://www.ebay.com/sch/i.html?_nkw=baseball>"

user_agent = {"User-Agent", "Mozilla/5.0..."}

Replace the user agent string with your browser's user agent.

Fetch the Listings Page

We can use HTTPoison to make the GET request:

resp = HTTPoison.get!(url, [], hackney: [headers: user_agent])

html = resp.body

The user agent header is passed in the options.

Extract Listing Data

To parse the HTML, we can use Floki:

{:ok, document} = Floki.parse_document(html)

listing_nodes = Floki.find(document, "div.s-item__info")

for node <- listing_nodes do
  title = Floki.find(node, "div.s-item__title") |> Floki.text()
  url = Floki.find(node, "a.s-item__link") |> Floki.attribute("href")
  price = Floki.find(node, "span.s-item__price") |> Floki.text()

  # Get other fields like seller, shipping, etc
end

We find the listing divs and extract the text/attributes from tags.

Print Results

We can print the extracted values:

IO.puts("Title: #{title}")
IO.puts("URL: #{url}")
IO.puts("Price: #{price}")

IO.puts(String.duplicate("=", 50)) # Separator

This outputs each listing's data.

Full Code

Here is the full code to scrape eBay listings:

import HTTPoison

url = "<https://www.ebay.com/sch/i.html?_nkw=baseball>"

user_agent = {"User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"}

resp = HTTPoison.get!(url, [], hackney: [headers: user_agent])

html = resp.body

{:ok, document} = Floki.parse_document(html)

listing_nodes = Floki.find(document, "div.s-item__info")

for node <- listing_nodes do
  title = Floki.find(node, "div.s-item__title") |> Floki.text()
  url = Floki.find(node, "a.s-item__link") |> Floki.attribute("href")
  price = Floki.find(node, "span.s-item__price") |> Floki.text()

  details = Floki.find(node, "div.s-item__subtitle") |> Floki.text()
  seller_info = Floki.find(node, "span.s-item__seller-info-text") |> Floki.text()
  shipping_cost = Floki.find(node, "span.s-item__shipping") |> Floki.text()
  location = Floki.find(node, "span.s-item__location") |> Floki.text()
  sold = Floki.find(node, "span.s-item__quantity-sold") |> Floki.text()

  IO.puts("Title: #{title}")
  IO.puts("URL: #{url}")
  IO.puts("Price: #{price}")
  IO.puts("Details: #{details}")
  IO.puts("Seller: #{seller_info}")
  IO.puts("Shipping: #{shipping_cost}")
  IO.puts("Location: #{location}")
  IO.puts("Sold: #{sold}")

  IO.puts(String.duplicate("=", 50))
end

Browse by language:

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!