Scraping eBay Listings with Scala and HTTP4S in 2023

Oct 5, 2023 ยท 3 min read

eBay is one of the largest online marketplaces with millions of active listings at any given time. In this tutorial, we'll walk through how to scrape and extract key data from eBay listings using Scala and the HTTP4S library.

Setup

We'll need the following dependencies added to our build.sbt:

libraryDependencies ++= Seq(
  "org.http4s" %% "http4s-blaze-client" % http4sVersion,
  "org.jsoup" % "jsoup" % "1.14.3"
)

This will pull in HTTP4S for making requests, and Jsoup for parsing HTML.

We'll also define the starting eBay URL and a header for the user agent:

import org.http4s._

val url = uri"<https://www.ebay.com/sch/i.html?_nkw=baseball>"

val userAgent = headers.User-Agent(headerValue = "Mozilla/5.0...")

Replace the user agent string with your own browser's user agent.

Fetch the Listings Page

We'll use the HTTP4S client to fetch the HTML content:

val client = Client.fromHttpApp()

val req = Request[IO](Method.GET, url).putHeaders(userAgent)

val html = client.expect[String](req).unsafeRunSync()

The user agent header is added to the request. The response body is parsed as a String.

Extract Listing Data

Now we can use Jsoup to parse the HTML and extract the data:

import org.jsoup.Jsoup
import org.jsoup.nodes.Document

val doc: Document = Jsoup.parse(html)

val listings = doc.select("div.s-item__info")

for (listing <- listings) {

  val title = listing.select("div.s-item__title").text()

  val url = listing.select("a.s-item__link").attr("href")

  val price = listing.select("span.s-item__price").text()

  // Extract other fields like seller, shipping, etc

}

We select elements by CSS class and extract the text or attributes.

Print Results

We can print the extracted info:

print(s"Title: $title")
print(s"URL: $url")
print(s"Price: $price")

print("="*50) // Separator between listings

This will output each listing's data.

Full Code

Here is the full code to scrape eBay listings:

import org.http4s._
import org.jsoup.Jsoup
import org.jsoup.nodes.Document

val url = uri"<https://www.ebay.com/sch/i.html?_nkw=baseball>"

val userAgent = headers.User-Agent(headerValue = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36")

val client = Client.fromHttpApp()

val req = Request[IO](Method.GET, url).putHeaders(userAgent)

val html = client.expect[String](req).unsafeRunSync()

val doc: Document = Jsoup.parse(html)

val listings = doc.select("div.s-item__info")

for (listing <- listings) {

  val title = listing.select("div.s-item__title").text()

  val url = listing.select("a.s-item__link").attr("href")

  val price = listing.select("span.s-item__price").text()

  val details = listing.select("div.s-item__subtitle").text()

  val sellerInfo = listing.select("span.s-item__seller-info-text").text()

  val shippingCost = listing.select("span.s-item__shipping").text()

  val location = listing.select("span.s-item__location").text()

  val sold = listing.select("span.s-item__quantity-sold").text()

  println(s"Title: $title")
  println(s"URL: $url")
  println(s"Price: $price")
  println(s"Details: $details")
  println(s"Seller: $sellerInfo")
  println(s"Shipping: $shippingCost")
  println(s"Location: $location")
  println(s"Sold: $sold")

  println("="*50)

}

Browse by tags:

Browse by language:

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!