Stories from the Web Crawling trenches in Scala

Web Scraping with Scala & ChatGPT

Author: Mohan Ganesan

Date: Sep 25, 2023

Scala is a great language for web scraping with ChatGPT. Use Scalaj and Jsoup libraries for HTTP requests and HTML parsing. ChatGPT can provide explanations and generate code snippets for scraping tasks.

How to Build a Super Simple HTTP Proxy in Scala in Just 20 Lines of Code

Author: Mohan Ganesan

Date: Oct 1, 2023

Scala makes it easy to build networked applications with concise syntax and strong libraries. Here is an HTTP proxy server in Scala using Akka in just 20 lines of code. It is prone to get blocked due to single IP usage, but a rotating proxy service like Proxies API can solve IP blocking problems instantly.

Scrape Any Website with OpenAI Function Calling in Scala

Author: Mohan Ganesan

Date: Sep 25, 2023

Web scraping with OpenAI in Scala to extract product data from HTML using function calling.

Downloading Images from a Website with Scala and rucola

Author: Mohan Ganesan

Date: Oct 15, 2023

Learn how to use Scala and libraries like scalaj-http and rucola to download images of dog breeds from a Wikipedia page.

Building a Simple Proxy Rotator with Scala and Scraping

Author: Mohan Ganesan

Date: Oct 2, 2023

A simple Scala proxy rotator using ScalaJS for web scraping, fetching and parsing proxies periodically from a proxy site.

Web Scraping Wikipedia in Scala

Author: Mohan Ganesan

Date: Dec 6, 2023

Wikipedia scraping using Scala and Jsoup to extract structured data from tables. Simplified steps include importing libraries, defining URL, setting user agent, sending HTTP request, parsing HTML, extracting data, and printing scraped data.

Scraping Multiple Pages in Scala with HTTP Client and XML Libraries

Author: Mohan Ganesan

Date: Oct 15, 2023

Web scraping in Scala using HTTP client and XML libraries to extract data from multiple pages. Use XPath expressions and proxies for scalability.

Introduction to Scraping Reddit Posts in Scala

Author: Mohan Ganesan

Date: Jan 9, 2024

Beginner-friendly guide to scrape content from Reddit using Scala and Play Framework's WS library. Extract key information like post titles, permalinks, authors, and scores from Reddit posts on a webpage.

Scraping All Images from a Website with Scala

Author: Mohan Ganesan

Date: Dec 13, 2023

Learn how to use Scala and Jsoup to scrape images from a website. Make HTTP requests, extract data from HTML, and download images.

Scraping Craigslist Listings with Scala

Author: Mohan Ganesan

Date: Oct 1, 2023

Learn how to scrape Craigslist apartment listings using Scala and the play-ws library. Use XML parsing and a rotating proxy server to avoid IP blocking.

Scraping Yelp Business Listings with Scala

Author: Mohan Ganesan

Date: Dec 6, 2023

Learn how to extract data from Yelp business listings using Scala and web scraping techniques.

Scraping eBay Listings with Scala and HTTP4S in 2023

Author: Mohan Ganesan

Date: Oct 5, 2023

eBay is a large online marketplace. This tutorial explains how to scrape and extract data from eBay listings using Scala and the HTTP4S library.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!