Stories from the Web Crawling trenches in JSoup

The Ultimate Jsoup Cheatsheet in Java

Author: Mohan Ganesan

Date: Oct 31, 2023

Jsoup is a Java library for parsing and manipulating HTML using DOM, CSS, and jquery-like methods.

The Ultimate KSoup Cheatsheet for Kotlin

Author: Mohan Ganesan

Date: Oct 31, 2023

KSoup is an HTML parser for Kotlin that provides a convenient DSL for extracting and manipulating data from HTML documents.

Downloading Images from a Website with Java and JSoup

Author: Mohan Ganesan

Date: Oct 15, 2023

Learn how to use Java and JSoup to download images from a Wikipedia page, extract data from HTML tables, and overcome challenges in web scraping using proxies.

The Ultimate JSoup Kotlin Cheatsheet

Author: Mohan Ganesan

Date: Oct 31, 2023

JSoup is a Java library for working with real-world HTML. It provides a convenient API for extracting and manipulating data from HTML documents.

Scraping Multiple Pages in Java with JSoup

Author: Mohan Ganesan

Date: Oct 15, 2023

Web scraping in Java using JSoup to extract data from multiple pages. Use base URL pattern, loop through pages, send request, parse HTML, and extract data using selectors.

The Ultimate JSoup Scala Cheatsheet

Author: Mohan Ganesan

Date: Oct 31, 2023

JSoup is a Java library for working with real-world HTML. It provides a convenient API for extracting and manipulating data from HTML documents.

Scraping Wikipedia in Java for Beginners

Author: Mohan Ganesan

Date: Dec 6, 2023

Web scraping is the process of extracting data from websites. This article provides a code example using Jsoup to scrape Wikipedia for data on US presidents. It also discusses handling IP blocking with a rotating proxy service.

Scraping All Images from a Website with Java

Author: Mohan Ganesan

Date: Dec 13, 2023

Web scraping is the process of extracting data from websites automatically. This article explains how to scrape dog breed images from a Wikipedia page using Java and Jsoup library. It also discusses the use of CSS selectors and overcoming IP blocking.

Scraping eBay Listings with Java and JSoup in 2023

Author: Mohan Ganesan

Date: Oct 5, 2023

eBay is a large online marketplace. This tutorial explains how to scrape and extract data from eBay listings using Java and the JSoup library.

Scraping New York Times News Headlines with Java

Author: Mohan Ganesan

Date: Dec 6, 2023

Scrape New York Times articles using Java and Jsoup library, extract headlines and links, and simulate a browser's user agent string.

Extracting Structured Data by Scraping Wikipedia with Kotlin

Author: Mohan Ganesan

Date: Dec 6, 2023

Scraping Wikipedia using Jsoup to extract structured data on US presidents.

Downloading Images from a Website with Kotlin and Jsoup

Author: Mohan Ganesan

Date: Oct 15, 2023

Learn how to use Kotlin and Jsoup to download images from a Wikipedia page, extract data from HTML tables, and scrape websites. Use Proxies API for scaling web scraping.

Scraping New York Times News Headlines using Kotlin

Author: Mohan Ganesan

Date: Dec 6, 2023

The New York Times homepage can be scraped programmatically using Python and JSoup to extract article titles and links.

Scraping Real Estate Listings From Realtor in Java

Author: Mohan Ganesan

Date: Jan 9, 2024

Scrape real estate listing data from Realtor.com using Jsoup, a Java library. Analyze trends or build applications with large-scale housing data.

Scraping Yelp Business Listings in Java

Author: Mohan Ganesan

Date: Dec 6, 2023

Learn how to scrape Yelp business listings using Jsoup and Java with proxies for stable data extraction.

Scraping Real Estate Listings From Realtor in Kotlin

Author: Mohan Ganesan

Date: Jan 9, 2024

Learn how to use Jsoup for web scraping to extract key details from real estate listings on Realtor.com. This comprehensive guide covers crafting GET requests, selecting HTML elements with CSS selectors, extracting and transforming text, and dealing with missing data. By the end, you'll be able to scrape details like broker name, status, price, beds, baths, square footage, lot size, and full address from any Realtor.com search page.

Scraping New York Times News Headlines in Scala

Author: Mohan Ganesan

Date: Dec 6, 2023

Web scraping is a technique for extracting data from websites automatically. This article explains how to scrape article titles and links from The New York Times homepage using Scala and the Jsoup library.

Scraping All Images from a Website with Scala

Author: Mohan Ganesan

Date: Dec 13, 2023

Learn how to use Scala and Jsoup to scrape images from a website. Make HTTP requests, extract data from HTML, and download images.

Scraping Real Estate Listings From Realtor in Scala

Author: Mohan Ganesan

Date: Jan 9, 2024

Learn how to extract key details from real estate listings on Realtor.com using Jsoup, a Java library for web scraping.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!