Stories from the Web Crawling trenches in user agent

How to Set and Change User Agent when using curl

Author: Mohan Ganesan

Date: Jan 9, 2024

Learn how to change cURL's user agent to avoid blocks and mimic real browsers for web scraping and API testing.

Scraping New York Times News Headlines with Java

Author: Mohan Ganesan

Date: Dec 6, 2023

Scrape New York Times articles using Java and Jsoup library, extract headlines and links, and simulate a browser's user agent string.

Web Scraping Wikipedia in Scala

Author: Mohan Ganesan

Date: Dec 6, 2023

Wikipedia scraping using Scala and Jsoup to extract structured data from tables. Simplified steps include importing libraries, defining URL, setting user agent, sending HTTP request, parsing HTML, extracting data, and printing scraped data.

Scraping New York Times News Headlines in VB

Author: Mohan Ganesan

Date: Dec 6, 2023

Web scraping is the process of extracting data from websites automatically through code. This article provides a step-by-step guide on how to scrape article titles and links from The New York Times website using HTML parsing and XPath queries.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!