Stories from the Web Crawling trenches in web scraper

How to Tell if a Website is Scrapable

Author: Mohan Ganesan

Date: Feb 20, 2024

Determine if a website can be scraped by checking the robots.txt file, analyzing the page source, checking for CAPTCHAs, and testing scraping a page.

Dealing with 403 Forbidden Errors in BeautifulSoup

Author: Mohan Ganesan

Date: Oct 6, 2023

Ways to handle and bypass 403 Forbidden errors in web scraping: checking error codes, using user agents, authenticating with login credentials, waiting and retrying, using proxies.

Is VPN good for scraping?

Author: Mohan Ganesan

Date: Feb 20, 2024

Using a VPN for web scraping can provide privacy and access benefits, but it may also slow down page load times and have usage limits.

ParseHub Alternative - Simplify Web Scraping with ProxiesAPI

Author: Mohan Ganesan

Date: Sep 30, 2023

ParseHub is a visual web scraper with complex configuration and slow scraping speed. ProxiesAPI simplifies scraping with one API call, providing proxy rotation, browser identities, CAPTCHA solving, and javascript rendering.

Is a web scraper a bot?

Author: Mohan Ganesan

Date: Feb 20, 2024

Web scrapers extract specific data from sites, while web bots interact with full site contents and flows. The program specifics depend on your particular needs and constraints.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!