Stories from the Web Crawling trenches in browser automation

Tips for Handling JavaScript Content with BeautifulSoup

Author: Mohan Ganesan

Date: Oct 6, 2023

Dealing with heavy JavaScript sites takes specialized tools like browser automation or APIs. BeautifulSoup can still effectively access and parse content.

Overcoming CAPTCHAs When Web Scraping with PHP

Author: Mohan Ganesan

Date: Feb 20, 2024

Web scraping guide: handling CAPTCHAs with PHP. Use CAPTCHA solving service, browser automation, or proxy service. Consider ethical concerns.

The Complete Guide to JavaScript Scraping with Python: Tips, Tricks, and Gotchas

Author: Mohan Ganesan

Date: Nov 17, 2023

Scraping JavaScript-heavy sites in Python can be tricky. With the right tools like Selenium and Requests-HTML, you can conquer complex JS pages and handle async JS rendering.

Web Scraping in Python: A Comparison of Beautiful Soup, Selenium, and Scrapy

Author: Mohan Ganesan

Date: Oct 4, 2023

Web scraping with Python using Beautiful Soup, Selenium, and Scrapy. Each tool serves a different niche, from simple extraction to browser automation and large-scale scraping.

Using Proxies with Pyppeteer for Web Scraping

Author: Mohan Ganesan

Date: Jan 9, 2024

Pyppeteer allows browser automation with proxies, including static IPs, rotating proxies, and residential proxies. Proxy management is important for successful web scraping, including refreshing IP pools, having backup options, and monitoring proxy statuses. Proxies API offers a managed proxy solution for easier integration. Pyppeteer also provides advanced proxy usage options like setting proxies in page routes and creating proxy middleware. Following proxy best practices, such as mixing different proxy types and adding random page delays, can help avoid bot protections.

Is BeautifulSoup faster than selenium?

Author: Mohan Ganesan

Date: Feb 5, 2024

Web scraping involves extracting data from websites. BeautifulSoup is lightweight and efficient for scraping static content, while Selenium is necessary for dynamically loaded content. Together, they provide a comprehensive solution for web scraping.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!