Stories from the Web Crawling trenches in automation

The Complete Puppeteer Cheatsheet

Author: Mohan Ganesan

Date: Dec 6, 2023

Puppeteer is a Node.js library for automating UI testing, scraping, and screenshot testing using headless Chrome.

Scraping Leads using ChatGPT: A How-To Guide

Author: Mohan Ganesan

Date: Sep 25, 2023

ChatGPT enables lead generation by scraping leads from the web, providing targeted domains, extracting email addresses, and automating the process. It generates 500-1000 leads in a niche, but has limitations and requires workarounds for web scraping. Overall, it offers a powerful starting point for lead generation.

Using Python and Wget for Web Scraping

Author: Mohan Ganesan

Date: Jan 9, 2024

Wget is a powerful command-line utility for downloading content from the web. This article explores how to use Wget in Python scripts, either through the Wget module or by calling the Wget command via subprocess. Wget offers features like recursive downloading, resuming broken downloads, customizing user agent strings, speed throttling options, and flexible filtering. Python's subprocess module allows for more configurability, but introduces more complexity. Overall, Python and Wget are a great combination for web scraping and automation tasks.

Web Scraping with PHP & ChatGPT

Author: Mohan Ganesan

Date: Sep 25, 2023

Web scraping in PHP using ChatGPT for code generation and explanations. PHP libraries like Goutte and DOMDocument are popular for data extraction. ChatGPT assists in generating code snippets and improving prompts for better results.

Capturing Screenshots with Puppeteer - An advanced guide

Author: Mohan Ganesan

Date: Jan 9, 2024

Puppeteer is a Node.js library for controlling headless Chrome, ideal for web scraping and automation tasks. It allows you to automate browser actions, capture screenshots, and perform advanced tasks like emulating mobile devices and simulating network conditions.

How to Set and Change User Agent when using curl

Author: Mohan Ganesan

Date: Jan 9, 2024

Learn how to change cURL's user agent to avoid blocks and mimic real browsers for web scraping and API testing.

Chromedriver Executable Needs to be in Path? - Solved

Author: Mohan Ganesan

Date: Jan 9, 2024

Learn how to install and configure Chromedriver for Selenium automation in Python, and avoid the 'chromedriver executable needs to be in PATH' error.

Bypassing CAPTCHAs with Puppeteer

Author: Mohan Ganesan

Date: Oct 4, 2023

Automate captcha solving using Puppeteer and headless Chrome with the help of a captcha solving service like 2Captcha.

Solving CAPTCHAs with OpenAI's Whisper Using Selenium

Author: Mohan Ganesan

Date: Oct 4, 2023

Automate solving audio CAPTCHAs using OpenAI's Whisper and Selenium. Whisper's powerful speech recognition capabilities paired with Selenium's web automation tool provide an end-to-end pipeline for defeating CAPTCHAs programmatically.

Web Scraping Wikipedia with CSharp

Author: Mohan Ganesan

Date: Dec 6, 2023

Learn how to scrape data from Wikipedia using C# and the HtmlAgilityPack library. Extract information from websites for data collection, analysis, and automation.

How long does web scraping take

Author: Mohan Ganesan

Date: Feb 20, 2024

Web scraping involves extracting data from websites. The time needed depends on factors like website size, complexity, data type, automation level, experience, and difficulty. Start small to estimate accurately.

Using Python Requests to Populate Date Fields in Web Forms

Author: Mohan Ganesan

Date: Feb 3, 2024

Use Python Requests library and headers to populate date fields in web forms with date pickers for automation.

What Are Static Residential Proxies? An Insider's Perspective

Author: Mohan Ganesan

Date: Jan 9, 2024

Static residential proxies provide anonymity and legitimacy using real residential IPs while maintaining the speed of datacenter proxies. They are ideal for web scraping and automation, avoiding blocks and captchas.

Automate Search Form Submission with Python Requests

Author: Mohan Ganesan

Date: Feb 3, 2024

Submitting forms is a common task when scraping the web or automating workflows. Python requests allows you to easily submit forms programmatically.

How to SCRAPE DYNAMIC Websites with Selenium

Author: Mohan Ganesan

Date: Oct 4, 2023

Web scraping dynamic websites with Selenium for automation and data extraction. Consider using ProxiesAPI for robust and scalable commercial scraping projects.

How do I scrape a difficult website?

Author: Mohan Ganesan

Date: Feb 20, 2024

Web scraping can be tricky, but with persistence and technical knowledge, obstacles like dynamic content and captcha can be overcome.

Scraping New York Times News Headlines in Elixir

Author: Mohan Ganesan

Date: Dec 6, 2023

Learn how to use Elixir libraries like HTTPoison and Floki to automate web scraping and extract data from the New York Times homepage.

What is Requests Used For in Python?

Author: Mohan Ganesan

Date: Oct 22, 2023

Requests library simplifies working with HTTP APIs and web services in Python, including web scraping, API testing, interacting with web services, building web clients, fetching data, and automation.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!