Stories from the Web Crawling trenches in Javascript

The Complete Playwright Cheatsheet

Author: Mohan Ganesan

Date: Dec 21, 2023

Playwright is a Node.js library for cross-browser end-to-end testing across Chromium, Firefox, and WebKit.

The Complete Puppeteer Cheatsheet

Author: Mohan Ganesan

Date: Dec 6, 2023

Puppeteer is a Node.js library for automating UI testing, scraping, and screenshot testing using headless Chrome.

How to Build a Super Simple HTTP Proxy in JavaScript in just 20 lines of code

Author: Mohan Ganesan

Date: Oct 1, 2023

Build a basic proxy server with JavaScript using Node.js http and request modules. Avoid IP blocking with a rotating proxy service.

How to Build a Super Simple HTTP Proxy in JavaScript in just 20 lines of code

Author: Mohan Ganesan

Date: Oct 1, 2023

Build a basic proxy server with JavaScript using Node.js http and request modules. Avoid IP blocking with a rotating proxy service.

Capturing Screenshots with Puppeteer - An advanced guide

Author: Mohan Ganesan

Date: Jan 9, 2024

Puppeteer is a Node.js library for controlling headless Chrome, ideal for web scraping and automation tasks. It allows you to automate browser actions, capture screenshots, and perform advanced tasks like emulating mobile devices and simulating network conditions.

Web Scraping All The Images From a Website in Node.js

Author: Mohan Ganesan

Date: Dec 13, 2023

Automate data collection from websites using web scraping with Node.js, axios, and cheerio. Extract dog breed information and images from a Wikipedia page.

Scraping Multiple Pages in Javascript with Cheerio

Author: Mohan Ganesan

Date: Oct 15, 2023

Web scraping in Javascript using cheerio library to extract data from multiple pages. Fetch pages with request() and parse HTML using cheerio. Scrape and extract information at scale with Proxies API.

The Ultimate Cheerio Web Scraping Cheat Sheet

Author: Mohan Ganesan

Date: Oct 31, 2023

Cheerio is a fast, flexible web scraping library for Node.js. This cheat sheet provides a comprehensive reference of its syntax and capabilities.

Downloading Images from a Website with Javascript and cheerio

Author: Mohan Ganesan

Date: Oct 15, 2023

Learn how to use Javascript and the cheerio library to download all the images from a Wikipedia page and extract data about dog breeds listed on the page.

Scraping Wikipedia Pages with Node.js

Author: Mohan Ganesan

Date: Dec 6, 2023

Scrape Wikipedia using Node.js with axios and cheerio to extract structured data for various use cases.

Tips for Handling JavaScript Content with BeautifulSoup

Author: Mohan Ganesan

Date: Oct 6, 2023

Dealing with heavy JavaScript sites takes specialized tools like browser automation or APIs. BeautifulSoup can still effectively access and parse content.

Fetching Data in JavaScript with urllib

Author: Mohan Ganesan

Date: Feb 6, 2024

JavaScript uses urllib library to fetch data from URLs, including JSON APIs, in web browsers and Node.js environments.

Fetching Data in JavaScript with urllib

Author: Mohan Ganesan

Date: Feb 6, 2024

JavaScript uses urllib library to fetch data from URLs, including JSON APIs, in web browsers and Node.js environments.

Scraping Hidden Emails with Python Web Scraping

Author: Mohan Ganesan

Date: Feb 3, 2024

Email addresses are often hidden on websites. Python web scraping with BeautifulSoup and re module can help uncover hidden emails.

Web Scraping New York Times News Headlines with Node.js

Author: Mohan Ganesan

Date: Dec 6, 2023

Scrape New York Times articles using Node.js modules like request and cheerio to extract structured data for various applications.

What are the limitations of BeautifulSoup?

Author: Mohan Ganesan

Date: Feb 5, 2024

BeautifulSoup is a Python library for parsing and extracting data from HTML and XML documents. It struggles with modern JavaScript sites and cannot bypass most bot protections. CSS selectors and navigation logic can get complex. Consider alternatives like Scrapy, Puppeteer, or Playwright for professional web scraping.

Scrape Websites with OpenAI Function Calling in JavaScript

Author: Mohan Ganesan

Date: Sep 25, 2023

Web scraping with OpenAI allows for resilient data extraction from websites using JavaScript. It leverages natural language processing to handle changes in HTML structure. This article provides a code example for scraping product data from an ecommerce website.

Scraping Reddit Posts in Node.js

Author: Mohan Ganesan

Date: Jan 9, 2024

Guide to scraping image URLs from a Reddit page using Node.js, focusing on identifying and extracting post blocks with images and metadata.

Convert Object to JSON String in JavaScript

Author: Mohan Ganesan

Date: Oct 4, 2023

Converting a JavaScript object to a JSON string requires handling types like objects, arrays, and primitives. Recursively stringify nested values. Use valid JSON syntax.

Web Scraping with Javascript & ChatGPT

Author: Mohan Ganesan

Date: Sep 25, 2023

Web scraping in JavaScript with ChatGPT for code generation and explanations. Libraries like Request and Cheerio are used for data extraction. Consider using a dedicated web scraping API like Proxies API for robust scraping.

Using Proxies in Axios in Node.js for Web Scraping in 2024

Author: Mohan Ganesan

Date: Jan 9, 2024

Configure proxies for Node.js web scraping using Axios library. Learn about proxy options, authentication, rotating proxies, environment variables, custom logic, and proxy services like Proxies API.

Making Asynchronous HTTP Requests with request.post() in Node.js

Author: Mohan Ganesan

Date: Feb 3, 2024

The request.post() method in Node.js can be made asynchronous and non-blocking by using callbacks, promises, or the async library.

What is the difference between async and await?

Author: Mohan Ganesan

Date: Mar 24, 2024

Asynchronous programming in JavaScript can be achieved using async/await and promises. Async/await provides syntax that makes code easier to read and maintain, while promises lay the foundation for async/await.

Which scraping language is best?

Author: Mohan Ganesan

Date: Feb 5, 2024

When it comes to web scraping, the programming language you use matters. Python and JavaScript are popular choices, but consider factors like performance, complexity, and available libraries.

Which language is best for web scraping?

Author: Mohan Ganesan

Date: Feb 5, 2024

Best languages for web scraping: Python, JavaScript, and R. They provide the best libraries and balance for most web scraping needs.

Scraping Yelp Business Listings in NodeJS

Author: Mohan Ganesan

Date: Dec 6, 2023

Learn how to scrape business listings from Yelp using web scraping techniques and premium proxies with Node.js and Axios.

how long does it take to learn web scraping

Author: Mohan Ganesan

Date: Feb 20, 2024

Learn web scraping in 0-3 months with Python or JavaScript. Master advanced techniques in 4-12 months. Keep leveling up your skills!

What are the three types of scrapers?

Author: Mohan Ganesan

Date: Feb 22, 2024

Web scraping refers to automatically extracting data from websites using DOM parsing, headless browser automation, or web scraping services.

Is web scraping good for freelancing?

Author: Mohan Ganesan

Date: Feb 20, 2024

Web scraping freelancing offers great income potential in a niche with lots of demand. Be ready to continually monitor scripts and adapt to site changes.

Is web scraping for beginners?

Author: Mohan Ganesan

Date: Feb 20, 2024

Web scraping is the process of extracting data from websites. Beginners can learn it with programming knowledge in HTML/CSS, Python, and JavaScript.

Is it easy to learn web scraping?

Author: Mohan Ganesan

Date: Feb 5, 2024

Web scraping made easy with Python or JavaScript. Understand website structure, leverage libraries, and problem solve for rewarding data extraction.

Is BeautifulSoup good for web scraping?

Author: Mohan Ganesan

Date: Feb 5, 2024

BeautifulSoup is the leading Python web scraping library, with an intuitive API for parsing HTML. It struggles with JavaScript-heavy sites, so use proxies and mimic humans. Try it for your next project!

How to create an API?

Author: Mohan Ganesan

Date: May 7, 2024

APIs allow software applications to communicate. This guide shows how to create a REST API using Node.js and Express.

Is BeautifulSoup or Selenium better?

Author: Mohan Ganesan

Date: Feb 5, 2024

Selenium vs BeautifulSoup: choose the right tool for web scraping based on the complexity of the site and the presence of dynamic content.

Scraping Booking.com Property Listings with JavaScript in 2023

Author: Mohan Ganesan

Date: Oct 15, 2023

Learn how to scrape property listings from Booking.com using JavaScript. Use Axios and Cheerio to fetch HTML content and extract details like property name, location, ratings, etc.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!