Stories from the Web Crawling trenches in library

The Ultimate Nokogiri Cheat Sheet for Ruby

Author: Mohan Ganesan

Date: Oct 31, 2023

Nokogiri is a powerful HTML/XML parsing and scraping library for Ruby. This cheat sheet covers its extensive capabilities.

The Ultimate Select.rs Cheat Sheet for Rust

Author: Mohan Ganesan

Date: Oct 31, 2023

select.rs is a robust HTML/XML scraping library for Rust. This cheat sheet covers its features, including installation, loading documents, selecting nodes, traversing nodes, extracting/modifying nodes, creating/inserting/removing nodes, output formats, caching and persistence, headless browsers, validation, encoding, advanced selectors, caching and performance, common recipes, troubleshooting, and ecosystem libraries.

The Ultimate Gumbo C++ Cheatsheet

Author: Mohan Ganesan

Date: Oct 31, 2023

Gumbo is an HTML5 parsing library in C++ that allows for easy manipulation and extraction of HTML. It provides various functions for selecting, traversing, and manipulating nodes in the DOM.

Are Python requests deprecated?

Author: Mohan Ganesan

Date: Oct 22, 2023

Python Requests is a popular library for making HTTP requests. Despite confusion caused by AWS, it remains actively maintained and supports the latest Python versions.

Who wrote BeautifulSoup?

Author: Mohan Ganesan

Date: Feb 5, 2024

The Origins of BeautifulSoup: Mark Pilgrim's Powerful Web Scraping Library. Created in 2004, BeautifulSoup is a popular and powerful library for web scraping and handling HTML/XML in Python.

Is socket a Python library?

Author: Mohan Ganesan

Date: Feb 20, 2024

The socket module in Python is a built-in interface for networking and inter-process communication. It is not a third-party library and can be imported freely without extra installation steps.

Is BeautifulSoup a library or module?

Author: Mohan Ganesan

Date: Feb 5, 2024

BeautifulSoup is a library in Python for parsing, navigating, and searching HTML and XML documents.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!