Stories from the Web Crawling trenches in API

The Complete BeautifulSoup Cheatsheet with Examples

Author: Mohan Ganesan

Date: Oct 4, 2023

This cheatsheet covers the full BeautifulSoup 4 API with practical examples. It provides a comprehensive guide to web scraping and HTML parsing using Python's BeautifulSoup library.

Downloading Files with Python Requests - Tips, Tricks and Code Example

Author: Mohan Ganesan

Date: Oct 31, 2023

Learn how to use Python Requests to download files from the web with ease. Requests provides a simple API for making HTTP calls, supports advanced features like streaming downloads and authentication, and is actively maintained. Use Requests to download files like a pro!

Scrape Any Website with OpenAI Function Calling in PHP

Author: Mohan Ganesan

Date: Sep 25, 2023

Web scraping with OpenAI in PHP allows for resilient data extraction from websites, adapting to changes in HTML structure. Extracted product data can be processed and stored.

How to Authenticate with Bearer Tokens in Python Requests

Author: Mohan Ganesan

Date: Feb 3, 2024

Bearer tokens are used for authentication in APIs. This article explains how to make authenticated requests with bearer tokens in Python using the Requests module.

The Complete HTTPBin CheatSheet in Python

Author: Mohan Ganesan

Date: Dec 6, 2023

Httpbin is a popular online service for testing and debugging HTTP libraries and clients. It is useful for testing HTTP client code, experimenting with APIs, learning HTTP concepts, debugging issues, and more.

How to Build a Super Simple HTTP Proxy in C++ in just 30 lines of code

Author: Mohan Ganesan

Date: Oct 1, 2023

Build a basic HTTP proxy in C++ in 30 lines of code. Use a rotating proxy service to avoid IP blocking with an API.

Accessing Your Local Web Server from Python Requests

Author: Mohan Ganesan

Date: Feb 3, 2024

Accessing a development server on localhost is easy with Python requests: Use http://localhost or http://127.0.0.1, Add the port your server uses like :8000, Disable SSL warnings for HTTPS, Import requests and call get/post as usual!

Scrape Any Website with OpenAI Function Calling in C++

Author: Mohan Ganesan

Date: Sep 25, 2023

Web scraping with OpenAI in C++ allows for resilient data extraction from websites using function calling.

Accessing OAuth2 APIs with Python Requests

Author: Mohan Ganesan

Date: Feb 3, 2024

Python's Requests library provides an easy way to handle OAuth2 authentication and access protected resources from an API. It covers obtaining and refreshing access tokens programmatically.

Building a Simple Proxy Rotator with Ruby and Nokogiri

Author: Mohan Ganesan

Date: Oct 2, 2023

Fetch and use public proxies in Ruby projects using Nokogiri and free proxy lists. Scale to thousands of links with a rotating proxy service like Proxies API.

How to Build a Super Simple HTTP Proxy in Elixir in just 20 lines of code

Author: Mohan Ganesan

Date: Oct 1, 2023

Elixir makes it easy to build fast and scalable network applications. Here is a basic HTTP proxy server in less than 20 lines of Elixir code.

Scrape Websites with OpenAI Function Calling in JavaScript

Author: Mohan Ganesan

Date: Sep 25, 2023

Web scraping with OpenAI allows for resilient data extraction from websites using JavaScript. It leverages natural language processing to handle changes in HTML structure. This article provides a code example for scraping product data from an ecommerce website.

Scrape Any Website with OpenAI Function Calling in Ruby

Author: Mohan Ganesan

Date: Sep 25, 2023

Web scraping with OpenAI in Ruby allows for resilient data extraction from HTML using function calling.

Scraping Wikipedia in Java for Beginners

Author: Mohan Ganesan

Date: Dec 6, 2023

Web scraping is the process of extracting data from websites. This article provides a code example using Jsoup to scrape Wikipedia for data on US presidents. It also discusses handling IP blocking with a rotating proxy service.

Scrape Any Website with OpenAI Function Calling in CSharp

Author: Mohan Ganesan

Date: Sep 25, 2023

Web scraping with OpenAI in C# allows for resilient data extraction from websites using natural language processing.

Web Scraping with Javascript & ChatGPT

Author: Mohan Ganesan

Date: Sep 25, 2023

Web scraping in JavaScript with ChatGPT for code generation and explanations. Libraries like Request and Cheerio are used for data extraction. Consider using a dedicated web scraping API like Proxies API for robust scraping.

How to Build a Super Simple HTTP proxy in Go in just 20 lines of code

Author: Mohan Ganesan

Date: Oct 1, 2023

Go is a great language for writing simple and efficient network applications. Learn how to build a basic HTTP proxy in Go in under 20 lines of code. To handle IP blocking, consider using a rotating proxy service like Proxies API.

Making Python Requests Without Timeout

Author: Mohan Ganesan

Date: Feb 3, 2024

When making HTTP requests in Python using the requests library, timeouts are set by default. However, sometimes you may want to remove the timeout to let long requests run to completion.

Scrape Any Website with OpenAI Function Calling in Perl

Author: Mohan Ganesan

Date: Sep 25, 2023

Web scraping with OpenAI in Perl to extract product data from HTML using function calling.

Dodging CAPTCHAs with Python for Web Scraping

Author: Mohan Ganesan

Date: Oct 4, 2023

CAPTCHAs are a major annoyance when scraping the web. This article explains how to automatically solve CAPTCHAs using Python libraries and services like 2Captcha and Proxies API.

Speed Up Your Python Web Requests: Requests vs. Urllib

Author: Mohan Ganesan

Date: Feb 3, 2024

Python's requests library provides a fast and simple interface for making HTTP requests, offering better performance than urllib for most use cases.

ZenRows Alternative - Why Proxies API is Simpler & More Affordable

Author: Mohan Ganesan

Date: Sep 30, 2023

ZenRows is a popular web scraping API, but Proxies API offers a simpler and cheaper alternative. Proxies API provides a simple and affordable solution with easy API integration, pay per API call pricing, and no vendor lock-in.

Simplify OAuth Authentication in Python with httpx-oauth

Author: Mohan Ganesan

Date: Feb 5, 2024

Authenticating with OAuth in Python can be tedious. httpx-oauth simplifies the process by providing a unified API for different OAuth providers and handling token management, refreshing, and storage.

Troubleshooting Stale Data in Python Requests

Author: Mohan Ganesan

Date: Feb 3, 2024

Check for client-side caching in requests and disable. Ensure server is not caching responses. Use sessions for APIs that require statefulness.

Zenscrape Alternative - Simplify Web Scraping with ProxiesAPI

Author: Mohan Ganesan

Date: Sep 30, 2023

ProxiesAPI simplifies web scraping with easy pricing and automatic proxy rotation, CAPTCHA solving, and Javascript rendering.

Scraping YouTube Data: What's Allowed and Best Practices

Author: Mohan Ganesan

Date: Feb 20, 2024

YouTube allows limited web scraping for non-commercial personal use cases like academic research, but with significant restrictions and best practices to follow.

Customizing HTTPX User Agents for Effective API Requests

Author: Mohan Ganesan

Date: Feb 5, 2024

Customize the User Agent header in HTTPX Python library for API analytics, compatibility checks, and access control.

ProxyScrape Residential Proxies Alternative - Simplify Web Scraping with ProxiesAPI

Author: Mohan Ganesan

Date: Sep 30, 2023

ProxiesAPI simplifies web scraping with a single API call and unlimited bandwidth, beating ProxyScrape's manual proxy rotation and per GB usage fees.

Does YouTube allow scraping?

Author: Mohan Ganesan

Date: Feb 20, 2024

YouTube restricts data scraping to protect its platform and users. Exceptions include limited personal use and research purposes.

WebScrapingAPI Alternative - Simplify Web Scraping with ProxiesAPI

Author: Mohan Ganesan

Date: Sep 30, 2023

WebScrapingAPI offers robust web scraping via API, but ProxiesAPI is a simpler alternative with unlimited requests and bandwidth.

Scraping New York Times News Headlines in VB

Author: Mohan Ganesan

Date: Dec 6, 2023

Web scraping is the process of extracting data from websites automatically through code. This article provides a step-by-step guide on how to scrape article titles and links from The New York Times website using HTML parsing and XPath queries.

Overcoming SSL Certificate Errors with Python Requests

Author: Mohan Ganesan

Date: Feb 3, 2024

Dealing with SSL certificates in Python Requests can be a pain. Here are some tips to overcome certificate errors and ensure validation.

Scraping Yelp Business Listings with Scala

Author: Mohan Ganesan

Date: Dec 6, 2023

Learn how to extract data from Yelp business listings using Scala and web scraping techniques.

ScrapingBee Alternative - Why Proxies API is Simpler & Cheaper

Author: Mohan Ganesan

Date: Sep 30, 2023

ScrapingBee and Proxies API are web scraping APIs, but Proxies API offers a simpler and more affordable approach. Proxies API provides an easy API, pay per call pricing, no lock-in, and simple integration. It is a cost-effective alternative to ScrapingBee.

Why use Python requests?

Author: Mohan Ganesan

Date: Feb 20, 2024

The Requests library is a popular tool for Python developers to make HTTP requests and APIs easier. It saves time compared to urllib module and provides features like JSON decoding and SSL verification. Requests is recommended for web API calls, web scraping, and more.

ScraperAPI Alternative - Simplify Web Scraping with ProxiesAPI

Author: Mohan Ganesan

Date: Sep 30, 2023

ProxiesAPI simplifies web scraping with easy pricing and delivers clean HTML from any webpage with one API call.

Making PUT Requests with aiohttp in Python

Author: Mohan Ganesan

Date: Mar 3, 2024

aiohttp library in Python provides a simple way to make asynchronous PUT requests, allowing for easy resource creation and updates.

ProWebScraper Alternative - Simplify Web Scraping with ProxiesAPI

Author: Mohan Ganesan

Date: Sep 30, 2023

ProWebScraper is a visual web scraping tool, but ProxiesAPI simplifies scraping with an API, offering features like proxy rotation and CAPTCHA solving.

Datahut Alternative - Simplify Web Scraping with ProxiesAPI

Author: Mohan Ganesan

Date: Sep 30, 2023

Datahut offers web scraping as a service, but ProxiesAPI simplifies scraping with easy pricing and delivers clean HTML from any webpage with one API call.

Getting Started with the HTTPX Python Library

Author: Mohan Ganesan

Date: Feb 5, 2024

The HTTPX library is a powerful and user-friendly HTTP client for Python. Install it with pip and make requests easily with its elegant API.

How do I scrape Google cache?

Author: Mohan Ganesan

Date: Feb 20, 2024

Search engine caches like Google Cache provide a useful way to access web pages. Web scraping can help access and preserve these cached copies.

SOAX Alternative - Simplify Web Scraping with ProxiesAPI

Author: Mohan Ganesan

Date: Sep 30, 2023

ProxiesAPI simplifies web scraping with easy pricing and built-in features, offering unlimited bandwidth and automatic proxy rotation. It beats SOAX with its simplicity and lower cost. Get started with 1,000 free API requests at ProxiesAPI.com.

Scrapfly Alternative - Why Proxies API is Simpler & More Affordable

Author: Mohan Ganesan

Date: Sep 30, 2023

Proxies API offers a simpler and cost-effective alternative to Scrapfly for web scraping, with a simple API, pay-per-call pricing, and no lock-in.

ScrapingRobot Alternative - Simplify Web Scraping with ProxiesAPI

Author: Mohan Ganesan

Date: Sep 30, 2023

ProxiesAPI simplifies web scraping with easy pricing and delivers clean HTML from any webpage with one API call.

Scrapingdog Alternative - Simplify Web Scraping with ProxiesAPI

Author: Mohan Ganesan

Date: Sep 30, 2023

Scrapingdog provides a robust web scraping API with flexible credits-based plans. ProxiesAPI offers a simpler scraping API with features like proxy rotation and javascript rendering.

WebScraper.io Alternative - Simplify Web Scraping with ProxiesAPI

Author: Mohan Ganesan

Date: Sep 30, 2023

WebScraper.io is a visual web scraping tool, but ProxiesAPI simplifies scraping with easy pricing and delivers clean HTML from any webpage with one API call.

Smartproxy Alternative - Simplify Web Scraping with ProxiesAPI

Author: Mohan Ganesan

Date: Sep 30, 2023

ProxiesAPI simplifies web scraping with easy pricing, unlimited bandwidth, and built-in features like CAPTCHA solving. No need for complex proxy plans or integrations.

ParseHub Alternative - Simplify Web Scraping with ProxiesAPI

Author: Mohan Ganesan

Date: Sep 30, 2023

ParseHub is a visual web scraper with complex configuration and slow scraping speed. ProxiesAPI simplifies scraping with one API call, providing proxy rotation, browser identities, CAPTCHA solving, and javascript rendering.

Proxyrack Alternative - Simplify Web Scraping with ProxiesAPI

Author: Mohan Ganesan

Date: Sep 30, 2023

ProxiesAPI simplifies web scraping with easy pricing and built-in features like CAPTCHA solving and proxy rotation. It offers unlimited bandwidth and a lower cost compared to Proxyrack.

ScrapingAnt Alternative - Simplify Web Scraping with ProxiesAPI

Author: Mohan Ganesan

Date: Sep 30, 2023

ScrapingAnt offers a robust web scraping API, but it can be expensive. ProxiesAPI simplifies scraping with easy pricing and delivers clean HTML from any webpage with one API call.

Does Python requests use urllib3?

Author: Mohan Ganesan

Date: Feb 20, 2024

Python requests library provides a high-level interface for making HTTP requests, while urllib3 handles the low-level details.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!