Stories from the Web Crawling trenches in HTTP

Using Proxies with Python Requests

Author: Mohan Ganesan

Date: Oct 22, 2023

Python requests library simplifies HTTP requests and API calls. Proxies help avoid IP blocking. Configure proxies using a dictionary or environment variables. Authenticate requests with credentials. Use sessions for persistent data. Disable SSL verification if trusted. Adjust timeouts and retries for robust requests.

Python's URL Handling Libraries compared - urllib vs requests

Author: Mohan Ganesan

Date: Nov 17, 2023

Python's URL handling libraries have evolved over time, from urllib to urllib2 to urllib3 and finally to requests. Each library offers different features and capabilities, making it important to choose the right one for your needs.

Retrieving and Parsing Text from URLs with Python's urllib

Author: Mohan Ganesan

Date: Feb 8, 2024

The urllib module in Python provides tools for retrieving and parsing content from URLs. It can fetch text content, parse HTML and JSON, and handle errors.

Properly Closing aiohttp Clients and Sessions

Author: Mohan Ganesan

Date: Mar 3, 2024

Properly close aiohttp ClientSession and connections to avoid resource leaks and TCP connection leaks over time.

Accessing Websites in Python with urllib.request.urlopen

Author: Mohan Ganesan

Date: Feb 6, 2024

The urllib.request module in Python 3 provides a simple way to access and download data from websites via HTTP and HTTPS.

Fetching Web Resources with urllib in MicroPython

Author: Mohan Ganesan

Date: Feb 6, 2024

The urllib module in MicroPython provides a simple interface for fetching resources from the web. It can handle HTTP requests and responses, making it easy to fetch JSON data, download images, and more.

Fetching Data in JavaScript with urllib

Author: Mohan Ganesan

Date: Feb 6, 2024

JavaScript uses urllib library to fetch data from URLs, including JSON APIs, in web browsers and Node.js environments.

Benchmarking aiohttp Web Performance

Author: Mohan Ganesan

Date: Feb 22, 2024

The Python aiohttp library provides powerful async HTTP client/server functionality. Benchmarking quantifies metrics like requests per second, latency distributions, and resource usage to guide optimization and capacity planning.

Making PUT Requests with aiohttp in Python

Author: Mohan Ganesan

Date: Mar 3, 2024

aiohttp library in Python provides a simple way to make asynchronous PUT requests, allowing for easy resource creation and updates.

Using AFNetworking Proxies for Web Scraping in 2024

Author: Mohan Ganesan

Date: Jan 9, 2024

Setting up a basic AFNetworking proxy, working with different proxy protocols, advanced proxy functionality, troubleshooting common AFNetworking proxy problems.

Serving HTTP Requests Efficiently with aiohttp's TCPServer

Author: Mohan Ganesan

Date: Mar 3, 2024

The aiohttp Python library provides powerful tools for building asynchronous HTTP services. TCPServer is a key component that handles details like accepting connections, reading/writing data, and closing connections. It supports HTTPS, handles concurrent connections efficiently, and is useful for microservices and API backends.

How to Use cURL with a Proxy

Author: Mohan Ganesan

Date: Apr 30, 2024

cURL is a powerful command-line tool for transferring data using proxies. It supports HTTP, HTTPS, and SOCKS proxies, and allows for data extraction and proxy authentication.

Accessing Resources in Python Without HTTP: Alternatives to the Requests Library

Author: Mohan Ganesan

Date: Feb 3, 2024

Python Requests library is popular for accessing resources over HTTP, but Python also offers options for working with local files, databases, and alternative protocols using the standard library and add-on modules.

urllib get

Author: Mohan Ganesan

Date: Feb 8, 2024

The urllib module in Python provides a simple interface for fetching data over HTTP. With just a few lines of code, you can easily make GET and POST requests to access web pages and APIs.

Async IO for Python: aiohttp 3.7.4

Author: Mohan Ganesan

Date: Mar 3, 2024

The aiohttp library provides asynchronous HTTP client/server functionality for Python based on the asyncio event loop. Version 3.7.4 contains useful updates that make aiohttp even more powerful and developer-friendly.

Is Urllib a standard Python package?

Author: Mohan Ganesan

Date: Feb 8, 2024

Urllib is a standard Python package for working with HTTP resources. It provides tools for fetching URLs, handling redirects, parsing response data, encoding requests, and more.

How to call API URL?

Author: Mohan Ganesan

Date: May 7, 2024

When building an application, you'll often need to retrieve or send data to an API. This guide covers the basics of making API requests in your code.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!