Stories from the Web Crawling trenches in cURL

How do I Make cURL Ignore the Proxy?

Author: Mohan Ganesan

Date: Jan 9, 2024

Unset HTTP_PROXY and HTTPS_PROXY environment variables. Set NO_PROXY to exclude specific hosts/domains from the proxy. Use --noproxy or related curl options to disable the proxy per request.

Downloading Images from URLs in PHP

Author: Mohan Ganesan

Date: May 5, 2024

Learn different methods to download images from URLs using PHP, including file_get_contents, cURL, fopen, fwrite, Guzzle, and Imagick.

Using Proxies in file_get_contents in PHP in 2024

Author: Mohan Ganesan

Date: Jan 9, 2024

Proxying web requests in PHP using stream_context_create and file_get_contents. Adding authentication for secure proxies. Advanced HTTP options through stream contexts. Debugging common PHP proxy problems. Scraping via cURL. Leveraging Proxy-as-a-Service for robust web scraping with Proxies API.

How to Set and Change User Agent when using curl

Author: Mohan Ganesan

Date: Jan 9, 2024

Learn how to change cURL's user agent to avoid blocks and mimic real browsers for web scraping and API testing.

Building a Simple Proxy Rotator with PHP and SimpleHTMLDOM

Author: Mohan Ganesan

Date: Oct 2, 2023

Implement a rotating proxy in PHP using free proxies from sslproxies.org. Use SimpleHTMLDOM and cURL to fetch and parse the proxies. Rotate IPs and User-Agent-String to avoid IP blocking with Proxies API.

Scraping Data from Wikipedia with PHP

Author: Mohan Ganesan

Date: Dec 6, 2023

Web scraping is the process of extracting data from websites automatically. This article demonstrates how to scrape Wikipedia using PHP and cURL to get data on the Presidents of the United States.

Making HTTP Requests in PHP: Alternatives to Python's Requests

Author: Mohan Ganesan

Date: Feb 3, 2024

The Python requests module is beloved by Python developers for its simplicity in making HTTP requests. PHP developers looking for that same simplicity have several solid options to choose from, including Guzzle, Symfony HTTP Client, and cURL.

Managing cURL HTTP Redirects

Author: Mohan Ganesan

Date: Jan 9, 2024

Learn how to manage HTTP redirects with cURL for effective web scraping, avoiding redirect loops, lost credentials, and changed request methods.

Scraping Reddit Posts with PHP

Author: Mohan Ganesan

Date: Jan 9, 2024

Web scraping with PHP to extract data from Reddit using DOM parsing, CSS selectors, and cURL.

Scraping Real Estate Listings From Realtor with PHP

Author: Mohan Ganesan

Date: Jan 9, 2024

Learn how to scrape real estate listings from Realtor.com using PHP and cURL. Extract data using DOMDocument and XPath.

Scraping Data from Wikipedia in C++

Author: Mohan Ganesan

Date: Dec 6, 2023

Scraping Wikipedia using cURL and Gumbo to extract details on US presidents from a table.

Scraping Yelp Business Listings with PHP

Author: Mohan Ganesan

Date: Dec 6, 2023

Web scraping guide for extracting data from Yelp business listings using PHP and XPath.

How to Use cURL with a Proxy

Author: Mohan Ganesan

Date: Apr 30, 2024

cURL is a powerful command-line tool for transferring data using proxies. It supports HTTP, HTTPS, and SOCKS proxies, and allows for data extraction and proxy authentication.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!