How to Find Free Proxies & Rotate Them with Python

Oct 4, 2023 ยท 4 min read

Web scraping can often lead to getting blocked from websites due to too many requests coming from a single IP address. One technique to avoid this is using proxies to rotate your IP address with each request.

This article will cover how to find free proxies and successfully rotate through them in your Python code to avoid getting blocked while web scraping.

What is a Proxy Server?

A proxy server acts as an intermediary between your computer and the website you are accessing. When you make a request through a proxy server, the website will see the proxy's IP address rather than your own.

This allows you to hide your real IP address and distribute requests across multiple IP addresses to avoid getting blocked.

Finding Free Proxies

There are many websites that provide free proxy lists that you can use. However, most of these proxies are well known and likely already blocked by sites trying to prevent scraping.

A better approach is to scrape these free proxy sites yourself to get fresh proxies that are more likely to work. Here's how to scrape and collect free proxies in Python:

import requests
from bs4 import BeautifulSoup

def getProxies():
  r = requests.get('<https://free-proxy-list.net/>')
  soup = BeautifulSoup(r.content, 'html.parser')
  table = soup.find('tbody')

  proxies = []
  for row in table:
    if row.find_all('td')[4].text == 'elite proxy':
      proxy = ':'.join([row.find_all('td')[0].text, row.find_all('td')[1].text])
      proxies.append(proxy)
    else:
      pass

  return proxies

This scrapes the free proxy table on free-proxy-list.net, checks for "elite" proxies, and returns a list of proxies in the format IP:PORT.

Testing Proxies with Python

Once you have a list of potential proxies, you'll want to test them to verify they are working. Here is a simple way to test proxies:

import requests

def testProxy(proxy):

  try:
    r = requests.get("<https://httpbin.org/ip>", proxies={"http": proxy, "https": proxy}, timeout=1)
    print(r.json())
    print("Working!")
  except:
    print("Not working")

This makes a request to httpbin.org/ip which returns your originating IP address. If the proxy works, it will return the proxy's IP rather than your own.

You can test a list of proxies quickly using Python's concurrent.futures module:

import concurrent.futures

with concurrent.futures.ThreadPoolExecutor() as executor:
  executor.map(testProxy, proxies)

This will test all proxies concurrently and print out the working ones.

Rotating Proxies in Your Code

Once you have a list of working proxies, you can rotate through them in your scraper code to avoid getting blocked.

Here is an example using the requests module:

import requests
from random import choice

proxies = [# list of working proxies]

proxy = choice(proxies)

headers = {"User-Agent": "Mozilla/5.0"}

for page in range(1, 11):

  r = requests.get("<https://example.com/>" + str(page), headers=headers, proxies={"http": proxy, "https": proxy})

  # scraping logic here

  proxy = choice(proxies) # rotate proxy

This chooses a random proxy from the list for each request and cycles through them.

The key points are:

  • Use a list of verified working proxies
  • Randomize which proxy is used for each request
  • Make sure to rotate to a new proxy regularly
  • This should allow you to scrape sites successfully without getting blocked.

    Conclusion

    Proxy rotation is important for avoiding blocks while web scraping. This article covered:

  • How to find free proxies through scraping
  • Testing proxies to identify ones that work
  • Rotating through the working proxies in your scraper code
  • While these tools are great for learning, scraping production-level sites can pose challenges like CAPTCHAs, IP blocks, and bot detection. Rotating proxies and automated CAPTCHA solving can help.

    Proxies API offers a simple API for rendering pages with built-in proxy rotation, CAPTCHA solving, and evasion of IP blocks. You can fetch rendered pages in any language without configuring browsers or proxies yourself.

    This allows scraping at scale without headaches of IP blocks. Proxies API has a free tier to get started. Check out the API and sign up for an API key to supercharge your web scraping.

    With the power of Proxies API combined with Python libraries like Beautiful Soup, you can scrape data at scale without getting blocked.

    Browse by tags:

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: