Building a Simple Proxy Rotator with Elixir and Floki

Oct 2, 2023 · 4 min read

When starting a web crawling project or needing to scale to just a few hundred requests, having a simple proxy rotator that populates itself from free proxy pools can be useful.

We can use the sslproxies.org website to fetch public proxies every few minutes and use them in our Elixir projects.

This is what the site looks like:

And if you check the HTML using the inspect tool, you will see the full content is encapsulated in a table with the id proxylisttable

The IP and port are the first and second elements in each row.

We can use the following code to select the table and its rows to iterate on and further pull out the first and second elements of the elements.

Let's walk through a simple implementation. This assumes you have the floki library installed for HTML parsing.

First we'll fetch the HTML from sslproxies:

url = "<https://sslproxies.org/>"

headers = [
  {"User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9"}
]

response = HTTPoison.get!(url, headers)

This makes a GET request to the site pretending to be a web browser with the User-Agent header.

The full HTML content is now in response.body.

Inspecting the HTML, we see the proxy data is inside a table with id "proxylisttable".

The IP and port are the first and second elements in each row.

We can use Floki to select the table rows and extract the IP and port:

html = response.body

table = Floki.find(html, "#proxylisttable tr")

proxies =
  table
  |> Enum.map(fn row ->
    [ip, port] =
      row
      |> Floki.find("td")
      |> Enum.map(&Floki.text/1)

    %{ip: ip, port: port}
  end)

This finds the table rows, gets the td elements, extracts the text, and returns a map with the ip and port.

Let's wrap it in a function we can call whenever we want the latest proxies:

def fetch_proxies do
  url = "<https://sslproxies.org/>"

  headers = [
    {"User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9"}
  ]

  response = HTTPoison.get!(url, headers)

  html = response.body

  table = Floki.find(html, "#proxylisttable tr")

  proxies =
    table
    |> Enum.map(fn row ->
      [ip, port] =
        row
        |> Floki.find("td")
        |> Enum.map(&Floki.text/1)

      %{ip: ip, port: port}
    end)

  proxies
end

To get a random proxy:

proxies = fetch_proxies()

proxy = Enum.random(proxies)

IO.inspect proxy

This will print out a random proxy's IP and port.

You'll want to call fetch_proxies every few minutes to have the latest proxy IPs.

The full code:

# proxies.exs

defmodule Proxies do
  def fetch_proxies do
    url = "<https://sslproxies.org/>"

    headers = [
      {"User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9"}
    ]

    response = HTTPoison.get!(url, headers)

    html = response.body

    table = Floki.find(html, "#proxylisttable tr")

    proxies =
      table
      |> Enum.map(fn row ->
        [ip, port] =
          row
          |> Floki.find("td")
          |> Enum.map(&Floki.text/1)

        %{ip: ip, port: port}
      end)

    proxies
  end
end

# Usage

proxies = Proxies.fetch_proxies()
proxy = Enum.random(proxies)
IO.inspect proxy

This provides a full proxy rotator implementation in Elixir using Floki.

If you want to use this in production and want to scale to thousands of links, then you will find that many free proxies won't hold up under the speed and reliability requirements. In this scenario, using a rotating proxy service to rotate IPs is almost a must.

Otherwise, you tend to get IP blocked a lot by automatic location, usage, and bot detection algorithms.

Our rotating proxy server Proxies API provides a simple API that can solve all IP Blocking problems instantly.

  • With millions of high speed rotating proxies located all over the world • With our automatic IP rotation • With our automatic User-Agent-String rotation (which simulates requests from different, valid web browsers and web browser versions) • With our automatic CAPTCHA solving technology
  • Hundreds of our customers have successfully solved the headache of IP blocks with a simple API.

    A simple API can access the whole thing like below in any programming language.

    curl "<http://api.proxiesapi.com/?key=API_KEY&url=https://example.com>"
    

    We have a running offer of 1000 API calls completely free. Register and get your free API Key here.

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: