Building a Simple Proxy Rotator with JavaScript and Puppeteer

Oct 2, 2023 · 5 min read

In the beginning stages of a web crawling project or when you have to scale it to only a few hundred requests, you might want a simple proxy rotator that uses the free proxy pools available on the internet to populate itself now and then.

We can use a website like https://sslproxies.org/ to fetch public proxies every few minutes and use them in our JavaScript projects.

This is what the site looks like:

And if you check the HTML using the inspect tool, you will see the full content is encapsulated in a table with the id proxylisttable

The IP and port are the first and second elements in each row.

We can use the following code to select the table and its rows to iterate on and further pull out the first and second elements of the elements.

Fetching the Proxies

First, we'll need to install Puppeteer to handle fetching and parsing the HTML:

npm install puppeteer

Then we can use Puppeteer to fetch the proxy list page:

const puppeteer = require('puppeteer');

async function getProxies() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto('<https://sslproxies.org/>');

  const content = await page.content();

  await browser.close();

  return content;
}

This will fetch the full HTML content of the proxies page that we can then parse.

Parsing the HTML

To parse the HTML, we'll use cheerio, which provides jQuery-style selectors:

npm install cheerio

Then we can use it to parse the content:

const cheerio = require('cheerio');

function parseProxies(content) {
  const $ = cheerio.load(content);
  const proxies = [];

  $('#proxylisttable tr').each((i, elem) => {
    const ip = $(elem).find('td:nth-child(1)').text();
    const port = $(elem).find('td:nth-child(2)').text();

    if (ip && port) {
      proxies.push({ ip, port });
    }
  });

  return proxies;
}

This loops through each table row, grabs the IP and port, and adds them to the proxies array if they exist.

Putting It Together

Now we can put it all together into a getProxies function:

const puppeteer = require('puppeteer');
const cheerio = require('cheerio');

async function getProxies() {

  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto('<https://sslproxies.org/>');

  const content = await page.content();

  await browser.close();

  return parseProxies(content);

}

function parseProxies(content) {

  const $ = cheerio.load(content);
  const proxies = [];

  $('#proxylisttable tr').each((i, elem) => {

    const ip = $(elem).find('td:nth-child(1)').text();
    const port = $(elem).find('td:nth-child(2)').text();

    if (ip && port) {
      proxies.push({ ip, port });
    }

  });

  return proxies;

}

// Usage:

getProxies()
  .then(proxies => {
    // use proxies
  })

This allows us to fetch and parse the proxies simply by calling getProxies().

Using a Random Proxy

To use a random proxy from the list, we can select one randomly like:

function getRandomProxy(proxies) {
  return proxies[Math.floor(Math.random() * proxies.length)];
}

getProxies()
  .then(proxies => {
    const randomProxy = getRandomProxy(proxies);

    // use randomProxy
  })

This will give you a different random proxy each time.

You could call getProxies() every few minutes to keep the proxy list fresh.

Summary

In summary, here is the full code to fetch and parse proxies as well as selecting a random one:

// Fetch proxies
const puppeteer = require('puppeteer');

async function getProxies() {

  // Launch browser
  const browser = await puppeteer.launch();

  // Navigate to url
  const page = await browser.newPage();
  await page.goto('<https://sslproxies.org/>');

  // Get content
  const content = await page.content();

  // Close browser
  await browser.close();

  return parseProxies(content);

}

// Parse proxies
const cheerio = require('cheerio');

function parseProxies(content) {

  const $ = cheerio.load(content);
  const proxies = [];

  $('#proxylisttable tr').each((i, elem) => {

    const ip = $(elem).find('td:nth-child(1)').text();
    const port = $(elem).find('td:nth-child(2)').text();

    if (ip && port) {
      proxies.push({ ip, port });
    }

  });

  return proxies;

}


// Get random proxy
function getRandomProxy(proxies) {
  return proxies[Math.floor(Math.random() * proxies.length)];
}


// Usage:

getProxies()
  .then(proxies => {
    const randomProxy = getRandomProxy(proxies);

    // Use randomProxy

  })

This provides a simple way to fetch and use free proxies in your JavaScript projects!

If you want to use this in production and want to scale to thousands of links, then you will find that many free proxies won't hold up under the speed and reliability requirements. In this scenario, using a rotating proxy service to rotate IPs is almost a must.

Otherwise, you tend to get IP blocked a lot by automatic location, usage, and bot detection algorithms.

Our rotating proxy server Proxies API provides a simple API that can solve all IP Blocking problems instantly.

  • With millions of high speed rotating proxies located all over the world • With our automatic IP rotation • With our automatic User-Agent-String rotation (which simulates requests from different, valid web browsers and web browser versions) • With our automatic CAPTCHA solving technology
  • Hundreds of our customers have successfully solved the headache of IP blocks with a simple API.

    A simple API can access the whole thing like below in any programming language.

    curl "<http://api.proxiesapi.com/?key=API_KEY&url=https://example.com>"
    

    We have a running offer of 1000 API calls completely free. Register and get your free API Key here.

    Browse by tags:

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!