Building a Simple Proxy Rotator with Visual Basic and HTML Agility Pack

Oct 2, 2023 · 5 min read

In the beginning stages of a web crawling project or when you have to scale it to only a few hundred requests, you might want a simple proxy rotator that uses the free proxy pools available on the internet to populate itself now and then.

We can use a website like https://sslproxies.org/ to fetch public proxies every few minutes and use them in our VB projects.

This is what the site looks like:

And if you check the HTML using the inspect tool, you will see the full content is encapsulated in a table with the id proxylisttable

The IP and port are the first and second elements in each row.

We can use the following code to select the table and its rows to iterate on and further pull out the first and second elements of the elements.

Fetching the Proxies

First, we need to add the HTML Agility Pack NuGet package to our project. This will allow us to parse and query HTML documents.

PM> Install-Package HtmlAgilityPack

Next, we can fetch and parse the HTML from the proxy site:

Imports System.Net
Imports HtmlAgilityPack

Module Module1

  Sub Main()

    Dim url As String = "<https://sslproxies.org/>"
    Dim webClient As New WebClient()
    Dim html As String = webClient.DownloadString(url)

    Dim doc As New HtmlDocument()
    doc.LoadHtml(html)

  End Sub

End Module

The doc object now contains the full HTML document.

Extracting the Proxies

The proxies are contained in a table with id "proxylisttable". We can use LINQ to select the table rows and extract the IP and port:

Dim proxies = doc.DocumentNode.Descendants("table") _
                .Where(Function(t) t.Id = "proxylisttable") _
                .SelectMany(Function(t) t.Descendants("tr")) _
                .Select(Function(tr) New With { _
                   .IP = tr.Elements("td")(0).InnerText, _
                   .Port = tr.Elements("td")(1).InnerText _
                })

This queries the descendent

elements, filters by the "proxylisttable" id, gets the rows, and creates anonymous types containing the IP and port.

Storing the Proxies

Let's add the proxies to a List(Of Proxy) structure we can use later:

Structure Proxy
  Public IP As String
  Public Port As Integer
End Structure

Dim proxyList As New List(Of Proxy)

For Each p In proxies
  Dim proxy As New Proxy()
  proxy.IP = p.IP
  proxy.Port = Integer.Parse(p.Port)

  proxyList.Add(proxy)
Next

Selecting a Random Proxy

To select a random proxy from the list, we can use the .Random extension method:

Dim randomProxy As Proxy = proxyList.Random()

Console.WriteLine(randomProxy.IP)
Console.WriteLine(randomProxy.Port)

This will print out a random IP and port each time.

Refreshing the Proxies

To keep the proxy list up to date, we should call the fetch code every few minutes. We can use a Timer for this:

Dim timer As New Timer()
timer.Interval = 1000 * 60 * 5 ' 5 minutes

AddHandler timer.Elapsed, Sub()
                             ' Fetch proxies
                             proxyList.Clear()
                             ' ...code to fetch proxies...

                             ' Select new random proxy
                             randomProxy = proxyList.Random()
                           End Sub

timer.Start()

Now we have a constantly updating list of proxies to use in our VB application!

Full Code

Here is the full code to easily copy and paste:

Imports System
Imports System.Net
Imports System.Timers
Imports HtmlAgilityPack

Module Module1

  Structure Proxy
    Public IP As String
    Public Port As Integer
  End Structure

  Dim proxyList As New List(Of Proxy)
  Dim randomProxy As Proxy

  Sub Main()

    Dim timer As New Timer()
    timer.Interval = 1000 * 60 * 5
    AddHandler timer.Elapsed, AddressOf RefreshProxies
    timer.Start()

    Console.ReadLine()

  End Sub

  Sub RefreshProxies()

    proxyList.Clear()

    Dim url As String = "<https://sslproxies.org/>"
    Dim webClient As New WebClient()
    Dim html As String = webClient.DownloadString(url)

    Dim doc As New HtmlDocument()
    doc.LoadHtml(html)

    Dim proxies = doc.DocumentNode.Descendants("table") _
                    .Where(Function(t) t.Id = "proxylisttable") _
                    .SelectMany(Function(t) t.Descendants("tr")) _
                    .Select(Function(tr) New With { _
                        .IP = tr.Elements("td")(0).InnerText, _
                        .Port = tr.Elements("td")(1).InnerText _
                     })

    For Each p In proxies
      Dim proxy As New Proxy()
      proxy.IP = p.IP
      proxy.Port = Integer.Parse(p.Port)

      proxyList.Add(proxy)
    Next

    randomProxy = proxyList.Random()

  End Sub

End Module

This provides a simple way to keep rotating proxies in a Visual Basic application.

If you want to use this in production and want to scale to thousands of links, then you will find that many free proxies won't hold up under the speed and reliability requirements. In this scenario, using a rotating proxy service to rotate IPs is almost a must.

Otherwise, you tend to get IP blocked a lot by automatic location, usage, and bot detection algorithms.

Our rotating proxy server Proxies API provides a simple API that can solve all IP Blocking problems instantly.

  • With millions of high speed rotating proxies located all over the world • With our automatic IP rotation • With our automatic User-Agent-String rotation (which simulates requests from different, valid web browsers and web browser versions) • With our automatic CAPTCHA solving technology
  • Hundreds of our customers have successfully solved the headache of IP blocks with a simple API.

    A simple API can access the whole thing like below in any programming language.

    curl "<http://api.proxiesapi.com/?key=API_KEY&url=https://example.com>"
    

    We have a running offer of 1000 API calls completely free. Register and get your free API Key here.

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: