Building a Simple Proxy Rotator with Kotlin and Jsoup

Oct 2, 2023 · 4 min read

In the beginning stages of a web crawling project or when you have to scale it to only a few hundred requests, you might want a simple proxy rotator that uses the free proxy pools available on the internet to populate itself now and then.

We can use a website like https://sslproxies.org/ to fetch public proxies every few minutes and use them in our Kotlin projects.

This is what the site looks like:

And if you check the HTML using the inspect tool, you will see the full content is encapsulated in a table with the id proxylisttable

The IP and port are the first and second elements in each row.

We can use the following code to select the table and its rows to iterate on and further pull out the first and second elements of the elements.

Setup

To follow along with this tutorial, you'll need:

  • Kotlin installed
  • A build tool like Gradle or Maven
  • The Jsoup library dependency
  • implementation 'org.jsoup:jsoup:1.14.3'
    

    Fetching Proxies

    Let's start by fetching proxies from https://sslproxies.org/.

    We'll use Jsoup to parse the HTML:

    import org.jsoup.Jsoup
    import org.jsoup.nodes.Document
    
    fun fetchProxies(): Document {
      val url = "<https://sslproxies.org/>"
    
      val userAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
    
      val response = Jsoup.connect(url).userAgent(userAgent).get()
    
      return response
    }
    

    This sends a GET request to sslproxies.org using a browser User Agent string.

    It returns a Jsoup Document containing the HTML.

    Parsing the Proxies

    Next, we need to parse the HTML to extract the proxies.

    The proxies are contained in a table with id "proxylisttable". Each row contains an IP and port in the first and second columns.

    We can use a Jsoup selector to find the table, then loop through the rows:

    val doc = fetchProxies()
    
    val proxies = mutableListOf<Proxy>()
    
    doc.select("#proxylisttable tr").forEach { row ->
    
      val ip = row.select("td")[0].text()
      val port = row.select("td")[1].text()
    
      proxies.add(Proxy(ip, port))
    
    }
    

    This selects the table rows, gets the IP and port text from the cells, and adds a new Proxy object to the list.

    Using a Random Proxy

    To use a random proxy from the list, we can pick a random index:

    import kotlin.random.Random
    
    // Fetch proxies
    
    val randomIndex = Random.nextInt(proxies.size)
    val randomProxy = proxies[randomIndex]
    
    println(randomProxy)
    

    This will print out a random proxy each time.

    Refreshing the Proxy List

    To keep the list up to date, we need to refresh it every few minutes. We can wrap the logic in a function:

    fun refreshProxies() {
    
      val doc = fetchProxies()
    
      proxies.clear()
    
      // Parse proxies
    
      proxies.addAll(...)
    
    }
    

    Then call this on a timer:

    import kotlin.concurrent.timer
    
    var proxies = mutableListOf<Proxy>()
    
    fun main() {
    
      timer(period = 5 * 60 * 1000) {
        refreshProxies()
      }
    
    }
    

    This will refresh the proxy list every 5 minutes.

    Full Code

    Here is the full code:

    import org.jsoup.Jsoup
    import org.jsoup.nodes.Document
    import kotlin.random.Random
    import kotlin.concurrent.timer
    
    data class Proxy(val ip: String, val port: String)
    
    var proxies = mutableListOf<Proxy>()
    
    fun fetchProxies(): Document {
      //...
    }
    
    fun parseProxies(doc: Document) {
      //...
    }
    
    fun refreshProxies() {
      val doc = fetchProxies()
    
      proxies.clear()
    
      parseProxies(doc)
    }
    
    fun main() {
    
      timer(period = 5 * 60 * 1000) {
        refreshProxies()
      }
    
    }
    

    This provides a simple proxy rotator in Kotlin using free proxy lists.

    If you want to use this in production and want to scale to thousands of links, then you will find that many free proxies won't hold up under the speed and reliability requirements. In this scenario, using a rotating proxy service to rotate IPs is almost a must.

    Otherwise, you tend to get IP blocked a lot by automatic location, usage, and bot detection algorithms.

    Our rotating proxy server Proxies API provides a simple API that can solve all IP Blocking problems instantly.

  • With millions of high speed rotating proxies located all over the world • With our automatic IP rotation • With our automatic User-Agent-String rotation (which simulates requests from different, valid web browsers and web browser versions) • With our automatic CAPTCHA solving technology
  • Hundreds of our customers have successfully solved the headache of IP blocks with a simple API.

    A simple API can access the whole thing like below in any programming language.

    curl "<http://api.proxiesapi.com/?key=API_KEY&url=https://example.com>"
    

    We have a running offer of 1000 API calls completely free. Register and get your free API Key here.

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: