Effective Strategies for Rate Limiting Asynchronous Requests in Python

Feb 3, 2024 ยท 4 min read

Making asynchronous HTTP requests is very common in Python applications. Whether fetching data from an API or submitting forms, we often need to send multiple concurrent requests to maximize performance.

However, many web services enforce rate limits to prevent abuse and protect availability. Exceeding these limits can lead to errors or even get your API access revoked entirely!

In this guide, I'll share some effective strategies for respectfully rate limiting your asynchronous requests in Python. We'll cover:

  • Understanding common rate limiting approaches
  • Using queues to throttle concurrency
  • Retrying failed requests with backoff
  • Monitoring usage to stay under limits
  • Common Types of Rate Limiting

    There are a few popular patterns for how services rate limit API access:

  • Requests per second - The simplest approach, restricting the number of requests allowed per second. Common limits are around 5, 10, or 60 requests per second.
  • Concurrent requests - Restricts the number of requests that can be in-flight at any given time, like allowing only 15 concurrent calls.
  • Token buckets - More advanced, allocating a set number of tokens over time that each request consumes. Buckets refill at a defined rate to allow short bursts.
  • Leaky buckets - The opposite, limiting total requests per time period. New requests are queued if the threshold is already reached this period.
  • Understanding how an API rates limits allows us to shape traffic appropriately on the client side.

    Using Queues to Control Concurrency

    Since most Python async frameworks like asyncio and trio allow unrestrained concurrency, we need another solution to limit how many requests execute simultaneously.

    Queues make an excellent built-in tool for controlling concurrency. Here is an example using asyncio and the asyncio.Queue class:

    import asyncio
    import httpx
    async def fetch(url):
        async with httpx.AsyncClient() as client:
            return await client.get(url)
    async def worker(queue):
        while True:
            url = await queue.get()
            await fetch(url)
    async def main():  
        queue = asyncio.Queue(maxsize=10) 
        tasks = []
        for i in range(100):
        for _ in range(3): 
        await queue.join()
        for task in tasks:

    Here we create a queue with a max size of 10, limiting us to only 10 requests in flight globally across all our worker tasks.

    We could also use a Semaphore for similar throttling.

    Retrying Failed Requests with Exponential Backoff

    APIs often use rate limiting to maintain availability under high load. So if we start hitting limits, it's best to back off and retry requests later.

    We can use an exponential backoff algorithm that progressively waits longer between retries, reducing pressure on the API:

    import asyncio
    import random
    async def fetch(url):
        for tries in range(5):
                async with httpx.AsyncClient() as client:
                    return await client.get(url) 
            except httpx.HTTPStatusError as error:
                seconds = random.expovariate(1) * 2 ** tries
                print(f'Error fetching {url}, retrying in {seconds:.2f}s')
                await asyncio.sleep(seconds)
        raise RuntimeError(f'Failed to fetch {url} after 5 tries')

    This retries on any HTTP errors, waiting 2, 4, 8, 16 seconds between progressive attempts using exponential backoff with randomized jitter.

    Monitoring Usage to Stay Under Limits

    To stay under rate limits and avoid failures, we should monitor how close our application is trending toward request thresholds.

    Most APIs provide usage metadata in responses we can track. For example, GitHub's API includes remaining rate limit details in headers:

    X-RateLimit-Limit: 60
    X-RateLimit-Remaining: 56
    X-RateLimit-Reset: 1602132167

    We can capture and log this data with middleware and raise alerts if we cross certain thresholds, like remaining requests falling under 20% of the limit.

    Advanced options like token buckets also allow pre-emptively modeling expected usage to predict limit breaches before they trigger failures.

    Carefully rate limiting asynchronous requests helps avoid disruptions from exceeding limits, while still allowing reasonable use. Following these patterns, we can build robust applications that use APIs responsibly.

    Browse by tags:

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you

    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />


    Don't leave just yet!

    Enter your email below to claim your free API key: