Effective Strategies for Rate Limiting Asynchronous Requests in Python

Making asynchronous HTTP requests is very common in Python applications. Whether fetching data from an API or submitting forms, we often need to send multiple concurrent requests to maximize performance.

However, many web services enforce rate limits to prevent abuse and protect availability. Exceeding these limits can lead to errors or even get your API access revoked entirely!

In this guide, I'll share some effective strategies for respectfully rate limiting your asynchronous requests in Python. We'll cover:

Understanding common rate limiting approaches

Using queues to throttle concurrency

Retrying failed requests with backoff

Monitoring usage to stay under limits

Common Types of Rate Limiting

There are a few popular patterns for how services rate limit API access:

Requests per second - The simplest approach, restricting the number of requests allowed per second. Common limits are around 5, 10, or 60 requests per second.

Concurrent requests - Restricts the number of requests that can be in-flight at any given time, like allowing only 15 concurrent calls.

Token buckets - More advanced, allocating a set number of tokens over time that each request consumes. Buckets refill at a defined rate to allow short bursts.

Leaky buckets - The opposite, limiting total requests per time period. New requests are queued if the threshold is already reached this period.

Understanding how an API rates limits allows us to shape traffic appropriately on the client side.

Using Queues to Control Concurrency

Since most Python async frameworks like asyncio and trio allow unrestrained concurrency, we need another solution to limit how many requests execute simultaneously.

Queues make an excellent built-in tool for controlling concurrency. Here is an example using asyncio and the asyncio.Queue class:

import asyncio
import httpx

async def fetch(url):
    async with httpx.AsyncClient() as client:
        return await client.get(url)
        
async def worker(queue):
    while True:
        url = await queue.get()
        await fetch(url)
        queue.task_done()

async def main():  
    queue = asyncio.Queue(maxsize=10) 
    tasks = []
    
    for i in range(100):
        queue.put_nowait(f'https://api.example.com/data?id={i}')
    
    for _ in range(3): 
        tasks.append(asyncio.create_task(worker(queue)))
        
    await queue.join()
    for task in tasks:
        task.cancel()
		
asyncio.run(main())

Here we create a queue with a max size of 10, limiting us to only 10 requests in flight globally across all our worker tasks.

We could also use a Semaphore for similar throttling.

Retrying Failed Requests with Exponential Backoff

APIs often use rate limiting to maintain availability under high load. So if we start hitting limits, it's best to back off and retry requests later.

We can use an exponential backoff algorithm that progressively waits longer between retries, reducing pressure on the API:

import asyncio
import random

async def fetch(url):
    for tries in range(5):
        try:
            async with httpx.AsyncClient() as client:
                return await client.get(url) 
        except httpx.HTTPStatusError as error:
            seconds = random.expovariate(1) * 2 ** tries
            print(f'Error fetching {url}, retrying in {seconds:.2f}s')
            await asyncio.sleep(seconds)
    raise RuntimeError(f'Failed to fetch {url} after 5 tries')

This retries on any HTTP errors, waiting 2, 4, 8, 16 seconds between progressive attempts using exponential backoff with randomized jitter.

Monitoring Usage to Stay Under Limits

To stay under rate limits and avoid failures, we should monitor how close our application is trending toward request thresholds.

Most APIs provide usage metadata in responses we can track. For example, GitHub's API includes remaining rate limit details in headers:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 56
X-RateLimit-Reset: 1602132167

We can capture and log this data with middleware and raise alerts if we cross certain thresholds, like remaining requests falling under 20% of the limit.

Advanced options like token buckets also allow pre-emptively modeling expected usage to predict limit breaches before they trigger failures.

Carefully rate limiting asynchronous requests helps avoid disruptions from exceeding limits, while still allowing reasonable use. Following these patterns, we can build robust applications that use APIs responsibly.

Effective Strategies for Rate Limiting Asynchronous Requests in Python

Common Types of Rate Limiting

Using Queues to Control Concurrency

Retrying Failed Requests with Exponential Backoff

Monitoring Usage to Stay Under Limits

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

Effective Strategies for Rate Limiting Asynchronous Requests in Python

Common Types of Rate Limiting

Using Queues to Control Concurrency

Retrying Failed Requests with Exponential Backoff

Monitoring Usage to Stay Under Limits

The easiest way to do Web Scraping

Don't leave just yet!