Python Requests: Retry Failed Requests in 2023

Oct 22, 2023 ยท 5 min read

Introduction

Handling failed requests is a critical aspect of building robust applications in Python. Requests may fail due to network issues, server errors, or other transient problems. Having a solid retry strategy helps make applications more fault-tolerant and improves overall reliability.

In this comprehensive guide, we'll cover everything you need to know about retrying failed requests in Python using the powerful Requests library.

Overview of Retrying Failed Requests

The goal of retrying failed requests is to repeat the request in case of failure to maximize the chance of it succeeding. This prevents failures from bringing down the entire application.

Some common cases where retrying requests helps include:

  • Network connection issues or timeouts
  • Server errors like 500 or 503 status codes
  • Throttling and rate limiting (429 status code)
  • Authentication issues
  • By retrying with some delay, the underlying cause of failure may be resolved before the next attempt.

    Common Reasons for Request Failures

    Some typical reasons requests fail:

  • Network errors - DNS failures, refused connections, connection timeouts, etc.
  • Server errors - Internal server issues, overload, maintenance mode.
  • Client errors - Incorrect requests, authentication problems, rate limiting.
  • Transient errors - Short-lived problems, race conditions, unexpected load.
  • Having proper retry logic helps handle many of these cases gracefully.

    Implementing Retry Logic

    The Requests library makes it easy to retry failed requests with some handy components.

    Core Components

    Retry Object

    The urllib3.Retry object defines the retry strategy including number of retries, backoff, and conditions to trigger a retry.

    HTTPAdapter

    The requests.adapters.HTTPAdapter handles sending requests and configurable options like retries.

    Session Object

    The requests.Session persists settings across requests and is used to send all requests.

    Number of Retries

    Deciding the number of retry attempts involves balancing reliability and performance. Some factors to consider:

  • More retries improve fault tolerance but increase request time.
  • 3-5 retries covers most common transient failures.
  • Certain requests may need higher retries than others.
  • Infinite retries with backoff can be used for critical requests. But safeguards should prevent endless futile retries.

    Retry Conditions

    The Retry object allows controlling exactly when a retry is triggered via:

  • status_forcelist - List of HTTP status codes like 500, 503.
  • method_whitelist - Allowed HTTP methods like 'GET'.
  • backoff_factor - Sleep between retries.
  • Exceptions like ConnectionError.
  • Delay and Backoff

    Adding a delay between retries can improve reliability:

  • A fixed delay is simple but constant time between retries.
  • Exponential backoff increases delay each retry, preventing overload.
  • The Retry-After response header can be checked to respect server advised next retry time.
  • Putting It Together

    Basic Retry Pattern

    retries = Retry(total=3, backoff_factor=1)
    adapter = HTTPAdapter(max_retries=retries)
    
    session = requests.Session()
    session.mount('https://', adapter)
    

    Full Configuration Example

    More advanced configuration:

    retries = Retry(
       total=10,
       backoff_factor=2,
       status_forcelist=[500, 502, 503],
       method_whitelist=['GET','POST']
    )
    
    adapter = HTTPAdapter(max_retries=retries)
    session = requests.Session()
    
    session.mount('<https://api.example.com>', adapter)
    

    This does ten total retries with exponential backoff for the specified server and request methods.

    Advanced Usage

    Timeouts

    Setting timeouts prevents hanging requests. They can be combined with retries.

    # Timeout if response takes over 3 seconds
    timeout = Timeout(connect=3.0, read=3.0)
    adapter = HTTPAdapter(max_retries=retries, timeout=timeout)
    

    This times out a request but allows retrying.

    Logging

    Debugging failed requests is easier with logging enabled:

    # Log details for all requests
    import logging
    logging.basicConfig(level=logging.DEBUG)
    

    The Requests log includes the request URL, parameters, response code, and timing.

    Testing

    Mocking API responses allows testing retry logic during development without hitting real endpoints:

    import responses
    
    @responses.activate
    def test_retries():
       responses.add(responses.GET, '<https://api.example.com/users>', status=500)
    
       resp = session.get('<https://api.example.com/users>')
       # Assert retries happened as expected
    

    Responses provides a simple way to simulate API responses when testing.

    Common Issues

    Avoiding Infinite Loops

    Infinite retry loops can happen if there are no limits and the issue persists. Some ways to prevent this:

  • Limit number of total retries like Retry(total=3).
  • Use exponential backoff to delay and eventually give up.
  • Reset state on exceptions before retrying.
  • Set timeouts on requests.
  • Handling Side Effects

    For non-idempotent requests like POST, retries can end up duplicating effects. Ways to address:

  • Retry only idempotent requests like GET by default.
  • Add logic to check for duplicate side effects.
  • Design APIs to be idempotent where possible.
  • Watching for Stale Data

    Retrying means multiple requests for the same resource. Ensure:

  • APIs return the latest data on each call.
  • Add request headers like cache-control.
  • Application logic handles potential stale data issues.
  • Conclusion

    That covers the key aspects around retrying failed requests in Python! Here are some key takeaways:

  • Use the Retry and HTTPAdapter objects for configured retries.
  • Pay attention to number of retries, delays, and triggering criteria.
  • Combine retries and timeouts for comprehensive failure handling.
  • Address side effects, infinite loops, stale data with request and app logic.
  • Test retry logic thoroughly using mocking libraries like Responses.
  • Enable debug logging to understand failures and fine-tune retries.
  • Overall, having solid retry capabilities makes applications much more reliable. And the Python Requests library provides excellent tools to build this in a customizable way.

    However, even with meticulous retry logic, scrapers and crawlers can run into issues like getting blocked by targets. By rotating IPs and simulating real browsers, services like Proxies API can provide an easy alternative to complex self-managed scraping infrastructure.

    Proxies API handles proxy rotation, browser simulation, and automatic CAPTCHA solving through a simple API. Some key benefits:

  • Millions of rotating proxies to avoid IP blocks
  • Auto user-agent rotation to mimic real browsers
  • Built-in headless browser rendering
  • CAPTCHA solving for difficult sites
  • Easy integration with Python Requests
  • Rather than worrying about the nuances of proxies, browsers, and CAPTCHAs, Proxies API can handle it all behind the scenes:

    import requests
    
    PROXY_URL = "<http://api.proxiesapi.com/?url=example.com&render=1>"
    
    proxies = {
      'http': PROXY_URL,
      'https': PROXY_URL
    }
    
    requests.get('<http://example.com>', proxies=proxies)
    

    For complex scraping and automation tasks, Proxies API is worth considering to have a battle-tested service handle the heavy lifting. The API abstracts away the hard parts, while providing the key capabilities needed to scrape and crawl at scale.

    So check out Proxies API to make scraping and automation more effective!

    Browse by tags:

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!