Simplifying HTTP Requests in Python: urllib2 vs urllib vs requests

Feb 3, 2024 ยท 4 min read

Making HTTP requests is a fundamental task in most Python applications. However, with the default Python installation, it can be more complicated than needed. In this article, we'll explore the progression of HTTP client libraries in Python - from the low-level urllib2 to the simplified requests. We'll compare their APIs and use cases to help you pick the right tool for your next Python project.

urllib2 - Python's Default HTTP Client

Up until Python 2.6, urllib2 was the built-in HTTP client. It provides a low-level interface, handling things like:

  • Opening and closing connections
  • Headers, redirection, cookies
  • SSL verification
  • Proxy support
  • Encoding/decoding content
  • For example, here's how to make a simple GET request with urllib2:

    import urllib2
    
    response = urllib2.urlopen('http://example.com')
    html = response.read()

    While full-featured, the API is clunky and involves error-prone boilerplate code even for basic operations:

    import urllib2
    request = urllib2.Request('http://example.com')
    request.add_header('User-Agent', 'My Python App')
    
    try:
        response = urllib2.urlopen(request)
    except urllib2.HTTPError as e:
        print(e.code)
    except urllib2.URLError as e: 
        print(e.reason)
    else:
        print(response.read())

    So what are some downsides to using urllib2?

  • Verbose API with lots of boilerplate code
  • Handles HTTP exceptions by raising errors
  • Manual handling of headers, redirection, cookies
  • No built-in connection pooling or retries
  • Overall, it requires significant effort to use urllib2 effectively for anything beyond trivial requests.

    Enter urllib - A Minor Improvement

    In Python 3, urllib2 was split into several modules like urllib.request for issuing requests. The API is mostly the same, but with some minor improvements:

  • Separate classes for Request and Response objects
  • Supports custom HTTP handlers
  • Timeout support
  • Access to low-level socket object
  • Here's an example HTTP GET request with urllib:

    from urllib import request
    
    response = request.urlopen('http://example.com')
    html = response.read()

    While the API is slightly better, it still shares many of the same problems as urllib2:

  • Lots of boilerplate code for simple operations
  • Manual handling of HTTP exceptions
  • No built-in connection pooling, sessions, or retries
  • So urllib is only a small incremental improvement over urllib2.

    requests - A Simple Yet Powerful Library

    Released in 2011, the requests library aimed to make HTTP requests in Python more user-friendly. With requests, you can make HTTP API calls with minimal code:

    import requests
    
    response = requests.get('http://example.com')
    print(response.text)

    Compared to urllib and urllib2, some key advantages of requests include:

    Simplified API - Intuitive functions like .get(), .post() instead of Request/urlopen boilerplate.

    Built-in JSON support - Automatic JSON encoding/decoding with .json() method.

    Connection pooling/sessions - Underlying connection pooling and session handling.

    Automatic error handling - HTTP errors raise clean Python exceptions instead.

    Helper methods - Utilities like response headers/body, cookies, timeouts, retries.

    Third-party support - Plays nicely with extensions like caching, proxies, authentication.

    While requests doesn't handle as low-level details as urllib, it's perfect for most API usage. Due to its simplicity and features, requests has become the de facto standard for HTTP in Python.

    Here's a more in-depth example with error handling, headers, and JSON data:

    import requests
    
    url = 'http://example.com/api'
    headers = {'User-Agent': 'My App'}
    
    try:
        response = requests.get(url, headers=headers)   
        response.raise_for_status() # Check for 4xx/5xx status codes
    except requests.exceptions.HTTPError as e:
        print(e)
    else:
        data = response.json()
        print(data)

    So in summary, what are some good use cases for each library?

  • urllib - Low-level HTTP handling, advanced customization
  • urllib2 - Legacy Python 2.x applications
  • requests - Simplified usage for most HTTP APIs
  • Requests is suitable for over 90% of HTTP usage in Python. However, urllib and urllib2 still have niche use cases for advanced low-level control.

    Key Takeaways

  • urllib2 - Low-level HTTP client in Python 2.x
  • urllib - Minor improvements over urllib2, split into modules
  • requests - Simplified, standard library for HTTP APIs
  • Requests is perfect for most applications, while urllib is better for low-level control
  • Browse by tags:

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: