Simplifying HTTP Requests in Python: urllib2 vs urllib vs requests

Making HTTP requests is a fundamental task in most Python applications. However, with the default Python installation, it can be more complicated than needed. In this article, we'll explore the progression of HTTP client libraries in Python - from the low-level urllib2 to the simplified requests. We'll compare their APIs and use cases to help you pick the right tool for your next Python project.

urllib2 - Python's Default HTTP Client

Up until Python 2.6, urllib2 was the built-in HTTP client. It provides a low-level interface, handling things like:

Opening and closing connections

Headers, redirection, cookies

SSL verification

Proxy support

Encoding/decoding content

For example, here's how to make a simple GET request with urllib2:

import urllib2

response = urllib2.urlopen('http://example.com')
html = response.read()

While full-featured, the API is clunky and involves error-prone boilerplate code even for basic operations:

import urllib2
request = urllib2.Request('http://example.com')
request.add_header('User-Agent', 'My Python App')

try:
    response = urllib2.urlopen(request)
except urllib2.HTTPError as e:
    print(e.code)
except urllib2.URLError as e: 
    print(e.reason)
else:
    print(response.read())

So what are some downsides to using urllib2?

Verbose API with lots of boilerplate code

Handles HTTP exceptions by raising errors

Manual handling of headers, redirection, cookies

No built-in connection pooling or retries

Overall, it requires significant effort to use urllib2 effectively for anything beyond trivial requests.

Enter urllib - A Minor Improvement

In Python 3, urllib2 was split into several modules like urllib.request for issuing requests. The API is mostly the same, but with some minor improvements:

Separate classes for Request and Response objects

Supports custom HTTP handlers

Timeout support

Access to low-level socket object

Here's an example HTTP GET request with urllib:

from urllib import request

response = request.urlopen('http://example.com')
html = response.read()

While the API is slightly better, it still shares many of the same problems as urllib2:

Lots of boilerplate code for simple operations

Manual handling of HTTP exceptions

No built-in connection pooling, sessions, or retries

So urllib is only a small incremental improvement over urllib2.

requests - A Simple Yet Powerful Library

Released in 2011, the requests library aimed to make HTTP requests in Python more user-friendly. With requests, you can make HTTP API calls with minimal code:

import requests

response = requests.get('http://example.com')
print(response.text)

Compared to urllib and urllib2, some key advantages of requests include:

Simplified API - Intuitive functions like .get(), .post() instead of Request/urlopen boilerplate.

Built-in JSON support - Automatic JSON encoding/decoding with .json() method.

Connection pooling/sessions - Underlying connection pooling and session handling.

Automatic error handling - HTTP errors raise clean Python exceptions instead.

Helper methods - Utilities like response headers/body, cookies, timeouts, retries.

Third-party support - Plays nicely with extensions like caching, proxies, authentication.

While requests doesn't handle as low-level details as urllib, it's perfect for most API usage. Due to its simplicity and features, requests has become the de facto standard for HTTP in Python.

Here's a more in-depth example with error handling, headers, and JSON data:

import requests

url = 'http://example.com/api'
headers = {'User-Agent': 'My App'}

try:
    response = requests.get(url, headers=headers)   
    response.raise_for_status() # Check for 4xx/5xx status codes
except requests.exceptions.HTTPError as e:
    print(e)
else:
    data = response.json()
    print(data)

So in summary, what are some good use cases for each library?

urllib - Low-level HTTP handling, advanced customization

urllib2 - Legacy Python 2.x applications

requests - Simplified usage for most HTTP APIs

Requests is suitable for over 90% of HTTP usage in Python. However, urllib and urllib2 still have niche use cases for advanced low-level control.

Key Takeaways

urllib2 - Low-level HTTP client in Python 2.x

urllib - Minor improvements over urllib2, split into modules

requests - Simplified, standard library for HTTP APIs

Requests is perfect for most applications, while urllib is better for low-level control

Simplifying HTTP Requests in Python: urllib2 vs urllib vs requests

urllib2 - Python's Default HTTP Client

Enter urllib - A Minor Improvement

requests - A Simple Yet Powerful Library

Key Takeaways

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

Simplifying HTTP Requests in Python: urllib2 vs urllib vs requests

urllib2 - Python's Default HTTP Client

Enter urllib - A Minor Improvement

requests - A Simple Yet Powerful Library

Key Takeaways

The easiest way to do Web Scraping

Don't leave just yet!