Fetching resources over HTTP is a ubiquitous task in Python programming. But the landscape of libraries available for this has evolved considerably over the years. In this comprehensive guide, we'll look at the origins of Python's URL handling libraries and how they have changed over time.
We'll cover the history of urllib, the addition of urllib2, the transition to urllib in Python 3, and the emergence of powerful third-party options like urllib3 and requests. You'll learn the key differences between these libraries and walk away with clear guidelines on when to use each one.
An Era Before urllib
It may be hard to imagine a time before Python had a built-in way to fetch URLs. But in the early days of Python 1.2, developers had to use lower-level socket programming to request web pages and other network resources. There were no out-of-the-box libraries tailored for this common task.
That all changed with the introduction of urllib in Python 1.4. This new module provided a simple interface for fetching URLs, using the
response = urllib.urlopen('<http://www.example.com>')
html = response.read()
While primitive by today's standards, urllib was a big leap forward in making network requests more accessible to Python programmers. No longer did you need to worry about the details of sockets or HTTP.
urllib2 - A Revamped Library
By the time Python 1.6 came around, it was clear urllib needed improvements. So Python 1.6 introduced urllib2, with an updated API and new capabilities like handling redirects and exceptions.
Here's an example of using urllib2:
request = urllib2.Request('<http://www.example.com>')
response = urllib2.urlopen(request)
urllib2 became the preferred way to fetch URLs for over a decade of Python 2.x releases. Under the hood, it uses the httplib library to handle HTTP requests.
One downside of urllib2 was its more complex API for simple use cases. Developers had to create Request objects even for basic GET requests. But urllib2 enabled more advanced scenarios like custom headers, which urllib lacked.
Python 3 - Out with urllib2, In with urllib
When Python 3 came around, the core developers took the opportunity to rethink the API for fetching URLs. The verdict - urllib2's API was too cumbersome for common usage.
So urllib2 was removed in Python 3, in favor of a revamped urllib module. This new urllib learned from urllib2's clunky interface and aimed for simplicity and elegance.
Here's a Python 3 example using the modern urllib:
from urllib import request
with request.urlopen('<http://python.org>') as response:
html = response.read()
This combines the ease of use of urllib with some of the advanced capabilities like custom headers that were only present in urllib2. Under the hood, the new urllib is implemented in terms of http.client rather than httplib.
So in summary:
This helps explain the confusing versioning of Python's URL libraries.
urllib3 - A Fresh New Option
The built-in urllib and urllib2 libraries served Python well for many years. But developers eventually wanted more - better security, connection pooling, performance.
To meet these needs, a third-party library called urllib3 was created in 2008. Despite the name, urllib3 is not just a numbered upgrade of urllib, but a standalone module with a much different implementation.
Some key features of urllib3 include:
Overall, urllib3 provides a powerful alternative to the built-in urllib, with a focus on security, efficiency, and functionality. Many Python developers will choose urllib3 over urllib for its additional features and robustness.
requests - The Runaway Favorite
By far the most popular library for making HTTP requests in Python today is requests. First released in 2011, requests shot to stardom in the Python community and became ubiquitous in Python code within just a few years.
So what explains the meteoric rise of requests? A few key factors:
Simple, Pythonic API - requests makes it extremely easy to make basic HTTP requests in a natural way:
response = requests.get('<https://api.github.com>')
Much nicer than urllib and urllib2!
Batteries Included - requests comes ready "out of the box" with support for JSON, file uploads, connections pooling, and more. No extra work needed.
Great Documentation - requests has fantastic documentation with examples for virtually every use case. This makes it easy for newcomers to learn.
Under the hood, requests is actually built on urllib3. But it wraps urllib3 with a simpler, more Pythonic API and handles many of the details for you.
So in summary, requests combines urllib3's power with an incredibly clean and easy-to-use interface on top. That's why so many Python developers have flocked to requests for their HTTP needs. It just takes away much of urllib's frustration and boilerplate.
Key Differences At a Glance
Now that we've covered the history and context of each major URL handling library in Python, let's directly compare their key differences:
|Built-in to Python 3
|Removed in Python 3
|urlopen() only supports GET/POST
|More explicit support for verbs
|Explicit support for verbs
|GET, POST, etc methods
|Simple basic usage
|More complex API
|Very simple, Pythonic API
|Bare bones HTTP requests
|Advanced capabilities like redirects and auth
|Very featured and configurable
|"Batteries included" functionality like JSON handling
|Minimal connection handling
|Improved performance via connection pooling in httplib
|High-performance thanks to connection pooling and threading
|Vulnerable to bugs and attacks
|More secure but still potential issues
|Modern security practices like SSL verification
As you can see, there is an overall trend towards simpler APIs and more built-in functionality as Python's URL libraries evolved over time. The latest iterations like urllib3 and especially requests aim to remove boilerplate and provide a secure, robust, high-level interface for HTTP requests.
Guidelines for Choosing a Library
Based on our exploration of Python's major URL handling libraries, here are some guidelines on when you should use each one:
The requests library really shines for common HTTP-based scraping and integration tasks. It's quick to learn and use. urllib3 offers lower-level control, while urllib provides protocol flexibility. Pick the right tool for the job!
We've covered a lot of ground tracing the evolution of Python's URL handling libraries over the past 20+ years. While the history involves a lot of version confusion, the end result is a robust ecosystem with urllib meeting basic needs, urllib3 providing lower-level control, and requests serving as an easy-to-use solution for most use cases.
Hopefully this guide has shed light on the strengths of each option and how they fit together. The next time you need to fetch a URL in Python, you'll know exactly which library to turn to!
Frequently Asked Questions
Here are some common questions about Python's URL handling libraries:
What is the difference between urllib and requests?
The main difference is that requests has a simpler and easier to use API while still handling many common scenarios like JSON handling out of the box. urllib provides lower-level control but typically requires more code.
Do I need to install urllib?
No, urllib is built-in to Python 3 as part of the standard library. You can start using it without any installation needed.
Is requests faster than urllib3?
Generally they have similar performance for common use cases. Under the hood requests actually uses urllib3 so the performance difference should be negligible.
Can I use requests for non-HTTP protocols like FTP?
No, requests only works with HTTP. For FTP and other network protocols you would need to use urllib.
Is it ok to use urllib2 in Python 3?
No, urllib2 has been removed in Python 3 so you must migrate to urllib or requests.
How do I choose between urllib3 and requests?
Use requests for simpler cases where you want an easy API. Use urllib3 for lower-level control and advanced configuration of HTTP behavior.
What's the minimum Python version to use requests?
Requests supports Python 2.7 and 3.6+. Always use the latest Python version where possible.