Python's URL Handling Libraries compared - urllib vs requests

Nov 17, 2023 ยท 8 min read

Fetching resources over HTTP is a ubiquitous task in Python programming. But the landscape of libraries available for this has evolved considerably over the years. In this comprehensive guide, we'll look at the origins of Python's URL handling libraries and how they have changed over time.

We'll cover the history of urllib, the addition of urllib2, the transition to urllib in Python 3, and the emergence of powerful third-party options like urllib3 and requests. You'll learn the key differences between these libraries and walk away with clear guidelines on when to use each one.

An Era Before urllib

It may be hard to imagine a time before Python had a built-in way to fetch URLs. But in the early days of Python 1.2, developers had to use lower-level socket programming to request web pages and other network resources. There were no out-of-the-box libraries tailored for this common task.

That all changed with the introduction of urllib in Python 1.4. This new module provided a simple interface for fetching URLs, using the urlopen() function. For example:

import urllib

response = urllib.urlopen('<http://www.example.com>')
html = response.read()

While primitive by today's standards, urllib was a big leap forward in making network requests more accessible to Python programmers. No longer did you need to worry about the details of sockets or HTTP.

urllib2 - A Revamped Library

By the time Python 1.6 came around, it was clear urllib needed improvements. So Python 1.6 introduced urllib2, with an updated API and new capabilities like handling redirects and exceptions.

Here's an example of using urllib2:

import urllib2

request = urllib2.Request('<http://www.example.com>')
response = urllib2.urlopen(request)

print response.code
print response.headers

urllib2 became the preferred way to fetch URLs for over a decade of Python 2.x releases. Under the hood, it uses the httplib library to handle HTTP requests.

One downside of urllib2 was its more complex API for simple use cases. Developers had to create Request objects even for basic GET requests. But urllib2 enabled more advanced scenarios like custom headers, which urllib lacked.

Python 3 - Out with urllib2, In with urllib

When Python 3 came around, the core developers took the opportunity to rethink the API for fetching URLs. The verdict - urllib2's API was too cumbersome for common usage.

So urllib2 was removed in Python 3, in favor of a revamped urllib module. This new urllib learned from urllib2's clunky interface and aimed for simplicity and elegance.

Here's a Python 3 example using the modern urllib:

from urllib import request

with request.urlopen('<http://python.org>') as response:
   html = response.read()

This combines the ease of use of urllib with some of the advanced capabilities like custom headers that were only present in urllib2. Under the hood, the new urllib is implemented in terms of http.client rather than httplib.

So in summary:

  • Python 2: urllib and urllib2
  • Python 3: urllib2 removed, urllib revamped
  • This helps explain the confusing versioning of Python's URL libraries.

    urllib3 - A Fresh New Option

    The built-in urllib and urllib2 libraries served Python well for many years. But developers eventually wanted more - better security, connection pooling, performance.

    To meet these needs, a third-party library called urllib3 was created in 2008. Despite the name, urllib3 is not just a numbered upgrade of urllib, but a standalone module with a much different implementation.

    Some key features of urllib3 include:

  • Thread safety and connection pooling for better concurrency and performance
  • Automatic handling of gzip and deflate encoding
  • More secure HTTPS handling with SSL/TLS verification
  • Support for compression, proxies, retries, and more
  • Overall, urllib3 provides a powerful alternative to the built-in urllib, with a focus on security, efficiency, and functionality. Many Python developers will choose urllib3 over urllib for its additional features and robustness.

    requests - The Runaway Favorite

    By far the most popular library for making HTTP requests in Python today is requests. First released in 2011, requests shot to stardom in the Python community and became ubiquitous in Python code within just a few years.

    So what explains the meteoric rise of requests? A few key factors:

    Simple, Pythonic API - requests makes it extremely easy to make basic HTTP requests in a natural way:

    import requests
    
    response = requests.get('<https://api.github.com>')
    

    Much nicer than urllib and urllib2!

    Batteries Included - requests comes ready "out of the box" with support for JSON, file uploads, connections pooling, and more. No extra work needed.

    Great Documentation - requests has fantastic documentation with examples for virtually every use case. This makes it easy for newcomers to learn.

    Under the hood, requests is actually built on urllib3. But it wraps urllib3 with a simpler, more Pythonic API and handles many of the details for you.

    So in summary, requests combines urllib3's power with an incredibly clean and easy-to-use interface on top. That's why so many Python developers have flocked to requests for their HTTP needs. It just takes away much of urllib's frustration and boilerplate.

    Key Differences At a Glance

    Now that we've covered the history and context of each major URL handling library in Python, let's directly compare their key differences:

    Featureurlliburllib2urllib3requests
    StatusBuilt-in to Python 3Removed in Python 33rd party3rd party
    HTTP Verbsurlopen() only supports GET/POSTMore explicit support for verbsExplicit support for verbsGET, POST, etc methods
    SimplicitySimple basic usageMore complex APILow-levelVery simple, Pythonic API
    FunctionalityBare bones HTTP requestsAdvanced capabilities like redirects and authVery featured and configurable"Batteries included" functionality like JSON handling
    PerformanceMinimal connection handlingImproved performance via connection pooling in httplibHigh-performance thanks to connection pooling and threadingLeverages urllib3
    SecurityVulnerable to bugs and attacksMore secure but still potential issuesModern security practices like SSL verificationLeverages urllib3

    As you can see, there is an overall trend towards simpler APIs and more built-in functionality as Python's URL libraries evolved over time. The latest iterations like urllib3 and especially requests aim to remove boilerplate and provide a secure, robust, high-level interface for HTTP requests.

    Guidelines for Choosing a Library

    Based on our exploration of Python's major URL handling libraries, here are some guidelines on when you should use each one:

  • urllib - Use for simple HTTP requests when you want to stick to built-ins. Also required for non-HTTP protocols like FTP.
  • urllib2 - No need to use this outdated module that's been removed in Python 3.
  • urllib3 - Use when you need fine-grained control over HTTP requests and response handling. Great for performance and security.
  • requests - Suitable for most HTTP use cases. Easiest to use and most productive for common tasks.
  • More specifically:

  • For scraping and consuming web APIs, choose requests for its simple syntax and built-in handlers.
  • For building web services and APIs, consider urllib3 for greater control over HTTP behavior.
  • For non-HTTP protocols like FTP, use urllib since requests is HTTP only.
  • For legacy Python 2 code, stick with urllib2 until you can upgrade to Python 3.
  • When in doubt, start with requests! It's suitable for most use cases and easier to tweak should you need more control later.
  • The requests library really shines for common HTTP-based scraping and integration tasks. It's quick to learn and use. urllib3 offers lower-level control, while urllib provides protocol flexibility. Pick the right tool for the job!

    Wrapping Up

    We've covered a lot of ground tracing the evolution of Python's URL handling libraries over the past 20+ years. While the history involves a lot of version confusion, the end result is a robust ecosystem with urllib meeting basic needs, urllib3 providing lower-level control, and requests serving as an easy-to-use solution for most use cases.

    Key takeways:

  • urllib pioneered basic URL handling in Python
  • urllib2 iterated with improvements like redirect handling
  • Modern urllib revamped the API in Python 3
  • urllib3 delivers performance and configurability
  • requests simplified common HTTP tasks with an elegant API
  • Hopefully this guide has shed light on the strengths of each option and how they fit together. The next time you need to fetch a URL in Python, you'll know exactly which library to turn to!

    Frequently Asked Questions

    Here are some common questions about Python's URL handling libraries:

    What is the difference between urllib and requests?

    The main difference is that requests has a simpler and easier to use API while still handling many common scenarios like JSON handling out of the box. urllib provides lower-level control but typically requires more code.

    Do I need to install urllib?

    No, urllib is built-in to Python 3 as part of the standard library. You can start using it without any installation needed.

    Is requests faster than urllib3?

    Generally they have similar performance for common use cases. Under the hood requests actually uses urllib3 so the performance difference should be negligible.

    Can I use requests for non-HTTP protocols like FTP?

    No, requests only works with HTTP. For FTP and other network protocols you would need to use urllib.

    Is it ok to use urllib2 in Python 3?

    No, urllib2 has been removed in Python 3 so you must migrate to urllib or requests.

    How do I choose between urllib3 and requests?

    Use requests for simpler cases where you want an easy API. Use urllib3 for lower-level control and advanced configuration of HTTP behavior.

    What's the minimum Python version to use requests?

    Requests supports Python 2.7 and 3.6+. Always use the latest Python version where possible.

    Browse by tags:

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!