Accessing Protected Resources with urllib and Realm Authentication

Feb 8, 2024 ยท 2 min read

When accessing protected web resources using the Python urllib module, you may encounter a "401 Unauthorized" error indicating that realm-based authentication is required. Realm authentication protects parts of a web application and prompts the user for credentials when accessing those areas.

To gain access, urllib provides the HTTPPasswordMgrWithDefaultRealm class to handle sending credentials. Here is an example fetching a protected resource:

import urllib.request

username = 'myusername'
password = 'mypassword' 

password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, "https://example.com/api", username, password)

handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
opener = urllib.request.build_opener(handler)

urllib.request.install_opener(opener)

response = urllib.request.urlopen('https://example.com/api/protected-resource')
print(response.read())

The key steps are:

  1. Create a HTTPPasswordMgrWithDefaultRealm to store credentials
  2. Add the username and password for the realm using add_password()
  3. Create a HTTPBasicAuthHandler using the password manager
  4. Build an opener using the handler to preemptively send credentials
  5. Install the opener as the default opener in urllib

Now any requests will automatically send credentials if a 401 realm challenge is encountered.

Some tips:

  • The realm is usually, but not always, the root URL path
  • You can specify the exact realm if known using add_password(realm, uri,...)
  • Using an opener allows transparent handling of authentication
  • To summarize, urllib provides the capability to access protected resources using realm-based authentication via the HTTPPasswordMgrWithDefaultRealm and HTTPBasicAuthHandler classes. Configuring these correctly takes a bit of trial-and-error to match the expected realm behavior of the server.

    Browse by tags:

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!