Accessing Protected Resources with urllib and Realm Authentication

Feb 8, 2024 ยท 2 min read

When accessing protected web resources using the Python urllib module, you may encounter a "401 Unauthorized" error indicating that realm-based authentication is required. Realm authentication protects parts of a web application and prompts the user for credentials when accessing those areas.

To gain access, urllib provides the HTTPPasswordMgrWithDefaultRealm class to handle sending credentials. Here is an example fetching a protected resource:

import urllib.request

username = 'myusername'
password = 'mypassword' 

password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, "https://example.com/api", username, password)

handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
opener = urllib.request.build_opener(handler)

urllib.request.install_opener(opener)

response = urllib.request.urlopen('https://example.com/api/protected-resource')
print(response.read())

The key steps are:

  1. Create a HTTPPasswordMgrWithDefaultRealm to store credentials
  2. Add the username and password for the realm using add_password()
  3. Create a HTTPBasicAuthHandler using the password manager
  4. Build an opener using the handler to preemptively send credentials
  5. Install the opener as the default opener in urllib

Now any requests will automatically send credentials if a 401 realm challenge is encountered.

Some tips:

  • The realm is usually, but not always, the root URL path
  • You can specify the exact realm if known using add_password(realm, uri,...)
  • Using an opener allows transparent handling of authentication
  • To summarize, urllib provides the capability to access protected resources using realm-based authentication via the HTTPPasswordMgrWithDefaultRealm and HTTPBasicAuthHandler classes. Configuring these correctly takes a bit of trial-and-error to match the expected realm behavior of the server.

    Browse by tags:

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: