Accessing URLs Requiring Authentication with Python's urllib

Feb 6, 2024 ยท 2 min read

When accessing web URLs that require authentication, Python's urllib module provides a simple way to supply credentials and access protected resources. Whether you need to pull data from a website or API endpoint, urllib handles the basic auth handshake automatically behind the scenes.

Here's a quick example accessing a protected URL:

import urllib.request
import urllib.parse

username = 'myusername'
password = 'mypassword'

url = 'https://api.example.com/data'

p = urllib.parse.urlencode({'username': username, 'password': password})
request = urllib.request.Request(url)
request.add_header('Authorization', 'Basic %s' % p)

response = urllib.request.urlopen(request)
data = response.read()

We supply the username and password, encode them into a string using urllib.parse.urlencode, and add an Authorization header to the request with the encoded credentials.

When accessing URLs over HTTPS, this handles the authentication automatically without needing to deal with cookies, sessions, etc.

Tips

  • Use a context manager to automatically close the response:
  • with urllib.request.urlopen(request) as response:
       data = response.read()
  • If authentication fails, it will raise a HTTPError - catch this for handling invalid creds
  • Scenarios

  • Accessing APIs that require an API key
  • Pulling reports from a web app that requires login
  • Scrape data from a website that uses basic auth protection
  • Using urllib for basic access authentication provides a simple way to supply credentials for restricted URLs. With a few lines of code, you can access protected resources and data.

    Browse by tags:

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!