Accessing Websites in Python with urllib.request.urlopen

Feb 6, 2024 ยท 2 min read

The urllib.request module in Python 3 provides a simple way to access and download data from websites via HTTP and HTTPS. The key function for this is urllib.request.urlopen(), which opens a handle to a URL that you can then read from or write to.

Opening and Reading URL Contents

To open a URL, just pass the URL string to urlopen():

import urllib.request

with urllib.request.urlopen('http://example.com') as response:
   html = response.read()

This opens the URL, gets a file-like handle in response, reads the contents into the html variable, and automatically closes the handle when done.

You can also access header data, get the response status code, check if the site exists, and more:

import urllib.request 

response = urllib.request.urlopen('http://python.org')
print(response.status) # 200 means success
print(response.getheaders()) # print headers
print(response.msg) # print status msg

Handling Errors

If a URL doesn't exist or there's another error, urlopen() will raise an exception that you need to handle:

import urllib.request
import urllib.error

try:
  response = urllib.request.urlopen('http://wrong.url')
except urllib.error.HTTPError as e:
  print(e.code) # print error code
  print(e.read()) # print error response body

This prints out the HTTP status code and error message so you can handle issues cleanly.

Writing to Websites

You can also use urlopen() to post data to web servers and APIs. For example, to submit a simple search form:

import urllib.parse
import urllib.request

url = 'http://www.example.com/search'
data = urllib.parse.urlencode({'q':'python'}) 
data = data.encode('utf-8')

req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
   print(response.read())

The urllib.parse module helps encode the data, and then Request packages it up to send to the URL.

So in summary, urllib.request.urlopen() is a versatile function for accessing web resources from Python. It handles the HTTP protocol, URLs, errors, headers, and more automatically so you can focus on interacting with the website contents.

Browse by tags:

Browse by language:

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!