urllib retrieve

Feb 8, 2024 ยท 2 min read

The urllib module in Python provides useful tools for retrieving data from the web. Whether you want to pull down HTML pages, images, or API data, urllib makes it straightforward to make web requests in your Python code.

Making Basic Requests

To retrieve a web page, you can use urllib.request.urlopen():

import urllib.request

with urllib.request.urlopen('http://example.com') as response:
   html = response.read()

This opens the URL, downloads the response, and stores it in the html variable. We use a with block so the connection automatically gets closed when we're done.

Handling Errors

It's good practice to wrap calls to urlopen() in try/except blocks to handle errors gracefully:

import urllib.error

try:
   html = urllib.request.urlopen("http://example.com") 
except urllib.error.HTTPError as e:
   print(e.code) # Print error code
   print(e.read()) # Print error response body

Now instead of your code crashing on an error, you can handle issues like a 404 or 500 status code and decide what to do.

Sending Data in Requests

To include data like query parameters or POST data, you can pass a data argument:

import urllib.parse 

data = urllib.parse.urlencode({'query': 'hello world'})
html = urllib.request.urlopen("http://example.com/search", data=data)

This URL encodes the data and includes it in the request.

So in summary, urllib in Python makes it really easy to fetch resources from the internet right within your code. With some simple try/except handling and URL encoding, you can build robust scripts to scrape data or interact with APIs.

Browse by tags:

Browse by language:

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!