Handling URL Errors Gracefully in Python urllib

Feb 6, 2024 ยท 2 min read

When working with URLs in Python, you may occasionally run into errors trying to open or read from a web resource. The urllib module provides useful exception classes to handle these cases gracefully.

Common errors like HTTP 404 (page not found) or 500 (server error) are raised as HTTPError exceptions. These can be caught to avoid crashing your script when a URL doesn't exist or returns bad data:

import urllib.request
import urllib.error

try:
    response = urllib.request.urlopen("http://bad.url") 
except urllib.error.HTTPError as e:
    print(f"HTTP error: {e.code} - {e.reason}")

For temporary network errors like the server being unavailable, URLError is raised. You may want to retry the request a few times before giving up:

import urllib.request
import urllib.error

max_attempts = 3
for retry in range(max_attempts):
    try: 
        response = urllib.request.urlopen("http://flaky.site")
        break
    except urllib.error.URLError:
        if retry < max_attempts-1:
            continue
        else:
            print("Site appears to be down")
            break

Pay attention to the error message - sometimes a protocol or SSL issue can manifest as a URLError too. Handle appropriately instead of pointless retries.

To catch any other exceptions besides HTTP and URL errors, use urllib.error.URLError:

import urllib.request 
import urllib.error

try:
    urllib.request.urlopen("http://")
except urllib.error.URLError as e: 
    print(f"Generic exception: {str(e)}")

Proper exception handling avoids abrupt crashes in your code when web resources misbehave. The urllib module categorizes errors into useful subclasses to identify the root cause instead of just failing on all exceptions. Pay attention to error messages and apply targeted handling and retries where applicable.

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


Try ProxiesAPI for free

curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

<!doctype html>
<html>
<head>
    <title>Example Domain</title>
    <meta charset="utf-8" />
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
...

X

Don't leave just yet!

Enter your email below to claim your free API key: