URL Parsing in Python with urllib.parse

Understanding and manipulating URLs is crucial for many Python programs that work with the web. The urllib.parse module provides useful functions for parsing, composing, and manipulating URLs in your Python code.

The Pieces of a URL

A URL like https://www.example.com/path/to/page?key1=value1&key2=value2#Somewhere may look complicated, but it breaks down into distinct components:

Scheme - The protocol used like https

Netloc - The domain name like www.example.com

Path - The path to a resource like /path/to/page

Query Parameters - Extra key-value data like key1=value1&key2=value2

Fragment - An id referencing part of the page like Somewhere

The urllib.parse module helps you easily break a URL string down and access these components.

Parsing URLs

The urllib.parse.urlparse() function takes a URL string and returns a parsed structure with the different components:

from urllib.parse import urlparse

url = 'https://www.example.com/path/to/page?key1=value1&key2=value2#Somewhere'
parsed = urlparse(url)

print(parsed.scheme) # https 
print(parsed.netloc) # www.example.com
print(parsed.path) # /path/to/page
print(parsed.query) # key1=value1&key2=value2
print(parsed.fragment) # Somewhere

There are also convenience methods like parsed.hostname and parsed.port.

Composing and Joining URLs

You can also compose or reconstruct a URL from its parsed components using urllib.parse.urlunparse():

from urllib.parse import urlunparse

data = ['https', 'www.example.com', '/path/to/page', None, 'key1=value1&key2=value2', 'Somewhere']
print(urlunparse(data)) 
# https://www.example.com/path/to/page?key1=value1&key2=value2#Somewhere

This allows modifying URLs by pieces programmatically.

The urllib.parse module contains other useful functions like urljoin() for joining relative URLs to base URLs. Mastering URL manipulation unlocks many possibilities for Python web programming.

URL Parsing in Python with urllib.parse

The Pieces of a URL

Parsing URLs

Composing and Joining URLs

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

URL Parsing in Python with urllib.parse

The Pieces of a URL

Parsing URLs

Composing and Joining URLs

The easiest way to do Web Scraping

Don't leave just yet!