URL Parsing in Python with urllib.parse

Feb 6, 2024 ยท 2 min read

Understanding and manipulating URLs is crucial for many Python programs that work with the web. The urllib.parse module provides useful functions for parsing, composing, and manipulating URLs in your Python code.

The Pieces of a URL

A URL like https://www.example.com/path/to/page?key1=value1&key2=value2#Somewhere may look complicated, but it breaks down into distinct components:

  • Scheme - The protocol used like https
  • Netloc - The domain name like www.example.com
  • Path - The path to a resource like /path/to/page
  • Query Parameters - Extra key-value data like key1=value1&key2=value2
  • Fragment - An id referencing part of the page like Somewhere
  • The urllib.parse module helps you easily break a URL string down and access these components.

    Parsing URLs

    The urllib.parse.urlparse() function takes a URL string and returns a parsed structure with the different components:

    from urllib.parse import urlparse
    
    url = 'https://www.example.com/path/to/page?key1=value1&key2=value2#Somewhere'
    parsed = urlparse(url)
    
    print(parsed.scheme) # https 
    print(parsed.netloc) # www.example.com
    print(parsed.path) # /path/to/page
    print(parsed.query) # key1=value1&key2=value2
    print(parsed.fragment) # Somewhere

    There are also convenience methods like parsed.hostname and parsed.port.

    Composing and Joining URLs

    You can also compose or reconstruct a URL from its parsed components using urllib.parse.urlunparse():

    from urllib.parse import urlunparse
    
    data = ['https', 'www.example.com', '/path/to/page', None, 'key1=value1&key2=value2', 'Somewhere']
    print(urlunparse(data)) 
    # https://www.example.com/path/to/page?key1=value1&key2=value2#Somewhere

    This allows modifying URLs by pieces programmatically.

    The urllib.parse module contains other useful functions like urljoin() for joining relative URLs to base URLs. Mastering URL manipulation unlocks many possibilities for Python web programming.

    Browse by tags:

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!