Encoding URLs in Python with urllib

Feb 8, 2024 ยท 2 min read

When building web applications in Python, you'll often need to encode URLs and their components to ensure they are valid and can be transmitted properly between the client and server. The urllib module in Python provides functions for encoding URLs and their parts like the path, query parameters, etc.

The most common reason you need to encode URLs is that they may contain special characters that have significance in URLs like &, =, and spaces. If left unencoded, these could cause issues. Additionally, languages use a wide range of alphabets and characters, so encoding allows transmitting this text properly.

Why Encode URL Components

Let's look at an example URL:

https://www.example.com/path with spaces?query=foo bar

The path contains spaces and the query value contains a space as well. These spaces need encoding before sending this URL to a server.

We can encode the full URL, the path, and the query parameters separately using urllib:

from urllib.parse import quote, quote_plus

url = "https://www.example.com/path with spaces?query=foo bar"

full_url = quote(url, safe='/') 
path = quote("/path with spaces")
query = quote_plus("foo bar")

The safe parameter indicates characters that should not be encoded. For the full URL, we want to leave the forward slashes unencoded.

When to Encode Parts

When building URLs in code, it's best to encode each component as you construct the full URL. For example:

from urllib.parse import urlencode

base = "https://www.example.com"
path = "/my path" 
query = {"foo": "bar"}

path = quote(path)
query = urlencode(query) 

full_url = base + path + "?" + query

This way each piece gets properly encoded before placing it into the full URL string.

Takeaways

  • Use quote() and quote_plus() from urllib to encode URLs and components
  • Encode parts like path and query as you construct URLs
  • Pass a safe parameter to leave certain characters like / unencoded if needed
  • Properly encoding the URLs you generate in your Python code will ensure they can be transmitted successfully to other systems.

    Browse by tags:

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!