When fetching data from a URL using Python's urllib module, the response body is returned as bytes. Often, we want to work with this data as text strings instead. Converting between bytes and strings is easy with a few methods.
import urllib.request
response = urllib.request.urlopen("http://example.com")
html_bytes = response.read() # Read response body as bytesTo decode bytes to a string, we need to know the character encoding that was used to encode the bytes. Common encodings are UTF-8, ASCII, and Latin-1.
We can usually find the encoding in the response headers:
encoding = response.headers.get_content_charset() # Get encoding from headersIf there is no encoding specified, UTF-8 is a safe bet.
Once we have the encoding, we can decode the bytes:
html_string = html_bytes.decode(encoding) # Decode bytes
print(html_string)The
We may also encode strings into bytes:
data = "hello world"
data_bytes = data.encode(encoding) # Encode string to bytesWhen posting data to a URL, it often needs to be URL encoded into bytes before sending:
from urllib.parse import quote_plus
data = "hello world"
url_encoded_data = quote_plus(data) # URL encode string
data_bytes = url_encoded_data.encode(encoding) # Encode to bytesSo in Python's urllib, we can easily convert between bytes and strings for request/response bodies using