One of the most fundamental tasks in programming is retrieving data from the internet. The Python standard library provides the urllib module to handle common web functionality like opening URLs. A handy method in urllib is read(), which allows you to easily download content from a web page.
How read() Works
The
import urllib.request
with urllib.request.urlopen('http://example.com') as f:
page_data = f.read()
The
text = page_data.decode('utf-8')
By default
partial_data = f.read(100) # Reads 100 bytes
This allows you to retrieve data in chunks if the page is very large.
Handling Errors
One pitfall to watch out for is errors that can occur when opening the URL. It's best to wrap the call in a try/except:
try:
with urllib.request.urlopen(url) as f:
data = f.read()
except URLError as e:
print('URL failed: ', e.reason)
This will gracefully handle issues like invalid URLs or network errors.
In Summary
The
Related articles:
Browse by tags:
Browse by language:
Popular articles:
- Web Scraping in Python - The Complete Guide
- Working with Query Parameters in Python Requests
- How to Authenticate with Bearer Tokens in Python Requests
- Building a Simple Proxy Rotator with Kotlin and Jsoup
- The Complete BeautifulSoup Cheatsheet with Examples
- The Complete Playwright Cheatsheet
- Web Scraping using ChatGPT - Complete Guide with Examples