Reading CSV Files with Python's urllib

Feb 8, 2024 ยท 3 min read

Comma separated value (CSV) files are a common file format used to store tabular data like spreadsheets or databases. Often you need to read data from a CSV file into a Python program for analysis and reporting. Python's built-in urllib module provides a simple way to download and parse CSV data.

In this article, we'll cover the key things you need to know to read CSV data from a URL using Python.

Downloading the CSV File

The first step is to download the raw CSV file contents from a URL. Here's an example using urllib.request:

import urllib.request

url = "https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv"  

response = urllib.request.urlopen(url)
csv_data = response.read()

This opens the URL, gets the response, and reads the raw bytes into the csv_data variable.

Parsing as CSV

Next we need to parse the CSV data. The built-in csv module handles parsing:

import csv

rows = csv.reader(csv_data.splitlines())

for row in rows:
    print(row)

Here we:

  • Split the raw CSV into lines
  • Create a csv.reader object
  • Iterate over the rows
  • By default it expects commas as delimiters. You can customize the dialect and delimiters used.

    Putting It Together

    Here is the full code to download a CSV file from a URL and parse it with csv.reader:

    import urllib.request
    import csv
    
    url = "https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv"  
    
    response = urllib.request.urlopen(url)
    csv_data = response.read()
    
    rows = csv.reader(csv_data.splitlines())
    
    for row in rows:
        print(row)

    This provides the complete workflow to go from a URL to parsed CSV rows.

    Dealing with Headers

    Often CSV files have header rows describing each column. We can skip the headers easily:

    next(rows) # skip header row 
    for row in rows:
       print(row)

    Use next(rows) to advance the reader past the first row.

    Practical Considerations

    Here are some things to keep in mind when using urllib to read CSV data:

  • Handling large files - For very large CSVs, you may want to stream the rows instead of loading the entire file contents into memory at once.
  • Data types - The CSV reader returns string values by default. You'll need to cast them to numbers, dates, etc.
  • Delimiter handling - Make sure the delimiter matches what the CSV file is actually using.
  • Encoding - The CSV may use special encodings like UTF-8. Make sure to handle encoding accordingly.
  • Use Cases

    Some examples of cases where using urllib to download and parse CSV data is helpful:

  • Loading data from public CSV APIs for analysis
  • Ingesting CSV reports from another internal system
  • Populating data into applications like dashboards or ML models
  • The direct URL access allows easily integrating remote CSV data into Python programs.

    Next Steps

    To take your CSV skills further:

  • Learn how to write CSV data to files
  • Look at the pandas library for more advanced data analysis
  • Handle streaming large CSV files in chunks for better memory utilization
  • I hope this overview gives you a solid starting point for using Python and urllib to access CSV file data!

    Browse by tags:

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: