Leveraging Asynchronous I/O with Asyncio for Faster File Operations

Mar 25, 2024 ยท 4 min read

Asynchronous I/O, or async IO for short, allows Python programs to perform file operations without blocking the execution of other tasks. The asyncio module included in Python 3 provides an asynchronous event loop and helpful utilities to write non-blocking, concurrent code.

Learning how to properly utilize asyncio can help optimize Python applications that work with lots of file I/O. This guide will walk through the basics of async IO in Python and demonstrate how to speed up common file operations like reading, writing, and copying files.

Why Asynchronous I/O Matters

Blocking I/O happens when a program must stop and wait for a file operation to complete before continuing. This wastes CPU cycles and slows down applications. Asynchronous I/O allows a program to continue executing other tasks while awaiting I/O completion in the background.

For example, a blocking approach to reading two files would look like:

def read_files_blocking():
    contents = read_file1() # execution pauses here 
    other_contents = read_file2() # now execution pauses again

The async approach allows both reads to be kicked off concurrently:

import asyncio

async def read_files_async():
    coroutine1 = read_file1_async() # schedule file 1 read
    coroutine2 = read_file2_async() # schedule file 2 read
    contents, other_contents = await asyncio.gather(coroutine1, coroutine2)  

By using asyncio, our program avoids unnecessary waiting and can execute other logic while the I/O completes. This leads to faster and more scalable applications.

Asyncio Event Loop

The asyncio module centers around an event loop which runs asynchronous tasks and callbacks. We schedule "coroutine" tasks to execute on this loop, which enables them to run concurrently while the loop manages execution. The loop suspends and resumes coroutines when I/O operations complete.

We can run the event loop and schedule a simple callback:

import asyncio

async def my_callback():
    print('Hello async world!')

loop = asyncio.get_event_loop()
loop.run_until_complete(my_callback())
loop.close() 

The coroutine my_callback() runs attached to the loop, allowing other events to execute while it awaits completion.

Async File Operations with Asyncio

Let's look at some common file operations and how asyncio provides asynchronous alternatives.

Reading Files Asynchronously

To read a file asynchronously:

import asyncio

async def read_file_async(filename):
    with open(filename) as f: 
        contents = await loop.run_in_executor(None, f.read)  
    return contents

contents = await read_file_async('example.txt') 

The key ingredients here:

  • Coroutine function declared with async def
  • Use loop.run_in_executor to offload blocking I/O operation, avoiding freeze
  • await the async call to retrieve result
  • This schedules the file read without blocking. Other processing can happen while waiting on contents.

    Writing Files Asynchronously

    Similarly, we can write data asynchronously:

    async def write_file_async(filename, data):
        with open(filename, 'w') as f:
           await loop.run_in_executor(None, f.write, data)  
    
    loop = asyncio.get_event_loop()
    await write_file_async('results.txt', 'some data')

    This allows us to avoid blocking while writing files as well.

    Copying Files Asynchronously

    We can leverage async read and write approaches to efficiently copy files without blocking:

    import asyncio
    
    async def copy_file_async(src, dest):
        with open(src) as source, open(dest, 'w') as destination: 
            contents = await loop.run_in_executor(None, source.read)
            await loop.run_in_executor(None, destination.write, contents)
    
    loop = asyncio.get_event_loop()  
    loop.run_until_complete(copy_file_async('source.txt', 'destination.txt'))

    By overlapping the read and write coroutines with asyncio.gather, we can optimize copy performance.

    Key Takeaways

  • Asyncio provides infrastructure for asynchronous I/O in Python
  • Use async/await syntax to define coroutines that yield instead of block
  • Coroutines schedule tasks on an event loop allowing concurrent execution
  • run_in_executor offloads blocking I/O avoiding application freeze
  • Overlapping async operations leads to faster file processing
  • Adopting async I/O opens the door for more performant and scalable Python programs that deal with many file operations or networked resources. It takes rethinking blocking assumptions, but async code can run faster while using fewer system resources.

    Browse by tags:

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: