Capturing Screenshots with Puppeteer - An advanced guide

Jan 9, 2024 Β· 9 min read

What is Puppeteer?

In a nutshell, Puppeteer is a Node.js library that provides an API for controlling headless Chrome.

I know that sounds kinda geeky, but here's what it really means:

  1. It lets you automate everything that you'd normally do manually in a browser - click buttons, fill forms, scroll pages, press keys etc.
  2. It runs everything inside a headless Chrome instance i.e. there's no actual browser UI. It all happens silently inside the background.

This makes it super fast and ideal for web scraping/automation tasks!

And as I discovered in my early trials and tribulations, Puppeteer also makes capturing web screenshots simple, flexible and reliable.

Let me walk you through it step-by-step...

Getting Setup with Puppeteer

I'll assume you already have Node.js installed on your machine. If not, grab the latest LTS release for your OS here first.

The good news is installing Puppeteer takes just one command! πŸ’ͺ

npm install puppeteer

This downloads the Puppeteer package and a bundled Chromium binary that it will use under the hoods.

You can also install Puppeteer globally with the -g flag, but local installs are considered a best practice. If you run into permission issues though on Mac/Linux, you may need elevated privileges:

sudo npm install -g puppeteer

Once the install finishes, you're all setup for screenshot magic!

Let's start with a basic example...

Taking Your First Puppeteer Screenshot

Create a new file and import Puppeteer:

const puppeteer = require('puppeteer');

Then add the following async script:

(async () => {

  const browser = await puppeteer.launch();

  const page = await browser.newPage();

  await page.goto('<https://www.example.com>');

  await page.screenshot({path: 'example.png'});

  browser.close();

})();

Breaking this down:

  • puppeteer.launch() opens a headless Chrome instance.
  • browser.newPage() opens a new blank tab.
  • page.goto() navigates to the URL we want to screenshot
  • page.screenshot() captures and saves the screenshot.
  • Finally run the script:

    node script.js
    

    And voila! You should now have a basic screenshot saved to your filesystem. πŸ₯³

    Now while this works, the output is less than stellar. The screenshot only captures the viewable part of the page based on your default screen size.

    Let's fix that next...

    Capturing Full Page Screenshots with Puppeteer

    To take complete screenshots that scroll the full page height, we need to add fullPage: true in the screenshot options:

    await page.screenshot({path: 'example.png', fullPage: true});
    

    This scrolls down the entire length of the page, stitching together a long continuous screenshot.

    Depending on page height and content, full page screenshots take slightly longer than regular ones as they have more work to do behind the scenes!

    We can also set the explicit viewport size upfront using page.setViewport():

    await page.setViewport({width: 1280, height: 1600})
    

    This makes sure the page renders correctly on bigger screens before capturing.

    Pro Tip: Use wider viewports for readability and set heights taller than the page height to enable vertical scrolling.

    Here's our updated script:

    const puppeteer = require('puppeteer');
    
    (async () => {
    
      const browser = await puppeteer.launch();
    
      const page = await browser.newPage();
    
      await page.setViewport({width: 1280, height: 1600});
    
      await page.goto('<https://www.example.com>');
    
      await page.screenshot({path: 'example.png', fullPage: true});
    
      await browser.close();
    
    })();
    

    Give this a whirl on a long web page and you'll now get a nice, clean full page vertical screenshot!

    Let's keep exploring what else we can do...

    Taking Screenshots of Specific HTML Elements

    Sometimes you may not need the entire page screenshot. Instead, you may want to capture only a particular section, div or element on the page.

    Puppeteer makes this easy with element selectors.

    For this example, let's say we want to screenshot just the navigation bar. The selector copied from the inspect element tool is:

    #main-nav
    

    We can use page.$() to grab this element and elementHandle.screenshot() to capture it:

    const navbar = await page.$('#main-nav');
    
    await navbar.screenshot({path: 'navbar.png'});
    

    The same selector concepts apply as with DOM manipulation in the browser or by libraries like jQuery.

    You can use classes, ids, attribute selectors, or even advanced CSS combinators to isolate specific components on the page!

    Pro Tip: Combine full page and element screenshots to create partial "slices" of pages!

    Let's see another useful screenshot scenario...

    Capturing Multiple Screenshots in a Loop

    A common real-world requirement is needing to screenshot numerous links or pages in one go.

    Say for example, you want to capture screenshots of the first 10 search results for a Google query.

    Here's how to loop through the links and save screenshots with dynamic file names:

    const links = await page.$$('.search-results a');
    
    for(let i = 0; i < 10; i++) {
    
      await links[i].click();
    
      await page.screenshot({path: `result-${i}.png`});
    
      page.goBack();
    
    }
    

    We use page.$$() here to grab all the specific links in an array. This uses document.querySelectorAll() internally.

    We iterate through the first 10, clicking each link, taking a screenshot, then navigate back to return to the results.

    This saves the screenshots with file names like result-1.png, result-2.png etc in a programmatic way.

    The same approach allows batch screenshotting categories, filtered image grids, product listings, galleries, menus and almost anything!

    Pro Tip: Wrap this in an async function to make the script reusable. Pass the base URL, selector and filenames as arguments.

    This covers the main screenshot use cases, but we've still barely scratched the surface of Puppeteer's true powers.

    Stick with me...things are about to get even more exciting! 😎

    Unleashing Puppeteer’s Advanced Superpowers

    While taking basic screenshots is incredibly useful, Puppeteer’s real magic lies in more advanced browser automation scenarios.

    Let’s explore some of these advanced tricks that’ll make you a certified Puppeteer power user!

    Automating Login and Auth Workflows

    Sites these days have gotten really restrictive with login requirements to access certain pages and content.

    Puppeteer helps you automate filling forms, entering data and bypassing login screens seamlessly.

    For example, logging into a site can be broken down into easy steps like:

    // Navigate to login URL
    await page.goto('/login');
    
    // Enter username from credentials file
    await page.type('#username', username);
    
    // Enter password
    await page.type('#password', password);
    
    // Click submit button
    await page.click('.submit');
    
    // Confirm we got past auth wall
    if (page.url().includes('/dashboard')) {
      // We're in! πŸŽ‰
    } else {
      // Invalid login, handle error
    }
    

    Pro Tip: Save logins in a .env file and load them using require('dotenv').config(). Never hardcode credentials!

    Emulate Mobile Devices and Screen Sizes

    Modern responsive sites adapt UI and layouts for mobile vs desktop experiences.

    To test these, we can use page.emulate() to mimic various devices and screen sizes:

    // Emulate iPhone 12
    await page.emulate(iPhone);
    
    // Custom sizes
    await page.emulate({viewport: {width: 400, height: 800}});
    

    We can then take screenshots to confirm UI behavior on varying screen real estate.

    This helps catch missing elements, overflow issues, tiny tap targets etc.

    Simulate Network Speeds and Conditions

    To build truly resilient apps, we need to test bad network scenarios.

    Puppeteer allows throttling CPU, network and even going offline:

    // Simulate Slow 3G
    await page.emulate({
      networkConditions: {
        offline: false,
        downloadThroughput: 750 * 1024 / 8,
        uploadThroughput: 250 * 1024 / 8,
        latency: 400 // ms
      }
    });
    
    // Test offline behavior
    await page.emulate({offline: true});
    

    We can take screenshots after introducing these constraints to judge impacts on page load speeds, image optimisations, fallbacks etc.

    Battle-hardening our apps for the real-world! πŸ’ͺ

    ...and this merely scratches the surface of next-level techniques I've picked up from thousands of hours scraping with Puppeteer!

    But before I got here, I first had to overcome a series of roadblocks. Let's talk troubleshooting...

    Debugging Common Puppeteer Issues

    As a grizzled veteran now, I can proudly say I've made every silly Puppeteer mistake imaginable!

    Let me quickly share solutions to some frequent gotchas I see users run into:

    Problem: Blank/Empty Screenshots

    Ah, the dreaded blank screenshot. This is mostly often caused by...

  • Page not fully loaded before taking screenshot
  • Browser instance closing before screenshot success
  • Solutions:

    // Wait for document body
    await page.waitFor('body');
    
    // Network idle event
    await page.goto(url, {waitUntil: 'networkidle0'});
    
    // Close browser after screenshot promise resolves
    const data = await page.screenshot();
    await browser.close();
    

    Problem: Timeouts and Hangs

    Asynchronous Puppeteer scripts involve lots of waiting. So if one step lags, the whole script may time out or hang.

    Solution:

    // Bump default timeout from 30s if needed
    browser.defaultViewport({timeout: 0});
    
    // Catch errors to handle timeouts gracefully
    try {
      await page.click();
    } catch {
      // React to timeouts
    }
    

    Problem: Page Loads Partially/Incompletely

    If shots miss elements or pages look half loaded, we likely have...

  • Network delays affecting page load
  • Dynamic content still updating on load
  • Solutions:

    // Wait for page load event
    await page.waitForNavigation();
    
    // Network idle + DOMContentLoaded
    await page.goto(url, {waitUntil: ['domcontentloaded', 'networkidle0']});
    
    // Delay after page loads fully
    await page.waitFor(500); // 0.5 sec
    

    These are just a sampling of common issues that once plagued me endlessly!

    Learning the root causes and fixes the hard way so you don't have to. πŸ˜…

    Let's round things off with some final expert level tricks...

    Expert Puppeteer Tips and Tricks πŸ’‘

    Over the years, I've compiled a handy bag of Puppeteer pro tips that I frequently use in my web scraping projects.

    Let me share some of my secret weapons to take your Puppeteer skills to the next level:

    Speed Up Slow Page Loads

    Sites overloaded with ads/trackers bring browsers to a crawl. We can intercept requests and clean these up by blocking unnecessary domains:

    await page.setRequestInterception(true);
    
    page.on('request', request => {
    
      if (request.resourceType() == 'image') {
        request.abort();
      } else {
        request.continue();
      }
    });
    

    This keeps page loads lightning fast by stripping heavyweight elements before they even start downloading!

    Automatically Scroll Pages

    Scrolling long pages with dynamic content can be clunky. We can auto-scroll instead using Puppeteer's built-in smooth scrolling capabilities:

    await page.evaluate(() => {
    
      window.scrollBy(0, window.innerHeight);
    
    });
    

    Just wrap this in a loop to scroll several page lengths at once!

    Craft Flexible Reusable Utils

    Repeating same steps across scripts? Modularize common logic into separate utils files that can be reused or published as packages:

    // utils.js
    async function getScreenshot(page, path) {
    
      await autoScroll(page);
    
      return page.screenshot({path});
    
    }
    
    // index.js
    const { getScreenshot } = require('./utils');
    
    getScreenshot(page, 'file.png');
    

    Build your own Puppeteer toolkit over time! 🧰

    CI Integrations and Automation

    Puppeteer shines when integrated into CI/CD pipelines for automation. Capture screenshots on a schedule or on-demand using services like AWS Lambda:

    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ minute (0 - 59)
    β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ hour (0 - 23)
    β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ day of month (1 - 31)
    β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ month (1 - 12)
    β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ day of week (0 - 6) (Sunday to Saturday;
    β”‚ β”‚ β”‚ β”‚ β”‚                                   7 is also Sunday on some systems)
    β”‚ β”‚ β”‚ β”‚ β”‚
    β”‚ β”‚ β”‚ β”‚ β”‚
    * * * * *  /opt/screenshot
    

    This runs the script every hour fetching fresh screenshots! ⏰

    Browse by tags:

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!