Capturing Screenshots with Puppeteer - An advanced guide

What is Puppeteer?

In a nutshell, Puppeteer is a Node.js library that provides an API for controlling headless Chrome.

I know that sounds kinda geeky, but here's what it really means:

It lets you automate everything that you'd normally do manually in a browser - click buttons, fill forms, scroll pages, press keys etc.
It runs everything inside a headless Chrome instance i.e. there's no actual browser UI. It all happens silently inside the background.

This makes it super fast and ideal for web scraping/automation tasks!

And as I discovered in my early trials and tribulations, Puppeteer also makes capturing web screenshots simple, flexible and reliable.

Let me walk you through it step-by-step...

Getting Setup with Puppeteer

I'll assume you already have Node.js installed on your machine. If not, grab the latest LTS release for your OS here first.

The good news is installing Puppeteer takes just one command! 💪

npm install puppeteer

This downloads the Puppeteer package and a bundled Chromium binary that it will use under the hoods.

You can also install Puppeteer globally with the -g flag, but local installs are considered a best practice. If you run into permission issues though on Mac/Linux, you may need elevated privileges:

sudo npm install -g puppeteer

Once the install finishes, you're all setup for screenshot magic!

Let's start with a basic example...

Taking Your First Puppeteer Screenshot

Create a new file and import Puppeteer:

const puppeteer = require('puppeteer');

Then add the following async script:

(async () => {

  const browser = await puppeteer.launch();

  const page = await browser.newPage();

  await page.goto('<https://www.example.com>');

  await page.screenshot({path: 'example.png'});

  browser.close();

})();

Breaking this down:

puppeteer.launch() opens a headless Chrome instance.

browser.newPage() opens a new blank tab.

page.goto() navigates to the URL we want to screenshot

page.screenshot() captures and saves the screenshot.

Finally run the script:

node script.js

And voila! You should now have a basic screenshot saved to your filesystem. 🥳

Now while this works, the output is less than stellar. The screenshot only captures the viewable part of the page based on your default screen size.

Let's fix that next...

Capturing Full Page Screenshots with Puppeteer

To take complete screenshots that scroll the full page height, we need to add fullPage: true in the screenshot options:

await page.screenshot({path: 'example.png', fullPage: true});

This scrolls down the entire length of the page, stitching together a long continuous screenshot.

Depending on page height and content, full page screenshots take slightly longer than regular ones as they have more work to do behind the scenes!

We can also set the explicit viewport size upfront using page.setViewport():

await page.setViewport({width: 1280, height: 1600})

This makes sure the page renders correctly on bigger screens before capturing.

Pro Tip: Use wider viewports for readability and set heights taller than the page height to enable vertical scrolling.

Here's our updated script:

const puppeteer = require('puppeteer');

(async () => {

  const browser = await puppeteer.launch();

  const page = await browser.newPage();

  await page.setViewport({width: 1280, height: 1600});

  await page.goto('<https://www.example.com>');

  await page.screenshot({path: 'example.png', fullPage: true});

  await browser.close();

})();

Give this a whirl on a long web page and you'll now get a nice, clean full page vertical screenshot!

Let's keep exploring what else we can do...

Taking Screenshots of Specific HTML Elements

Sometimes you may not need the entire page screenshot. Instead, you may want to capture only a particular section, div or element on the page.

Puppeteer makes this easy with element selectors.

For this example, let's say we want to screenshot just the navigation bar. The selector copied from the inspect element tool is:

#main-nav

We can use page.$() to grab this element and elementHandle.screenshot() to capture it:

const navbar = await page.$('#main-nav');

await navbar.screenshot({path: 'navbar.png'});

The same selector concepts apply as with DOM manipulation in the browser or by libraries like jQuery.

You can use classes, ids, attribute selectors, or even advanced CSS combinators to isolate specific components on the page!

Pro Tip: Combine full page and element screenshots to create partial "slices" of pages!

Let's see another useful screenshot scenario...

Capturing Multiple Screenshots in a Loop

A common real-world requirement is needing to screenshot numerous links or pages in one go.

Say for example, you want to capture screenshots of the first 10 search results for a Google query.

Here's how to loop through the links and save screenshots with dynamic file names:

const links = await page.$$('.search-results a');

for(let i = 0; i < 10; i++) {

  await links[i].click();

  await page.screenshot({path: `result-${i}.png`});

  page.goBack();

}

We use page.$$() here to grab all the specific links in an array. This uses document.querySelectorAll() internally.

We iterate through the first 10, clicking each link, taking a screenshot, then navigate back to return to the results.

This saves the screenshots with file names like result-1.png, result-2.png etc in a programmatic way.

The same approach allows batch screenshotting categories, filtered image grids, product listings, galleries, menus and almost anything!

Pro Tip: Wrap this in an async function to make the script reusable. Pass the base URL, selector and filenames as arguments.

This covers the main screenshot use cases, but we've still barely scratched the surface of Puppeteer's true powers.

Stick with me...things are about to get even more exciting! 😎

Unleashing Puppeteer’s Advanced Superpowers

While taking basic screenshots is incredibly useful, Puppeteer’s real magic lies in more advanced browser automation scenarios.

Let’s explore some of these advanced tricks that’ll make you a certified Puppeteer power user!

Automating Login and Auth Workflows

Sites these days have gotten really restrictive with login requirements to access certain pages and content.

Puppeteer helps you automate filling forms, entering data and bypassing login screens seamlessly.

For example, logging into a site can be broken down into easy steps like:

// Navigate to login URL
await page.goto('/login');

// Enter username from credentials file
await page.type('#username', username);

// Enter password
await page.type('#password', password);

// Click submit button
await page.click('.submit');

// Confirm we got past auth wall
if (page.url().includes('/dashboard')) {
  // We're in! 🎉
} else {
  // Invalid login, handle error
}

Pro Tip: Save logins in a .env file and load them using require('dotenv').config(). Never hardcode credentials!

Emulate Mobile Devices and Screen Sizes

Modern responsive sites adapt UI and layouts for mobile vs desktop experiences.

To test these, we can use page.emulate() to mimic various devices and screen sizes:

// Emulate iPhone 12
await page.emulate(iPhone);

// Custom sizes
await page.emulate({viewport: {width: 400, height: 800}});

We can then take screenshots to confirm UI behavior on varying screen real estate.

This helps catch missing elements, overflow issues, tiny tap targets etc.

Simulate Network Speeds and Conditions

To build truly resilient apps, we need to test bad network scenarios.

Puppeteer allows throttling CPU, network and even going offline:

// Simulate Slow 3G
await page.emulate({
  networkConditions: {
    offline: false,
    downloadThroughput: 750 * 1024 / 8,
    uploadThroughput: 250 * 1024 / 8,
    latency: 400 // ms
  }
});

// Test offline behavior
await page.emulate({offline: true});

We can take screenshots after introducing these constraints to judge impacts on page load speeds, image optimisations, fallbacks etc.

Battle-hardening our apps for the real-world! 💪

...and this merely scratches the surface of next-level techniques I've picked up from thousands of hours scraping with Puppeteer!

But before I got here, I first had to overcome a series of roadblocks. Let's talk troubleshooting...

Debugging Common Puppeteer Issues

As a grizzled veteran now, I can proudly say I've made every silly Puppeteer mistake imaginable!

Let me quickly share solutions to some frequent gotchas I see users run into:

Problem: Blank/Empty Screenshots

Ah, the dreaded blank screenshot. This is mostly often caused by...

Page not fully loaded before taking screenshot

Browser instance closing before screenshot success

Solutions:

// Wait for document body
await page.waitFor('body');

// Network idle event
await page.goto(url, {waitUntil: 'networkidle0'});

// Close browser after screenshot promise resolves
const data = await page.screenshot();
await browser.close();

Problem: Timeouts and Hangs

Asynchronous Puppeteer scripts involve lots of waiting. So if one step lags, the whole script may time out or hang.

Solution:

// Bump default timeout from 30s if needed
browser.defaultViewport({timeout: 0});

// Catch errors to handle timeouts gracefully
try {
  await page.click();
} catch {
  // React to timeouts
}

Problem: Page Loads Partially/Incompletely

If shots miss elements or pages look half loaded, we likely have...

Network delays affecting page load

Dynamic content still updating on load

Solutions:

// Wait for page load event
await page.waitForNavigation();

// Network idle + DOMContentLoaded
await page.goto(url, {waitUntil: ['domcontentloaded', 'networkidle0']});

// Delay after page loads fully
await page.waitFor(500); // 0.5 sec

These are just a sampling of common issues that once plagued me endlessly!

Learning the root causes and fixes the hard way so you don't have to. 😅

Let's round things off with some final expert level tricks...

Expert Puppeteer Tips and Tricks 💡

Over the years, I've compiled a handy bag of Puppeteer pro tips that I frequently use in my web scraping projects.

Let me share some of my secret weapons to take your Puppeteer skills to the next level:

Speed Up Slow Page Loads

Sites overloaded with ads/trackers bring browsers to a crawl. We can intercept requests and clean these up by blocking unnecessary domains:

await page.setRequestInterception(true);

page.on('request', request => {

  if (request.resourceType() == 'image') {
    request.abort();
  } else {
    request.continue();
  }
});

This keeps page loads lightning fast by stripping heavyweight elements before they even start downloading!

Automatically Scroll Pages

Scrolling long pages with dynamic content can be clunky. We can auto-scroll instead using Puppeteer's built-in smooth scrolling capabilities:

await page.evaluate(() => {

  window.scrollBy(0, window.innerHeight);

});

Just wrap this in a loop to scroll several page lengths at once!

Craft Flexible Reusable Utils

Repeating same steps across scripts? Modularize common logic into separate utils files that can be reused or published as packages:

// utils.js
async function getScreenshot(page, path) {

  await autoScroll(page);

  return page.screenshot({path});

}

// index.js
const { getScreenshot } = require('./utils');

getScreenshot(page, 'file.png');

Build your own Puppeteer toolkit over time! 🧰

CI Integrations and Automation

Puppeteer shines when integrated into CI/CD pipelines for automation. Capture screenshots on a schedule or on-demand using services like AWS Lambda:

┌───────────── minute (0 - 59)
│ ┌───────────── hour (0 - 23)
│ │ ┌───────────── day of month (1 - 31)
│ │ │ ┌───────────── month (1 - 12)
│ │ │ │ ┌───────────── day of week (0 - 6) (Sunday to Saturday;
│ │ │ │ │                                   7 is also Sunday on some systems)
│ │ │ │ │
│ │ │ │ │
* * * * *  /opt/screenshot

This runs the script every hour fetching fresh screenshots! ⏰

Capturing Screenshots with Puppeteer - An advanced guide

What is Puppeteer?

Getting Setup with Puppeteer

Taking Your First Puppeteer Screenshot

Taking Screenshots of Specific HTML Elements

Capturing Multiple Screenshots in a Loop

Unleashing Puppeteer’s Advanced Superpowers

Automating Login and Auth Workflows

Emulate Mobile Devices and Screen Sizes

Simulate Network Speeds and Conditions

Battle-hardening our apps for the real-world! 💪

Debugging Common Puppeteer Issues

Expert Puppeteer Tips and Tricks 💡

Speed Up Slow Page Loads

Automatically Scroll Pages

Craft Flexible Reusable Utils

CI Integrations and Automation

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

Capturing Screenshots with Puppeteer - An advanced guide

What is Puppeteer?

Getting Setup with Puppeteer

Taking Your First Puppeteer Screenshot

Taking Screenshots of Specific HTML Elements

Capturing Multiple Screenshots in a Loop

Unleashing Puppeteer’s Advanced Superpowers

Automating Login and Auth Workflows

Emulate Mobile Devices and Screen Sizes

Simulate Network Speeds and Conditions

Battle-hardening our apps for the real-world! 💪

Debugging Common Puppeteer Issues

Expert Puppeteer Tips and Tricks 💡

Speed Up Slow Page Loads

Automatically Scroll Pages

Craft Flexible Reusable Utils

CI Integrations and Automation

The easiest way to do Web Scraping

Don't leave just yet!