The Ultimate Cheerio Web Scraping Cheat Sheet

Cheerio is a fast, flexible web scraping library for Node.js. This cheat sheet provides a comprehensive reference of its syntax and capabilities.

Capabilities Covered

Installation

Loading HTML

Selectors

DOM Traversal

DOM Manipulation

Information

Looping

Output

Plugins

Debugging

Rate Limiting

Caching

Best Practices

Real World Examples

Installation

Install via npm:

npm install cheerio

Or Yarn:

yarn add cheerio

Loading HTML

Load markup into Cheerio for parsing:

From String:

const $ = cheerio.load('<h2 class="title">Hello</h2>')

From File:

const fs = require('fs');
const $ = cheerio.load(fs.readFileSync('index.html'));

From URL:

const $ = cheerio.load(await axios('<https://example.com>'));

From JSON:

const data = {foo: 'bar'};
const $ = cheerio.load(JSON.stringify(data));

Selectors

Query DOM elements using CSS selector syntax:

IDs:

$('#my-id');

Classes:

$('.my-class');

Tags:

$('ul'); // <ul>
$('li'); // <li>

Attributes:

$('a[target=_blank]');

Multiple Classes:

$('.class1.class2');

Wildcards:

$('*'); // All elements

Chained:

$('.outer').find('.inner');

Pseudo Selectors:

$('a:first');
$('div:last');
$('li:nth-child(3)');
$('a:contains("text")');

DOM Traversal

Navigate between nodes:

Parents:

$('.child').parent();

Children:

$('.parent').children();

Siblings:

$('.first-child').next();
$('.last-child').prev();

Filtering:

$('.parent').filter('.special').text();

Traverse Up:

$('.child').closest('.ancestor');
$('.child').parentsUntil('.grandparent');

Traverse Down:

$('.parent').find('.child');

DOM Manipulation

Modify elements and content:

Set Text:

$('h1').text('New Text');

Set HTML:

$('button').html('<b>Save</b>');

Add Class:

$('.box').addClass('blue');

Remove Class:

$('.box').removeClass('blue');

Toggle Class:

$('.box').toggleClass('highlighted');

Set Attributes:

$('input[type="text"]').attr('name', 'username');

Append:

$('ul').append('<li class="new">New</li>');

Prepend:

$('ul').prepend('<li class="new">New</li>');

Before:

$('li.third').before('<li class="second">Second</li>');

After:

$('li.third').after('<li class="fourth">Fourth</li>');

Remove:

$('.deleted').remove();

Wrap Inner:

$('.message').wrapInner('<b></b>');

Unwrap:

$('b').unwrap();

Information

Extract info from elements:

Text:

$('h1').text();

HTML:

$('div').html();

Value:

$('input[name=first_name]').val();

Attribute:

$('a').attr('href');

Data Attribute:

$('.user').data('id');

Looping

Iterate through elements:

Each:

$('li').each((i, el) => {
  // element logic
});

Map:

const urls = $('li a').map((i, el) => $(el).attr('href')).get();

Reduce:

const total = $('.product').reduce((sum, el) => {
  const price = $(el).data('price');
  return sum + price;
}, 0);

Filter:

const special = $('.product').filter((i, el) => {
  return $(el).data('special');
}).get();

Output

Render final output:

Full HTML:

$.html();

Outer HTML:

$('.box').html();

Text:

$('.message').text();

JSON:

JSON.stringify($('.box').map((i, el) => {
   // map to object
 }).get());

Save File:

fs.writeFileSync('page.html', $.html());

HTTP Response:

res.send($.html());

Plugins

Extend functionality:

Images:

const images = require('cheerio-image-loader')

images($, '.product img')
  .then(/* ... */)

Videos:

const videos = require('cheerio-video')

videos($).attr('src', '<https://example.com/trailer.mp4>')

SVG:

const svg = require('cheerio-svg-parser')

svg.parse($.html()).svg() // SVG DOM

Debugging

Log and inspect output:

Elements:

console.log($('.item'));

HTML:

console.log($.html());

JSON:

console.log(JSON.stringify($('.item').map((i, el) => {
  return $(el).text();
}).get()));

Node REPL:

const repl = require('repl');
repl.start('> ').context.$_ = $;

Rate Limiting

Control request speed:

Simple Delay:

await new Promise(resolve => setTimeout(resolve, 1000));

Queue:

const queue = new PQueue({ concurrency: 2 });

queue.add(() => {
  // Request code
})

Bottleneck:

const limiter = new Bottleneck({
  minTime: 1000
});

limiter.schedule(() => {
  // Request code
});

Caching

Save responses:

In-Memory:

let cache = {};

const url = '<https://example.com>';
if (cache[url]) {
  return cache[url];
} else {
  const resp = await fetch(url);
  cache[url] = resp;
  return resp;
}

Redis:

const redis = require('redis');
const client = redis.createClient();

const key = `cache:${url}`;
const cached = await client.get(key);

if (cached) {
  return JSON.parse(cached);
} else {
  const resp = await fetch(url);
  client.set(key, JSON.stringify(resp), 'EX', 3600);
  return resp;
}

Best Practices

Tips for effective web scraping:

Use CSS/DOM selectors over regex for parsing HTML

Validate schemas for consistency

Rotate proxies/headers to prevent blocking

Cache duplicate requests

Limit request rate to avoid flooding servers

Use asynchronous logic to maximize throughput

Real World Examples

Common use cases:

Scrape pricing data from ecommerce sites

Build aggregated feeds from multiple news sources

Compile research datasets from public websites

Monitor website changes for broken link checking

Archive old versions of web pages for historical records

Extract structured data from HTML tables

Populate headless CMS with imported content

Run SEO audits by extracting on-page content

Train ML classifiers on HTML data

Process files of markup for analysis

And that covers the full range of Cheerio's syntax and capabilities. With this handy reference, you can scrape the web more effectively!

The Ultimate Cheerio Web Scraping Cheat Sheet

Capabilities Covered

Installation

Loading HTML

Selectors

DOM Traversal

DOM Manipulation

Information

Looping

Output

Plugins

Debugging

Rate Limiting

Caching

Best Practices

Real World Examples

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

The Ultimate Cheerio Web Scraping Cheat Sheet

Capabilities Covered

Installation

Loading HTML

Selectors

DOM Traversal

DOM Manipulation

Information

Looping

Output

Plugins

Debugging

Rate Limiting

Caching

Best Practices

Real World Examples

The easiest way to do Web Scraping

Don't leave just yet!