Jan 16th, 2021




Websites with infinite page scrolls are rendered using AJAX. It calls back to the server for extra content as the user pages down the page.

One of the ways of scraping data like this is to simulate the browser, allow the javascript to fire the ajax, and also to simulate a page scroll.

Puppeteer is the best tool to do that. It controls the Chromium browser behind the scenes.

Let’s take a Quora answers pages as an example of an infinite scroll page. In this example, we will try to load the page and scroll down till we reach the end of the content and then take a screenshot of the page to our local disk.

Let’s install the puppeteer first.

mkdir quora_scraper
cd quora_scraper
npm install --save puppeteer

Then create a file like this and save it in the quora_scraper folder. Call it quora_scroll.js

const fs = require('fs');
const puppeteer = require('puppeteer');

(async () => {
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
await page.goto('https://www.quora.com/Which-one-is-the-best-data-scraping-services');
await page.setViewport({
width: 1200,
height: 800
});

await autoScroll(page);// keep scrolling till resolution


await page.screenshot({
path: 'quora.png',
fullPage: true
});

await browser.close();
})();

async function autoScroll(page){
await page.evaluate(async () => {
await new Promise((resolve, reject) => {
var totalHeight = 0;
var distance = 100;
var timer = setInterval(() => {
var scrollHeight = document.body.scrollHeight;
window.scrollBy(0, distance);
totalHeight = distance;

//a few of the last scrolling attempts have brought no new
//data so the distance we tried to scroll is now greater
//than the actual page height itself

if(totalHeight >= scrollHeight){
clearInterval(timer);//reset
resolve();
}
}, 100);
});
});
}

Now run it by the command.

node quora_scroll.js

It should open the Chromium browser, and you should be able to see the page scroll in action.

Once done, you will find a rather large file called quora.png in your folder.

For further reading read the article How To Scrape Quora Using Puppeteer.

The author is the founder of Proxies API the rotating proxies service.

Share this article:

Get our articles in your inbox

Dont miss our best tips/tricks/tutorials about Web Scraping
Only great content, we don’t share your email with third parties.
Icon