Feb 16th, 2021

How To Scrape Quora Using Puppeteer

In this example, we will try to load a Quora answers page and scroll down till we reach the end of the content and then take a screenshot of the page to our local disk. We will also try and scrape all the answers and save them as a JSON page on your drive.

We are going to scrape this page accurately.

This one has more than 20 answers running into multiple pages, but not all of them load unless you scroll down.

Quora uses an infinite scroll page. Websites with endless page scrolls are basically rendered using AJAX. It calls back to the server for extra content as the user pages down the page.

One of the ways of scraping data like this is to simulate the browser, allow the javascript to fire the ajax, and also to simulate a page scroll.

Puppeteer is the best tool to do that. It controls the Chromium browser behind the scenes.

Let’s install Puppeteer first.

mkdir quora_scraper
cd quora_scraper
npm install --save puppeteer

Then create a file like this and save it in the quora_scraper folder. Call it quora_scroll.js

const fs = require('fs');
const puppeteer = require('puppeteer');

(async () => {
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
await page.goto('https://www.quora.com/Which-one-is-the-best-data-scraping-services');
await page.setViewport({
width: 1200,
height: 800
});

await autoScroll(page);// keep scrolling till resolution


await page.screenshot({
path: 'quora.png',
fullPage: true
});

await browser.close();
})();

async function autoScroll(page){
await page.evaluate(async () => {
await new Promise((resolve, reject) => {
var totalHeight = 0;
var distance = 100;
var timer = setInterval(() => {
var scrollHeight = document.body.scrollHeight;
window.scrollBy(0, distance);
totalHeight = distance;

//a few of the last scrolling attempts have brought no new
//data so the distance we tried to scroll is now greater
//than the actual page height itself

if(totalHeight >= scrollHeight){
clearInterval(timer);//reset
resolve();
}
}, 100);
});
});
}

Now run it by the command.

node quora_scroll.js

It should open the Chromium browser, and you should be able to see the page scroll in action.

Once done, you will find a rather large file called quora.png in your folder.

Now let’s add some more code to this to scrape the HTML collected after all the scrolling to get the answers and the details of the users who posted the answers

We need to find the elements containing the user’s name and also the solution. If you inspect the HTML in Chrome’s Inspect tool, you will find the two parts with the class names, user, and ui_qtext_rendered_qtext contain the user’s name and their answer, respectively.

Puppeteer allows you to use CSS selectors to extract data using the querySelectorAll command like this.

var answers = await page.evaluate(() => {
var Answerrers = document.querySelectorAll('.user'); //gets the user's name
var Answers = document.querySelectorAll('.ui_qtext_rendered_qtext');//gets the answer

var titleLinkArray = [];
for (var i = 0; i < Answerrers.length; i ) {
titleLinkArray[i] = {

Answerrer: Answerrers[i].innerText.trim(),
Answer: Answers[i].innerText.trim(),

};

}
return titleLinkArray;
});

We can put this code right after the page scrolling has finished, and so the whole code will look like this.

const fs = require('fs');
const puppeteer = require('puppeteer');

(async () => {
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
await page.goto('https://www.quora.com/Which-one-is-the-best-data-scraping-services');
//await page.goto('https://www.quora.com/Is-data-scraping-easy');
await page.setViewport({
width: 1200,
height: 800
});

await autoScroll(page);// keep scrolling till resolution

var answers = await page.evaluate(() => {
var Answerrers = document.querySelectorAll('.user');
var Answers = document.querySelectorAll('.ui_qtext_rendered_qtext');

var titleLinkArray = [];
for (var i = 0; i < Answerrers.length; i ) {
titleLinkArray[i] = {

Answerrer: Answerrers[i].innerText.trim(),
Answer: Answers[i].innerText.trim(),

};

}
return titleLinkArray;
});
console.log(answers);


await page.screenshot({
path: 'quora.png',
fullPage: true
});
console.log("The screenshot has been saved!");

await browser.close();
})();

async function autoScroll(page){
await page.evaluate(async () => {
await new Promise((resolve, reject) => {
var totalHeight = 0;
var distance = 100;
var timer = setInterval(() => {
var scrollHeight = document.body.scrollHeight;
window.scrollBy(0, distance);
totalHeight = distance;

if(totalHeight >= scrollHeight){//a few of the last scrolling attempts have brought no new data so the distance we tried to scroll is now greater than the actual page height itself
clearInterval(timer);//reset
resolve();
}
}, 100);
});
});
}

Now run it with

node quora_scroll.js

It will print the Answers scraped onto the console when you run it.

Now let’s go further and save it as a JSON file…

fs.writeFile("quora_answers.json", JSON.stringify(answers), function(err) {
if (err) throw err;
console.log("The answers have been saved!");
});

And putting it all together

const fs = require('fs');
const puppeteer = require('puppeteer');

(async () => {
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
await page.goto('https://www.quora.com/Which-one-is-the-best-data-scraping-services');
//await page.goto('https://www.quora.com/Is-data-scraping-easy');
await page.setViewport({
width: 1200,
height: 800
});

await autoScroll(page);// keep scrolling till resolution

var answers = await page.evaluate(() => {
var Answerrers = document.querySelectorAll('.user');
var Answers = document.querySelectorAll('.ui_qtext_rendered_qtext');

var titleLinkArray = [];
for (var i = 0; i < Answerrers.length; i ) {
titleLinkArray[i] = {

Answerrer: Answerrers[i].innerText.trim(),
Answer: Answers[i].innerText.trim(),

};

}
return titleLinkArray;
});
console.log(answers);

fs.writeFile("quora_answers.json", JSON.stringify(answers), function(err) {
if (err) throw err;
console.log("The answers have been saved!");
});

await page.screenshot({
path: 'quora.png',
fullPage: true
});
console.log("The screenshot has been saved!");

await browser.close();
})();

async function autoScroll(page){
await page.evaluate(async () => {
await new Promise((resolve, reject) => {
var totalHeight = 0;
var distance = 100;
var timer = setInterval(() => {
var scrollHeight = document.body.scrollHeight;
window.scrollBy(0, distance);
totalHeight = distance;

if(totalHeight >= scrollHeight){//a few of the last scrolling attempts have brought no new data so the distance we tried to scroll is now greater than the actual page height itself
clearInterval(timer);//reset
resolve();
}
}, 100);
});
});
}

Now run it.

node quora_scroll.js

Once it is run, you will find the file quora_answers.json with the following results in it.

[{"Answerrer":"Cahide Gunes","Answer":"Which one is the best data scraping services?"},{"Answerrer":"Ethan Connor","Answer":"There are many data scraping services available on the market nowadays and you can see some of them below:\n\nImport io\nWebhose io\nScrapinghub\nVisual Scraper and etc.\n\n8 years ago, when Price2Spy was launched back in springtime 2011, there weren't many data scraping services.\n\nEver since Price2Spy was born, we have gained huge experience in data scraping that could help your business gather any valuable data in bulk.\n\nSo, if you are looking for data scraping service that tends to provide 'best' data scraping experience, please feel free to check out Price2Spy.\n\nBelow, you can see some of the key technical..."},{"Answerrer":"Ashley Ng","Answer":"Who is best at data scraping?"},{"Answerrer":"Ricky Arnold","Answer":"Is data scraping easy?"},{"Answerrer":"Eric Frazer","Answer":"What is the best way to scrape data from a website?"},{"Answerrer":"State Bank of India (SBI)","Answer":"I need to extract data from a website. What web scraping tool is the best?"},{"Answerrer":"Suresh Sharma","Answer":"What are some of the best web data scraping tools?"},{"Answerrer":"Ankitha Sumit","Answer":"I have tried several data scraping services like companies that provide data scraping & data mining services, Freelancers & data scraping tools also. But the best I found is Projjal K’s data scraping services on Fiverr. He really does amazing job. Fast, reliable & really friendly to his clients. I have recommeded him to many of my clients, colleagues and friends who were looking for a good data scraping service.\n\nYou guys can contact him from the links below:\n\nData Scraping & data mining service by Projjal K. on Fiverr.\nData Extraction service from any website & directory by Projjal.\nData Scrapin..."},{"Answerrer":"Priyanka Rathee","Answer":"Octoparse, the mission is to make data accessible to anyone, in particular, those without a technical background. In the long run, we aim to automate the web data collection process as easy as any everyday software, such as Excel and Word. In essence, we believe that data is becoming so important that everyone should be equipped with the ability to source the data they need without having to own a degree in Computer Science. So we would like to take the job from a Tech's hand and put it into a typical non-tech person's hand.\n\nThe Octoparse Data Extraction Solution provides is a highly flexibl..."},{"Answerrer":"Joel Daly","Answer":"First, let’s understand what makes a data scraping service good.\n\nThere are a number of things to consider before choosing a service:\n\nQuality of the data provided: The quality of the data depends both on the accuracy and freshness of it.\nAbility to meet the deadlines: One of the reasons we need data scraping services is that they can collect data very fast and deliver it to us. The timeframe also depends on the quantity of data needed. However a good service should be able to provide data in the set timeframe.\nTechnology/skilled staff: both the technology used and the skills of the staff will im..."},{"Answerrer":"Sam Degeran","Answer":"What are some of the best web data scraping tools?"},{"Answerrer":"Keerthi","Answer":"What is the best and most efficient way to scrape a website?"},{"Answerrer":"Skone Cosmetics","Answer":"If scraping data from Amazon, Alibaba and other websites is illegal, then how do dropshipping sites like Pricematik, ProfitScraper and others ..."},{"Answerrer":"Henry Obinna","Answer":"Is it worth paying for a web scraping service?"},{"Answerrer":"Mason Harlin","Answer":"What are the easiest and best web scraping tools for non-technical people? I want to collect data from a few sites and have it dumped into Exc..."},{"Answerrer":"Timothy Lewis","Answer":"Content scraping (also referred to as web scraping or data scraping) is nothing but lifting off unique/original content from other websites and publishing it elsewhere. ...Content scrapers typically copy the entire content and pass it off as their own content.\n\nThe Web Scraping Services:\n\nScrape Financial and Stock Market Data\nWeb Scraping for Sales Lead Generation\nWeb Scraping Sentiment Analysis\nScraping Product Prices and Reviews\nWeb Scraping Mobile Application\nScrape Airline Websites\nGoogle Scraper\nWeb Screen Scraping\nDoctors and Lawyers Data Scraping\nWeb Scraping Car Data\n\nInfovium web scraping compan..."},{"Answerrer":"Pritpal Singh","Answer":"What are the details of the Global Ed-Vantage loan offered by the State Bank of India? What is the interest rate offered by SBI..."},{"Answerrer":"Jacob Martirosyan","Answer":"According to an article published in Live Mint on 17.08.2018, while the government has stepped up its efforts to improve the quality of higher education institutions in India, more and more Indian students seem to be preferring to study abroad. According to the Reserve Bank of India (RB..."},{"Answerrer":"Data iSolutions","Answer":"Hello there,\n\nData scraping is all about extracting data from the World Wide Web for various reasons. Extractive data from the web can be extremely beneficial for businesses. It can help you gain insight into the current market trend, your potential customers, your competitors and that’s just the beginning.\n\nThere are two ways to scrape data from a website, either you can write code and scrape data on your own, or you can scrape data with the help of certain scraping tools. But, know that automated tools would not be able to scrape data from each and every website. And, writing scraping codes r..."},{"Answerrer":"Ankita Banerjee","Answer":"We can list N number of data scraping service providers, but the thing is we have to opt for the best service provider in terms of cost, time and the excellence in scraping tricky website with quality output.\n\nSearch for a web creeping specialist organization with transparent and straightforward evaluating. Evaluating models that are exceptionally mind-boggling are regularly irritating and May even imply that they have shady concealed expenses.\n\nEven I had tried many data scraping services but the one which I found the best is outsourcebigdata, they are so much willing in providing quality outp..."},{"Answerrer":"Alessander Conossel","Answer":"Checkout Agenty!\n\nAgenty is SaaS(Software as a service) based scraping tool. You can run from any browser by accessing through the url. It provides a complete toolkit for data extractions.\n\nIt offers the services according to your needs and requirement.\n\nIt provides cloud hosted web scraping agent to scrape data according to your choice.\nManaged services: Feel free to build, maintain and host your data scraping project. Expert team do all for you.\nExpert Agent Setup: If you don’t know how to setup your agent. Request a setup quote by expert team.\n\nIt has most of the advance feature required in a data..."},{"Answerrer":"Olga Bamatter","Answer":"Divinfosys. COM top web-scraping company in India.They can do amazon and all ecommerce scraping application. if you are looking for a fully managed web scraping service with most affordable web scraping solutions compare to other service provider. There’s are many great web scraping tools out there. These are currently popular tools for collecting web data.\n\nDivinfosys is the right place. They can deliver the data in various popular document formats like XML, excel and CSV and also the websites which are login or PDF\n\nbased too. It is located in India.About my knowledge company in my mind which..."},{"Answerrer":"Halina Makeeva","Answer":"When choosing a crawler services, you need to keep an eye on few things - speed, quality, and price for the product you are about to get. Obviously, would recommend do do your own research on all the available tools and compare them.\nPersonally, from all the tools i have seen available online, Real-Time Crawler by Oxylabs, seems very interesting to me.\nFirst of all, you need to have minimal knowledge of scraping itself, as it offers graphical UI for simple users. Person doesnt need to combine proxies and scraping bot itself, as its just an app, that you install, and are ready to use.\nAs pe..."},{"Answerrer":"Brandon Dotson","Answer":"Divinfosys Software company in India, Best Web Design and Development Company.\n\none of the top web-scraping companies in India. if you are looking for a fully managed web scraping service with most affordable web scraping solutions compare to other service provider.\n\nDivinfosys is the right place. They can deliver the data in various popular document formats like XML, excel and CSV and also the websites which are login or PDF\n\nbased too. It is located in India.About my knowledge company in my mind which has been done 2000  projects done in web scraping.\n\nAlso they can do Ecommerce Based Scraping,Pr..."},{"Answerrer":"Ishtiaq","Answer":"Greetings of the day,\n\nWell, as per my experience, Botscraper is the best provider of data scraping services. I know, a lot of people would have told you that hiring professional data scraping experts will be very costly for you and you should get automated data scraping tools to get the job done.\n\nIf that’s true, know that automated data scraping tools would not be able to scrape data from certain websites, but, data scraping experts can develop custom scrappers to scrape any websites to get the information that you require. That being said, I would also like to let you know that web scraping ..."},{"Answerrer":"X-Byte Enterprise Crawling","Answer":"Best data scraping service? Best is a subjective term to use here, most people would recommend the web scraping service they represent or work for without highlighting what makes a good web scraping service or tool.\n\nTo the Op, what you need is a good web scraper that’s able to handle all your web scraping tasks and requests while delivering optimum quality at the fastest rate possible.\n\nHere are things you should consider when selecting your ‘best’ web scraper:\n\nSpeed.\nSupported webpage types.\nQaulity of scrapped data.\nPrice.\nAbility to handle web scraping challenges.\nSupport for Anonymity?\nHow fast c..."},{"Answerrer":"Rajesh Rajonia","Answer":"Octoparse is a good bet if you are not that well rounded when it comes to scraping. The software is quite self explanatory and user friendly (there’s loads of tutorials and such if something is unclear).\n\nAnother thing that you will need is some kind of pool of residential proxies, as sending a more than a few automated requests will result in your IP address being blocked (data center ones can work on some websites too, but most pages have a somewhat sophisticated security measures which are reacting to IP addresses which does not look like unique human visitors)."},{"Answerrer":"Prithwi Mondal","Answer":"Data scraping is the same as web harvesting and web data extraction. It essentially means gathering data from websites. This type of software can access the internet through a web browser or a HTP. There are many different kinds of data scraping but it really just means accumulating information from other sites, some can be significantly sketchier than others.\n\nThe site I use to get intensive eCommerce data is Algopix. Algopix analyzes over 16 global marketplaces. Their data includes real-time transactions from all the site and is run through proprietary algorithms to provide you with product..."},{"Answerrer":"Norman Dicerto","Answer":"You guys can check ( Webnyze: Fully Managed Web Scraping Service ).\nThey are best of best and quick response and clean data delivery which keep them on top.\n\nFollowing are some Features that webnyze Provides:\n\nAlso in website their is a live price tracking Graph/Visualisation is present for Uk and German ( Iphone x and Iphone x Max )\n\nTo check this live visualisation click on following link: Iphone x and Max Live Price Change Graph\n\nTo Download Free Sample Data refer following link : Data Visualization | Webnyze\n\nFor more Details you can submit form\nHarry Bajwa\nClick to submit"},{"Answerrer":"Preetish Panda","Answer":"There are many great data scraping services out there.\n\nIn order to choose the best you need to follow these criteria:\n\nReliability\nWell-established companies tend to have years of successful experience behind their back. Naturally, they are the best fit.\n\nSpeed\nThis one explains itself. The faster- the better.\n\nPrice\nThe most sensitive part is the cost of the service. Price is very important, especially for a small company or a startup, who can't go around and splash the cash.\n\nI strongly recommend DataHen. You can request a Free Quote, and they will tell you how much the scraping service cost and..."},{"Answerrer":"Aisha Danna","Answer":"In today's era, as all knows data is oil. For better market analysis , Market research and expand your business growth you must review data statics.\n\nYou have to build your database for perfect analysis\n\nWeb scraping technique best for devlope your database.\n\nData iSolutions is a one of top companies in India who has been serving IT - BPO services (Web Scraping Service) for last 15 years.\n\nData iSolutions India's well known outsourcing company who is providing web scraping service in affordable price with excellent quality of work.\n\nWith use of, latest technology(Python, VBA, Ruby and Aws Cloud) Data..."},{"Answerrer":"Rajendra Parouha","Answer":"Data is the core of any business. No great business decisions have ever been made on qualitative data. Every business in some way or the other depends on data to help them make decisions. This is a data-driven world and businesses needs to be constantly vigilant and updated with the data. And Binaryfolks knows to scrape any data, any size, anywhere - Automatically!\n\nLet me at this very initial stage clarify that there is no magic web scraping tool available that will extract data from each and every website on the web. Every website is different in terms of structure, navigation, coding and h..."},{"Answerrer":"Igor Savinkin","Answer":"I believe that it really depends on what are your needs and what kind of web scraper you’re looking for - what features it should have, for what reason you going to use it, the price and much more.\n\nFro my own experience I would recommend checking Scrapy and/or BeautifulSoup. I found both of these tools easy to work with, user-friendly (so you don’t need to have an advanced level of knowledge on how to program the tool), fast and the best part is that both of the tools can work with complexed websites without any problems the same way like with the easy-ones.\n\nThey also work perfectly fine with..."},{"Answerrer":"Brandon Dotson","Answer":"As a university student, I am working with a huge amount of data every day. Occasionally, I found such statistics of Seagate: “The global datasphere will grow from 33 zettabytes in 2018 to 175 by 2025”. That makes me scary and motivational at the same time, so I thought about using web scraping services.\n\nWeb scraping service provides customers with actual, accurate, structured data. Companies who offer web scraping services would help you to find data sources with the most up-to-date, clear information, rank them for quality and as a result receive data in various popular formats like CSV, X..."},{"Answerrer":"Niko Nikolai","Answer":"Parsers is a web scraper does not require programming skills. Parsers is an extension in Chrome web store. With web scraper you can download more than 1000 pages for free! You only need to install the extension, go to the desired site (be sure to go to the product card page, not the catalog!) select the necessary data and run the web scraper."},{"Answerrer":"Mikhail Sisin","Answer":"“Best” depends on many things.\n\nThere are so many companies offering web scraping and every one claims to be the best, so no one actually knows here who is the best service, Its better to discuss your project with few providers before hiring some one.\n\nYou can search on google to get some companies who offer web scraping, contact some of them, give them details of what you need and ask for price quote. Web scraping is usually expensive because it requires custom software development.\n\nDisclosure: I work at BotSol | Ultimate Web Scraping Service"},{"Answerrer":"Camila Guimaraes","Answer":"X-Byte Enterprise Crawling is the Best Data Scraping Services provider that scrapes or extracts data from different websites. They provide professional web scraping services by extracting various kinds of data from websites quickly and effectively. There are many procedures of web scrapping which work automatically through data scraping. All the methods involve HTTP (Hyper Text Transfer Protocol) or entrenching the browser from which the users search the internet.\n\nThe Benefit of Web Scraping Services\n\nThe key benefits of hiring X-Byte Enterprise Crawling for Web Scraping Services are as under:\n\n..."},{"Answerrer":"Tuan Nguyen","Answer":"Hello there,\n\nAs per my experience, I would personally state that Botscraper is the best web scraping service provider. They are highly talented and the most professional web scraping experts. Their quality of work is also impeccable. Noteworthy to mention, the scrapped data that Botscraper delivered me was accurate and relevant to my industry to the utmost level.\n\nHaving said that, I would also like to inform you that hiring web scraping services is much more cost-efficient and fast as compared to hiring a researcher or using any automated tools. Some business owners prefer to use automated to..."},{"Answerrer":"Eugene K","Answer":"PromptCloud’s web scraping service geared towards large-scale and recurring (daily/weekly/monthly) requirements. We have in business for more than 10 years and have become adept at web scraping techniques with a talented team of developers along with robust infrastructure.\n\nHere are the solutions:\n\nSite-specific web data extraction\nDataStock (instant web dataset downloads)\nLive crawling\nJobsPikr (job data feeds from thousands of sites updated on daily basis)"},{"Answerrer":"Stepan Aslanyan","Answer":"From my experience, I would recommend you to take a look at Scrapy and BeautifulSoup. Both are pretty easy to understand and work with, has great features and I found them very useful while scraping data from various websites.\nYou can also check Octopares or Scrapinghub that offers lots of features but sometimes might be a bit difficult, especially for those who doesn’t have much knowledge about web scraping and programming.\n\nAnd don’t forget about proxies and how useful they can be while working with web scraping tools. Even if it isn’t a necessity but using proxy services while web scraping..."},{"Answerrer":"Nikolai Kekish","Answer":"Disclosure: I work at PromptCloud — a Data as a Service provider.\n\nIf you are looking for enterprise-grade data extraction service, then it is better to go with a managed service provider like PromptCloud. We have been in the web data extraction business for close to 8 years now and have developed domain-specific knowledge to acquire clean data from various complex sites.\n\nReach out to us in case of recurring and large-scale web data requirement."}]

The author is the founder of Proxies API, a proxy rotation API service.



Share this article:

Get our articles in your inbox

Dont miss our best tips/tricks/tutorials about Web Scraping
Only great content, we don’t share your email with third parties.
Icon