In this beginner web scraping tutorial, we'll walk through code that scrapes search results data from Google Scholar.
This is the Google Scholar result page we are talking about…

Overview
We'll be using Node.js for web scraping, with the following key packages:
First we require these packages:
const rp = require('request-promise');
const cheerio = require('cheerio');
Then we set up the initial scraper configuration:
// Define the URL of the Google Scholar search page
const url = "<https://scholar.google.com/scholar?hl=en&as\\_sdt=0%2C5&q=transformers&btnG=>";
// Define a User-Agent header
const headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
};
// Configure the request options
const options = {
uri: url,
headers: headers,
transform: function (body) {
return cheerio.load(body);
}
};
This sets up the Google Scholar URL we want to scrape, adds a browser User-Agent string, and configures the request to use cheerio for HTML parsing.
Making the Request
With the configuration complete, we can now make the GET request:
// Send a GET request to the URL with the User-Agent header
rp(options)
.then(($) => {
// ... extract data here
})
.catch((error) => {
console.error("Failed to retrieve the page:", error);
});
We pass the options to
The
Extracting Search Result Data
Inspecting the code
You can see that the items are enclosed in a

Inside the
// Find all the search result blocks with class "gs_ri"
const search_results = $(".gs_ri");
// Loop through each search result block
search_results.each((index, element) => {
// Extract data from each result...
});
We grab all the
Then we iterate through them with
Title and URL
Let's get the title and URL of the search result:
// Extract the title and URL
const title_elem = $(element).find(".gs_rt");
const title = title_elem.text() || "N/A";
const url = title_elem.find("a").attr("href") || "N/A";
We use
The linked URL is extracted directly from the anchor tag's
Authors
Next up is author data:
// Extract the authors
const authors_elem = $(element).find(".gs_a");
const authors = authors_elem.text() || "N/A";
Simply grab the inner text of the
Abstract
Finally, we extract the abstract text:
// Extract the abstract or description
const abstract_elem = $(element).find(".gs_rs");
const abstract = abstract_elem.text() || "N/A";
The
Printing the Results
To finish, we log out all the information extracted from each search result:
console.log("Title:", title);
console.log("URL:", url);
console.log("Authors:", authors);
console.log("Abstract:", abstract);
console.log("-".repeat(50)); // Separating search results
This prints the title, URL, authors and abstract for inspection. The separating line keeps each result organized in the terminal output.
And that covers scraping key data fields from Google Scholar search results! The full code is included below to run as a complete scraper.
Running the Scraper
To run the web scraper code, you need:
Here is the complete Google Scholar scraping script:
const rp = require('request-promise');
const cheerio = require('cheerio');
// Define the URL of the Google Scholar search page
const url = "https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=transformers&btnG=";
// Define a User-Agent header
const headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36" // Replace with your User-Agent string
};
// Configure the request options
const options = {
uri: url,
headers: headers,
transform: function (body) {
return cheerio.load(body);
}
};
// Send a GET request to the URL with the User-Agent header
rp(options)
.then(($) => {
// Find all the search result blocks with class "gs_ri"
const search_results = $(".gs_ri");
// Loop through each search result block and extract information
search_results.each((index, element) => {
// Extract the title and URL
const title_elem = $(element).find(".gs_rt");
const title = title_elem.text() || "N/A";
const url = title_elem.find("a").attr("href") || "N/A";
// Extract the authors and publication details
const authors_elem = $(element).find(".gs_a");
const authors = authors_elem.text() || "N/A";
// Extract the abstract or description
const abstract_elem = $(element).find(".gs_rs");
const abstract = abstract_elem.text() || "N/A";
// Print the extracted information
console.log("Title:", title);
console.log("URL:", url);
console.log("Authors:", authors);
console.log("Abstract:", abstract);
console.log("-".repeat(50)); // Separating search results
});
})
.catch((error) => {
console.error("Failed to retrieve the page:", error);
});
The output will show extracted data from search results for the query "transformers". Feel free to customize the search URL for other queries.
This is great as a learning exercise but it is easy to see that even the proxy server itself is prone to get blocked as it uses a single IP. In this scenario where you may want a proxy that handles thousands of fetches every day using a professional rotating proxy service to rotate IPs is almost a must.
Otherwise, you tend to get IP blocked a lot by automatic location, usage, and bot detection algorithms.
Our rotating proxy server Proxies API provides a simple API that can solve all IP Blocking problems instantly.
Hundreds of our customers have successfully solved the headache of IP blocks with a simple API.
The whole thing can be accessed by a simple API like below in any programming language.
In fact, you don't even have to take the pain of loading Puppeteer as we render Javascript behind the scenes and you can just get the data and parse it any language like Node, Puppeteer or PHP or using any framework like Scrapy or Nutch. In all these cases you can just call the URL with render support like so:
curl "<http://api.proxiesapi.com/?key=API_KEY&render=true&url=https://example.com>"
We have a running offer of 1000 API calls completely free. Register and get your free API Key.
Browse by language:
Popular articles:
- Web Scraping in Python - The Complete Guide
- Working with Query Parameters in Python Requests
- How to Authenticate with Bearer Tokens in Python Requests
- Building a Simple Proxy Rotator with Kotlin and Jsoup
- The Complete BeautifulSoup Cheatsheet with Examples
- The Complete Playwright Cheatsheet
- Web Scraping using ChatGPT - Complete Guide with Examples