Web scraping is a technique for automatically extracting information from websites. In this comprehensive tutorial, we'll walk through an example Java program that scrapes search results data from Google Scholar.
This is the Google Scholar result page we are talking about…

Specifcally, we'll learn how to use the popular Jsoup Java library to connect to Google Scholar, send search queries, and scrape key bits of data - title, URL, authors, and abstract text - from the search results pages.
Prerequisites
To follow along with the code examples below, you'll need:
That's it! Jsoup handles most of the heavy lifting, so we can focus on the fun data extraction parts.
Walkthrough of the Web Scraper Code
Let's break it down section by section.
Imports
We import Jsoup classes that allow connecting to web pages and selecting elements:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
Define URL and User-Agent
Next we define the Google Scholar URL we want to scrape along with a common User-Agent header:
// Define the URL of the Google Scholar search page
String url = "<https://scholar.google.com/scholar?hl=en&as\\_sdt=0%2C5&q=transformers&btnG=>";
// Define a User-Agent header
String userAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36";
Quick web scraping tip - impersonating a real browser's User-Agent helps avoid bot detection.
Connect to URL and Select Elements
Inspecting the code
You can see that the items are enclosed in a

The magic happens in this section where we:
- Use Jsoup to connect to the Google Scholar URL
- Select all search result elements on the page with
select("div.gs_ri")
// Send a GET request to the URL with the User-Agent header
Document document = Jsoup.connect(url).userAgent(userAgent).get();
// Find all the search result blocks with class "gs_ri"
Elements searchResults = document.select("div.gs_ri");
Let's break this down...
The Jsoup
This HTML is stored in a
The
div.gs_ri
All matching elements get stored in an
Pro tip: Install browser developer tools to inspect elements and test selectors.
Extract Data from Search Results
With search result elements selected, we can traverse each one and extract the inner text and attributes:
// Loop through each search result block and extract information
for (Element result : searchResults) {
// Extract the title and URL
Element titleElement = result.selectFirst("h3.gs_rt");
String title = titleElement != null ? titleElement.text() : "N/A";
String resultUrl = titleElement != null ? titleElement.selectFirst("a").attr("href") : "N/A";
// Extract the authors and publication details
Element authorsElement = result.selectFirst("div.gs_a");
String authors = authorsElement != null ? authorsElement.text() : "N/A";
// Extract the abstract or description
Element abstractElement = result.selectFirst("div.gs_rs");
String abstractText = abstractElement != null ? abstractElement.text() : "N/A";
// Print the extracted information
System.out.println("Title: " + title);
System.out.println("URL: " + resultUrl);
System.out.println("Authors: " + authors);
System.out.println("Abstract: " + abstractText);
System.out.println("-".repeat(50)); // Separating search results
}
We loop through each previously selected
tag with class gs_rt , get .text()
The scraped pieces of data are printed, with each search result separated by dashes.
Let's quickly summarize the key concepts:
This core scraper recipe can be adapted to pull data from almost any site.
Full Java Code for Scraping Google Scholar
Here is the complete code example for scraping search results data from Google Scholar:
Hundreds of our customers have successfully solved the headache of IP blocks with a simple API.
The whole thing can be accessed by a simple API like below in any programming language.
We have a running offer of 1000 API calls completely free. Register and get your free API Key.
Browse by language:
Popular articles:
- Web Scraping in Python - The Complete Guide
- Working with Query Parameters in Python Requests
- How to Authenticate with Bearer Tokens in Python Requests
- Building a Simple Proxy Rotator with Kotlin and Jsoup
- The Complete BeautifulSoup Cheatsheet with Examples
- The Complete Playwright Cheatsheet
- Web Scraping using ChatGPT - Complete Guide with Examples