Stories from the Web Crawling trenches in XPath

The Complete Libxml2 C++ Cheatsheet

Author: Mohan Ganesan

Date: Oct 31, 2023

Libxml2 is a XML processing library written in C for use in C/C++ applications. It provides DOM, SAX, XMLReader, XPath and XPointer support.

A Guide to Using XPath with BeautifulSoup for Powerful Web Scraping

Author: Mohan Ganesan

Date: Oct 6, 2023

XPath is a powerful querying language for selecting elements in XML and HTML documents, making web scraping with BeautifulSoup more robust and flexible.

Web Scraping in CSharp - The Ultimate Guide

Author: Mohan Ganesan

Date: Mar 24, 2024

Learn web scraping with C# using powerful libraries like HtmlAgilityPack and AngleSharp. Understand the importance of XPath and CSS selectors for extracting data from HTML. Overcome challenges like dynamic content and anti-scraping measures. Rotate user agents and headers to mimic human behavior and avoid detection.

The Complete HTML Agility Pack Cheat Sheet in VB

Author: Mohan Ganesan

Date: Oct 31, 2023

HTML Agility Pack is an HTML parser for .NET that allows easy manipulation and data extraction from HTML documents.

Web Scraping Wikipedia with CSharp

Author: Mohan Ganesan

Date: Dec 6, 2023

Learn how to scrape data from Wikipedia using C# and the HtmlAgilityPack library. Extract information from websites for data collection, analysis, and automation.

Downloading Images from a Website with CSharp and HtmlAgilityPack

Author: Mohan Ganesan

Date: Oct 15, 2023

Learn how to use C# and HtmlAgilityPack to download images from a Wikipedia page and extract data from HTML tables.

Scraping New York Times News Headlines in CSharp

Author: Mohan Ganesan

Date: Dec 6, 2023

Automate data extraction from websites using C# and HTML Agility Pack for web scraping. Use HTTP client for making requests and XPath for parsing HTML elements.

Can BeautifulSoup use XPath?

Author: Mohan Ganesan

Date: Feb 5, 2024

BeautifulSoup and XPath can complement each other to create powerful web scrapers, but be mindful of the performance tradeoff.

Scraping Real Estate Listings From Realtor with C++

Author: Mohan Ganesan

Date: Jan 9, 2024

Web scraping tutorial in C++ using libcurl and libxml2 to extract data from Realtor.com listings.

Scraping Real Estate Listings From Realtor with PHP

Author: Mohan Ganesan

Date: Jan 9, 2024

Learn how to scrape real estate listings from Realtor.com using PHP and cURL. Extract data using DOMDocument and XPath.

Scraping Yelp Business Listings with PHP

Author: Mohan Ganesan

Date: Dec 6, 2023

Web scraping guide for extracting data from Yelp business listings using PHP and XPath.

Scraping Real Estate Listings from Realtor with CSharp

Author: Mohan Ganesan

Date: Jan 9, 2024

Scrape real estate listing data from Realtor.com using C# and HtmlAgilityPack library. Extract information like broker name, price, beds, baths, sqft, lot size, and address.

Scraping Wikipedia Tables in Objective-C for Beginners

Author: Mohan Ganesan

Date: Dec 6, 2023

Gathering data by scraping websites is made easy with just 34 lines of code in Objective-C using TFHpple library. Learn how to make HTTP requests, parse HTML content, extract data from a table, and clean and process the scraped content.

Scraping Business Listings from Yelp with Objective C

Author: Mohan Ganesan

Date: Dec 6, 2023

Scraping business listings from Yelp using Objective-C and proxies for data extraction.

Mastering XPath Locators for Reliable Selenium Tests

Author: Mohan Ganesan

Date: Jan 9, 2024

Locators in test automation allow for the identification of elements on a web page. XPath locators are robust and flexible, making them ideal for scalable test automation. By mastering XPath syntax and operators, test engineers can construct dynamic locators to handle complex scenarios. Integrating XPath locators into Selenium scripts requires understanding the difference between finding a single element and multiple elements. Best practices include reusing locators through the Page Object Model pattern and handling exceptions carefully. Troubleshooting XPath issues involves verifying locator accuracy, outputting attribute values, and using more resilient variations. Overall, mastering XPath locators is crucial for successful UI test automation using Selenium.

How to Select Elements by Text in XPath

Author: Mohan Ganesan

Date: Jan 9, 2024

XPath is used for navigating XML and HTML documents in web scraping. It can select elements based on text content using contains function or exact match.

Scraping Yelp Business Listings using CSharp

Author: Mohan Ganesan

Date: Dec 6, 2023

Yelp is a popular review site with over 200 million reviews. This article explains how to scrape Yelp using proxies and HTML parsing with XPath.

Do I need to learn HTML for web scraping?

Author: Mohan Ganesan

Date: Feb 20, 2024

HTML knowledge is useful but not necessary for web scraping. Tools like BeautifulSoup and selector gadgets can be used to extract data without deep HTML knowledge.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!