The Complete HTML Agility Pack Cheat Sheet in VB

HTML Agility Pack is an HTML parser for .NET. It allows easy manipulation and data extraction from HTML documents.

Getting Started

Install NuGet package:

Install-Package HtmlAgilityPack

Load HTML:

Dim html As String = "<html>...</html>"
Dim doc As HtmlDocument = New HtmlDocument()
doc.LoadHtml(html)

Select nodes:

Dim nodes As HtmlNodeCollection = doc.DocumentNode.SelectNodes("//div")

Get text:

Dim text As String = doc.DocumentNode.InnerText

Selecting Nodes

By CSS selector:

doc.DocumentNode.SelectNodes(".header")

By XPath:

doc.DocumentNode.SelectNodes("//table")

By tag name:

doc.DocumentNode.SelectNodes("img")

By id:

doc.GetElementbyId("header")

Virtual collections:

Dim virtualCol = doc.CreateVCollection(XPath)

Querying & Extracting

Get attribute:

Dim href As String = node.GetAttributeValue("href", "")

Get text:

Dim text As String = node.InnerText

Get HTML:

Dim html As String = node.OuterHtml

Find ancestors:

Dim parent As HtmlNode = node.ParentNode

Evaluate XPath:

doc.DocumentNode.Evaluate("//a")

Manipulation

Add node:

doc.DocumentNode.AppendChild(HtmlNode.CreateNode("<p>Hello</p>"))

Update text:

node.InnerText = "New text"

Update HTML:

node.OuterHtml = "<div>New HTML</div>"

Remove node:

node.Remove()

Add class:

node.SetAttributeValue("class", "blue")

Parsing HTML

From string:

doc.LoadHtml(htmlString)

From URL:

doc.Load(url)

From file:

doc.Load(filename)

Auto detect encoding:

doc.OptionAutoCloseOnEnd = true

Tips

Select nodes with XPath or CSS

Use InnerText for text

OuterHtml for full HTML

AppendChild to add nodes

Enable OptionAutoCloseOnEnd

Example

Dim html = <html>
              <body>
                <h1>Title</h1>
                <p>Hello World!</p>
              </body>
            </html>

Dim doc As HtmlDocument = New HtmlDocument()
doc.LoadHtml(html)

Dim title As String = doc.DocumentNode.SelectSingleNode("//h1").InnerText
' Title

Dim text As String = doc.DocumentNode.SelectSingleNode("//p").InnerText
' Hello World!

Advanced Querying

XPath Axes

ancestor:: - selects all ancestors (parent, grandparent, etc)

descendant:: - selects all descendants (children, grandchildren, etc)

following-sibling:: - selects all siblings after the current node

Query by Node Type

doc.DocumentNode.SelectNodes("//*[self::p or self::div]")

Predicates

//div[@class='header']

Advanced Manipulation

Insert Nodes

doc.DocumentNode.InsertBefore(newNode, refNode);
doc.DocumentNode.InsertAfter(newNode, refNode);

Clone Nodes

var clone = node.Clone();

Move & Remove Nodes

node.Remove();
doc.DocumentNode.InsertBefore(node, refNode);

Handling Documents

Loading

doc.Load(url);
doc.LoadHtml(htmlString);
doc.Load(stream);
doc.Load(textReader);

Saving

doc.Save(filename);

Options

doc.OptionOutputAsXml = true;

Working with Fragments

doc.LoadHtml(htmlFragment);
doc.CreateElement("div");

Best Practices

Reuse HtmlDocument instances if possible for better performance

Dispose HtmlDocument when no longer needed

Avoid excessive XPath queries - cache result sets

Use for web scraping to avoid overhead of full browser load

Additional Tips

doc.OptionFixNestedTags = true;
doc.DetectEncoding(stream);
// Integrate AngleSharp
// Support .NET Framework + Core

The Complete HTML Agility Pack Cheat Sheet in VB

Getting Started

Selecting Nodes

Querying & Extracting

Manipulation

Parsing HTML

Tips

Example

Advanced Querying

XPath Axes

Query by Node Type

Predicates

Advanced Manipulation

Insert Nodes

Clone Nodes

Move & Remove Nodes

Handling Documents

Loading

Saving

Options

Working with Fragments

Best Practices

Additional Tips

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

The Complete HTML Agility Pack Cheat Sheet in VB

Getting Started

Selecting Nodes

Querying & Extracting

Manipulation

Parsing HTML

Tips

Example

Advanced Querying

XPath Axes

Query by Node Type

Predicates

Advanced Manipulation

Insert Nodes

Clone Nodes

Move & Remove Nodes

Handling Documents

Loading

Saving

Options

Working with Fragments

Best Practices

Additional Tips

The easiest way to do Web Scraping

Don't leave just yet!