The Ultimate Cheat Sheet for HtmlAgilityPack in CSharp

Oct 31, 2023 · 4 min read

HtmlAgilityPack allows fast and robust manipulation of HTML documents in .NET. This cheat sheet aims to be the most in-depth reference possible for working with HtmlAgilityPack.


PM> Install-Package HtmlAgilityPack

Loading HTML

From string:

var doc = new HtmlDocument();

From file:


From stream:

using(var fs = File.OpenRead("page.html")) {

From web:


Custom options:

doc.OptionFixNestedTags = true;

Helper method:

private static HtmlDocument LoadHtml(string html) {
  var doc = new HtmlDocument();
  return doc;

Selecting Nodes

By CSS selector:

var paras = doc.DocumentNode

By XPath:

var items = doc.DocumentNode

Get single element by ID:

var content = doc.GetElementbyId("content");

Get elements by tag name:

var divs = doc.GetElementsByTagName("div");

Evaluate XPath:

var xpath = "//div/p";
var nodes = doc.DocumentNode.Evaluate(xpath);

Looping Nodes

For each loop:

foreach(var item in items) {
  // ...

For loop:

for(int i = 0; i < items.Count; i++) {
  var item = items[i];

While loop:

int i = 0;
while(node = nodes[i++]) {
  // ...

Modifying Nodes

Get attribute value:

var cls = el.GetAttributeValue("class", null);

Set attribute value:

el.SetAttributeValue("class", "blue");

Get inner text:

var text = el.InnerText;

Set inner text:

el.InnerText = "Hello World";

Get inner HTML:

var html = el.InnerHtml;

Set inner HTML:

el.InnerHtml = "<strong>Hello</strong>";

Creating Nodes

Create element:

var el = doc.CreateElement("p");

Create text node:

var text = doc.CreateTextNode("Hello");

Create document fragment:

var frag = doc.CreateDocumentFragment();

Create from HTML:

var frag = doc.ParseFragment("<b>Hi!</b>");

Inserting Nodes

Append child element:


Insert before element:

parent.InsertBefore(newEl, el);

Insert after element:

parent.InsertAfter(newEl, el);

Prepend child element:


Insert adjacent HTML:

el.InsertAdjacentHtml("beforebegin", "<p>Hello</p>");

Removing Nodes

Remove single element:


Remove all children:


Remove nodes by ID:

   .Where(p => p.Id == "intro")
   .ForEach(p => p.Remove());

Remove all nodes:


Loading Sub-Documents

Parse HTML fragment:

var frag = doc.ParseFragment("<b>Hi!</b>");

Append parsed fragment:


Load partial document:

var newDoc = new HtmlDocument();


Register namespace:

doc.DocumentNode.RegisterNamespace("h", "<>");

Get namespaced nodes:

var nodes = doc.DocumentNode

DOM Traversal

Parent node:

var parent = node.ParentNode;

Child nodes:

var children = parent.ChildNodes;

Next sibling:

var nextSibling = node.NextSibling;

Previous sibling:

var prevSibling = node.PreviousSibling;

Caching XPath Queries

Don't reparse queries:

// Reusable query
private static string ParasXpath = "//p";

var nodes = doc.DocumentNode.SelectNodes(ParasXpath);

// Later...

var moreNodes = doc.DocumentNode.SelectNodes(ParasXpath);


DTD validate:

doc.OptionValidateDTD = true;
doc.LoadHtml(html); // Throws on error

XSD validate:

doc.Validate(schemaStream); // Returns issues


Load as UTF-8:

doc.OptionDefaultStreamEncoding = Encoding.UTF8;

Special characters:

doc.DocumentNode.SelectNodes("//p/text()[contains(., 'en dash –')]");

LINQ Integration

LINQ query:

var paras = from p in doc.DocumentNode.Descendants("p")
            where !p.HasClass("intro")
            select p.InnerText;

Extension methods:

   .Where(p => !p.HasClass("intro"))
   .Select(p => p.InnerText);

Real World Use Cases

  • Web scraping scripts
  • Parsers, converters, transformers
  • Automated testing bots
  • Site scrapers and crawlers
  • Architect headless sites
  • Data extraction from reports
  • PDF generation
  • Web automation scripts
  • Comparing HTML documents
  • Building HTML editors
  • Feed readers
  • Web screenshot tools
  • Archiving sites
  • Analyzing SEO metadata
  • Processing HTML datasets
  • This covers the full range of capabilities and best practices for parsing, traversing, and modifying HTML documents with HtmlAgilityPack in C#!

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you

    Try ProxiesAPI for free

    curl ""

    <!doctype html>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />


    Don't leave just yet!

    Enter your email below to claim your free API key: