The Ultimate HTML::TreeBuilder Cheatsheet in Perl

Oct 31, 2023 ยท 4 min read

HTML::TreeBuilder is a Perl module that parses HTML and XML documents into a tree structure. It allows you to manipulate the document tree easily.


To install HTML::TreeBuilder:

perl -MCPAN -e 'install HTML::TreeBuilder'

Or add it to your Perl project's cpanfile and run cpanm:

requires 'HTML::TreeBuilder';

Basic Usage

use HTML::TreeBuilder;

my $tree = HTML::TreeBuilder->new;


my $root = $tree->root;

This parses the HTML file and stores the document tree in $tree. The root node is available via $tree->root.

Walking the Tree

To access child nodes:

my @children = $root->content_list;

To get specific child by index:

my $child2 = $root->content_list->[1];

Loop through children:

foreach my $child ($root->content_list) {
  # do something with $child

Navigate to parent:

my $parent = $node->parent;

Common Node Methods


Get node's tag name:

my $tag = $node->tag;


Get node's inner text:

my $text = $node->text;


Get attribute value by name:

my $class = $node->attr('class');


Add child node:



Insert child at beginning:



Remove node:



Replace node with new node:


Searching the Tree


Find node recursively:

my $img = $root->look_down(_tag => 'img');


Find all nodes by tag name:

my @divs = $root->find_by_tag_name('div');


Find nodes by attribute value:

my @figs = $root->find_by_attribute({class => 'figure'});

Modifying the Tree


Change node's tag:



Set node's text content:

$node->set_text("New text");


Set attribute value:

$node->set_attr(class => 'blue');


Add child to end:


Outputting HTML


Serialize tree back to HTML:

print $tree->as_HTML;


Output text content only:

print $tree->as_text;

Full Example

Here is an example script that loads HTML, finds all tags, and sets their width to 100:

use strict;
use warnings;

use HTML::TreeBuilder;

my $tree = HTML::TreeBuilder->new;

my @imgs = $tree->find_by_tag_name('img');

foreach my $img (@imgs) {
  $img->set_attr(width => 100);

print $tree->as_HTML;

Complex Tree Manipulation

More complex traversal and modification of the tree:

# Recursively find all <td> elements
my @cells = $root->look_down(sub {
  my $node = shift;
  return $node->tag eq 'td';

# Prune a subtree
my $pruned = $tree->prune($root->content_list->[2]);

# Swap two node positions
my $tmp = $n1->right;

Custom Parsers and Handlers

Integrating custom HTML parsers:

# Use HTML::Parser for parsing malformed markup
my $parser = HTML::Parser->new(api_version => 3);
my $handler = HTML::TreeBuilder->new;

$parser->parse( $html, $handler);
my $tree = $handler->tree;

Performance and Memory Optimization

Avoid retaining entire tree in memory:

# Discard subtree after extracting info
my $info = $tree->look_down(sub {
  my $node = shift;
  return $node->text if $node->tag eq 'div';


Real-World Use Cases

Scraping content from HTML:

# Extract article content
my $article;
foreach my $child (@{$root->content_list}) {
  if ($child->tag eq 'article') {
    $article = $child;

my $text = $article->as_text;

Using HTML::TreeBuilder for templating:

# Template system

my $template = HTML::TreeBuilder->new;

# ... logic to fill template ...

$template->find_by_attribute(id => 'main')

print $template->as_HTML;

Tips and Tricks

  • Check if a node has children:
  • if ($node->content_list) {
      # has children
  • Remove all children:
  • $node->delete_content;
  • Get first/last child:
  • my $first = $node->first_child;
    my $last = $node->last_child;

    Comparison with Mojo::DOM

    Maintains parent/child relationshipsNo persistent structure
    Modifying original treeParsed copy, original unchanged
    Heavier memory usageLower memory footprint
    Straightforward DOM interfaceCSS selector-based methods

    Error Handling

    # Wrap in eval block
    eval {
    if ($@) {
      die "Parse error: $@";

    Browse by tags:

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you

    Try ProxiesAPI for free

    curl ""

    <!doctype html>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />


    Don't leave just yet!

    Enter your email below to claim your free API key: