The Ultimate DOMDocument Cheat Sheet for PHP

DOMDocument allows manipulating HTML/XML documents in PHP. This cheat sheet aims to be the most comprehensive reference possible for working with DOMDocument.

Capabilities Covered

Loading Documents

Selecting Nodes

Looping Elements

Creating Elements

Inserting Elements

Removing Elements

Modifying Elements

Namespaces

DOM Events

Cloning Nodes

Outputting HTML

Validation

Optimization

Real World Use Cases

Loading Documents

Initialize DOMDocument and load markup:

From string:

$dom = new DOMDocument;
$dom->loadHTML('<html><body/></html>');

From file:

$dom->loadHTMLFile('page.html');

From URL:

$dom->load('<http://example.com>');

Helper function:

function loadHTML(string $html) : DOMDocument {
  $dom = new DOMDocument;
  $dom->loadHTML($html);
  return $dom;
}

Load XML:

$dom->loadXML($xml);

Force UTF-8:

$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

Suppress errors:

libxml_use_internal_errors(true);

Selecting Nodes

Query DOM nodes using CSS selectors or XPath expressions:

CSS Selector:

$selector = new DOMXPath($dom);
$headings = $selector->query('//h2');

XPath:

$paras = $selector->query('//p');

Get element by ID:

$el = $dom->getElementById('header');

By tag name:

$items = $dom->getElementsByTagName('li');

Document node:

$doc = $selector->document;

Document root:

$root = $selector->document->documentElement;

Looping Elements

Iterate through node lists:

foreach($headings as $heading) {
  // ...
}

Indexed loop:

for($i = 0; $i < $items->length; $i++) {
  $item = $items->item($i);
}

While loop:

while($node = $nodelist->item(++$i)) {
  // ...
}

Convert to array:

$itemsArray = iterator_to_array($items);

Creating Elements

Generate new DOM nodes:

Create element:

$para = $dom->createElement('p');

Create text node:

$text = $dom->createTextNode('Hello World');

From HTML:

$frag = $dom->createDocumentFragment();
$frag->appendHTML('<b>Hello</b>');

Helper function:

function createTag(DOMDocument $dom, string $name) : DOMElement {
  return $dom->createElement($name);
}

Inserting Elements

Insert nodes into the document:

Append child:

$el->appendChild($new);

Prepend child:

$el->insertBefore($new, $el->firstChild);

Insert after:

$el->parentNode->insertBefore($new, $el->nextSibling);

Insert before:

$el->parentNode->insertBefore($new, $el);

Append HTML:

$el->appendHTML('<span>Text</span>');

Insert adjacent HTML:

$el->insertAdjacentHTML('afterend', '<span>Text</span>');

Removing Elements

Detach nodes from the document:

Remove child:

$el->removeChild($child);

Remove node:

$el->parentNode->removeChild($el);

Replace node:

$el->parentNode->replaceChild($new, $el);

Clear children:

$el->innerHTML = '';

Modifying Elements

Edit nodes and their content:

Get attribute:

echo $el->getAttribute('class');

Set attribute:

$el->setAttribute('class', 'bold');

Set custom attribute:

$el->setAttributeNS('<http://ns.example.com>', 'attr', 'value');

Remove attribute:

$el->removeAttribute('class');

Get text value:

echo $el->textContent;

Set text value:

$el->textContent = 'New text';

Get HTML:

echo $el->innerHTML;

Set HTML:

$el->innerHTML = 'New <strong>HTML</strong>';

Get outer HTML:

echo $el->C14N(); // canonical XML

Namespaces

Work with XML namespaces:

$dom->registerNodeNamespace('ns', '<http://example.com/ns>');

Create namespaced node:

$node = $dom->createElementNS('<http://example.com/ns>', 'ns:element');

Get namespaced elements:

$elements = $dom->getElementsByTagNameNS('<http://example.com/ns>', 'element');

DOM Events

Attach event listeners to nodes:

$el->addEventListener('click', function() {
  echo 'Clicked';
});

Create event:

$event = new DOMEvent('click');

Dispatch event:

$el->dispatchEvent($event);

Cloning Nodes

Import node:

$imported = $dom->importNode($el); // Shallow clone

$imported = $dom->importNode($el, true); // Deep clone

Clone node:

$cloned = $el->cloneNode(); // Shallow

$cloned = $el->cloneNode(true); // Deep

Outputting HTML

Render and save DOM documents:

Get full HTML:

$html = $dom->saveHTML();

Get outer HTML:

$html = $element->C14N(); // Canonical XML

Save to file:

$dom->saveHTMLFile('page.html');

Send in response:

// Headers
echo $dom->saveHTML();

Output text:

echo $dom->textContent;

Pretty print XML:

$dom->formatOutput = true;
echo $dom->saveXML();

Validation

Validate against DTD/XSD schema:

$dom->validate(); // Throws on error

libxml_clear_errors();
$dom->schemaValidate('schema.xsd');

if(libxml_get_errors()) {
  // Validation error(s)
}

Disable validation:

libxml_disable_entity_loader(true);

Optimization

Improve performance of DOM scraping:

Cache XPath queries:

$xpath = new DOMXPath($dom);

// Reusable query
$query = '//div/p';

for($i = 0; $i < $loopCount; $i++) {
  $results = $xpath->query($query);
  // ...
}

Avoid stale lists:

while($node = $nodelist->item(++$i)) {
  // Modify child
}

// Not stale
for($i = 0; $i < $nodelist->length; $i++) {
  $node = $nodelist->item($i);
}

Real World Use Cases

Example applications:

Scrape data from HTML

Parse XML feeds

Script bots to fill forms

Modify CMS content

Populate headless CMSes

Build dynamic sites

Create PDF documents

Convert document formats

Crawl websites

Build Chrome extensions

Automate screenshots

Create XML sitemaps

Build report aggregators

Migrate content between systems

Create localized versions

Monitor for changes

Analyze SEO metadata

Process markup-heavy datasets

This covers the full range of capabilities and best practices for DOM manipulation in PHP. With this handy reference, you can traverse, edit, and scrape documents with ease!

The Ultimate DOMDocument Cheat Sheet for PHP

Capabilities Covered

Loading Documents

Selecting Nodes

Looping Elements

Creating Elements

Inserting Elements

Removing Elements

Modifying Elements

Namespaces

DOM Events

Cloning Nodes

Outputting HTML

Validation

Optimization

Real World Use Cases

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

The Ultimate DOMDocument Cheat Sheet for PHP

Capabilities Covered

Loading Documents

Selecting Nodes

Looping Elements

Creating Elements

Inserting Elements

Removing Elements

Modifying Elements

Namespaces

DOM Events

Cloning Nodes

Outputting HTML

Validation

Optimization

Real World Use Cases

The easiest way to do Web Scraping

Don't leave just yet!