The Ultimate DOMDocument Cheat Sheet for PHP

Oct 31, 2023 ยท 5 min read

DOMDocument allows manipulating HTML/XML documents in PHP. This cheat sheet aims to be the most comprehensive reference possible for working with DOMDocument.

Capabilities Covered

  • Loading Documents
  • Selecting Nodes
  • Looping Elements
  • Creating Elements
  • Inserting Elements
  • Removing Elements
  • Modifying Elements
  • Namespaces
  • DOM Events
  • Cloning Nodes
  • Outputting HTML
  • Validation
  • Optimization
  • Real World Use Cases
  • Loading Documents

    Initialize DOMDocument and load markup:

    From string:

    $dom = new DOMDocument;
    $dom->loadHTML('<html><body/></html>');
    

    From file:

    $dom->loadHTMLFile('page.html');
    

    From URL:

    $dom->load('<http://example.com>');
    

    Helper function:

    function loadHTML(string $html) : DOMDocument {
      $dom = new DOMDocument;
      $dom->loadHTML($html);
      return $dom;
    }
    

    Load XML:

    $dom->loadXML($xml);
    

    Force UTF-8:

    $dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    

    Suppress errors:

    libxml_use_internal_errors(true);
    

    Selecting Nodes

    Query DOM nodes using CSS selectors or XPath expressions:

    CSS Selector:

    $selector = new DOMXPath($dom);
    $headings = $selector->query('//h2');
    

    XPath:

    $paras = $selector->query('//p');
    

    Get element by ID:

    $el = $dom->getElementById('header');
    

    By tag name:

    $items = $dom->getElementsByTagName('li');
    

    Document node:

    $doc = $selector->document;
    

    Document root:

    $root = $selector->document->documentElement;
    

    Looping Elements

    Iterate through node lists:

    foreach($headings as $heading) {
      // ...
    }
    

    Indexed loop:

    for($i = 0; $i < $items->length; $i++) {
      $item = $items->item($i);
    }
    

    While loop:

    while($node = $nodelist->item(++$i)) {
      // ...
    }
    

    Convert to array:

    $itemsArray = iterator_to_array($items);
    

    Creating Elements

    Generate new DOM nodes:

    Create element:

    $para = $dom->createElement('p');
    

    Create text node:

    $text = $dom->createTextNode('Hello World');
    

    From HTML:

    $frag = $dom->createDocumentFragment();
    $frag->appendHTML('<b>Hello</b>');
    

    Helper function:

    function createTag(DOMDocument $dom, string $name) : DOMElement {
      return $dom->createElement($name);
    }
    

    Inserting Elements

    Insert nodes into the document:

    Append child:

    $el->appendChild($new);
    

    Prepend child:

    $el->insertBefore($new, $el->firstChild);
    

    Insert after:

    $el->parentNode->insertBefore($new, $el->nextSibling);
    

    Insert before:

    $el->parentNode->insertBefore($new, $el);
    

    Append HTML:

    $el->appendHTML('<span>Text</span>');
    

    Insert adjacent HTML:

    $el->insertAdjacentHTML('afterend', '<span>Text</span>');
    

    Removing Elements

    Detach nodes from the document:

    Remove child:

    $el->removeChild($child);
    

    Remove node:

    $el->parentNode->removeChild($el);
    

    Replace node:

    $el->parentNode->replaceChild($new, $el);
    

    Clear children:

    $el->innerHTML = '';
    

    Modifying Elements

    Edit nodes and their content:

    Get attribute:

    echo $el->getAttribute('class');
    

    Set attribute:

    $el->setAttribute('class', 'bold');
    

    Set custom attribute:

    $el->setAttributeNS('<http://ns.example.com>', 'attr', 'value');
    

    Remove attribute:

    $el->removeAttribute('class');
    

    Get text value:

    echo $el->textContent;
    

    Set text value:

    $el->textContent = 'New text';
    

    Get HTML:

    echo $el->innerHTML;
    

    Set HTML:

    $el->innerHTML = 'New <strong>HTML</strong>';
    

    Get outer HTML:

    echo $el->C14N(); // canonical XML
    

    Namespaces

    Work with XML namespaces:

    Register namespace:

    $dom->registerNodeNamespace('ns', '<http://example.com/ns>');
    

    Create namespaced node:

    $node = $dom->createElementNS('<http://example.com/ns>', 'ns:element');
    

    Get namespaced elements:

    $elements = $dom->getElementsByTagNameNS('<http://example.com/ns>', 'element');
    

    DOM Events

    Attach event listeners to nodes:

    $el->addEventListener('click', function() {
      echo 'Clicked';
    });
    

    Create event:

    $event = new DOMEvent('click');
    

    Dispatch event:

    $el->dispatchEvent($event);
    

    Cloning Nodes

    Import node:

    $imported = $dom->importNode($el); // Shallow clone
    
    $imported = $dom->importNode($el, true); // Deep clone
    

    Clone node:

    $cloned = $el->cloneNode(); // Shallow
    
    $cloned = $el->cloneNode(true); // Deep
    

    Outputting HTML

    Render and save DOM documents:

    Get full HTML:

    $html = $dom->saveHTML();
    

    Get outer HTML:

    $html = $element->C14N(); // Canonical XML
    

    Save to file:

    $dom->saveHTMLFile('page.html');
    

    Send in response:

    // Headers
    echo $dom->saveHTML();
    

    Output text:

    echo $dom->textContent;
    

    Pretty print XML:

    $dom->formatOutput = true;
    echo $dom->saveXML();
    

    Validation

    Validate against DTD/XSD schema:

    $dom->validate(); // Throws on error
    
    libxml_clear_errors();
    $dom->schemaValidate('schema.xsd');
    
    if(libxml_get_errors()) {
      // Validation error(s)
    }
    

    Disable validation:

    libxml_disable_entity_loader(true);
    

    Optimization

    Improve performance of DOM scraping:

    Cache XPath queries:

    $xpath = new DOMXPath($dom);
    
    // Reusable query
    $query = '//div/p';
    
    for($i = 0; $i < $loopCount; $i++) {
      $results = $xpath->query($query);
      // ...
    }
    

    Avoid stale lists:

    while($node = $nodelist->item(++$i)) {
      // Modify child
    }
    
    // Not stale
    for($i = 0; $i < $nodelist->length; $i++) {
      $node = $nodelist->item($i);
    }
    

    Real World Use Cases

    Example applications:

  • Scrape data from HTML
  • Parse XML feeds
  • Script bots to fill forms
  • Modify CMS content
  • Populate headless CMSes
  • Build dynamic sites
  • Create PDF documents
  • Convert document formats
  • Crawl websites
  • Build Chrome extensions
  • Automate screenshots
  • Create XML sitemaps
  • Build report aggregators
  • Migrate content between systems
  • Create localized versions
  • Monitor for changes
  • Analyze SEO metadata
  • Process markup-heavy datasets
  • This covers the full range of capabilities and best practices for DOM manipulation in PHP. With this handy reference, you can traverse, edit, and scrape documents with ease!

    Browse by tags:

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!