How to SCRAPE DYNAMIC Websites with Selenium

Oct 4, 2023 · 3 min read

Web scraping dynamic websites that load content dynamically can be challenging. However, the Selenium browser automation tool makes it possible!

Overview

Traditional scraping tools like Beautiful Soup are designed for static HTML pages. When websites rely heavily on JavaScript to load content, those tools fail.

Selenium provides a way to automate web browsers programmatically. This allows you to:

  • Load dynamic web pages
  • Interact with elements on the page
  • Extract data after content has rendered
  • In this guide, we'll cover the key steps to scrape dynamic sites with Selenium:

  • Setting up Selenium
  • Locating elements
  • Extracting text from elements
  • Storing scraped data
  • Setting Up Selenium

    To start, install Selenium and import it:

    from selenium import webdriver
    

    Then launch a browser (we'll use Chrome here):

    driver = webdriver.Chrome()
    

    You can now use this driver object to automate the browser.

    Locating Page Elements

    To extract data, first identify the elements that contain what you need.

    Use the browser inspector to view the HTML and locate elements by:

  • ID
  • Class name
  • XPath
  • For example, to get video elements on YouTube:

    from selenium.webdriver.common.by import By
    
    videos = driver.find_elements(By.CLASS_NAME, 'style-scope ytd-rich-item-renderer')
    

    Make sure to import By to use these locators.

    Extracting Data from Elements

    Once you've located an element, extract text or other attributes using:

    title = title_element.text
    

    For nested elements, first find the parent element:

    title_element = video.find_element(By.XPATH, './/h3/a')
    

    Then extract text from the nested element.

    Storing Scraped Data

    To save scraped data, store in lists or dicts and convert to DataFrame:

    data = []
    
    for video in videos:
    
      title = ...
      views = ...
    
      vid = {'title': title, 'views': views}
    
      data.append(vid)
    
    df = pd.DataFrame(data)
    

    You can then export to CSV, JSON, databases, etc.

    Putting It All Together

    Let's walk through a full scraping script:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    import pandas as pd
    
    url = '<https://www.youtube.com/@MicrosoftDeveloper/videos>'
    
    driver = webdriver.Chrome()
    driver.get(url)
    
    videos = driver.find_elements(By.CLASS_NAME, 'style-scope ytd-rich-item-renderer')
    
    data = []
    
    for video in videos:
    
      title_element = video.find_element(By.XPATH, './/h3/a')
      title = title_element.text
    
      views_element = video.find_element(By.XPATH, './/span[contains(@class,"inline-metadata-item")]')
      views = views_element.text
    
      age_elements = video.find_elements(By.XPATH, './/span[contains(@class, "inline-metadata-item")]')
      second_age = age_elements[1]
      age = second_age.text
    
      vid = {'title': title, 'views': views, 'age': age}
    
      data.append(vid)
    
    df = pd.DataFrame(data)
    print(df)
    
    driver.quit()
    

    You should see the results printed like this…

    While Selenium provides a great way to learn web scraping fundamentals, it has some limitations in robustness and scale when it comes to commercial scraping projects.

    For production-level web scraping, consider using a specialized service like ProxiesAPI.com. It handles all the complexities of automation and scraping behind a rotating proxy network.

    With features like:

  • Millions of residential IPs
  • Automatic IP and user-agent rotation
  • Built-in headless browser rendering
  • CAPTCHA solving
  • You can scrape any site with a simple API call, without worrying about blocks.

    New users get 1000 free API calls. Get your API key here to turbo-charge your web scraping project!

    Focus on data extraction, and let Proxies API take care of the scraping infrastructure.

    Browse by tags:

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!