Stripping HTML Tags from Text with BeautifulSoup

Oct 6, 2023 ยท 2 min read

When scraping web pages, you'll often want to extract just the text content without all the surrounding HTML tags. Here's how to use BeautifulSoup to cleanly strip out tags and isolate the text.

The get_text() Method

The simplest way is using the get_text() method on either a BeautifulSoup object or an individual tag element:

from bs4 import BeautifulSoup

html = "<p>Example text <b>with</b> <i>some</i> tags</p>"
soup = BeautifulSoup(html, 'html.parser')

# "Example text with some tags"

This strips out all tags and returns just the text.

Stripping Tags from Strings

You can also call get_text() on NavigableStrings directly:

text = soup.p.string

Use this when dealing with a single text element.

Removing Whitespace

To also strip excess whitespace and newline characters:

# "Example text with some tags"

The strip parameter removes whitespace.

Extracting HTML Attributes

To extract specific HTML attributes from tags:

for link in soup.find_all('a'):
  print(link.get('href')) # Prints attribute value

This loops through tags and prints the href.

In summary, get_text() is the primary tool for clearly extracting just text content from HTML with BeautifulSoup. Pair it with attribute extraction to pull text and attributes from tags.

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you

Try ProxiesAPI for free

curl ""

<!doctype html>
    <title>Example Domain</title>
    <meta charset="utf-8" />
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />


Don't leave just yet!

Enter your email below to claim your free API key: