Stories from the Web Crawling trenches in regular expressions

Extracting URLs from Text in Python

Author: Mohan Ganesan

Date: Feb 20, 2024

When working with text data in Python, you can use regular expressions and the urllib module to detect and validate URLs. This article provides examples and tips for effectively detecting links in text.

What is the alternative to BeautifulSoup in Python?

Author: Mohan Ganesan

Date: Feb 5, 2024

BeautifulSoup is a popular Python library for parsing HTML, but there are alternatives like XML parsing, html.parser, and regular expressions.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!