Web Scraping with Visual Basic & ChatGPT

Sep 25, 2023 ยท 4 min read

Visual Basic provides a straightforward way to build web scrapers on Windows. ChatGPT is an AI assistant that can explain concepts and generate VB code for scraping. This article covers web scraping in VB with ChatGPT's help.

Setting Up Visual Basic for Web Scraping

You'll need VB installed along with these libraries:

' Microsoft XML for parsing HTML
Imports System.Xml

' JSON.NET for JSON parsing
Imports Newtonsoft.Json

' CSVHelper for CSV output
Imports CsvHelper

Introduction to Web Scraping in VB

Web scraping involves sending requests to websites and extracting data from the HTML, JSON or XML responses. Useful VB options:

  • HttpWebRequest - Sending HTTP requests
  • HTML Agility Pack - HTML parsing/scraping
  • Regex - Parsing with regular expressions
  • Typical web scraping workflow:

  • Send HTTP request to download a page
  • Parse response and extract relevant data
  • Store scraped data
  • Repeat for other pages
  • Using ChatGPT for Web Scraping Help

    ChatGPT is an AI assistant created by Anthropic to be helpful, harmless, and honest. It can provide explanations and generate VB code snippets for web scraping:

    Getting Explanations

    Ask ChatGPT to explain web scraping concepts or specifics:

  • How to use HTML Agility Pack to extract text from paragraph tags
  • Strategies for scraping content spread across pagination
  • Generating Code Snippets

    Give a description of what you want to scrape and have ChatGPT provide starter VB code:

  • Scrape product listings into a CSV file
  • Parse date strings into DateTime when extracting
  • Validate any code before using.

    Improving Prompts

    Ask ChatGPT to suggest ways to improve your prompt if it doesn't provide helpful responses.

    Asking Follow-up Questions

    Chat with ChatGPT to get explanations for any other questions you have.

    Explaining Errors

    Share any errors and ask ChatGPT to debug and explain the problem.

    Web Scraping Example Using ChatGPT

    Let's walk through scraping a Wikipedia page with ChatGPT's assistance.

    Goal

    Extract the chronology table from: https://en.wikipedia.org/wiki/Chronology_of_the_universe

    Step 1: Download page

    ChatGPT: VB code to download this page:
    <https://en.wikipedia.org/wiki/Chronology_of_the_universe>
    
    ' ChatGPT provides this code
    Dim request As HttpWebRequest = WebRequest.Create("<https://en.wikipedia.org/wiki/Chronology_of_the_universe>")
    Dim response As HttpWebResponse = request.GetResponse()
    Dim reader As StreamReader = New StreamReader(response.GetResponseStream())
    Dim html As String = reader.ReadToEnd()
    

    Step 2: Inspect HTML, table has class wikitable

    Step 3: Extract table data to CSV

    ChatGPT: VB code to extract wikitable table to CSV
    
    ' ChatGPT provides this code
    Dim htmlDoc As HtmlDocument = New HtmlDocument()
    htmlDoc.LoadHtml(html)
    
    Dim table = htmlDoc.DocumentNode.SelectSingleNode("//table[contains(@class, 'wikitable')]")
    
    ' Extract headers
    Dim headers = table.SelectNodes("./thead/tr/th").Select(Function(x) x.InnerText)
    
    ' Extract rows
    Dim rows = table.SelectNodes("./tbody/tr").Select(Function(x) x.SelectNodes("./td").Select(Function(y) y.InnerText))
    
    ' Write rows to CSV file
    

    This shows using ChatGPT to get VB scraping code fast.

    Conclusion

    Key points:

  • VB provides a straightforward way to build Windows scrapers
  • ChatGPT can explain concepts and provide VB code
  • Inspect HTML to understand how to extract data
  • Follow best practices like throttling requests, randomizing user agents
  • Web scraping allows gathering data from websites at scale with VB
  • ChatGPT + VB is great for creating web scrapers.

    However, some limitations:

  • Handling anti-scraping measures like CAPTCHAs
  • Avoiding IP blocks when running locally
  • Rendering complex JavaScript pages
  • A more robust solution is using a web scraping API like Proxies API

    Proxies API provides:

  • Millions of proxy IPs to prevent blocks
  • Automated solving of CAPTCHAs
  • JavaScript rendering with headless browsing
  • Simple API instead of running your own scrapers
  • Easily scrape any site:

    ' Send request to Proxies API endpoint
    Dim request As HttpWebRequest = WebRequest.Create("https://api.proxiesapi.com/?url=example.com&key=XXX")
    Dim response As HttpWebResponse = request.GetResponse()

    Get started now with 1000 free API calls to supercharge your web scraping!

    Browse by tags:

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: