Bypassing CAPTCHAs with Puppeteer

Oct 4, 2023 ยท 5 min read

Captchas are a necessary evil on many websites. They help prevent bots from abusing services, but also create headaches for legitimate automation. Thankfully, there are options for bypassing captchas programmatically. This article will cover solving captchas using Puppeteer and headless Chrome.

Overview

Puppeteer provides a Node API for controlling headless Chrome. It allows you to navigate pages, interact with elements, run JavaScript, and more.

To bypass captchas, we'll use Puppeteer to:

  • Navigate to a page with a captcha
  • Extract necessary info to solve the captcha
  • Send that info to a captcha solving service
  • Poll the service for the captcha solution
  • Submit the solution to the page
  • This will allow us to automate pages that are normally protected by captchas.

    Extracting Captcha Info

    The first step is navigating to the page and extracting the info needed to solve the captcha. For example, with Google's reCAPTCHA we need the site key.

    To get the site key, we can view the page source and search for it:

    const siteKey = await page.evaluate(() => {
      return document.querySelector('#recaptcha script[type="text/javascript"]')
        .innerHTML.match(/sitekey: '(.*?)'/)[1];
    });
    

    This grabs the reCAPTCHA script from the page and extracts the site key using a regex.

    Sending Captcha Request

    Next we need to send the site key to a captcha solving service to process the challenge. For this example we'll use 2Captcha.

    We send a request containing the site key, our API key, and other details:

    const formData = {
      method: 'userrecaptcha',
      googlekey: siteKey,
      key: apiKey,
      pageurl: pageUrl,
      json: 1
    }
    
    const response = await request.post('<http://2captcha.com/in.php>', {form: formData});
    const requestId = JSON.parse(response).request;
    

    This initializes the captcha solving request and returns a request ID we can use to poll for the solution.

    Polling for Solution

    Now we need to continually poll the service to check if the captcha is solved:

    const response = await poll({
      taskFn: requestCaptchaResults,
      interval: 1500,
      retries: 30
    });
    
    function requestCaptchaResults(apiKey, requestId) {
      //...make request and return promise
    }
    

    We use the promise-poller library to handle the polling. Once resolved, response will contain the captcha solution.

    Submitting the Solution

    Finally, we need to submit the captcha solution to the page. For reCAPTCHA, this involves entering it into a hidden text field:

    await page.evaluate(solution => {
      document.getElementById('g-recaptcha-response').innerHTML = solution;
    }, response);
    

    And that's it! The captcha is now solved. We can submit the form and continue automation.

    Here is a full code example for solving reCAPTCHAs with Puppeteer using the 2Captcha service:

    const puppeteer = require('puppeteer');
    const request = require('request-promise-native');
    const poll = require('promise-poller').default;
    
    const apiKey = 'YOUR_API_KEY';
    
    const siteDetails = {
      sitekey: 'SITE_KEY',
      pageurl: '<https://www.example.com>'
    };
    
    (async () => {
    
      const browser = await puppeteer.launch();
      const page = await browser.newPage();
    
      await page.goto(siteDetails.pageurl);
    
      const requestId = await initiateCaptchaRequest(apiKey, siteDetails);
    
      const response = await pollForRequestResults(apiKey, requestId);
    
      await page.evaluate(resp => {
        document.getElementById('g-recaptcha-response').innerHTML = resp;
      }, response);
    
      await page.click('#submit');
    
      await browser.close();
    
    })();
    
    async function initiateCaptchaRequest(apiKey, siteDetails) {
    
      const formData = {
        method: 'userrecaptcha',
        googlekey: siteDetails.sitekey,
        key: apiKey,
        pageurl: siteDetails.pageurl,
        json: 1
      };
    
      const response = await request.post('<http://2captcha.com/in.php>', {form: formData});
      return JSON.parse(response).request;
    
    }
    
    async function pollForRequestResults(apiKey, requestId, retries=30, delay=15) {
    
      await timeout(delay * 1000);
    
      return poll({
        taskFn: requestCaptchaResults(apiKey, requestId),
        interval: 1500,
        retries
      });
    
    }
    
    function requestCaptchaResults(apiKey, requestId) {
    
      const url = `http://2captcha.com/res.php?key=${apiKey}&action=get&id=${requestId}&json=1`;
    
      return async function() {
        const resp = await request.get(url);
        const json = JSON.parse(resp);
        if(json.status === 0) throw Error('captcha not ready');
        return json.request;
      }
    
    }
    
    function timeout(ms) {
      return new Promise(resolve => setTimeout(resolve, ms));
    }
    

    This implements the overall flow:

  • Launch Puppeteer
  • Navigate to page
  • Initialize captcha request
  • Poll for solution
  • Submit solution
  • Click submit button
  • The key aspects are extracting the site key, sending it to 2Captcha, polling for the response, and submitting that back to the page.

    You would need to update the API key, site key, and page URL to match your specific use case. But this provides a full working example of solving CAPTCHAs with Puppeteer.

    Conclusion

    Bypassing captchas with Puppeteer provides a programatic way to automate through these protections. It does require using an external solving service, which has associated costs. But overall it's a simple and effective technique for controlling pages that make use of captcha defenses.

    Rather than building and managing your own captcha solving infrastructure, services like Proxies API handle all of this complexity for you.

    With Proxies API, you make a simple API request with the target URL. It will handle:

  • Rotating proxies and IP addresses
  • Rotating user agents
  • Solving captchas
  • Running JavaScript
  • And return the rendered HTML. No need to orchestrate the numerous steps required for reliable captcha solving.

    For example:

    curl "http://api.proxiesapi.com/?key=API_KEY&render=true&url=https://targetpage.com"
    

    This takes care of all the headaches of automation. No proxies, browsers, or captcha solving services to manage.

    Proxies API offers 1000 free API calls to get started. Check it out if you need to integrate robust captcha solving and proxy rotation in your projects.

    Browse by tags:

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!