Bypassing CAPTCHAs with Puppeteer

Captchas are a necessary evil on many websites. They help prevent bots from abusing services, but also create headaches for legitimate automation. Thankfully, there are options for bypassing captchas programmatically. This article will cover solving captchas using Puppeteer and headless Chrome.

Overview

Puppeteer provides a Node API for controlling headless Chrome. It allows you to navigate pages, interact with elements, run JavaScript, and more.

To bypass captchas, we'll use Puppeteer to:

Navigate to a page with a captcha

Extract necessary info to solve the captcha

Send that info to a captcha solving service

Poll the service for the captcha solution

Submit the solution to the page

This will allow us to automate pages that are normally protected by captchas.

Extracting Captcha Info

The first step is navigating to the page and extracting the info needed to solve the captcha. For example, with Google's reCAPTCHA we need the site key.

To get the site key, we can view the page source and search for it:

const siteKey = await page.evaluate(() => {
  return document.querySelector('#recaptcha script[type="text/javascript"]')
    .innerHTML.match(/sitekey: '(.*?)'/)[1];
});

This grabs the reCAPTCHA script from the page and extracts the site key using a regex.

Sending Captcha Request

Next we need to send the site key to a captcha solving service to process the challenge. For this example we'll use 2Captcha.

We send a request containing the site key, our API key, and other details:

const formData = {
  method: 'userrecaptcha',
  googlekey: siteKey,
  key: apiKey,
  pageurl: pageUrl,
  json: 1
}

const response = await request.post('<http://2captcha.com/in.php>', {form: formData});
const requestId = JSON.parse(response).request;

This initializes the captcha solving request and returns a request ID we can use to poll for the solution.

Polling for Solution

Now we need to continually poll the service to check if the captcha is solved:

const response = await poll({
  taskFn: requestCaptchaResults,
  interval: 1500,
  retries: 30
});

function requestCaptchaResults(apiKey, requestId) {
  //...make request and return promise
}

We use the promise-poller library to handle the polling. Once resolved, response will contain the captcha solution.

Submitting the Solution

Finally, we need to submit the captcha solution to the page. For reCAPTCHA, this involves entering it into a hidden text field:

await page.evaluate(solution => {
  document.getElementById('g-recaptcha-response').innerHTML = solution;
}, response);

And that's it! The captcha is now solved. We can submit the form and continue automation.

Here is a full code example for solving reCAPTCHAs with Puppeteer using the 2Captcha service:

const puppeteer = require('puppeteer');
const request = require('request-promise-native');
const poll = require('promise-poller').default;

const apiKey = 'YOUR_API_KEY';

const siteDetails = {
  sitekey: 'SITE_KEY',
  pageurl: '<https://www.example.com>'
};

(async () => {

  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto(siteDetails.pageurl);

  const requestId = await initiateCaptchaRequest(apiKey, siteDetails);

  const response = await pollForRequestResults(apiKey, requestId);

  await page.evaluate(resp => {
    document.getElementById('g-recaptcha-response').innerHTML = resp;
  }, response);

  await page.click('#submit');

  await browser.close();

})();

async function initiateCaptchaRequest(apiKey, siteDetails) {

  const formData = {
    method: 'userrecaptcha',
    googlekey: siteDetails.sitekey,
    key: apiKey,
    pageurl: siteDetails.pageurl,
    json: 1
  };

  const response = await request.post('<http://2captcha.com/in.php>', {form: formData});
  return JSON.parse(response).request;

}

async function pollForRequestResults(apiKey, requestId, retries=30, delay=15) {

  await timeout(delay * 1000);

  return poll({
    taskFn: requestCaptchaResults(apiKey, requestId),
    interval: 1500,
    retries
  });

}

function requestCaptchaResults(apiKey, requestId) {

  const url = `http://2captcha.com/res.php?key=${apiKey}&action=get&id=${requestId}&json=1`;

  return async function() {
    const resp = await request.get(url);
    const json = JSON.parse(resp);
    if(json.status === 0) throw Error('captcha not ready');
    return json.request;
  }

}

function timeout(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

This implements the overall flow:

Launch Puppeteer

Navigate to page

Initialize captcha request

Poll for solution

Submit solution

Click submit button

The key aspects are extracting the site key, sending it to 2Captcha, polling for the response, and submitting that back to the page.

You would need to update the API key, site key, and page URL to match your specific use case. But this provides a full working example of solving CAPTCHAs with Puppeteer.

Conclusion

Bypassing captchas with Puppeteer provides a programatic way to automate through these protections. It does require using an external solving service, which has associated costs. But overall it's a simple and effective technique for controlling pages that make use of captcha defenses.

Rather than building and managing your own captcha solving infrastructure, services like Proxies API handle all of this complexity for you.

With Proxies API, you make a simple API request with the target URL. It will handle:

Rotating proxies and IP addresses

Rotating user agents

Solving captchas

Running JavaScript

And return the rendered HTML. No need to orchestrate the numerous steps required for reliable captcha solving.

For example:

curl "http://api.proxiesapi.com/?key=API_KEY&render=true&url=https://targetpage.com"

This takes care of all the headaches of automation. No proxies, browsers, or captcha solving services to manage.

Proxies API offers 1000 free API calls to get started. Check it out if you need to integrate robust captcha solving and proxy rotation in your projects.

Bypassing CAPTCHAs with Puppeteer

Overview

Extracting Captcha Info

Sending Captcha Request

Polling for Solution

Submitting the Solution

Conclusion

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

Bypassing CAPTCHAs with Puppeteer

Overview

Extracting Captcha Info

Sending Captcha Request

Polling for Solution

Submitting the Solution

Conclusion

The easiest way to do Web Scraping

Don't leave just yet!