Avoiding bot detection: How to scrape the web without getting blocked?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • browser-fingerprinting

    Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?

  • Try your technique on a few of these fingerprint testing sites https://github.com/niespodd/browser-fingerprinting#fingerpri... I'm pretty sure it's quite detectible

  • duo-bypass

    Stuff after reverse engineering DUO's mobile app.

  • There are myriad ways of extracting the TOTP seed from these apps... Or you just reverse engineer the setup/confirmation process and then you can generate/trigger your own tokens from your automation workflow.

    2FA is a good security feature but it does not help against web scraping. Credential stuffing and other 3rd party attacks? Yes, it _can_ help. But it does not always help. There's a phishing group that has seemingly specialised on getting people to click the green confirm button in their Duo app... ¯\_(ツ)_/¯

    Check https://github.com/revalo/duo-bypass for a python script that can be used to automate Duo tokens... Has some code from me. There are similar scripts for all the other well known OTP Apps...

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • quaid

    A single-page webapp that decrypts text using only client-side JavaScript

  • This utility will help with that, assuming the services that use 2FA have a backup-code feature: https://github.com/sowbug/quaid

  • undetected-chromedriver

    Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

  • I've had a lot of success just with Selenium and this custom version of Chromedriver: https://github.com/ultrafunkamsterdam/undetected-chromedrive...

  • I've had a lot of success just with Selenium and this custom version of Chromedriver: https://github.com/ultrafunkamsterdam/undetected-chromedrive...

  • Playwright

    Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

  • Playwright is easy to get started with. The even tools that allow you to record your browser actions and covert it into code ( https://playwright.dev/ ).

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts