Stealth Web Scraping in Python: Avoid Blocking Like a Ninja

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • puppeteer

    Node.js API for Chrome

  • While avoiding them - for performance reasons - would be preferable, sometimes there is no other choice. Selenium, Puppeteer, and Playwright are the most used and known libraries. The snippet below shows only the User-Agent, but since it is a real browser, the headers will include the entire set (Accept, Accept-Encoding, etcetera)

  • Playwright

    Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

  • While avoiding them - for performance reasons - would be preferable, sometimes there is no other choice. Selenium, Puppeteer, and Playwright are the most used and known libraries. The snippet below shows only the User-Agent, but since it is a real browser, the headers will include the entire set (Accept, Accept-Encoding, etcetera)

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • node-crawler

    Web Crawler/Spider for NodeJS + server-side jQuery ;-)

  • We could write some snippet mixing all of these, but the best option in real life is to use a tool with it all like Scrapy, pyspider, node-crawler (Node.js), or Colly (Go). The idea being the snippets is to understand each problem on its own. But for large-scale, real-life projects, handling everything on our own would be too complicated.

  • colly

    Elegant Scraper and Crawler Framework for Golang

  • We could write some snippet mixing all of these, but the best option in real life is to use a tool with it all like Scrapy, pyspider, node-crawler (Node.js), or Colly (Go). The idea being the snippets is to understand each problem on its own. But for large-scale, real-life projects, handling everything on our own would be too complicated.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts