Show HN: Crawlee for Python – a web scraping and browser automation library

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. crawlee

    Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

    The main advantage (for now) is that the library has a single interface for both HTTP and headless browsers, and bundled auto scaling. You can write your crawlers using the same base abstraction, and the framework takes care of this heavy lifting. Developers of scrapers shouldn't need to reinvent the wheel, and just focus on building the "business" logic of their scrapers. Having said that, if you wrote your own crawling library, the motivation to use Crawlee might be lower, and that's fair enough.

    Please note that this is the first release, and we'll keep adding many more features as we go, including anti-blocking, adaptive crawling, etc. To see where this might go, check https://github.com/apify/crawlee

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. crawlee-python

    Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

    Hey all,

    This is Jan, the founder of [Apify](https://apify.com/)—a full-stack web scraping platform. After the success of [Crawlee for JavaScript](https://github.com/apify/crawlee/) and the demand from the Python community, we're launching [Crawlee for Python](https://github.com/apify/crawlee-python) today!

    The main features are:

    - A unified programming interface for both HTTP (HTTPX with BeautifulSoup) & headless browser crawling (Playwright)

  4. scrapy-proxy-pool

    > Crawlee isn’t any less configurable than Scrapy.

    Oh, then I have obviously overlooked how one would be able to determined if a proxy has been blocked and evict it from the pool <https://github.com/rejoiceinhope/scrapy-proxy-pool/blob/b833...> . Or how to use an HTTP cache independent of the "browser" cache (e.g. to allow short-circuiting the actual request if I can prove it is not stale for my needs, which enables recrawls to fix logic bugs or even downloading the actual request-response payloads for making better tests) https://docs.scrapy.org/en/2.11/topics/downloader-middleware...

    Unless you meant what I said about "pip install -e && echo glhf" in which case, yes, it's a "simple matter of programming" into a framework that was not designed to be extended

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Inside implementing SuperScraper with Crawlee.

    3 projects | dev.to | 5 Mar 2025
  • How to scrape Bluesky with Python

    6 projects | dev.to | 21 Mar 2025
  • How to scrape Google Maps data using Python and Crawlee

    2 projects | dev.to | 30 Dec 2024
  • How to scrape Google search results with Python

    2 projects | dev.to | 1 Dec 2024
  • How to scrape infinite scrolling webpages with Python

    2 projects | dev.to | 27 Aug 2024

Did you know that Python is
the 2nd most popular programming language
based on number of references?