Python Crawler

Open-source Python projects categorized as Crawler

Top 23 Python Crawler Projects

  1. Scrapy

    Scrapy, a fast high-level web crawling & scraping framework for Python.

    Project mention: Top 10 Tools for Efficient Web Scraping in 2025 | dev.to | 2025-01-16

    Scrapy is a robust and scalable open-source web crawling framework. It is highly efficient for large-scale projects and supports asynchronous scraping.

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. newspaper

    newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

  4. Photon

    Incredibly fast crawler designed for OSINT. (by s0md3v)

  5. Douyin_TikTok_Download_API

    🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。

  6. autoscraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

  7. scrapy-redis

    Redis-based components for Scrapy.

  8. crawlee-python

    Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

    Project mention: How to scrape Crunchbase using Python in 2024 (Easy Guide) | dev.to | 2025-01-15

    By the end of this blog, we'll explore three different ways to extract data from Crunchbase using Crawlee for Python. We'll fully implement two of them and discuss the specifics and challenges of the third. This will help us better understand how important it is to properly choose the right data source.

  9. Nutrient

    Nutrient - The #1 PDF SDK Library. Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.

    Nutrient logo
  10. myGPTReader

    A community-driven way to read and chat with AI bots - powered by chatGPT.

  11. trafilatura

    Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

    Project mention: Claude is now available in Europe | news.ycombinator.com | 2024-05-14
  12. ProxyBroker

    Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS :performing_arts:

  13. weibo-crawler

    新浪微博爬虫,用python爬取新浪微博数据,并下载微博图片和微博视频

  14. toapi

    Every web site provides APIs.

  15. TorBot

    Dark Web OSINT Tool

  16. Grab

    Web Scraping Framework

  17. news-please

    news-please - an integrated web crawler and information extractor for news that just works

  18. Leaked-GPTs

    Leaked GPTs Prompts Bypass the 25 message limit or to try out GPTs without a Plus subscription.

  19. PSpider

    简单易用的Python爬虫框架,QQ交流群:597510560

  20. grab-site

    The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

    Project mention: ArchiveBox is evolving: the future of self-hosted internet archives | news.ycombinator.com | 2024-10-16

    https://github.com/ArchiveTeam/grab-site might be helpful. I'm a fan of the ability to create WARC archives, put them in object storage (whether that is IA, S3, Backblaze B2, etc), and then keep them in cold storage or serve them up via HTTPS or a torrent (mutable, preferred).

  21. OpenWPM

    A web privacy measurement framework

  22. mlscraper

    🤖 Scrape data from HTML websites automatically by just providing examples

  23. XSRFProbe

    The Prime Cross Site Request Forgery (CSRF) Audit and Exploitation Toolkit.

  24. scrapyrt

    HTTP API for Scrapy spiders

  25. google-play-scraper

    Google play scraper for Python inspired by <facundoolano/google-play-scraper> (by JoMingyu)

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Crawler discussion

Log in or Post with

Python Crawler related posts

  • How to scrape Google Maps data using Python and Crawlee

    2 projects | dev.to | 30 Dec 2024
  • How to scrape Google search results with Python

    2 projects | dev.to | 1 Dec 2024
  • We're losing our digital history. Can the Internet Archive save it?

    1 project | news.ycombinator.com | 18 Sep 2024
  • How to scrape infinite scrolling webpages with Python

    2 projects | dev.to | 27 Aug 2024
  • Current problems and mistakes of web scraping in Python and tricks to solve them!

    21 projects | dev.to | 22 Aug 2024
  • Show HN: Crawlee for Python – a web scraping and browser automation library

    5 projects | news.ycombinator.com | 9 Jul 2024
  • How to download a copy of a website using Wget

    1 project | news.ycombinator.com | 7 Jun 2024
  • A note from our sponsor - CodeRabbit
    coderabbit.ai | 18 Feb 2025
    Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →

Index

What are some of the best open-source Crawler projects in Python? This list will help you:

# Project Stars
1 Scrapy 54,144
2 newspaper 14,353
3 Photon 11,315
4 Douyin_TikTok_Download_API 10,936
5 autoscraper 6,634
6 scrapy-redis 5,570
7 crawlee-python 5,287
8 myGPTReader 4,438
9 trafilatura 3,933
10 ProxyBroker 3,920
11 weibo-crawler 3,642
12 toapi 3,511
13 TorBot 3,113
14 Grab 2,400
15 news-please 2,150
16 Leaked-GPTs 2,129
17 PSpider 1,829
18 grab-site 1,443
19 OpenWPM 1,352
20 mlscraper 1,339
21 XSRFProbe 1,175
22 scrapyrt 849
23 google-play-scraper 818

Sponsored
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai

Did you know that Python is
the 2nd most popular programming language
based on number of references?