Python crawl

Open-source Python projects categorized as crawl

Top 3 Python crawl Projects

  1. InfoSpider

    INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源中国博客、简书。

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. grab-site

    The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

    Project mention: ArchiveBox is evolving: the future of self-hosted internet archives | news.ycombinator.com | 2024-10-16

    https://github.com/ArchiveTeam/grab-site might be helpful. I'm a fan of the ability to create WARC archives, put them in object storage (whether that is IA, S3, Backblaze B2, etc), and then keep them in cold storage or serve them up via HTTPS or a torrent (mutable, preferred).

  4. stweet

    Advanced python library to scrap Twitter (tweets, users) from unofficial API

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python crawl discussion

Log in or Post with

Python crawl related posts

  • We're losing our digital history. Can the Internet Archive save it?

    1 project | news.ycombinator.com | 18 Sep 2024
  • How to download a copy of a website using Wget

    1 project | news.ycombinator.com | 7 Jun 2024
  • struggling to download websites

    1 project | /r/DataHoarder | 15 May 2023
  • Internet Archive Down, will be up and running soon (i hope).

    1 project | /r/DataHoarder | 22 Mar 2023
  • best tool for downloading forum posts in real-time?

    1 project | /r/DataHoarder | 22 Mar 2023
  • Best way to back up entire website on a schedule

    2 projects | /r/DataHoarder | 29 Jan 2023
  • Help building or mirroring docs.microsoft.com

    2 projects | /r/DataHoarder | 9 Aug 2022
  • A note from our sponsor - SaaSHub
    www.saashub.com | 14 Feb 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source crawl projects in Python? This list will help you:

# Project Stars
1 InfoSpider 7,919
2 grab-site 1,436
3 stweet 593

Sponsored
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai

Did you know that Python is
the 2nd most popular programming language
based on number of references?