InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Top 23 Python Spider Projects
-
Douyin_TikTok_Download_API
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
-
InfoSpider
INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源中国博客、简书。
-
-
-
scrapydweb
Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. Docs 文档 :point_right:
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
-
-
grab-site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Project mention: ArchiveBox is evolving: the future of self-hosted internet archives | news.ycombinator.com | 2024-10-16https://github.com/ArchiveTeam/grab-site might be helpful. I'm a fan of the ability to create WARC archives, put them in object storage (whether that is IA, S3, Backblaze B2, etc), and then keep them in cold storage or serve them up via HTTPS or a torrent (mutable, preferred).
-
-
Project mention: HN Summary: Let ChatGPT Summarize Hacker News for You | news.ycombinator.com | 2024-09-02
-
alltheplaces
A set of spiders and scrapers to extract location information from places that post their location on the internet.
An open web data scraping dataset of CC 0 licenced POI, written in python with the scrapy framework.
https://github.com/alltheplaces/alltheplaces
-
freshonions-torscraper
Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
-
-
linkedIn-scraper
A playwright bot which is implemented to scrape linkedin and store advertisement data in a database and telegram channel
-
graphinder
🕸️ Blazing fast GraphQL endpoints finder using subdomain enumeration, scripts analysis and bruteforce. 🕸️
-
-
telegram-groups-crawler
A Telegram crawler made in Python to automatically search groups and channels and collect any type of data from them (+ dataset included).
-
-
scrapeops-scrapy-sdk
Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of the box.
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Spider discussion
Python Spider related posts
-
We're losing our digital history. Can the Internet Archive save it?
-
AllThePlaces.xyz
-
How to download a copy of a website using Wget
-
Differentiating between hypermarkets and supermarkets.
-
Meta, Microsoft and Amazon team up on maps project
-
Distribution of gross and net salaries on r/BESalary [OC]
-
struggling to download websites
-
A note from our sponsor - InfluxDB
www.influxdata.com | 12 May 2025
Index
What are some of the best open-source Spider projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | Douyin_TikTok_Download_API | 12,303 |
2 | Photon | 11,566 |
3 | InfoSpider | 7,988 |
4 | toapi | 3,524 |
5 | Gerapy | 3,442 |
6 | scrapydweb | 3,276 |
7 | TorBot | 3,266 |
8 | SpiderKeeper | 2,766 |
9 | Grab | 2,402 |
10 | PSpider | 1,833 |
11 | grab-site | 1,481 |
12 | XSRFProbe | 1,205 |
13 | hacker-news-digest | 712 |
14 | alltheplaces | 696 |
15 | freshonions-torscraper | 515 |
16 | LinkedInDumper | 431 |
17 | linkedIn-scraper | 237 |
18 | graphinder | 207 |
19 | estela | 180 |
20 | telegram-groups-crawler | 156 |
21 | XingDumper | 38 |
22 | scrapeops-scrapy-sdk | 37 |
23 | amazon_price_tracker | 7 |