Python Spider

Open-source Python projects categorized as Spider

Top 23 Python Spider Projects

  1. Douyin_TikTok_Download_API

    🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. Photon

    Incredibly fast crawler designed for OSINT. (by s0md3v)

  4. InfoSpider

    INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源中国博客、简书。

  5. toapi

    Every web site provides APIs.

  6. Gerapy

    Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

  7. scrapydweb

    Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. Docs 文档 :point_right:

  8. TorBot

    Dark Web OSINT Tool

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. SpiderKeeper

    admin ui for scrapy/open source scrapinghub

  11. Grab

    Web Scraping Framework

  12. PSpider

    简单易用的Python爬虫框架,QQ交流群:597510560

  13. grab-site

    The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

    Project mention: ArchiveBox is evolving: the future of self-hosted internet archives | news.ycombinator.com | 2024-10-16

    https://github.com/ArchiveTeam/grab-site might be helpful. I'm a fan of the ability to create WARC archives, put them in object storage (whether that is IA, S3, Backblaze B2, etc), and then keep them in cold storage or serve them up via HTTPS or a torrent (mutable, preferred).

  14. XSRFProbe

    The Prime Cross Site Request Forgery (CSRF) Audit and Exploitation Toolkit.

  15. hacker-news-digest

    :newspaper: Let ChatGPT Summarize Hacker News for You

    Project mention: HN Summary: Let ChatGPT Summarize Hacker News for You | news.ycombinator.com | 2024-09-02
  16. alltheplaces

    A set of spiders and scrapers to extract location information from places that post their location on the internet.

    Project mention: AllThePlaces.xyz | news.ycombinator.com | 2024-08-19

    An open web data scraping dataset of CC 0 licenced POI, written in python with the scrapy framework.

    https://github.com/alltheplaces/alltheplaces

  17. freshonions-torscraper

    Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion

  18. LinkedInDumper

    Python 3 script to dump/scrape/extract company employees from LinkedIn API

  19. linkedIn-scraper

    A playwright bot which is implemented to scrape linkedin and store advertisement data in a database and telegram channel

  20. graphinder

    🕸️ Blazing fast GraphQL endpoints finder using subdomain enumeration, scripts analysis and bruteforce. 🕸️

  21. estela

    estela, an elastic web scraping cluster 🕸

  22. telegram-groups-crawler

    A Telegram crawler made in Python to automatically search groups and channels and collect any type of data from them (+ dataset included).

  23. XingDumper

    Python 3 script to dump/scrape/extract company employees from XING API

  24. scrapeops-scrapy-sdk

    Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of the box.

  25. amazon_price_tracker

    A cool Scrapy spider that notifies price drop in a product you crave to buy!

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Spider discussion

Log in or Post with

Python Spider related posts

  • We're losing our digital history. Can the Internet Archive save it?

    1 project | news.ycombinator.com | 18 Sep 2024
  • AllThePlaces.xyz

    2 projects | news.ycombinator.com | 19 Aug 2024
  • How to download a copy of a website using Wget

    1 project | news.ycombinator.com | 7 Jun 2024
  • Differentiating between hypermarkets and supermarkets.

    1 project | /r/openstreetmap | 9 Dec 2023
  • Meta, Microsoft and Amazon team up on maps project

    1 project | news.ycombinator.com | 26 Jul 2023
  • Distribution of gross and net salaries on r/BESalary [OC]

    1 project | /r/BESalary | 1 Jul 2023
  • struggling to download websites

    1 project | /r/DataHoarder | 15 May 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 12 May 2025
    InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →

Index

What are some of the best open-source Spider projects in Python? This list will help you:

# Project Stars
1 Douyin_TikTok_Download_API 12,303
2 Photon 11,566
3 InfoSpider 7,988
4 toapi 3,524
5 Gerapy 3,442
6 scrapydweb 3,276
7 TorBot 3,266
8 SpiderKeeper 2,766
9 Grab 2,402
10 PSpider 1,833
11 grab-site 1,481
12 XSRFProbe 1,205
13 hacker-news-digest 712
14 alltheplaces 696
15 freshonions-torscraper 515
16 LinkedInDumper 431
17 linkedIn-scraper 237
18 graphinder 207
19 estela 180
20 telegram-groups-crawler 156
21 XingDumper 38
22 scrapeops-scrapy-sdk 37
23 amazon_price_tracker 7

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?