Spider

Open-source projects categorized as Spider

Top 23 Spider Open-Source Projects

  • colly

    Elegant Scraper and Crawler Framework for Golang

  • Project mention: Scraping the full snippet from Google search result | dev.to | 2024-01-01

    SerpApi focuses on scraping search results. That's why we need extra help to scrape individual sites. We'll use GoColly package.

  • crawlab

    Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Photon

    Incredibly fast crawler designed for OSINT. (by s0md3v)

  • Pholcus

    Pholcus is a distributed high-concurrency crawler software written in pure golang

  • InfoSpider

    INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源中国博客、简书。

  • Douyin_TikTok_Download_API

    🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。

  • Project mention: TikTok video scraper | /r/webscraping | 2023-05-23

    At the moment I am working on a web scraper for TikTok. At the moment, I am able to retrieve data about the first 16 videos from a channel. The way I achieved this was to make requests to an unofficial API https://github.com/Evil0ctal/Douyin_TikTok_Download_API. My problem is that the requirements for this project do not allow me to use any package that would extract data from TikTok. I would like to ask you all, how should I go about this task. Already tried getting data from the HTML, but is not sufficient since most of it is not displayed when I use requests.get(URL). Could you please recommend some repositories that could help or some way of extracting the data? Thank you!

  • node-crawler

    Web Crawler/Spider for NodeJS + server-side jQuery ;-)

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • awesome-web-scraping

    List of libraries, tools and APIs for web scraping and data processing.

  • awesome-crawler

    A collection of awesome web crawler,spider in different languages

  • browser-fingerprinting

    Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?

  • Project mention: A site that tracks the price of a Big Mac in every US McDonald's | news.ycombinator.com | 2024-01-13

    Yes, there is a lot written about it. Here is one link I have saved:

    https://github.com/niespodd/browser-fingerprinting

  • toapi

    Every web site provides APIs.

  • Gerapy

    Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

  • scrapydweb

    Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. DEMO :point_right:

  • SpiderKeeper

    admin ui for scrapy/open source scrapinghub

  • DHT

    BitTorrent DHT Protocol && DHT Spider.

  • TorBot

    Dark Web OSINT Tool

  • Geziyor

    Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.

  • Project mention: Show HN: I scraped 25M Shopify products to build a search engine | news.ycombinator.com | 2023-12-13

    As someone who has scraped millions of items myself, I had success using Geziyor (https://github.com/geziyor/geziyor) built in Go. Shopify sites are especially easy to scrape because they tend to share the same product data formatting and don't hide it behind JS rendering.

  • Grab

    Web Scraping Framework

  • abot

    Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.

  • PSpider

    简单易用的Python爬虫框架,QQ交流群:597510560

  • cariddi

    Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more

  • grab-site

    The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

  • Project mention: Ask HN: How can I back up an old vBulletin forum without admin access? | news.ycombinator.com | 2024-01-29

    The format you want is WARC. Even the Library of Congress uses it. There are many many WARC scrapers. I'd look at what the Internet Archive recommends. A quick search turned up this from the Archive Team and Jason Scott https://github.com/ArchiveTeam/grab-site (https://wiki.archiveteam.org/index.php/Who_We_Are) but I found that in less than 15 seconds of searching so do your own diligence.

  • x-crawl

    x-crawl is a flexible Node.js multifunctional crawler library. Flexible usage and numerous functions can help you quickly, safely, and stably crawl pages, interfaces, and files.

  • Project mention: Flexible Node.js AI-assisted crawler library | news.ycombinator.com | 2024-04-24
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Spider related posts

  • Flexible Node.js AI-assisted crawler library

    3 projects | news.ycombinator.com | 24 Apr 2024
  • Traditional crawler or AI-assisted crawler? How to choose?

    1 project | dev.to | 22 Apr 2024
  • AI+Node.js x-crawl crawler: Why are traditional crawlers no longer the first choice for data crawling?

    1 project | dev.to | 16 Apr 2024
  • AI combined with Node.js x-crawl crawler

    1 project | dev.to | 10 Apr 2024
  • Recommend a flexible Node.js multi-functional crawler library —— x-crawl

    1 project | dev.to | 20 Mar 2024
  • Differentiating between hypermarkets and supermarkets.

    1 project | /r/openstreetmap | 9 Dec 2023
  • Meta, Microsoft and Amazon team up on maps project

    1 project | news.ycombinator.com | 26 Jul 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 1 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Spider projects? This list will help you:

Project Stars
1 colly 22,165
2 crawlab 10,803
3 Photon 10,513
4 Pholcus 7,504
5 InfoSpider 7,134
6 Douyin_TikTok_Download_API 6,925
7 node-crawler 6,615
8 awesome-web-scraping 6,323
9 awesome-crawler 6,078
10 browser-fingerprinting 3,830
11 toapi 3,462
12 Gerapy 3,211
13 scrapydweb 3,004
14 SpiderKeeper 2,705
15 DHT 2,668
16 TorBot 2,618
17 Geziyor 2,480
18 Grab 2,355
19 abot 2,205
20 PSpider 1,811
21 cariddi 1,352
22 grab-site 1,261
23 x-crawl 1,176

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com