crawl

Open-source projects categorized as crawl

Top 10 crawl Open-Source Projects

  • InfoSpider

    INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源中国博客、简书。

  • grab-site

    The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

  • Project mention: Ask HN: How can I back up an old vBulletin forum without admin access? | news.ycombinator.com | 2024-01-29

    The format you want is WARC. Even the Library of Congress uses it. There are many many WARC scrapers. I'd look at what the Internet Archive recommends. A quick search turned up this from the Archive Team and Jason Scott https://github.com/ArchiveTeam/grab-site (https://wiki.archiveteam.org/index.php/Who_We_Are) but I found that in less than 15 seconds of searching so do your own diligence.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • x-crawl

    x-crawl is a flexible Node.js multifunctional crawler library. Flexible usage and numerous functions can help you quickly, safely, and stably crawl pages, interfaces, and files.

  • Project mention: Flexible Node.js AI-assisted crawler library | news.ycombinator.com | 2024-04-24
  • stweet

    Advanced python library to scrap Twitter (tweets, users) from unofficial API

  • Project mention: Failed using the new twitter API or alternatives | /r/learnpython | 2023-05-11
  • clipper.js

    HTML to Markdown converter and crawler.

  • Project mention: Mozilla: Readability.js | news.ycombinator.com | 2024-02-25

    Clipper.js is built on top of Mozilla's Readability library, Turndown to convert HTML to Markdown https://github.com/philschmid/clipper.js

  • gospider

    ⚡ Light weight Golang spider framework | 轻量的 Golang 爬虫框架

  • fetchurls

    A bash script to spider a site, follow links, and fetch urls (with built-in filtering) into a generated text file.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • wget-lua

    Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

  • squint

    Makes visual reviews of web app releases easy. (by kimmobrunfeldt)

  • splatstats

    DCSS tournament TeamSplat statistics

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

crawl related posts

  • Flexible Node.js AI-assisted crawler library

    3 projects | news.ycombinator.com | 24 Apr 2024
  • Traditional crawler or AI-assisted crawler? How to choose?

    1 project | dev.to | 22 Apr 2024
  • AI+Node.js x-crawl crawler: Why are traditional crawlers no longer the first choice for data crawling?

    1 project | dev.to | 16 Apr 2024
  • AI combined with Node.js x-crawl crawler

    1 project | dev.to | 10 Apr 2024
  • Recommend a flexible Node.js multi-functional crawler library —— x-crawl

    1 project | dev.to | 20 Mar 2024
  • struggling to download websites

    1 project | /r/DataHoarder | 15 May 2023
  • Internet Archive Down, will be up and running soon (i hope).

    1 project | /r/DataHoarder | 22 Mar 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 4 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source crawl projects? This list will help you:

Project Stars
1 InfoSpider 7,134
2 grab-site 1,261
3 x-crawl 1,176
4 stweet 571
5 clipper.js 432
6 gospider 203
7 fetchurls 123
8 wget-lua 81
9 squint 27
10 splatstats 2

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com