Go Spider

Open-source Go projects categorized as Spider

Top 12 Go Spider Projects

  • colly

    Elegant Scraper and Crawler Framework for Golang

    Project mention: Show HN: Flyscrape – A standalone and scriptable web scraper in Go | news.ycombinator.com | 2023-11-11

    Interesting. Can you compare it to colly? [0]

    Last time I looked it was the most popular choice for scraping in Go and I have some projects using it.

    Is it similar? Does it have more/less features or is it more suited for a different use case? (Which one?)

    [0] https://github.com/gocolly/colly

  • crawlab

    Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架

    Project mention: Self-hosted web scraper? | /r/selfhosted | 2023-01-03

    Haven't tried but this project https://github.com/crawlab-team/crawlab looks promising.

  • Onboard AI

    Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev.

  • Pholcus

    Pholcus is a distributed high-concurrency crawler software written in pure golang

  • DHT

    BitTorrent DHT Protocol && DHT Spider.

  • Geziyor

    Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.

    Project mention: Show HN: Flyscrape – A standalone and scriptable web scraper in Go | news.ycombinator.com | 2023-11-11

    Its been 8+ years since i started scraping. I even wrote a popular Go web scraping framework previously: (https://github.com/geziyor/geziyor).

    These days, I'm not even using Go for scraping, as the webpage changes makes me crazy, so I moved to Typescript+Playwright. (Crawlee framework is cool, while not strictly necessary).

    My favorite stack as of 2023: TypeScript+Playwright+Crawlee(Optional)

  • cariddi

    Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more

    Project mention: cariddi v1.3.1 is out🥳 | /r/opensource | 2023-03-24

    cariddi is an open source (https://github.com/edoardottt/cariddi) web security tool. It takes as input a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more.

  • webpalm

    WebPalm is a powerful command-line tool for website mapping and web scraping. With its recursive approach, it can generate a complete tree of all webpages and their links on a website. It can also extract data from the body of each page using regular expressions, making it an ideal tool for web scraping and data extraction.

    Project mention: New webcrawler for bug-hunters and data-miners | news.ycombinator.com | 2023-10-18
  • InfluxDB

    Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.

  • ant

    A web crawler for Go (by yields)

  • gospider

    ⚡ Light weight Golang spider framework | 轻量的 Golang 爬虫框架

  • spidy

    Domain names collector - Crawl websites and collect domain names along with their availability status. (by twiny)

  • scrapemate

    Golang Crawling and scraping framework (by gosom)

    Project mention: colly VS scrapemate - a user suggested alternative | libhunt.com/r/colly | 2023-04-15
  • turbo-tor-crawl

    Recursive hostnames crawler

    Project mention: Event Horizon | /r/candeltreow | 2023-03-05

    Event Horizon is a project that tells about interesting and safe places of the darknet space, with the aim of destroying stereotypes established in society, about all the immoral horrors of the dark web. It has many interesting sites with attached screenshots, some description and sometimes files. The project also has its own telegram bot for screenshots of onion resources and an onion crawler for searching for them - whose sources can be found on Github. Horizon was previously removed and resumed last year, now the activity of publications has fallen, presumably the project is on pause. Links: Telegram, Telegram-bot, Tor-Crawler.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-11-11.

Go Spider related posts

Index

What are some of the best open-source Spider projects in Go? This list will help you:

Project Stars
1 colly 21,202
2 crawlab 10,401
3 Pholcus 7,471
4 DHT 2,653
5 Geziyor 2,294
6 cariddi 1,191
7 webpalm 307
8 ant 274
9 gospider 197
10 spidy 128
11 scrapemate 13
12 turbo-tor-crawl 6
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com