Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev. Learn more →
Top 12 Go Spider Projects
-
Project mention: Show HN: Flyscrape – A standalone and scriptable web scraper in Go | news.ycombinator.com | 2023-11-11
Interesting. Can you compare it to colly? [0]
Last time I looked it was the most popular choice for scraping in Go and I have some projects using it.
Is it similar? Does it have more/less features or is it more suited for a different use case? (Which one?)
-
crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Haven't tried but this project https://github.com/crawlab-team/crawlab looks promising.
-
Onboard AI
Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev.
-
-
-
Project mention: Show HN: Flyscrape – A standalone and scriptable web scraper in Go | news.ycombinator.com | 2023-11-11
Its been 8+ years since i started scraping. I even wrote a popular Go web scraping framework previously: (https://github.com/geziyor/geziyor).
These days, I'm not even using Go for scraping, as the webpage changes makes me crazy, so I moved to Typescript+Playwright. (Crawlee framework is cool, while not strictly necessary).
My favorite stack as of 2023: TypeScript+Playwright+Crawlee(Optional)
-
cariddi
Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more
cariddi is an open source (https://github.com/edoardottt/cariddi) web security tool. It takes as input a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more.
-
webpalm
WebPalm is a powerful command-line tool for website mapping and web scraping. With its recursive approach, it can generate a complete tree of all webpages and their links on a website. It can also extract data from the body of each page using regular expressions, making it an ideal tool for web scraping and data extraction.
-
InfluxDB
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
-
-
-
spidy
Domain names collector - Crawl websites and collect domain names along with their availability status. (by twiny)
-
Project mention: colly VS scrapemate - a user suggested alternative | libhunt.com/r/colly | 2023-04-15
-
Event Horizon is a project that tells about interesting and safe places of the darknet space, with the aim of destroying stereotypes established in society, about all the immoral horrors of the dark web. It has many interesting sites with attached screenshots, some description and sometimes files. The project also has its own telegram bot for screenshots of onion resources and an onion crawler for searching for them - whose sources can be found on Github. Horizon was previously removed and resumed last year, now the activity of publications has fallen, presumably the project is on pause. Links: Telegram, Telegram-bot, Tor-Crawler.
Go Spider related posts
- New modern web crawling tool
-
DHT VS dht - a user suggested alternative
2 projects | 13 Jan 2022
- Show HN: A fast, feature-rich crawler for Go
- Fast, feature-rich web crawler for Go
- Feature rich crawler for Go.
- Create a tiny crawler/scraper for Go
-
A note from our sponsor - Onboard AI
getonboard.dev | 28 Nov 2023
Index
What are some of the best open-source Spider projects in Go? This list will help you:
Project | Stars | |
---|---|---|
1 | colly | 21,202 |
2 | crawlab | 10,401 |
3 | Pholcus | 7,471 |
4 | DHT | 2,653 |
5 | Geziyor | 2,294 |
6 | cariddi | 1,191 |
7 | webpalm | 307 |
8 | ant | 274 |
9 | gospider | 197 |
10 | spidy | 128 |
11 | scrapemate | 13 |
12 | turbo-tor-crawl | 6 |