SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Go Crawler Projects
-
Not a fix, but I tend to use lux when downloading from bilibili. It is faster too.
-
Sounds cool, but how is this different from Colly: https://github.com/gocolly/colly?
-
SonarQube
Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.
-
crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Haven't tried but this project https://github.com/crawlab-team/crawlab looks promising.
-
-
Using a few different methods. Pulling the sites I'm using Puppeteer and Katana (https://github.com/projectdiscovery/katana). To process and extract the information is tricky, most websites selling things put time into their metadata; this does make it easier. Additionally, a lot of the larger stores have common patterns between them. Failing all of this, I trained a Tensor flow model to understand how to read product pages. However, it's far from perfect and a journey of continual improvement.
-
-
Rendora
dynamic server-side rendering using headless Chrome to effortlessly solve the SEO problem for modern javascript websites
-
InfluxDB
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
-
-
cariddi
Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more
cariddi is an open source (https://github.com/edoardottt/cariddi) web security tool. It takes as input a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more.
-
-
till
DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.
-
-
dorkscout
DorkScout - Golang tool to automate google dork scan against the entiere internet or specific targets
-
-
crawley project: https://github.com/s0rg/crawley
-
spidy
Domain names collector - Crawl websites and collect domain names along with their availability status. (by twiny)
Project mention: Share Your Code.. Share your most unique piece of Go code. | /r/golang | 2022-10-151 - Expired domain scrapper => https://github.com/twiny/spidy 2 - A sample & efficient web crawler => https://github.com/twiny/wbot 3 - A mini blockchain scanner => https://github.com/twiny/blockscan 4 - A Snake Game => https://github.com/twiny/snaky
-
pagser
Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler
-
-
-
-
-
-
webpalm
WebPalm is a powerful command-line tool for website mapping and web scraping. With its recursive approach, it can generate a complete tree of all webpages and their links on a website. It can also extract data from the body of each page using regular expressions, making it an ideal tool for web scraping and data extraction.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Go Crawler related posts
- webpalm
- Bilibili download stalls at around 30-60%
- webpalm
- New Modern Crawling tool written with go
- New Modern Fast Crawler
- Webpalm - Modern fast web crawling tool in go
- Modern fast web crawling tool in go
-
A note from our sponsor - #<SponsorshipServiceOld:0x00007f092083ad30>
www.saashub.com | 10 Jun 2023
Index
What are some of the best open-source Crawler projects in Go? This list will help you:
Project | Stars | |
---|---|---|
1 | lux | 21,187 |
2 | colly | 19,693 |
3 | crawlab | 9,857 |
4 | Pholcus | 7,392 |
5 | katana | 6,578 |
6 | Ferret | 5,393 |
7 | Rendora | 1,962 |
8 | Geziyor | 1,933 |
9 | cariddi | 919 |
10 | go-dork | 799 |
11 | till | 799 |
12 | antch | 248 |
13 | dorkscout | 199 |
14 | ChainWalker | 166 |
15 | crawley | 143 |
16 | spidy | 116 |
17 | pagser | 82 |
18 | slrp | 82 |
19 | bathyscaphe | 82 |
20 | skweez | 56 |
21 | google-search-results-golang | 46 |
22 | seonaut | 40 |
23 | webpalm | 38 |