turbo-tor-crawl
Geziyor
turbo-tor-crawl | Geziyor | |
---|---|---|
1 | 2 | |
6 | 2,487 | |
- | 0.9% | |
10.0 | 0.6 | |
over 1 year ago | 7 months ago | |
Go | Go | |
- | Mozilla Public License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
turbo-tor-crawl
-
Event Horizon
Event Horizon is a project that tells about interesting and safe places of the darknet space, with the aim of destroying stereotypes established in society, about all the immoral horrors of the dark web. It has many interesting sites with attached screenshots, some description and sometimes files. The project also has its own telegram bot for screenshots of onion resources and an onion crawler for searching for them - whose sources can be found on Github. Horizon was previously removed and resumed last year, now the activity of publications has fallen, presumably the project is on pause. Links: Telegram, Telegram-bot, Tor-Crawler.
Geziyor
-
Show HN: I scraped 25M Shopify products to build a search engine
As someone who has scraped millions of items myself, I had success using Geziyor (https://github.com/geziyor/geziyor) built in Go. Shopify sites are especially easy to scrape because they tend to share the same product data formatting and don't hide it behind JS rendering.
-
Show HN: Flyscrape – A standalone and scriptable web scraper in Go
Its been 8+ years since i started scraping. I even wrote a popular Go web scraping framework previously: (https://github.com/geziyor/geziyor).
These days, I'm not even using Go for scraping, as the webpage changes makes me crazy, so I moved to Typescript+Playwright. (Crawlee framework is cool, while not strictly necessary).
My favorite stack as of 2023: TypeScript+Playwright+Crawlee(Optional)
What are some alternatives?
Pholcus - Pholcus is a distributed high-concurrency crawler software written in pure golang
colly - Elegant Scraper and Crawler Framework for Golang
crawlab - Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
DHT - BitTorrent DHT Protocol && DHT Spider.
jsonrpconion - Library for building JSON RPC services on Tor network
Ferret - Declarative web scraping
cariddi - Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more
google-search-results-golang - Google Search Results GoLang API
gichidan - Gichidan - CLI wrapper for Ichidan deep-web search engine.
gopixabay