Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev. Learn more →
Top 10 Ruby Scraper Projects
-
Project mention: Ask HN: What is the correct way to deal with pipelines? | news.ycombinator.com | 2023-09-21
"correct" is a value judgement that depends on lots of different things. Only you can decide which tool is correct. Here are some ideas:
- https://github.com/huginn/huginn
Your idea about a queue (in redis, or postgres, or sqlite, etc) is also totally valid. These off-the-shelf tools I listed probably wouldn't give you a huge advantage IMO.
-
Wombat
Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.
-
Onboard AI
Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev.
-
kimuraframework
Kimurai is a modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or simple HTTP requests and allows to scrape and interact with JavaScript rendered websites
Project mention: Tanakai 1.6.0 (web scraping gem) has been released with support to Ruby 3+ | /r/ruby | 2023-02-16Tanakai intends to be a maintained fork of Kimurai, a modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or simple HTTP requests and allows to scrape and interact with JavaScript rendered websites.
-
spidr
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use. (by postmodern)
-
Project mention: Tanakai: Modern web scraping framework written in Ruby | news.ycombinator.com | 2023-10-25
-
-
html2rss-web
🕸 Generates and delivers RSS feeds via HTTP. Docker image available! Create your own feeds or get started quickly with the included configs.
-
InfluxDB
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
-
nhkore
:jp::newspaper::mount_fuji: NHK News Web (Easy) word frequency (core list) scraper for Japanese language learners.
-
-
Ruby Scraper related posts
- Tanakai: Modern web scraping framework written in Ruby
- Are you using Huginn? If so do you have any latest documentation?
- Generate RSS feed for any website using CSS selectors
- What web scrapers do you recommend.
- Any recommendations for a open source replacement for If This Then That?
- Looking for a web scrapper to detect changes to a webpage on a schedule
- Where can I see Hokusai's Great Wave today?
-
A note from our sponsor - Onboard AI
getonboard.dev | 4 Dec 2023
Index
What are some of the best open-source Scraper projects in Ruby? This list will help you:
Project | Stars | |
---|---|---|
1 | Huginn | 39,866 |
2 | Wombat | 1,298 |
3 | kimuraframework | 990 |
4 | spidr | 770 |
5 | tanakai | 250 |
6 | html2rss | 102 |
7 | html2rss-web | 70 |
8 | nhkore | 13 |
9 | rails-urltohtml | 5 |
10 | chanCrawler | 4 |