beam
Ruby on Rails
Our great sponsors
beam | Ruby on Rails | |
---|---|---|
30 | 467 | |
7,477 | 54,865 | |
1.0% | 0.6% | |
10.0 | 10.0 | |
6 days ago | about 11 hours ago | |
Java | Ruby | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
beam
-
Ask HN: Does (or why does) anyone use MapReduce anymore?
The "streaming systems" book answers your question and more: https://www.oreilly.com/library/view/streaming-systems/97814.... It gives you a history of how batch processing started with MapReduce, and how attempts at scaling by moving towards streaming systems gave us all the subsequent frameworks (Spark, Beam, etc.).
As for the framework called MapReduce, it isn't used much, but its descendant https://beam.apache.org very much is. Nowadays people often use "map reduce" as a shorthand for whatever batch processing system they're building on top of.
-
beam VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023
-
How do Streaming Aggregation Pipelines work?
Apache Beam is one of many tools that you can use
-
Releasing Temporian, a Python library for processing temporal data, built together with Google
Flexible runtime ☁️: Temporian programs can run seamlessly in-process in Python, on large datasets using Apache Beam.
-
Kafka cluster loses or duplicates messages
To perform the tests I'm using a Kafka cluster on Kubernetes from the Beam repo (here).
- Apache Beam
-
Real Time Data Infra Stack
Apache Beam: Streaming framework which can be run on several runner such as Apache Flink and GCP Dataflow
-
Google Cloud Reference
Apache Beam: Batch/streaming data processing 🔗Link
-
Composer out of resources - "INFO Task exited with return code Negsignal.SIGKILL"
What you are looking for is Dataflow. It can be a bit tricky to wrap your head around at first, but I highly suggest leaning into this technology for most of your data engineering needs. It's based on the open source Apache Beam framework that originated at Google. We use an internal version of this system at Google for virtually all of our pipeline tasks, from a few GB, to Exabyte scale systems -- it can do it all.
-
Pub/Sub parallel processing best practices
That being said, there is a learning curve in understanding how Apache Beam works. Take a look at the beam website for more information.
Ruby on Rails
-
GitHub Incident with Issues, API Requests and Pull Requests
[0] is a my favorite demonstration of it.
[0]: https://github.com/rails/rails/commit/b83965785db1eec019edf1...
-
Client side Git hooks 101
Here's a real life example: Imagine a Ruby on Rails app on which a team of developers are working. The code is hosted on GitLab and all the work is coordinated using GitLab issues. In other words: For every commit, there's an associated issue and the issue number acts as a sort of primary key for documentation, time reporting and so forth. This convention has a few advantages, most notably the ability to easily learn more about how, when and by whom features were implemented as well as how this implementation came to be.
-
16 Best Ruby Frameworks For Web Development [2024]
Ruby on Rails is regarded as one of the best ruby frameworks. It was the primary language in developing big projects such as Twitter and helped the language boost the community. Often referred to as “Rails,” Ruby on Rails is a web development framework with an MVC control structure and currently running its 6.1 version. The 16-year-old language has dramatically influenced the web development structures and managing databases, web pages, and other components on a web application.
-
More control over enum in Rails 7.1
In Rails 7.1, a new option _instance_methods is introduced, allowing developers to opt-out of the automatic generation of instance methods for enums. When enum is defined with _instance_methods: false, Rails will no longer generate methods like pending?, processed?, etc.
-
Ruby on Rails load testing habits
Rails isn't super opinionated about database writes, its mostly left up to developers to discover that for relational DBs you do not want to be doing a bunch of small writes all at once.
That said it specifically has tools to address this that started appearing a few years ago https://github.com/rails/rails/pull/35077
The way my team handles it is to stick Kafka in between whats generating the records (for us, a bunch of web scraping workers) and and a consumer that pulls off the Kafka queue and runs an insert when its internal buffer reaches around 50k rows.
Rails is also looking to add some more direct background type work with https://github.com/basecamp/solid_queue but this is still very new - most larger Rails shops are going to be running a second system and a gem called Sidekiq that pulls jobs out of Redis.
-
DHH installing Campfire (37s ONCE #1) [video]
I'm looking forward to see what extractions from this will land on rails. For example: https://github.com/rails/rails/issues/50454
-
First commits in a Ruby on Rails app
Here is what strict_loading does (source):
-
Continuous Deployment with GitHub Actions and Kamal
Kamal is a wonderfully simple way to deploy your applications anywhere. It will also be included by default in Rails 8. Kamal is trivial, but I don’t recommend using it on your development machine.
-
What's Coming in Rails 8
Here's the GitHub milestone I've based this article on — https://github.com/rails/rails/milestone/87
- Rails 8 Plan
What are some alternatives?
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
Roda - Routing Tree Web Toolkit
Apache Hadoop - Apache Hadoop
Hanami - The web, with simplicity.
Scio - A Scala API for Apache Beam and Google Cloud Dataflow.
Sinatra - Classy web-development dressed in a DSL (official / canonical repo)
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
Cuba - Rum based microframework for web development.
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
CodeBehind Framework - CodeBehind library is a modern backend framework. This library is a programming model based on the MVC structure, which provides the possibility of creating dynamic aspx files in .NET Core and has high serverside independence.
Apache Hive - Apache Hive
Padrino - Padrino is a full-stack ruby framework built upon Sinatra.