elasticsearch_elixir_bulk_processor VS broadway

Compare elasticsearch_elixir_bulk_processor vs broadway and see what are their differences.

elasticsearch_elixir_bulk_processor

Elasticsearch Elixir Bulk Processor is a configurable manager for efficiently inserting data into Elasticsearch. This processor uses GenStages (data-exchange steps) for handling backpressure, and various settings to control the bulk payloads being uploaded to Elasticsearch. (by sashman)

broadway

Concurrent and multi-stage data ingestion and data processing with Elixir (by dashbitco)
Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
elasticsearch_elixir_bulk_processor broadway
0 11
12 2,275
- 1.5%
0.0 6.0
almost 3 years ago 23 days ago
Elixir Elixir
Apache License 2.0 Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

elasticsearch_elixir_bulk_processor

Posts with mentions or reviews of elasticsearch_elixir_bulk_processor. We have used some of these posts to build our list of alternatives and similar projects.

We haven't tracked posts mentioning elasticsearch_elixir_bulk_processor yet.
Tracking mentions began in Dec 2020.

broadway

Posts with mentions or reviews of broadway. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-11-09.
  • Switching to Elixir
    11 projects | news.ycombinator.com | 9 Nov 2023
    You can actually have "background jobs" in very different ways in Elixir.

    > I want background work to live on different compute capacity than http requests, both because they have very different resources usage

    In Elixir, because of the way the BEAM works (the unit of parallelism is much cheaper and consume a low amount of memory), "incoming http requests" and related "workers" are not as expensive (a lot less actually) compared to other stacks (for instance Ruby and Python), where it is quite critical to release "http workers" and not hold the connection (which is what lead to the creation of background job tools like Resque, DelayedJob, Sidekiq, Celery...).

    This means that you can actually hold incoming HTTP connections a lot longer without troubles.

    A consequence of this is that implementing "reverse proxies", or anything calling third party servers _right in the middle_ of your own HTTP call, is usually perfectly acceptable (something I've done more than a couple of times, the latest one powering the reverse proxy behind https://transport.data.gouv.fr - code available at https://github.com/etalab/transport-site/tree/master/apps/un...).

    As a consequence, what would be a bad pattern in Python or Ruby (holding the incoming HTTP connection) is not a problem with Elixir.

    > because I want to have state or queues in front of background work so there's a well-defined process for retry, error handling, and back-pressure.

    Unless you deal with immediate stuff like reverse proxying or cheap "one off async tasks" (like recording a metric), there also are solutions to have more "stateful" background works in Elixir, too.

    A popular background job queue is https://github.com/sorentwo/oban (roughly similar to Sidekiq at al), which uses Postgres.

    It handles retries, errors etc.

    But it's not the only solution, as you have other tools dedicated to processing, such as Broadway (https://github.com/dashbitco/broadway), which handles back-pressure, fault-tolerance, batching etc natively.

    You also have more simple options, such as flow (https://github.com/dashbitco/flow), gen_stage (https://github.com/elixir-lang/gen_stage), Task.async_stream (https://hexdocs.pm/elixir/1.12/Task.html#async_stream/5) etc.

    It allows to use the "right tool for the job" quite easily.

    It is also interesting to note there is no need to "go evented" if you need to fetch data from multiple HTTP servers: it can happen in the exact same process (even: in a background task attached to your HTTP server), as done here https://transport.data.gouv.fr/explore (if you zoom you will see vehicle moving in realtime, and ~80 data sources are being polled every 10 seconds & broadcasted to the visitors via pubsub & websockets).

  • My Love Letter to Rails (and Ruby) – Or, Why RoR Isn't Dead Yet
    6 projects | news.ycombinator.com | 24 Oct 2023
    While in general you are right, I'd strongly say "it depends".

    The raw BEAM ecosystem (things that come ootb) is huge in itself, many things that would require additional machinery/libs/infra/... in other tech stacks are simply covered right away with stuff like OTP.

    The machine-learning ecosystem is kinda thriving, not fully at python levels yet but catching up rapidly and already outshining most other tech stacks - especially factoring in again the BEAM underpinnings that allow stuff like https://elixir-broadway.org/ for data pipelines (which requires a lot of additional python machinery to even replicate), and I'd argue that iE LiveBook already is a much better story than Jupyter notebooks.

    The web framework story is already excellent, as you mentioned with Phoenix/Ecto/Liveview/Oban/... which are kinda best-of-breed in the industry right now. Not only for the first few days into a project (lots of tech stacks are compelling here for one reason or another), but the scaling up capabilities are flat out amazing, you can get _so much_ mileage out of the stack before even looking into anything like k8s or whatever and can focus on iteration features instead of spending time in optimizations/infra/... even when traffic peaks occur.

    What may be missing are some adjacent libs or QoL in many smaller places. But its getting better for a while... we now have a great storybook reimplementation that doesn't suck ass with nodejs ecosystem craziness/provisioning/slowness like the real storybook. We have usable solutions for i18n or auth that may not be fancy but do the job. And since being Elixir, many missing things are just a few macros away if you need it. Especially the last 2 years have been quite a ride, and I was a phoenix user since probably 1.2 years back, but recently things are stepping up.

    Right now I am eagerly waiting for BeaconCMS to mature enough, thats an absolute pain point to get solved since nearly every web platform at some point needs some CMS style free-style pages and right now I have to always implement an integration to some external system... can't wait to have this as a lib mounted in my app like everything else. Oh, and types of course, one of the few points that keep people from trying elixir and I (coming from Rust, Go, Typescript) learned to love. But it looks like we're getting there.

  • Unpacking Elixir: Concurrency
    9 projects | news.ycombinator.com | 25 Aug 2023
    > In other words, there is a subset of distributed problems that Distributed Erlang solves very well out of the box: homogeneous systems working on ephemeral data. And some of the scenarios above are very common.

    Speaking of which, I'm looking forward to using Broadway [1] in a new project here in my company. Here, people are using an enterprise integration engine specialized in the healthcare space [2], with built-in single-branch version control and all actions going through the UI.

    As I come from a background of several years with Ruby on Rails, I really hope to convince people to use this great library your company created, since RoR is severely lacking when handling heavy concurrency like when gluing multiple APIs in complex workflows. Software engineers are going to love it, but integration analysts are used to IDEs with GUIs, so we'll need to create a pretty admin dashboard to convince them to switch.

    [1] https://elixir-broadway.org/

  • Event Based System with Localstack (Elixir Edition): Notifing to SQS when a file its uploaded
    2 projects | dev.to | 23 Aug 2023
    To listen a message broker the most used library is broadway, this library helps to create GenServer's that listens a specific queue and process message by message (or by chunks).
  • Do I need to use Elixir from Go perspective?
    5 projects | /r/elixir | 9 Jan 2023
    Outside of that, Elixir can be used for data pipelines, audio-video processing, and it is making inroads on Machine Learning with projects like Livebook, Nx, and Bumblebee.
  • Como automatizamos a avaliação de projetos com Github Actions e o Broadway do Elixir.
    2 projects | dev.to | 8 Sep 2022
  • Controlling Elixir supervisors at runtime with feature flags
    4 projects | dev.to | 22 Jun 2022
    Like many applications, our infrastructure relies on queues to decouple various components. In our system we use AWS Kinesis as a data stream, consumed by Broadway consumers for some critical parts of our infrastructure. We have found that sometimes our Broadway consumers for AWS Kinesis fail in ways that do not gracefully recover when they crash. For example, each Kinesis shard has its own supervision tree managed by the Kinesis Broadway consumer. We found that if a shard consumer experienced a crash-inducing error, the shard would not restart and the crash would not cascade up to the Broadway producer. While we have worked on contributing to this consumer library, we decided that it would be important to have runtime control over stopping and starting consumers to respond to such failures just in case.
  • A Guide to Event-Driven Architecture in Elixir
    2 projects | dev.to | 17 May 2022
    If you are looking for an even higher-level abstraction, Broadway is a good starting point. It is built on top of GenStage and offers several additional features, including consuming data from external queues like Amazon SQS, Apache Kafka, and RabbitMQ.

What are some alternatives?

When comparing elasticsearch_elixir_bulk_processor and broadway you can also consider the following projects:

oban - 💎 Robust job processing in Elixir, backed by modern PostgreSQL and SQLite3

kafka_ex - Kafka client library for Elixir

exq - Job processing library for Elixir - compatible with Resque / Sidekiq

kaffe - An opinionated Elixir wrapper around brod, the Erlang Kafka client, that supports encrypted connections to Heroku Kafka out of the box.

conduit - A message queue framework, with support for middleware and multiple adapters.

amqp - Idiomatic Elixir client for RabbitMQ

flume - A blazing fast job processing system backed by GenStage & Redis.

Rihanna - Rihanna is a high performance postgres-backed job queue for Elixir

honeydew - Job Queue for Elixir. Clustered or Local. Straight BEAM. Optional Ecto. 💪🍈

verk - A job processing system that just verks! 🧛‍

gen_rmq - Elixir AMQP consumer and publisher behaviours

hulaaki - DEPRECATED : An Elixir library (driver) for clients communicating with MQTT brokers(via the MQTT 3.1.1 protocol).