Arc Alternatives
Similar projects and alternatives to arc
-
-
-
Scout APM
Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
-
Apache Spark
Apache Spark - A unified analytics engine for large-scale data processing
-
Apache Arrow
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
-
anarki
Community-managed fork of the Arc dialect of Lisp; for commit privileges submit a pull request.
-
-
docker
These are the official Dockerfiles for https://github.com/orgs/tripl-ai/packages (by tripl-ai)
-
SonarQube
Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.
arc reviews and mentions
-
Show HN: Box – Data Transformation Pipelines in Rust DataFusion
A while ago I posted a link to [Arc](https://news.ycombinator.com/item?id=26573930) a declarative method for defining repeatable data pipelines which execute against [Apache Spark](https://spark.apache.org/).
Today I would like to present a proof-of-concept implementation of the [Arc declarative ETL framework](https://arc.tripl.ai) against [Apache Datafusion](https://arrow.apache.org/datafusion/) which is an Ansi SQL (Postgres) execution engine based upon Apache Arrow and built with Rust.
The idea of providing a declarative 'configuration' language for defining data pipelines was planned from the beginning of the Arc project to allow changing execution engines without having to rewrite the base business logic (the part that is valuable to your business). Instead, by defining an abstraction layer, we can change the execution engine and run the same logic with different execution characteristics.
The benefit of the DataFusion over Apache Spark is a significant increase in speed and reduction in execution resource requirements. Even through a Docker-for-Mac inefficiency layer the same job completes in ~4 seconds with DataFusion vs ~24 seconds with Apache Spark (including JVM startup time). Without Docker-for-Mac layer end-to-end execution times of 0.5 second for the same example job (TPC-H) is possible. * the aim is not to start a benchmarking flamewar but to provide some indicative data *.
The purpose of this post is to gather feedback from the community whether you would use a tool like this, what features would be required for you to use it (MVP) or whether you would be interested in contributing to the project. I would also like to highlight the excellent work being done by the DataFusion/Arrow (and Apache) community for providing such amazing tools to us all as open source projects.
-
Apache Arrow Datafusion 5.0.0 release
Disclosure: I am a contributor to Datafusion.
I have done a lot of work in the ETL space in Apache Spark to build Arc (https://arc.tripl.ai/) and have ported a lot of the basic functionality of Arc to Datafusion as a proof-of-concept. The appeal to me of the Apache Spark and Datafusion engines is the ability to a) seperate compute and storage b) express transformation logic in SQL.
Performance: From those early experiments Datafusion would frequently finish processing an entire job _before_ the SparkContext could be started - even on a local Spark instance. Obviously this is at smaller data sizes but in my experience a lot of ETL is about repeatable processes not necessarily huge datasets.
Compatibility: Those experiments were done a few months ago and the SQL compatibility of the Datafusion engine has improved extremely rapidly (WINDOW functions were recently added). There is still some missing SQL functionality (for example to run all the TPC-H queries https://github.com/apache/arrow-datafusion/tree/master/bench...) but it is moving quickly.
- Arc - an opinionated framework for defining data pipelines which are predictable, repeatable and manageable.
-
Hacker News top posts: Mar 25, 2021
Show HN: Arc, an open-source Databricks alternative\ (30 comments)
- Show HN: Arc, an open-source Databricks alternative
-
Show HN: Arc, an Open Source Databricks Alternative
Yes. Most of the simple stages just invoke the Spark Scala API - for example MLTransform invokes a pretrained SparkML model against a dataframe and returns a new one. You can see the standard Spark ML call: https://github.com/tripl-ai/arc/blob/master/src/main/scala/a.... You can add any plugin you want via the interface: https://arc.tripl.ai/plugins/
This is really defining a dialect that is more simple for Technical Business Analysts to consume that is safer than code and a notebook environment to interactively build with.
Stats
tripl-ai/arc is an open source project licensed under MIT License which is an OSI approved license.
Are you hiring? Post a new remote job listing for free.