ballista VS roapi

Compare ballista vs roapi and see what are their differences.

ballista

Distributed compute platform implemented in Rust, and powered by Apache Arrow. (by ballista-compute)
Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
ballista roapi
20 24
2,238 3,080
- 1.7%
9.3 6.9
about 3 years ago about 1 month ago
Rust Rust
Apache License 2.0 Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

ballista

Posts with mentions or reviews of ballista. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-04-16.
  • Ballista: Distributed compute platform implemented in Rust using Apache Arrow.
    1 project | /r/compsci | 11 Jun 2022
  • Open source contributions for a Data Engineer?
    17 projects | /r/dataengineering | 16 Apr 2021
    His newer project, Ballista, was also donated to Apache Arrow. I hope to get the Rust skills to collaborate with him on open source work someday too. He's also doing really cool work on spark-rapids FYI.
  • Best format to use for DataFrames in Rust and Python?
    3 projects | /r/rust | 16 Mar 2021
    https://github.com/ballista-compute/ballista/blob/main/rust/executor/src/flight_service.rs#L193-L228
  • I wrote one of the fastest DataFrame libraries
    6 projects | news.ycombinator.com | 13 Mar 2021
    I'm guessing Polars and Ballista (https://github.com/ballista-compute/ballista) have different goals, but I don't know enough about either to say what those might be. Does anyone know enough about either to explain the differences?
  • Introducing Kamu - World's first global collaborative data pipeline
    3 projects | /r/rust | 12 Mar 2021
    In your article you mention looking for a faster data engine, have you looked at Ballista https://github.com/ballista-compute/ballista? It’s pretty young but it uses the Apache Arrow memory model and the maintainer did a bunch of work on Apache Spark I believe.
  • Rust for DE?
    6 projects | /r/dataengineering | 11 Mar 2021
    https://github.com/ballista-compute/ballista is also a cool project worth checking out.
  • Julia: A Post-Mortem
    4 projects | news.ycombinator.com | 8 Mar 2021
    It’s mostly a personal favourite, but once Ballista [1] gets a bit more developed, I expect we’ll tear out our Java/Spark pipelines and replace them with that.

    The ML ecosystem in Rust is a bit underdeveloped at the moment, but work is ticking along on packages like Linfa and SmartCore, so maybe it’ll get there? In my field I’m mostly about it’s potential for correct, high-performance data pipelines that are straightforward to write in reasonable time, and hopefully a model-serving framework: I hate that so many of the current tools require annotating and shipping Python when really model-serving shouldn’t really need any Python code.

    [1] https://github.com/ballista-compute/ballista

  • Ballista 0.4.0
    1 project | /r/rust | 20 Feb 2021
  • Why isn't differential dataflow more popular?
    13 projects | news.ycombinator.com | 22 Jan 2021
    I've looked at this and thought it looked amazing, but also haven't used it for anything. Some thoughts...

    Rust is a blessing and curse. I seems like the obvious choice for data pipelines, but everything big currently exists in Java and the small stuff is in Javascript, Python or R. Maybe this will slowly change, but it's a big ship to turn. I'm hopeful that tools like this and Balista [1] will eventually get things moving.

    Since the Rust community is relatively small, language bindings would be very helpful. Being able to configure pipelines from Java or Typescript(!) would be great.

    Or maybe it's just that this form of computation is too foreign. By the time you need it, the project is so large that it's too late to redesign it to use it. I'm also unclear on how it would handle changing requirements and recomputing new aggregations over old data. Better docs with more convincing examples would be helpful here. The GitHub page showing counting isn't very compelling.

    [1] https://github.com/ballista-compute/ballista

  • ballista-compute/ballista proof-of-concept distributed compute platform primarily implemented in Rust, using Apache Arrow as the memory model.
    1 project | /r/rust | 20 Jan 2021

roapi

Posts with mentions or reviews of roapi. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-04-25.
  • Full-fledged APIs for slowly moving datasets without writing code
    1 project | news.ycombinator.com | 25 Oct 2023
  • Tuql: Automatically create a GraphQL server from a SQLite database
    6 projects | news.ycombinator.com | 25 Apr 2023
    If your use case is read-only I suggest taking a look at roapi[1]. It supports multiple read frontends (GraphQL, SQL, REST) and many backends like SQLite, JSON, google sheets, MySQL, etc.

    [1] https://github.com/roapi/roapi

  • Who is using AXUM in production?
    18 projects | /r/rust | 21 Apr 2023
  • Ask HN: Best way to provide access to large data sets
    2 projects | news.ycombinator.com | 11 Apr 2023
    For smaller datasets then anywhere up to a few mb which isn't so bad reasonable with an API but in theory for historic data it could be up to several gb. I've not seen datasette go that high (IIRC it's a 1000 row return limit by default).

    That's what got me intrigued with Atlassians offering, as data lakes tend to be something internal to a company, not something I've ever seen offered as an interaction point to users.

    I've also tested out roapi [1] which is nice if the data is in some structured format already (Parquet/JSON)

    [1] https://github.com/roapi/roapi

  • "thread 'main' panicked at 'no CA certificates found'", when running application in docker container
    3 projects | /r/rust | 4 Apr 2023
    https://github.com/roapi/roapi/issues/103?
  • Roapi 0.9 release adds support for all cloud storage providers
    1 project | news.ycombinator.com | 29 Jan 2023
  • SQLite-based databases on the Postgres protocol? Yes we can
    11 projects | news.ycombinator.com | 25 Jan 2023
    Very cool and well executed project. Love the sprinkle of Rust in all the other companion projects as well :)

    The ROAPI(https://github.com/roapi/roapi) project I built also happened to support a similar feature set, i.e. to expose sqlite through a variety of remote query interfaces including pg wire protocols, rest apis and graphqls.

  • Using Rust to write a Data Pipeline. Thoughts. Musings.
    5 projects | /r/rust | 14 Jan 2023
  • PostgREST – Serve a RESTful API from Any Postgres Database
    22 projects | news.ycombinator.com | 29 Dec 2022
    > why not just accept SQL and cut out all the unnecessary mapping?

    You might be interested in what we're building: Seafowl, a database designed for running analytical SQL queries straight from the user's browser, with HTTP CDN-friendly caching [0]. It's a second iteration of the Splitgraph DDN [1] which we built on top of PostgreSQL (Seafowl is much faster for this use case, since it's based on Apache DataFusion + Parquet).

    The tradeoff for allowing the client to run any SQL vs a limited API is that PostgREST-style queries have a fairly predictable and low overhead, but aren't as powerful as fully-fledged SQL with aggregations, joins, window functions and CTEs, which have their uses in interactive dashboards to reduce the amount of data that has to be processed on the client.

    There's also ROAPI [2] which is a read-only SQL API that you can deploy in front of a database / other data source (though in case of using databases as a data source, it's only for tables that fit in memory).

    [0] https://seafowl.io/

    [1] https://www.splitgraph.com/connect

    [2] https://github.com/roapi/roapi

  • Command-line data analytics made easy
    6 projects | news.ycombinator.com | 3 Nov 2022
    It could be the NDJSON parser (DF source: [0]) or could be a variety of other factors. Looking at the ROAPI release archive [1], it doesn't ship with the definitive `columnq` binary from your comment, so it could also have something to do with compilation-time flags.

    FWIW, we use the Parquet format with DataFusion and get very good speeds similar to DuckDB [2], e.g. 1.5s to run a more complex aggregation query `SELECT date_trunc('month', tpep_pickup_datetime) AS month, COUNT(*) AS total_trips, SUM(total_amount) FROM tripdata GROUP BY 1 ORDER BY 1 ASC)` on a 55M row subset of NY Taxi trip data.

    [0]: https://github.com/apache/arrow-datafusion/blob/master/dataf...

    [1]: https://github.com/roapi/roapi/releases/tag/roapi-v0.8.0

    [2]: https://observablehq.com/@seafowl/benchmarks

What are some alternatives?

When comparing ballista and roapi you can also consider the following projects:

spark-rapids - Spark RAPIDS plugin - accelerate Apache Spark with GPUs

php-parquet - PHP implementation for reading and writing Apache Parquet files/streams. NOTICE: Please migrate to https://github.com/codename-hub/php-parquet.

differential-dataflow - An implementation of differential dataflow using timely dataflow on Rust.

qframe - Immutable data frame for Go

delta-rs - A native Rust library for Delta Lake, with bindings into Python

materialize - The data warehouse for operational workloads.

dagster - An orchestration platform for the development, production, and observation of data assets.

Prefect - The easiest way to build, run, and monitor data pipelines at scale.

fluvio - Lean and mean distributed stream processing system written in rust and web assembly.

airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

datasette - An open source multi-tool for exploring and publishing data