SaaSHub helps you find the best software and product alternatives Learn more →
Top 16 apache-arrow Open-Source Projects
-
AWS Data Wrangler
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
lance
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
-
functime
Time-series machine learning at scale. Built with Polars for embarrassingly parallel feature extraction and forecasts on panel data.
-
ustore
Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Grafana Beyla: OSS eBPF auto-instrumentation for application observability | news.ycombinator.com | 2023-09-13
Project mention: Read files from s3 using Pandas/s3fs or AWS Data Wrangler? | /r/dataengineering | 2023-12-06I had no problem with awswrangler (https://github.com/aws/aws-sdk-pandas) and it supports reading and writing partitions which was really helpful and a few other optimizations that made it a great tool
> In addition to that we built a custom columnar database
I did some digging in your blog history and it seems that is referencing https://www.polarsignals.com/blog/posts/2022/07/22/frostdb-i... and digging into the "but why?" section <https://github.com/polarsignals/frostdb#why-you-should-use-f...> seems to imply you favored the embedded feature over having something standalone, but I would enjoy hearing (or reading a blog post!) about why you felt it was a better use of your engineering to make your own columar DB versus using one of the existing columanr dbs that I have seen referenced a ton in other Show HN announcements around both logging and metrics services
there's a whole ecosystem in Python originally developed for high energy physics data processing: https://github.com/scikit-hep/awkward all because Numpy demands square N-dimensional array
Same technique used everywhere, here's a simple Julia pkg for the same thing: https://github.com/JuliaArrays/ArraysOfArrays.jl/blob/3a6f5b...
But Julia at least has the decency to just support ragged Vector{Vector} out of the box, and it's not that slow
Project mention: Parquet-WASM: Rust-based WebAssembly bindings to read and write Parquet data | news.ycombinator.com | 2024-04-22I'll let Kyle chime in but I tested it a few months ago with millions of polygons on an M2 16GB of RAM laptop and it worked very well.
There is a library by the same author called lonboard that provides the JS bits inside JupyterLab. https://github.com/developmentseed/lonboard
I think it is based on the Kepler.gl / Deck.gl data loaders that go straight to GPU from network.
Project mention: Unified storage framework for the entire machine learning lifecycle | news.ycombinator.com | 2024-02-28
Project mention: Parquet-WASM: Rust-based WebAssembly bindings to read and write Parquet data | news.ycombinator.com | 2024-04-22Arrow JS is just ArrayBuffers underneath. You do want to amortize some operations to avoid unnecessary conversions. I.e. Arrow JS stores strings as UTF-8, but native JS strings are UTF-16 I believe.
Arrow is especially powerful across the WASM <--> JS boundary! In fact, I wrote a library to interpret Arrow from Wasm memory into JS without any copies [0]. (Motivating blog post [1])
[0]: https://github.com/kylebarron/arrow-js-ffi
[1]: https://observablehq.com/@kylebarron/zero-copy-apache-arrow-...
apache-arrow related posts
-
Parquet-WASM: Rust-based WebAssembly bindings to read and write Parquet data
-
Polar Signals Cloud Is Generally Available
-
I agree that Arrow Tables are great, but we decided to keep the library focused on the Pandas interface. [wont implement]
-
Benchmarking Pandas, CuDF, Modin, Apache Arrow and Spark on a Billion Taxi Rides dataset
-
Rust 1.63.0
-
arcticDB: embedded columnar database written in Go
-
How to adapt Arrow.Table columns (naturally per record batch basis) into CuArrays for GPU processing?
-
A note from our sponsor - SaaSHub
www.saashub.com | 10 May 2024
Index
What are some of the best open-source apache-arrow projects? This list will help you:
Project | Stars | |
---|---|---|
1 | pixie | 5,305 |
2 | AWS Data Wrangler | 3,811 |
3 | lance | 3,296 |
4 | frostdb | 1,216 |
5 | functime | 914 |
6 | awkward | 796 |
7 | ustore | 489 |
8 | geopolars | 493 |
9 | parquet-wasm | 466 |
10 | lonboard | 416 |
11 | arrow-julia | 277 |
12 | space | 136 |
13 | arrow-js-ffi | 91 |
14 | red_amber | 62 |
15 | awesome-pandas-alternatives | 29 |
16 | udsb | 8 |
Sponsored