apache-arrow

Top 16 apache-arrow Open-Source Projects

  • pixie

    Instant Kubernetes-Native Application Observability

  • Project mention: Grafana Beyla: OSS eBPF auto-instrumentation for application observability | news.ycombinator.com | 2023-09-13
  • AWS Data Wrangler

    pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

  • Project mention: Read files from s3 using Pandas/s3fs or AWS Data Wrangler? | /r/dataengineering | 2023-12-06

    I had no problem with awswrangler (https://github.com/aws/aws-sdk-pandas) and it supports reading and writing partitions which was really helpful and a few other optimizations that made it a great tool

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • lance

    Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..

  • Project mention: The Nimble File Format by Meta | news.ycombinator.com | 2024-04-25
  • frostdb

    ❄️ Coolest database around 🧊 Embeddable column database written in Go.

  • Project mention: Polar Signals Cloud Is Generally Available | news.ycombinator.com | 2023-10-10

    > In addition to that we built a custom columnar database

    I did some digging in your blog history and it seems that is referencing https://www.polarsignals.com/blog/posts/2022/07/22/frostdb-i... and digging into the "but why?" section <https://github.com/polarsignals/frostdb#why-you-should-use-f...> seems to imply you favored the embedded feature over having something standalone, but I would enjoy hearing (or reading a blog post!) about why you felt it was a better use of your engineering to make your own columar DB versus using one of the existing columanr dbs that I have seen referenced a ton in other Show HN announcements around both logging and metrics services

  • functime

    Time-series machine learning at scale. Built with Polars for embarrassingly parallel feature extraction and forecasts on panel data.

  • Project mention: functime: NEW Data - star count:616.0 | /r/algoprojects | 2023-11-08
  • awkward

    Manipulate JSON-like data with NumPy-like idioms.

  • Project mention: Efficient Jagged Arrays | news.ycombinator.com | 2023-07-03

    there's a whole ecosystem in Python originally developed for high energy physics data processing: https://github.com/scikit-hep/awkward all because Numpy demands square N-dimensional array

    Same technique used everywhere, here's a simple Julia pkg for the same thing: https://github.com/JuliaArrays/ArraysOfArrays.jl/blob/3a6f5b...

    But Julia at least has the decency to just support ragged Vector{Vector} out of the box, and it's not that slow

  • ustore

    Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • geopolars

    Geospatial extensions for Polars

  • parquet-wasm

    Rust-based WebAssembly bindings to read and write Apache Parquet data

  • Project mention: FLaNK AI Weekly for 29 April 2024 | dev.to | 2024-04-29
  • lonboard

    A Python library for fast, interactive geospatial vector data visualization in Jupyter.

  • Project mention: Parquet-WASM: Rust-based WebAssembly bindings to read and write Parquet data | news.ycombinator.com | 2024-04-22

    I'll let Kyle chime in but I tested it a few months ago with millions of polygons on an M2 16GB of RAM laptop and it worked very well.

    There is a library by the same author called lonboard that provides the JS bits inside JupyterLab. https://github.com/developmentseed/lonboard

    I think it is based on the Kepler.gl / Deck.gl data loaders that go straight to GPU from network.

  • arrow-julia

    Official Julia implementation of Apache Arrow

  • space

    Unified storage framework for the entire machine learning lifecycle (by google)

  • Project mention: Unified storage framework for the entire machine learning lifecycle | news.ycombinator.com | 2024-02-28
  • arrow-js-ffi

    Zero-copy reading of Arrow data from WebAssembly

  • Project mention: Parquet-WASM: Rust-based WebAssembly bindings to read and write Parquet data | news.ycombinator.com | 2024-04-22

    Arrow JS is just ArrayBuffers underneath. You do want to amortize some operations to avoid unnecessary conversions. I.e. Arrow JS stores strings as UTF-8, but native JS strings are UTF-16 I believe.

    Arrow is especially powerful across the WASM <--> JS boundary! In fact, I wrote a library to interpret Arrow from Wasm memory into JS without any copies [0]. (Motivating blog post [1])

    [0]: https://github.com/kylebarron/arrow-js-ffi

    [1]: https://observablehq.com/@kylebarron/zero-copy-apache-arrow-...

  • red_amber

    A dataframe library for Rubyists.

  • awesome-pandas-alternatives

    Awesome list of alternative dataframe libraries in Python.

  • udsb

    Unlimited Data-Science Benchmarks for Numeric, Tabular and Graph Workloads

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

apache-arrow related posts

  • Parquet-WASM: Rust-based WebAssembly bindings to read and write Parquet data

    5 projects | news.ycombinator.com | 22 Apr 2024
  • Polar Signals Cloud Is Generally Available

    1 project | news.ycombinator.com | 10 Oct 2023
  • I agree that Arrow Tables are great, but we decided to keep the library focused on the Pandas interface. [wont implement]

    1 project | /r/programmingcirclejerk | 21 Sep 2022
  • Benchmarking Pandas, CuDF, Modin, Apache Arrow and Spark on a Billion Taxi Rides dataset

    2 projects | /r/Python | 21 Sep 2022
  • Rust 1.63.0

    14 projects | news.ycombinator.com | 11 Aug 2022
  • arcticDB: embedded columnar database written in Go

    2 projects | /r/golang | 4 May 2022
  • How to adapt Arrow.Table columns (naturally per record batch basis) into CuArrays for GPU processing?

    1 project | /r/Julia | 2 Mar 2022
  • A note from our sponsor - SaaSHub
    www.saashub.com | 10 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source apache-arrow projects? This list will help you:

Project Stars
1 pixie 5,305
2 AWS Data Wrangler 3,811
3 lance 3,296
4 frostdb 1,216
5 functime 914
6 awkward 796
7 ustore 489
8 geopolars 493
9 parquet-wasm 466
10 lonboard 416
11 arrow-julia 277
12 space 136
13 arrow-js-ffi 91
14 red_amber 62
15 awesome-pandas-alternatives 29
16 udsb 8

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com