Arrow

Open-source projects categorized as Arrow

Top 23 Arrow Open-Source Projects

  • polars

    Dataframes powered by a multithreaded, vectorized query engine, written in Rust

  • Project mention: Why Python's Integer Division Floors (2010) | news.ycombinator.com | 2024-02-28

    This is because 0.1 is in actuality the floating point value value 0.1000000000000000055511151231257827021181583404541015625, and thus 1 divided by it is ever so slightly smaller than 10. Nevertheless, fpround(1 / fpround(1 / 10)) = 10 exactly.

    I found out about this recently because in Polars I defined a // b for floats to be (a / b).floor(), which does return 10 for this computation. Since Python's correctly-rounded division is rather expensive, I chose to stick to this (more context: https://github.com/pola-rs/polars/issues/14596#issuecomment-...).

  • Apache Arrow

    Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

  • Project mention: How moving from Pandas to Polars made me write better code without writing better code | dev.to | 2024-03-05

    In comes Polars: a brand new dataframe library, or how the author Ritchie Vink describes it... a query engine with a dataframe frontend. Polars is built on top of the Arrow memory format and is written in Rust, which is a modern performant and memory-safe systems programming language similar to C/C++.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • arrow

    🏹 Better dates & times for Python (by arrow-py)

  • cudf

    cuDF - GPU DataFrame Library

  • Project mention: A Polars exploration into Kedro | dev.to | 2023-05-17

    The interesting thing about Polars is that it does not try to be a drop-in replacement to pandas, like Dask, cuDF, or Modin, and instead has its own expressive API. Despite being a young project, it quickly got popular thanks to its easy installation process and its “lightning fast” performance.

  • Kategory

    Λrrow - Functional companion to Kotlin's Standard Library (by arrow-kt)

  • Project mention: Java 21 makes me like Java again | news.ycombinator.com | 2023-09-16

    Yeah, it has nice funcional capabilities and libraries (like Arrow[0]).

    [0]: https://arrow-kt.io

  • arrow-datafusion

    Apache DataFusion SQL Query Engine

  • Project mention: Velox: Meta's Unified Execution Engine [pdf] | news.ycombinator.com | 2024-03-25

    Python's Substrait seems like the biggest/most-used competitor-ish out there. I'd love some compare & contrast; my sense is that Substrait has a smaller ambition, and more wants to be a language for talking about execution rather than a full on execution engine. https://github.com/substrait-io/substrait

    We can also see from the DataFusion discussion that they too see themselves as a bit of a Velox competitor. https://github.com/apache/arrow-datafusion/discussions/6441

  • roapi

    Create full-fledged APIs for slowly moving datasets without writing a single line of code.

  • Project mention: Full-fledged APIs for slowly moving datasets without writing code | news.ycombinator.com | 2023-10-25
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • LakeSoul

    LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

  • arrow-ballista

    Apache Arrow Ballista Distributed Query Engine

  • Project mention: Polars | news.ycombinator.com | 2024-01-08

    Not super on topic because this is all immature and not integrated with one another yet, but there is a scaled-out rust data-frames-on-arrow implementation called ballista that could maybe? form the backend of a polars scale out approach: https://github.com/apache/arrow-ballista

  • react-archer

    🏹 Draw arrows between React elements 🖋

  • vscode-data-preview

    Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files

  • ustore

    Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️

  • r-polars

    Bring polars to R

  • Project mention: Polars R Package | news.ycombinator.com | 2024-02-08
  • Arrow 🏹

    🏹 Parse JSON with style (by freshOS)

  • arrow-datafusion-comet

    Apache Arrow DataFusion Comet Spark Accelerator

  • Project mention: Apache Arrow DataFusion Comet Spark Accelerator | news.ycombinator.com | 2024-03-07
  • duckdb-rs

    Ergonomic bindings to duckdb for Rust

  • puffin

    Serverless HTAP cloud data platform powered by Arrow × DuckDB × Iceberg (by sutoiku)

  • Project mention: Throwing lots of data at DuckDB and Athena | news.ycombinator.com | 2023-04-23

    [3] https://github.com/sutoiku/puffin

    One possible thing to look into would be whether this dataset is partitioned too much. My understanding is that the recommended file size for individual parquet files is 512MB to 1GB, whereas here they are 50MB. It would be interesting to see the impact of the partitioning strategy on these benchmarks.

    [4] https://parquet.apache.org/docs/file-format/configurations/

  • pqrs

    Command line tool for inspecting Parquet files

  • parquet-wasm

    Rust-based WebAssembly bindings to read and write Apache Parquet data

  • Project mention: Goodbye, Node.js Buffer | news.ycombinator.com | 2023-10-24

    nodejs-polars is node-specific and uses native FFI. polars can be compiled to Wasm but doesn't yet have a js API out of the box.

    As for the fastest way to serialize data to Pandas data to the browser, you should use Parquet; it's the fastest to write on the Python side and read on the JS side, while also being compressed. See https://github.com/kylebarron/parquet-wasm (full disclosure, I wrote this)

  • spark-clickhouse-connector

    Spark ClickHouse Connector build on DataSourceV2 API

  • s2protocol-rs

    Starcraft 2 Protocol Replay Reader

  • Project mention: New version of s2protocol-rs SC2Replay parsing crate | /r/starcraft2 | 2023-10-06
  • ordered-arrowverse

    A listing of all shows in the Arrowverse in watch order to ensure continuity and sensible ordering for crossover episodes

  • vinum

    Vinum is a SQL processor for Python, designed for data analysis workflows and in-memory analytics.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-03-25.

Arrow related posts

Index

What are some of the best open-source Arrow projects? This list will help you:

Project Stars
1 polars 25,837
2 Apache Arrow 13,442
3 arrow 8,546
4 cudf 7,257
5 Kategory 5,954
6 arrow-datafusion 4,924
7 roapi 3,069
8 LakeSoul 2,294
9 arrow-ballista 1,259
10 react-archer 1,063
11 vscode-data-preview 522
12 ustore 485
13 r-polars 385
14 Arrow 🏹 384
15 arrow-datafusion-comet 365
16 duckdb-rs 357
17 puffin 277
18 pqrs 245
19 parquet-wasm 223
20 spark-clickhouse-connector 167
21 s2protocol-rs 102
22 ordered-arrowverse 96
23 vinum 65
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com