|6 days ago||2 months ago|
|Apache License 2.0||-|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
How to use Spark and Pandas to prepare big data
3 projects | dev.to | 10 May 2022
Pandas user-defined function (UDF) is built on top of Apache Arrow. Pandas UDF improves data performance by allowing developers to scale their workloads and leverage Panda’s APIs in Apache Spark. Pandas UDF works with Pandas APIs inside the function, and works with Apache Arrow to exchange data.
Spice.ai v0.6-alpha is now available!
1 project | reddit.com/r/spiceai | 21 Apr 2022
Building upon the Apache Arrow support in v0.6-alpha, Spice.ai now includes new Apache Arrow data processor and Apache Arrow Flight data connector components! Together, these create a high-performance bulk-data transport directly into the Spice.ai ML engine. Coupled with big data systems from the Apache Arrow ecosystem like Hive, Drill, Spark, Snowflake, and BigQuery, it's now easier than ever to combine big data with Spice.ai.
Arrowdantic 0.1.0 released
3 projects | reddit.com/r/Python | 16 Apr 2022
Arrowdantic is a small Python library backed by a mature Rust implementation of Apache Arrow that can interoperate with * Parquet * Apache Arrow and * ODBC (databases).
Introducing Spice.xyz - Data and AI infrastructure for web3
1 project | reddit.com/r/ethfinance | 14 Apr 2022
🔥 Some cool things for eth/finance. We have per-block pool reserve data for Uniswap and Sushiswap and a Python SDK which lets you get data into Pandas, NumPy in 4 lines of code so you can use all the Python ecosystem of finance libraries you are used to. It uses Apache Arrow as the transport, so much faster than JSON. Here's an example Kaggle notebook: https://www.kaggle.com/code/spiceluke/spice-xyz-ethereum-blocks
C++ Jobs - Q2 2022
4 projects | reddit.com/r/cpp | 3 Apr 2022
Technologies: Apache Arrow, Flatbuffers, C++ Actor Framework, Linux, Docker, Kubernetes, Wireguard
What are the differences between feather and parquet?
1 project | reddit.com/r/codehunter | 1 Apr 2022
Both are columnar (disk-)storage formats for use in data analysis systems. Both are integrated within Apache Arrow (pyarrow package for python) and aredesigned to correspond with Arrow as a columnar in-memory analytics layer.
Intro to Apache Arrow
1 project | dev.to | 27 Feb 2022
More information about Apache Arrow can be found at https://arrow.apache.org/ Leave a comment if you have any questions or feedback.
Apache Arrow Feature Parity Timeline?
2 projects | reddit.com/r/rust | 21 Feb 2022
Apache Arrow Feature Matrix
Apache Arrow Flight SQL: Accelerating Database Access
5 projects | news.ycombinator.com | 16 Feb 2022
I'm not tuned into Arrow all that much. I've some of the about and stuff, but the code examples (to my eye) look really complex and complicated. 
Could someone point me to a more glossy "arrow flight sql for dummies" examples? What I'm gleaning from this (or am I wrong?) is you could use a JDBC driver + arrow jdbc client and write... SQL? Or is it something a lot different?
Is this the sort of thing where you could just add a plugin to postgres and be arrowified or something?5 projects | news.ycombinator.com | 16 Feb 2022
Comparing SQLite, DuckDB and Arrow
5 projects | news.ycombinator.com | 27 Oct 2021
I enjoyed this comparison, thanks! Here is a related generally R-centric comparison that you might enjoy of DuckDB, dplyr, data.table, etc. applied to five data-sciency problems I wrote up a few months ago: https://github.com/bwlewis/duckdb_and_r
What are some alternatives?
h5py - HDF5 for Python -- The h5py package is a Pythonic interface to the HDF5 binary data format.
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
polars - Fast multi-threaded DataFrame library in Rust | Python | Node.js
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
ta-lib - Python wrapper for TA-Lib (http://ta-lib.org/).
arquero - Query processing and transformation of array-backed data tables.
Apache HBase - Apache HBase
ClickHouse - ClickHouse® is a free analytics DBMS for big data
spark-rapids - Spark RAPIDS plugin - accelerate Apache Spark with GPUs
beam - Apache Beam is a unified programming model for Batch and Streaming data processing.
arrow-julia - Official Julia implementation of Apache Arrow
Apache Hive - Apache Hive