bacon
Apache Arrow
Our great sponsors
bacon | Apache Arrow | |
---|---|---|
2 | 70 | |
175 | 12,498 | |
- | 3.4% | |
0.0 | 9.8 | |
21 days ago | about 23 hours ago | |
Rust | C++ | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
bacon
- Any role that Rust could have in the Data world (Big Data, Data Science, Machine learning, etc.)?
-
Scientific Computing in Rust
See the github repo here https://github.com/aftix/bacon
Apache Arrow
- Show HN: Udsv.js – A faster CSV parser in 5KB (min)
-
Interacting with Amazon S3 using AWS Data Wrangler (awswrangler) SDK for Pandas: A Comprehensive Guide
AWS Data Wrangler is a Python library that simplifies the process of interacting with various AWS services, built on top of some useful data tools and open-source projects such as Pandas, Apache Arrow and Boto3. It offers streamlined functions to connect to, retrieve, transform, and load data from AWS services, with a strong focus on Amazon S3.
-
Cap'n Proto 1.0
Worker should really adopt Apache Arrow, which has a much bigger ecosystem.
-
C++ Jobs - Q3 2023
Apache Arrow
-
CSV or Parquet File Format
In fact I have asked Apache Github how to read select column of particular row group of a parquet file. https://github.com/apache/arrow/issues/35688
-
A Polars exploration into Kedro
Polars is an open-source library for Python, Rust, and NodeJS that provides in-memory dataframes, out-of-core processing capabilities, and more. It is based on the Rust implementation of the Apache Arrow columnar data format (you can read more about Arrow on my earlier blog post “Demystifying Apache Arrow”), and it is optimised to be blazing fast.
-
What do you do to achieve this catastrophy?
it's actually not that uncommon for tools that are built to support many languages. Just look at apache arrow https://github.com/apache/arrow
-
Making Python 100x faster with less than 100 lines of Rust
Apache Arrow (https://arrow.apache.org/) is built exactly around this idea: it's a library for managing the in-memory representation of large datasets.
-
Show HN: Up to 100x Faster FastAPI with simdjson and io_uring on Linux 5.19
If anything you'd probably want to send it in Arrow[1] format. CSV's don't even preserve data types.
- Google Python Style Guide
What are some alternatives?
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
h5py - HDF5 for Python -- The h5py package is a Pythonic interface to the HDF5 binary data format.
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
FlatBuffers - FlatBuffers: Memory Efficient Serialization Library
polars - Fast multi-threaded, hybrid-out-of-core query engine focussing on DataFrame front-ends
ClickHouse - ClickHouse® is a free analytics DBMS for big data
beam - Apache Beam is a unified programming model for Batch and Streaming data processing.
duckdb_and_r - My thoughts and examples on DuckDB and R
ta-lib-python - Python wrapper for TA-Lib (http://ta-lib.org/).
Apache HBase - Apache HBase
Redis - Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.
Apache Hive - Apache Hive