SaaSHub helps you find the best software and product alternatives Learn more →
Top 5 Python apache-arrow Projects
-
AWS Data Wrangler
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
-
functime
Time-series machine learning at scale. Built with Polars for embarrassingly parallel feature extraction and forecasts on panel data.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Project mention: Read files from s3 using Pandas/s3fs or AWS Data Wrangler? | /r/dataengineering | 2023-12-06I had no problem with awswrangler (https://github.com/aws/aws-sdk-pandas) and it supports reading and writing partitions which was really helpful and a few other optimizations that made it a great tool
there's a whole ecosystem in Python originally developed for high energy physics data processing: https://github.com/scikit-hep/awkward all because Numpy demands square N-dimensional array
Same technique used everywhere, here's a simple Julia pkg for the same thing: https://github.com/JuliaArrays/ArraysOfArrays.jl/blob/3a6f5b...
But Julia at least has the decency to just support ragged Vector{Vector} out of the box, and it's not that slow
Project mention: Parquet-WASM: Rust-based WebAssembly bindings to read and write Parquet data | news.ycombinator.com | 2024-04-22I'll let Kyle chime in but I tested it a few months ago with millions of polygons on an M2 16GB of RAM laptop and it worked very well.
There is a library by the same author called lonboard that provides the JS bits inside JupyterLab. https://github.com/developmentseed/lonboard
I think it is based on the Kepler.gl / Deck.gl data loaders that go straight to GPU from network.
Project mention: Unified storage framework for the entire machine learning lifecycle | news.ycombinator.com | 2024-02-28
Python apache-arrow related posts
- I agree that Arrow Tables are great, but we decided to keep the library focused on the Pandas interface. [wont implement]
- Automate some wrangling and data visualization in Python
- Redshift API vs. other ways to connect?
- Parquet files
- Reading s3 file data with Python lambda function
- A guide to load (almost) anything into a DataFrame
- Best way to install pandas and bumpy to AWS Lanbda
-
A note from our sponsor - SaaSHub
www.saashub.com | 24 Apr 2024
Index
What are some of the best open-source apache-arrow projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | AWS Data Wrangler | 3,797 |
2 | functime | 891 |
3 | awkward | 792 |
4 | lonboard | 385 |
5 | space | 134 |
Sponsored