polars
DataFrames.jl
Our great sponsors
- CodiumAI - TestGPT | Generating meaningful tests for busy devs
- SonarQube - Static code analysis for 29 languages.
- ONLYOFFICE ONLYOFFICE Docs — document collaboration in your environment
- InfluxDB - Access the most powerful time series database as a service
polars | DataFrames.jl | |
---|---|---|
125 | 9 | |
17,362 | 1,596 | |
8.7% | 2.2% | |
10.0 | 6.6 | |
4 days ago | 8 days ago | |
Rust | Julia | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
polars
-
Benchmarking for Pandas and Polars Using CSV and Parquet File
e.g. https://github.com/pola-rs/polars/issues/8533
I have updated this issue https://github.com/pola-rs/polars/issues/8533, please kindly help to solve it. I have also sent similar issues to Pandas https://github.com/pandas-dev/pandas/issues/53249
-
Polars CLI is now available!
could you open up an issue in github
-
Data Engineering with Rust
https://github.com/jorgecarleitao/arrow2 https://github.com/apache/arrow-datafusion https://github.com/apache/arrow-ballista https://github.com/pola-rs/polars https://github.com/duckdb/duckdb
-
Polars query engine 0.29.0 released
A new release of polars https://github.com/pola-rs/polars/releases/tag/rs-0.29.0 query engine/ DataFrame library.
-
Test On 4 Concurrent Jobs Using Python-Polars 0.17.11 to GroupBy Billion Rows
I successfully ran four jobs with a billion rows yesterday while testing trillions of rows for more than a million files using Polars and Peaks on a step-by-step progressive basis. Previously, Polars failed on a single job, but after several bug fixes, it can now handle the workload. You can see https://github.com/pola-rs/polars/issues/7774
-
Serverless Speed: Rust vs. Go, Java, and Python in AWS Lambda Functions
Over in polars we are using some of these tricks to greatly increase the parsing of ndjson. While no official benchmarks have been done, polars ndjson reader does seem to be faster than simdjson in many scenarios.
- Polars[Query Engine/ DataFrame] 0.28.0 released :)
-
Daft: The Distributed Python Dataframe
There are also several mentions of polars:
-
Any job processing framework like Spark but in Rust?
For data frames built on Apache Arrow and: https://github.com/pola-rs/polars/
DataFrames.jl
-
IJulia: The Julia Notebook
IJulia also supports viewing and manipulating tables. To create a table, first install the DataFrames.jl package by running the following command in a new cell:
-
Machine learning with Julia - Solve Titanic competition on Kaggle and deploy trained AI model as a web service
It were just a few percents of all possible manipulations that you can do with data using DataFrames.jl library. Read more about it in the documentation.
-
Unleashing the Power of Julia: Top 5 Must-Have Packages
DataFrames.jl
-
Automate the boring stuff with Julia?
DataFrames.jl and XLSX.jl for JSON, CSV, and XLSX files
- What would it take to recreate dplyr in Python?
-
Teaching Python
Julia also has the CSV.jl library for reading/writing csv files, the DataFrames.jl library for manipulating data like pandas, and Images.jl for image processing/analysis. However, since Julia is so much newer than Python, the Julia libraries are almost never as feature rich as their Python counterparts.
-
Polars (Rust DataFrame library) join algorithm fastest in db-benchmark
Looks like it's single threaded according to this open issue: https://github.com/JuliaData/DataFrames.jl/issues/2626
What are some alternatives?
vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
arrow-datafusion - Apache Arrow DataFusion SQL Query Engine
modin - Modin: Scale your Pandas workflows by changing a single line of code
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
rust-numpy - PyO3-based Rust bindings of the NumPy C-API
db-benchmark - reproducible benchmark of database-like ops
tidypolars - Tidy interface to polars
datatable - A Python package for manipulating 2-dimensional tabular data structures
arrow2 - Transmute-free Rust library to work with the Arrow format
hdf5-rust - HDF5 for Rust
rust-csv - A CSV parser for Rust, with Serde support.
evcxr