polars vs Daft

polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust (by ritchie46)

Source Code

docs.pola.rs

Suggest alternative

Edit details

Daft

Distributed DataFrame for Python designed for the cloud, powered by Rust (by Eventual-Inc)

Machine Learning Python data-engineering Data Science Dataframe Distributed Computing Rust Big Data

Source Code

getdaft.io

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

polars		Daft
	Project
144	Mentions	7
26,218	Stars	1,684
2.9%	Growth	3.7%
10.0	Activity	9.8
4 days ago	Latest Commit	5 days ago
Rust	Language	Rust
MIT License	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

polars

Posts with mentions or reviews of polars. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-08.

Why Python's Integer Division Floors (2010)
1 project | news.ycombinator.com | 28 Feb 2024

This is because 0.1 is in actuality the floating point value value 0.1000000000000000055511151231257827021181583404541015625, and thus 1 divided by it is ever so slightly smaller than 10. Nevertheless, fpround(1 / fpround(1 / 10)) = 10 exactly.
I found out about this recently because in Polars I defined a // b for floats to be (a / b).floor(), which does return 10 for this computation. Since Python's correctly-rounded division is rather expensive, I chose to stick to this (more context: https://github.com/pola-rs/polars/issues/14596#issuecomment-...).
Polars
11 projects | news.ycombinator.com | 8 Jan 2024

https://github.com/pola-rs/polars/releases/tag/py-0.19.0

1 project | /r/programming | 30 Aug 2023
Stuff I Learned during Hanukkah of Data 2023
5 projects | dev.to | 18 Dec 2023

That turned out to be related to pola-rs/polars#11912, and this linked comment provided a deceptively simple solution - use PARSE_DECLTYPES when creating the connection:
Polars 0.20 Released
1 project | news.ycombinator.com | 16 Dec 2023
Segunda linguagem
3 projects | /r/brdev | 10 Dec 2023
Polars: Dataframes powered by a multithreaded query engine, written in Rust
1 project | news.ycombinator.com | 7 Dec 2023
Summing columns in remote Parquet files using DuckDB
4 projects | news.ycombinator.com | 16 Nov 2023
Polars 0.34 is released. (A query engine focussing on DataFrame front ends)
1 project | /r/u_Dazzling_Finger_8120 | 26 Oct 2023

1 project | /r/rust | 26 Oct 2023

Daft

Posts with mentions or reviews of Daft. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-02-29.

Daft: Distributed DataFrame for Python
2 projects | news.ycombinator.com | 29 Feb 2024

There are benchmarks here - https://github.com/Eventual-Inc/Daft?tab=readme-ov-file#benc.... Seems to outperform Dask by a fair bit.
Daft: A High-Performance Distributed Dataframe Library for Multimodal Data
4 projects | news.ycombinator.com | 7 Jun 2023

Hi (one of the maintainers here), that is a good suggestion! I wasn't aware of that project. I went ahead and made an issue to add `export DO_NOT_TRACK=1` as one of the variables we track! https://github.com/Eventual-Inc/Daft/issues/1015

1 project | news.ycombinator.com | 6 Jun 2023
Daft: The Distributed Python Dataframe
4 projects | /r/Python | 23 Mar 2023

We are looking at supporting other distributed backends as well - please drop by our discussion forums (https://github.com/Eventual-Inc/Daft/discussions) and drop us a message if you have any suggestions! We’d love to hear from you :)

What are some alternatives?

When comparing polars and Daft you can also consider the following projects:

vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

xvc - A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)

modin - Modin: Scale your Pandas workflows by changing a single line of code

hamilton - A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton

datafusion - Apache DataFusion SQL Query Engine

deeplake - Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

DataFrames.jl - In-memory tabular data in Julia

quokka - Making data lake work for time series

datatable - A Python package for manipulating 2-dimensional tabular data structures

lightflus - A Lightweight, Cloud-Native Stateful Distributed Dataflow Engine

Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

hamilton - Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.

polars vs vaex Daft vs xvc polars vs modin Daft vs hamilton polars vs datafusion Daft vs deeplake polars vs DataFrames.jl Daft vs quokka polars vs datatable Daft vs lightflus polars vs Apache Arrow Daft vs hamilton

Compare polars vs Daft and see what are their differences.

polars

Daft

polars

Daft

What are some alternatives?