lance
polars
lance | polars | |
---|---|---|
10 | 144 | |
3,275 | 26,218 | |
2.2% | 2.9% | |
9.8 | 10.0 | |
about 9 hours ago | 5 days ago | |
Rust | Rust | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
lance
- The Nimble File Format by Meta
-
Supabase Storage: now supports the S3 protocol
you should look at lance(https://lancedb.github.io/lance/)
-
Understanding Parquet, Iceberg and Data Lakehouses
Parquet has been the lakehouse file format of choice for nearly half a decade. But we are starting to see other contenders that are optimized more for lower latency like lance https://github.com/lancedb/lance
- FLaNK Stack Weekly for 12 June 2023
- FLaNK Stack 5-June-2023
- [Show HN] Lance is a Rust-based alternative to Parquet for ML data
-
Show HN: Lance is a Rust-based alternative to Parquet for ML data
getting bunch of 404s on the docs. for example https://eto-ai.github.io/lance/format.html (But this works: https://lancedb.github.io/lance/*)
Did you guys just pivot from eto-ai to lancedb?
-
Any job processing framework like Spark but in Rust?
For Feature Stores check out: https://github.com/eto-ai/lance
- Show HN: Lance – Deep Learning with DuckDB and Arrow
polars
-
Why Python's Integer Division Floors (2010)
This is because 0.1 is in actuality the floating point value value 0.1000000000000000055511151231257827021181583404541015625, and thus 1 divided by it is ever so slightly smaller than 10. Nevertheless, fpround(1 / fpround(1 / 10)) = 10 exactly.
I found out about this recently because in Polars I defined a // b for floats to be (a / b).floor(), which does return 10 for this computation. Since Python's correctly-rounded division is rather expensive, I chose to stick to this (more context: https://github.com/pola-rs/polars/issues/14596#issuecomment-...).
-
Polars
https://github.com/pola-rs/polars/releases/tag/py-0.19.0
-
Stuff I Learned during Hanukkah of Data 2023
That turned out to be related to pola-rs/polars#11912, and this linked comment provided a deceptively simple solution - use PARSE_DECLTYPES when creating the connection:
- Polars 0.20 Released
- Segunda linguagem
- Polars: Dataframes powered by a multithreaded query engine, written in Rust
- Summing columns in remote Parquet files using DuckDB
- Polars 0.34 is released. (A query engine focussing on DataFrame front ends)
What are some alternatives?
roop - one-click face swap
vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
deeplake - Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
modin - Modin: Scale your Pandas workflows by changing a single line of code
Lixur - Lixur is an open-sourced project that seeks to build a scalable, feeless, decentralized, quantum-secure, and easy-to-use blockchain with smart, and intelligent (A.I.) contract functionality.
datafusion - Apache DataFusion SQL Query Engine
Rio - A hardware-accelerated GPU terminal emulator focusing to run in desktops and browsers.
DataFrames.jl - In-memory tabular data in Julia
chatdocs - Chat with your documents offline using AI.
datatable - A Python package for manipulating 2-dimensional tabular data structures
scratch-pdf-bot - Prototyping a question and answer bot over PDFs
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing