roxmltree
polars
roxmltree | polars | |
---|---|---|
4 | 144 | |
403 | 26,218 | |
- | 2.9% | |
7.3 | 10.0 | |
4 months ago | 5 days ago | |
Rust | Rust | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
roxmltree
-
What are the scenarios where "Rewrite it in Rust" didn't meet your expectations or couldn't be successfully implemented?
This is exactly what I needed when implementing xml-mut :D I have used roxmltree instead and manipulated text directly. will try to rewrite it using Xot.
-
Surprises in the Rust JSON Ecosystem
In regards to the benchmarks, It makes sense to measure serializing/deserializing for parser crates. but since we are talking about dom implementations, metrics like traversal/iteration speed or insert/modification performance would be useful. a good example is roxmltree crate (readonly xml dom) which benches traversal/iteration performance and shows that by only focusing on readonly usecases, it gains substantial performance gains.
-
What are some less popular but well-made crates you'd like others to know about?
For xml parsing, I find https://github.com/RazrFalcon/roxmltree as a really good crate. It’s fast, light, and well documented/maintained. I have so much respect for the maintainer’s approach to merging PRs and the way they consider what’s important for the crate
-
fast-float - a super-fast float parser in Rust
I understand. But I've also wrote enough parsers and performance sensitive code in Rust (ttf-parser, tiny-skia, roxmltree). And in my experience, unsafe is not needed in 99% of the cases. Even something as performance sensitive as tiny-skia is unsafe-free (with some nuances).
polars
-
Why Python's Integer Division Floors (2010)
This is because 0.1 is in actuality the floating point value value 0.1000000000000000055511151231257827021181583404541015625, and thus 1 divided by it is ever so slightly smaller than 10. Nevertheless, fpround(1 / fpround(1 / 10)) = 10 exactly.
I found out about this recently because in Polars I defined a // b for floats to be (a / b).floor(), which does return 10 for this computation. Since Python's correctly-rounded division is rather expensive, I chose to stick to this (more context: https://github.com/pola-rs/polars/issues/14596#issuecomment-...).
-
Polars
https://github.com/pola-rs/polars/releases/tag/py-0.19.0
-
Stuff I Learned during Hanukkah of Data 2023
That turned out to be related to pola-rs/polars#11912, and this linked comment provided a deceptively simple solution - use PARSE_DECLTYPES when creating the connection:
- Polars 0.20 Released
- Segunda linguagem
- Polars: Dataframes powered by a multithreaded query engine, written in Rust
- Summing columns in remote Parquet files using DuckDB
- Polars 0.34 is released. (A query engine focussing on DataFrame front ends)
What are some alternatives?
fast-float-rust - Super-fast float parser in Rust (now part of Rust core)
vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
json - Strongly typed JSON library for Rust
modin - Modin: Scale your Pandas workflows by changing a single line of code
Clipper2 - Polygon Clipping and Offsetting - C++, C# and Delphi
datafusion - Apache DataFusion SQL Query Engine
quick-xml - Rust high performance xml reader and writer
DataFrames.jl - In-memory tabular data in Julia
log4rs - A highly configurable logging framework for Rust
datatable - A Python package for manipulating 2-dimensional tabular data structures
rust - Empowering everyone to build reliable and efficient software.
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing