orjson
polars
orjson | polars | |
---|---|---|
22 | 151 | |
7,137 | 34,403 | |
2.7% | 1.6% | |
8.0 | 10.0 | |
5 days ago | 3 days ago | |
Python | Rust | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
orjson
-
Web scraping of a dynamic website using Python with HTTP Client
The library already has support for an HTTP client that allows bypassing Cloudflare - CurlImpersonateHttpClient. Since we have to work with JSON responses we could use parsel_crawler added in version 0.3.0, but I think this is excessive for such tasks, besides I like the high speed of orjson.. Therefore, we'll need to implement our crawler rather than using one of the ready-made ones.
- orjson: Fast, correct Python JSON lib supporting dataclasses, datetimes, NumPy
-
JSON extra uses orjson instead of ujson
(https://github.com/ijl/orjson). In this implementation, the same JSON
-
This Week In Python
orjson – Fast, correct Python JSON library
- Orjson: Fast, correct Python JSON library
- JSON dans les projets data science : Trucs & Astuces
-
JSON in data science projects: tips & tricks
orjson is the fastest JSON library available for python. It natively manages dataclass objects, datetime, numpy and UUID objects.
- Segunda linguagem
-
Litestar 2.0
As we began venturing down that road, a few things emerged that would constitute significant changes to some of the core parts of Litestar, but there were two things in particular that started a chain reaction of changes by opening up further possibilities: The new DTOs and our switch from orjson to msgspec.
- orjson: Fast, correct Python JSON lib (supports dataclasses, datetimes, numpy)
polars
-
ClickHouse raises $350M Series C
Thanks for creating this issue, it is worth investigating!
I see you also created similar issues in Polars: https://github.com/pola-rs/polars/issues/17932 and DuckDB: https://github.com/duckdb/duckdb/issues/17066
ClickHouse has a built-in memory tracker, so even if there is not enough memory, it will stop the query and send an exception to the client, instead of crashing. It also allows fair sharing of memory between different workloads.
You need to provide more info on the issue for reproduction, e.g., how to fill the tables. 16 GB of memory should be enough even for a CROSS JOIN between a 10 billion-row and a 100-row table, because it is processed in a streaming fashion without accumulating a large amount of data in memory. The same should be true for a merge join.
However, there are places when a large buffer might be needed. For example, if you insert data into a table backed by S3 storage, it requires a buffer that can be in the order of 500 MB.
There is a possibility that your machine has 16 GB of memory, but most of it is consumed by Chrome, Slack, or Safari, and not much is left for ClickHouse server.
-
Debugging Data Pipelines: From Memory to File with WebDAV
(* There might be no file or even file-like thing. You may be working with data frames (Pandas or Polars), event streams, and whatnot.)
-
Using Polars in Rust for high-performance data analysis
If you want to get into Polars, the library is very well documented, and I’d recommend you check out their getting started tutorial, their API docs, and when you’re all set up, you can also check out their Cookbooks to learn about many of the standard operations within Polars.
-
Why Polars rewrote its Arrow string data type
This is false. The polars api has used smart string for a long time.
https://github.com/pola-rs/polars/blob/32a2325b55f9bce81d019...
- Polars releases v1.0.0 – a Pandas alternative
- Polars Releases v1.0.0
- Big Data Is Dead
-
Why Python's Integer Division Floors (2010)
This is because 0.1 is in actuality the floating point value value 0.1000000000000000055511151231257827021181583404541015625, and thus 1 divided by it is ever so slightly smaller than 10. Nevertheless, fpround(1 / fpround(1 / 10)) = 10 exactly.
I found out about this recently because in Polars I defined a // b for floats to be (a / b).floor(), which does return 10 for this computation. Since Python's correctly-rounded division is rather expensive, I chose to stick to this (more context: https://github.com/pola-rs/polars/issues/14596#issuecomment-...).
-
Polars
https://github.com/pola-rs/polars/releases/tag/py-0.19.0
-
Stuff I Learned during Hanukkah of Data 2023
That turned out to be related to pola-rs/polars#11912, and this linked comment provided a deceptively simple solution - use PARSE_DECLTYPES when creating the connection:
What are some alternatives?
msgspec - A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
Daft - Distributed query engine providing simple and reliable data processing for any modality and scale
ujson
DataFrames.jl - In-memory tabular data in Julia
ormsgpack - Msgpack serialization/deserialization library for Python, written in Rust using PyO3. Reboot of orjson. msgpack.org[Python]
vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀