Experience with heap bloat

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

loadtxt

2 5 0.0 Rust

~60-300x faster than numpy.loadtxt

Amdahl's Law will catch up with you really fast as you add threads with this strategy, but it's simple and is amenable to formats where you may have a delimiter in the middle of a record. For situations where you need maximum scaling and don't have the possibility of delimiters scattered into records, you can use the strategy I used to implement a faster numpy.loadtxt: https://github.com/saethlin/loadtxt/blob/master/src/inner.rs#L84 The general idea is that you divide the file among thread boundaries by splitting it on byte boundaries, then seeking from that byte offset to the end of the next record. This gets you non-interleaved sections so there's no duplicate parsing.

polars

144 26,218 10.0 Rust

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

I don't use arrows csv parser. This is the code I am talking of https://github.com/ritchie46/polars/blob/master/polars/polars-io/src/fork/csv.rs

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
jemalloc

34 9,046 8.3 C

It looks like jemalloc will use madvise where appropriate to tell the OS it doesn't need pages resident it memory. Ctrl-f MADV_DONTNEED: https://github.com/jemalloc/jemalloc/blob/a943172b732e65da34a19469f31cd3ec70cf05b0/src/pages.c

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Why Python's Integer Division Floors (2010)

1 project | news.ycombinator.com | 28 Feb 2024
Polars 0.20 Released

1 project | news.ycombinator.com | 16 Dec 2023
Polars: Dataframes powered by a multithreaded query engine, written in Rust

1 project | news.ycombinator.com | 7 Dec 2023
Polars 0.34 is released. (A query engine focussing on DataFrame front ends)

1 project | /r/u_Dazzling_Finger_8120 | 26 Oct 2023
Polars 0.34 is released. (A query engine focussing on DataFrame front ends)

1 project | /r/rust | 26 Oct 2023

Experience with heap bloat

This page summarizes the projects mentioned and recommended in the original post on /r/rust
dataframe-library Dataframe Dataframes Rust
Post date: 22 Jan 2021

loadtxt

polars

InfluxDB

jemalloc

Related posts

Why Python's Integer Division Floors (2010)

Polars 0.20 Released

Polars: Dataframes powered by a multithreaded query engine, written in Rust

Polars 0.34 is released. (A query engine focussing on DataFrame front ends)

Polars 0.34 is released. (A query engine focussing on DataFrame front ends)

Experience with heap bloat

This page summarizes the projects mentioned and recommended in the original post on /r/rust dataframe-library Dataframe Dataframes Rust Post date: 22 Jan 2021

loadtxt

polars

InfluxDB

jemalloc

Related posts

Why Python's Integer Division Floors (2010)

Polars 0.20 Released

Polars: Dataframes powered by a multithreaded query engine, written in Rust

Polars 0.34 is released. (A query engine focussing on DataFrame front ends)

Polars 0.34 is released. (A query engine focussing on DataFrame front ends)

This page summarizes the projects mentioned and recommended in the original post on /r/rust
dataframe-library Dataframe Dataframes Rust
Post date: 22 Jan 2021