fasteR VS db-benchmark

Compare fasteR vs db-benchmark and see what are their differences.

fasteR

Fast Lane to Learning R! (by matloff)

db-benchmark

reproducible benchmark of database-like ops (by h2oai)
Our great sponsors
  • InfluxDB - Build time-series-based applications quickly and at scale.
  • Scout APM - Truly a developer’s best friend
  • SonarQube - Static code analysis for 29 languages.
  • Zigi - The context switching struggle is real
fasteR db-benchmark
11 79
661 248
- 0.8%
5.6 0.0
2 months ago 3 months ago
R R
- Mozilla Public License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

fasteR

Posts with mentions or reviews of fasteR. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-09-08.

db-benchmark

Posts with mentions or reviews of db-benchmark. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-11-25.
  • Tutorial on Intro to Rust Programming
    5 projects | dev.to | 25 Nov 2022
    There has been an upward trend in opensource tools written in Rust with interfaces to python eg: pydantic (moved to Rust in the recent release), polars which is very fast as indicated in the H2Oai benchmarks.
  • How do I work with GIGANTIC csv files (20-100 gigabytes)?
    3 projects | reddit.com/r/bioinformatics | 15 Nov 2022
  • PostgresML is 8-40x faster than Python HTTP microservices
    5 projects | news.ycombinator.com | 19 Oct 2022
  • Polars vs ndarray performance
    2 projects | reddit.com/r/rust | 16 Oct 2022
    I've been playing with data analytics and ml in rust for the last couple of weeks. A typical ML job requires transforming some data to feed the ml model to the then train the model. For ML I've been using linfa (https://github.com/rust-ml/linfa) which is surprisingly nice. I've been experimenting with ndarray and polars for data transformation (linfa uses ndarray) - from a UX standpoint. I'm pretty surprised by polars' performance (https://h2oai.github.io/db-benchmark/), which sits on top of arrow2, and it's definitely a great candidate for OLAP tasks. But I couldn't find any comparison between ndarray and polars, has anyone had any meaningful experience with the two or/and can point me to a benchmark comparison?
  • Rust is showing a lot of promise in the DataFrame / tabular data space
    9 projects | reddit.com/r/rust | 4 Oct 2022
    [Polars](https://github.com/pola-rs/polars) is a blazing fast DataFrame library with a beautiful user interface and an awesome getting started guide. The impressive h2o benchmark results have gotten Polars a lot of users.
  • Benchmarking Pandas, CuDF, Modin, Apache Arrow and Spark on a Billion Taxi Rides dataset
    2 projects | reddit.com/r/Python | 21 Sep 2022
    And more benchmarks: https://h2oai.github.io/db-benchmark/. If you are looking for performant dataframes, ideomatic polars typically tops the benchmarks.
  • Hiring an R coder to improve efficiency of code?
    3 projects | reddit.com/r/rstats | 14 Sep 2022
    base-R is not particularly fast. Use data.table and it's fast assignment/grouping/aggregation
  • Does anyone else feel in a tricky spot about their use of R?
    3 projects | reddit.com/r/rstats | 7 Sep 2022
    Performance efficiency and capacity (e.g. RAM and speed), from the stats coder perspective, is not dependent on the language, but it's dependent on the packages. As /u/Farther_father mentioned, tidytable is identical to dplyr from coding perspective, but the efficiency and capacity are far better. This means that what you said about R's design or S4, Python, Julia, etc. is a fundamental misunderstanding of what is going on in the back-end, especially because Julia is known to be performant, when in fact it is the worst of the three (pandas runs out of memory while polars/tidypolars does not, dplyr runs out of memory while data.table/tidytable does not, etc. -- same language, different packages, different performance).
    3 projects | reddit.com/r/rstats | 7 Sep 2022
    yeah, `data.table` is a huge help for me as well. Most of what I do (my largest data sets are ~ 3 -5Gb) is even much faster than when I tried it with pandas/python - which agrees well with https://h2oai.github.io/db-benchmark/
  • Don't waste your time on Julia
    2 projects | reddit.com/r/rstats | 14 Aug 2022
    Certainly not compared to Python's polars and R's data.table https://h2oai.github.io/db-benchmark (where DuckDB, cuDF, ClickHouse, and DataFrames.jl are Julia) -- and the two are even tidy, with tidypolars and tidytable. So this necessitates Julia to catch up on two layers to Python and R.

What are some alternatives?

When comparing fasteR and db-benchmark you can also consider the following projects:

arrow-datafusion - Apache Arrow DataFusion SQL Query Engine

polars - Fast multi-threaded, hybrid-streaming DataFrame library in Rust | Python | Node.js

Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

databend - A powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com

DataFramesMeta.jl - Metaprogramming tools for DataFrames

sktime - A unified framework for machine learning with time series

disk.frame - Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data

arrow2 - Transmute-free Rust library to work with the Arrow format

DataFrame - C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage

Preql - An interpreted relational query language that compiles to SQL.

julia - The Julia Programming Language

datatable - A Python package for manipulating 2-dimensional tabular data structures