Does anyone else feel in a tricky spot about their use of R?

This page summarizes the projects mentioned and recommended in the original post on /r/rstats

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • db-benchmark

    reproducible benchmark of database-like ops

  • Performance efficiency and capacity (e.g. RAM and speed), from the stats coder perspective, is not dependent on the language, but it's dependent on the packages. As /u/Farther_father mentioned, tidytable is identical to dplyr from coding perspective, but the efficiency and capacity are far better. This means that what you said about R's design or S4, Python, Julia, etc. is a fundamental misunderstanding of what is going on in the back-end, especially because Julia is known to be performant, when in fact it is the worst of the three (pandas runs out of memory while polars/tidypolars does not, dplyr runs out of memory while data.table/tidytable does not, etc. -- same language, different packages, different performance).

  • targets

    Function-oriented Make-like declarative workflows for R

  • I'll chime in with others to say that using targets can help with the memory load as well. If you partition your data adequately (e.g. grouping by subjects), you can take advantage of the way targets maps data so it only loads what it needs to. Moreover, if you use the memory = "transient" option, it will unload objects between steps -- adding a little bit of time overhead but saving you on memory. targets and tidytable together have enabled me to work on pretty sizeable datasets while rarely running into memory issues. In fact, the only time I ran into a data memory hog was because I didn't adequately partition the data across worker nodes.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts