Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Performance efficiency and capacity (e.g. RAM and speed), from the stats coder perspective, is not dependent on the language, but it's dependent on the packages. As /u/Farther_father mentioned, tidytable is identical to dplyr from coding perspective, but the efficiency and capacity are far better. This means that what you said about R's design or S4, Python, Julia, etc. is a fundamental misunderstanding of what is going on in the back-end, especially because Julia is known to be performant, when in fact it is the worst of the three (pandas runs out of memory while polars/tidypolars does not, dplyr runs out of memory while data.table/tidytable does not, etc. -- same language, different packages, different performance).
I'll chime in with others to say that using targets can help with the memory load as well. If you partition your data adequately (e.g. grouping by subjects), you can take advantage of the way targets maps data so it only loads what it needs to. Moreover, if you use the memory = "transient" option, it will unload objects between steps -- adding a little bit of time overhead but saving you on memory. targets and tidytable together have enabled me to work on pretty sizeable datasets while rarely running into memory issues. In fact, the only time I ran into a data memory hog was because I didn't adequately partition the data across worker nodes.