disk.frame vs targets

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

disk.frame		targets
	Project
5	Mentions	10
592	Stars	866
0.5%	Growth	2.5%
0.0	Activity	9.7
3 months ago	Latest Commit	7 days ago
R	Language	R
GNU General Public License v3.0 or later	License	GNU General Public License v3.0 or later

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

disk.frame

Posts with mentions or reviews of disk.frame. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-10-12.

Do you code from memory? Or do you reference things?
1 project | /r/rstats | 31 Mar 2022

Say hello to disk.frame.
How can I read in only two columns from a massive 10+ GB tab file?
1 project | /r/rstats | 20 Jan 2022
Data cleaning/ analysis 100-200 million rows of data. Is this doable in R, or is there another program I should try instead?
2 projects | /r/rstats | 12 Oct 2021

It depends on your hardware, but it should not be a problem. You might look into disk frame (https://diskframe.com) or similar packages.
is it possible to have my enviroment objects and work with them on my local drive instead of RAM?
1 project | /r/Rlanguage | 3 Jul 2021

If that doesn't work, the disk.frame package might help. It is new-ish and not common, but does seem to work with data on disk rather than in memory
We Test PCIe 4.0 Storage: The AnandTech 2021 SSD Benchmark Suite
1 project | news.ycombinator.com | 2 Feb 2021

> The speeds were just stunning to say the least at 15GB/s.
That is amazing. That is around DDR4-1866 speeds, and not far from DDR4-2666 (~21 GB/s). At those speeds I would happily work with dataframes sitting on the disk rather than in memory [1, 2]. Did you benchmark RAID 0 with less than four disks?
[1] R: https://github.com/xiaodaigh/disk.frame

targets

Posts with mentions or reviews of targets. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-09-07.

Advice on Best Practices
1 project | /r/RStudio | 27 Sep 2022

Is this it https://github.com/ropensci/targets?
Does anyone else feel in a tricky spot about their use of R?
3 projects | /r/rstats | 7 Sep 2022

I'll chime in with others to say that using targets can help with the memory load as well. If you partition your data adequately (e.g. grouping by subjects), you can take advantage of the way targets maps data so it only loads what it needs to. Moreover, if you use the memory = "transient" option, it will unload objects between steps -- adding a little bit of time overhead but saving you on memory. targets and tidytable together have enabled me to work on pretty sizeable datasets while rarely running into memory issues. In fact, the only time I ran into a data memory hog was because I didn't adequately partition the data across worker nodes.
What are your favorite R Libraries?
1 project | /r/rstats | 1 Aug 2022

targets
Is there a better way to update an entire series of scripts?
1 project | /r/rstats | 11 Apr 2022

I highly recommend the holy grail of workflow orchestrators / executors in the R ecosystem: targets.
The new Drake ropensci targets: Function-oriented Make-like declarative workflows for R {R}
2 projects | /r/Sciatro | 15 Nov 2021
How do you manage, distribute and schedule jobs written in R?
1 project | /r/dataengineering | 7 Oct 2021

That said, you might want to check out the ‘targets’ package, which provides a DSL for specifying complex workflow descriptions in R. When repeatedly running the same jobs on changing data, this package helps ensure that only necessary work is performed (suitable intermediate results are reused), and scripts are run reproducibly. This might help with sceduling.
How do I do something like this as a parallel programming in R?
1 project | /r/rstats | 29 Sep 2021

It may be worth it to put these individual steps into a targets pipeline. targets is designed to support parallelization with future and make it easier to visualize downstream dependencies.
Tips re: workflow, organization, file hygiene and similar?
1 project | /r/rstats | 19 Aug 2021

Given your requirements, I recommend you check out ‘targets’, which specifically addresses the needs of reusable workflows in R, and it seems like it fits your requirements to a T.
Your impression of {targets}? (r package)
3 projects | /r/Rlanguage | 2 May 2021

The targets package is the official successor to Drake, and has the same primary author (Will Landau). He has explained why he created targets, which includes stronger guardrails for users and better UX.
Data engineering with R?
2 projects | /r/rstats | 18 Apr 2021

I use it for ETL. I use targets as the workflow management software, and, like others, have a cron job set up to run nightly builds.

Compare disk.frame vs targets and see what are their differences.

disk.frame

targets

disk.frame

targets