R Data Science

Open-source R projects categorized as Data Science | Edit details

Top 8 R Data Science Projects

  • GitHub repo awesome-R

    A curated list of awesome R packages, frameworks and software.

  • GitHub repo drake

    An R-focused pipeline toolkit for reproducibility and high-performance computing (by ropensci)

    Project mention: Your impression of {targets}? (r package) | reddit.com/r/Rlanguage | 2021-05-02

    The targets package is the official successor to Drake, and has the same primary author (Will Landau). He has explained why he created targets, which includes stronger guardrails for users and better UX.

  • Scout APM

    Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.

  • GitHub repo tidyverse

    Easily install and load packages from the tidyverse

    Project mention: R packages installation error | reddit.com/r/RStudio | 2021-07-15
  • GitHub repo janitor

    simple tools for data cleaning in R

  • GitHub repo disk.frame

    Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data

    Project mention: Data cleaning/ analysis 100-200 million rows of data. Is this doable in R, or is there another program I should try instead? | reddit.com/r/rstats | 2021-10-12

    It depends on your hardware, but it should not be a problem. You might look into disk frame (https://diskframe.com) or similar packages.

  • GitHub repo targets

    Function-oriented Make-like declarative workflows for R

    Project mention: How do you manage, distribute and schedule jobs written in R? | reddit.com/r/dataengineering | 2021-10-07

    That said, you might want to check out the ‘targets’ package, which provides a DSL for specifying complex workflow descriptions in R. When repeatedly running the same jobs on changing data, this package helps ensure that only necessary work is performed (suitable intermediate results are reused), and scripts are run reproducibly. This might help with sceduling.

  • GitHub repo causalglm

    Interpretable and model-robust causal inference for heterogeneous treatment effects using generalized linear working models with targeted machine-learning

    Project mention: [Q] Sensitivity of (Causal) Inference to Nonlinear Functional Form | reddit.com/r/statistics | 2021-09-28

    Why not both? https://tlverse.org/causalglm/ (Will replace this with a more informative comment when I have free time later today)

  • Nanos

    Run Linux Software Faster and Safer than Linux with Unikernels.

  • GitHub repo COVID19Algeria

    This repository contains datasets about Coronavirus COVID-19 in Algeria with daily updates and virus evolution in the country by province, date, and other criteria that are lacking in official resources and may help researchers or doctors to analyse the disease and maintain a good state of it changes.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-10-12.


What are some of the best open-source Data Science projects in R? This list will help you:

Project Stars
1 awesome-R 4,791
2 drake 1,321
3 tidyverse 1,111
4 janitor 1,060
5 disk.frame 559
6 targets 490
7 causalglm 6
8 COVID19Algeria 1
Find remote jobs at our new job board 99remotejobs.com. There are 36 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives