R Data Science

Open-source R projects categorized as Data Science

Top 23 R Data Science Projects

  • awesome-R

    A curated list of awesome R packages, frameworks and software.

    Project mention: Where to learn R? | /r/rprogramming | 2023-05-07
  • r4ds

    R for data science: a book

    Project mention: Learning R & statistics | /r/Rlanguage | 2023-07-11

    One of the best free resources is the R4DS book by Hadley Wickham. You should make sure you start with the in progress second edition. https://r4ds.hadley.nz/

  • SonarLint

    Clean code begins in your IDE with SonarLint. Up your coding game and discover issues early. SonarLint is a free plugin that helps you find & fix bugs and security issues from the moment you start writing code. Install from your favorite IDE marketplace today.

  • DataScienceR

    a curated list of R tutorials for Data Science, NLP and Machine Learning

  • tidyverse

    Easily install and load packages from the tidyverse

    Project mention: Discrimination of R in companies | /r/datascience | 2022-11-20

    What’s the original? Tidyverse, as it exists now, had it’s initial release in 2016. Pandas initial release was in 2009. AFAIK, ggplot2 and reshape are the only individual Tidyverse packages older than that.

  • drake

    An R-focused pipeline toolkit for reproducibility and high-performance computing (by ropensci)

  • janitor

    simple tools for data cleaning in R

    Project mention: Working with columns names that are numbers (in this case, years) | /r/RStudio | 2022-10-15

    I would just clean the names and work with those. Then there is no need to use backticks. Read about the function clean_names in the janitor vignette: https://github.com/sfirke/janitor

  • mlr3

    mlr3: Machine Learning in R - next generation

    Project mention: Trying to create a KNN model, takes too long!! | /r/rstats | 2023-04-12

    mlr3 would be a competing modern framework to tidymodels that is also used. I know little about it except that it exists.

  • Mergify

    Updating dependencies is time-consuming.. Solutions like Dependabot or Renovate update but don't merge dependencies. You need to do it manually while it could be fully automated! Add a Merge Queue to your workflow and stop caring about PR management & merging. Try Mergify for free.

  • targets

    Function-oriented Make-like declarative workflows for R

  • engsoccerdata

    English and European soccer results 1871-2022

  • disk.frame

    Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data

  • modeltime

    Modeltime unlocks time series forecast models and machine learning in one framework

    Project mention: Cross Validating Time Series Models in R | /r/rstats | 2023-03-16

    Check out the ModelTime package: https://business-science.github.io/modeltime/

  • voice-gender

    Gender recognition by voice and speech analysis

    Project mention: I need help for a project, Trans-voice database or library (vocal training/voice recognition) | /r/transvoice | 2022-10-17
  • fastverse

    An Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R

  • bruceR

    📦 BRoadly Useful Convenient and Efficient R functions that BRing Users Concise and Elegant R data analyses.

    Project mention: Help with Lavaan moderated mediation analysis in R | /r/rstats | 2023-04-11
  • targets-tutorial

    Short course on the targets R package

  • tweetbotornot2

    🔍🐦🤖 Detect Twitter Bots!

  • gittargets

    Data version control for reproducible analysis pipelines in R with {targets}.

    Project mention: Feedback needed: building Git for data that commits only diffs (for storage efficiency on large repositories), even without full checkouts of the datasets | /r/datascience | 2023-05-27

    This is was attempted in an R package called gittargets

  • targets-minimal

    A minimal example data analysis project with the targets R package

  • priceR

    Economics and Pricing in R

  • aorsf

    Accelerated Oblique Random Survival Forests

    Project mention: Peer-Reviewing Statistical R Packages | /r/CompSocial | 2022-11-30

    aorsf: Accelerated Oblique Random Survival Forests, by Byron Jaeger, Nicholas Pajewski, and Sawyer Welden, reviewed by Lukas Burk, Marvin N. Wright, edited by Toby Dylan Hocking

  • causalglm

    Interpretable and model-robust causal inference for heterogeneous treatment effects using generalized linear working models with targeted machine-learning

  • R-Fundamentals

    D-Lab's 4 part, 8 hour introduction to R Fundamentals. Learn how to create variables and functions, manipulate data frames, make visualizations, use control flow structures, and more, using R in RStudio.

  • R-Guide

    R Guide

  • InfluxDB

    Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-07-11.

R Data Science related posts


What are some of the best open-source Data Science projects in R? This list will help you:

Project Stars
1 awesome-R 5,580
2 r4ds 4,119
3 DataScienceR 1,931
4 tidyverse 1,492
5 drake 1,329
6 janitor 1,293
7 mlr3 814
8 targets 799
9 engsoccerdata 729
10 disk.frame 590
11 modeltime 471
12 voice-gender 315
13 fastverse 199
14 bruceR 131
15 targets-tutorial 89
16 tweetbotornot2 87
17 gittargets 67
18 targets-minimal 57
19 priceR 50
20 aorsf 22
21 causalglm 13
22 R-Fundamentals 6
23 R-Guide 4
Collect and Analyze Billions of Data Points in Real Time
Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.