Top 23 R Open-Source Projects

  • ML-For-Beginners

    12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

    Project mention: Good coding groups for black women? | news.ycombinator.com | 2024-01-13

    - https://github.com/microsoft/ML-For-Beginners

    Also check out this list Pitt puts out every year:

  • Apache Spark

    Apache Spark - A unified analytics engine for large-scale data processing

    Project mention: "xAI will open source Grok" | news.ycombinator.com | 2024-03-11
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • dash

    Data Apps & Dashboards for Python. No JavaScript Required.

    Project mention: dash VS solara - a user suggested alternative | libhunt.com/r/dash | 2023-10-13
  • Prophet

    Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

    Project mention: Moirai: A Time Series Foundation Model for Universal Forecasting | news.ycombinator.com | 2024-03-25


    "Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well."

  • LightGBM

    A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

    Project mention: SIRUS.jl: Interpretable Machine Learning via Rule Extraction | /r/Julia | 2023-06-29

    SIRUS.jl is a pure Julia implementation of the SIRUS algorithm by Bénard et al. (2021). The algorithm is a rule-based machine learning model meaning that it is fully interpretable. The algorithm does this by firstly fitting a random forests and then converting this forest to rules. Furthermore, the algorithm is stable and achieves a predictive performance that is comparable to LightGBM, a state-of-the-art gradient boosting model created by Microsoft. Interpretability, stability, and predictive performance are described in more detail below.

  • ds-cheatsheets

    List of Data Science Cheatsheets to rule the world

  • mal

    mal - Make a Lisp

    Project mention: Ask HN: Is Lisp Simple? | news.ycombinator.com | 2023-08-21

    >Would be interesting to see how the interpreter works actually...

    It's quite easy to see, there are interpeters for Lisp in like 20 lines or so.

    Here's a good one:


    (It has the full code in a link towards the bottom)

    There's also this:


  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • hugo-blox-builder

    😍 EASILY BUILD THE WEBSITE YOU WANT - NO CODE, JUST MARKDOWN BLOCKS! 使用块轻松创建任何类型的网站 - 无需代码。 一个应用程序,没有依赖项,没有 JS

  • catboost

    A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

    Project mention: CatBoost: Open-source gradient boosting library | news.ycombinator.com | 2024-03-05
  • metaflow

    :rocket: Build and manage real-life ML, AI, and data science projects with ease!

    Project mention: FLaNK Stack 05 Feb 2024 | dev.to | 2024-02-05
  • H2O

    H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

    Project mention: Really struggling with open source models | /r/LocalLLaMA | 2023-07-12

    I would use H20 if I were you. You can try out LLMs with a nice GUI. Unless you have some familiarity with the tools needed to run these projects, it can be frustrating. https://h2o.ai/

  • ggplot2

    An implementation of the Grammar of Graphics in R

    Project mention: ggplot2 | news.ycombinator.com | 2024-03-01
  • awesome-R

    A curated list of awesome R packages, frameworks and software.

    Project mention: Good coding groups for black women? | news.ycombinator.com | 2024-01-13
  • FriendsDontLetFriends

    Friends don't let friends make certain types of data visualization - What are they and why are they bad.

    Project mention: Friends don't let friends make certain types of data visualizations | news.ycombinator.com | 2023-11-19
  • papermill

    📚 Parameterize, execute, and analyze notebooks

    Project mention: Spreadsheet errors can have disastrous consequences – yet we keep making them | news.ycombinator.com | 2024-01-25

    Pandas docs > Comparison with spreadsheets: https://pandas.pydata.org/docs/getting_started/comparison/co...

    Pandas docs > I/O > Excel files: https://pandas.pydata.org/docs/user_guide/io.html#excel-file...

    nteract/papermill: https://github.com/nteract/papermill :

    > papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks. [...]

    > This opens up new opportunities for how notebooks can be used. For example:

    > - Perhaps you have a financial report that you wish to run with different values on the first or last day of a month or at the beginning or end of the year, using parameters makes this task easier.

    "The World Excel Championship is being broadcast on ESPN" (2022) https://news.ycombinator.com/item?id=32420925 :

    > Computational notebook speedrun ideas:

  • dplyr

    dplyr: A grammar of data manipulation

    Project mention: Show HN: Open-source, browser-local data exploration using DuckDB-WASM and PRQL | news.ycombinator.com | 2024-03-15

    That's great feedback, thanks!

    This tool definitely comes from a place of personal need - beyond just handling large files, I've also never really gelled well with the Excel/Google Sheet model of changing data in place as if you were editing text. I'm a Data Scientist and always preferred the chained data transforms you see in things like dplyr (https://dplyr.tidyverse.org/) or Polars (https://pola.rs/) and I feel this tool maps very closely to the chained model.

    Also, thank you for the feature requests! Those would all be very useful - we'll put them on the roadmap.

  • r4ds

    R for data science: a book

    Project mention: Ask HN: Learning Maths from the Ground Up | news.ycombinator.com | 2024-03-24
  • wave

    Realtime Web Apps and Dashboards for Python and R (by h2oai)

    Project mention: Streamlit alternatives but for Rust? | /r/rust | 2023-10-01

    https://streamlit.io/ https://wave.h2o.ai/ https://reflex.dev/

  • awesome-conformal-prediction

    A professionally curated list of awesome Conformal Prediction videos, tutorials, books, papers, PhD and MSc theses, articles and open-source libraries.

    Project mention: Forecasts need to have error bars | news.ycombinator.com | 2023-12-04

    Let me suggest a solution https://github.com/valeman/awesome-conformal-prediction

  • ML-Workspace

    🛠 All-in-one web-based IDE specialized for machine learning and data science.

  • rmarkdown

    Dynamic Documents for R

    Project mention: Pandoc | news.ycombinator.com | 2024-01-28

    I'm surprised to see no one has pointed out [RMarkdown + RStudio](https://rmarkdown.rstudio.com) as one way to immediately interface with Pandoc.

    I used to write papers and slides in LaTeX (using vim, because who needs render previews), then eventually switched to Pandoc (also vim). I eventually discovered RMarkdown+RStudio. I was looking for a nice way to format a simple table and discovered that rmarkdown had nice extensions of basic markdown (this was many years ago so maybe that is incorporated into vanilla markdown/pandoc).

    The RMarkdown page claims:

    > R Markdown supports dozens of static and dynamic output formats including HTML, PDF, MS Word, Beamer, HTML5 slides, Tufte-style handouts, books, dashboards, shiny applications, scientific articles, websites, and more.

    ...which I think is largely due to using pandoc as the core generator.

    RStudio shows you the pandoc command it runs to generate your document, which I've used to figure out the pandoc command I want to run when I've switched to using pandoc directly.

    This is a bit of a "lazy" way to interact with pandoc. Maybe the "laziest" aspect: when I get a new computer, I can install the entire stack by installing Rstudio, then opening a new rmarkdown document. Rstudio asks whether I'd like to install all the necessary libraries -- click "yes" and that's it. Maybe that sounds silly but it used to be a lot of work to manage your LaTeX install. These days I greatly favor things that save me time, which seems to get more precious every year.

  • Data-science-best-resources

    Carefully curated resource links for data science in one place

  • DifferentialEquations.jl

    Multi-language suite for high-performance solvers of differential equations and scientific machine learning (SciML) components. Ordinary differential equations (ODEs), stochastic differential equations (SDEs), delay differential equations (DDEs), differential-algebraic equations (DAEs), and more in Julia.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-03-25.

R related posts


What are some of the best open-source R projects? This list will help you:

Project Stars
1 ML-For-Beginners 66,722
2 Apache Spark 38,249
3 dash 20,404
4 Prophet 17,702
5 LightGBM 16,006
6 ds-cheatsheets 12,570
7 mal 9,792
8 hugo-blox-builder 7,750
9 catboost 7,716
10 metaflow 7,530
11 H2O 6,705
12 ggplot2 6,302
13 awesome-R 5,774
14 FriendsDontLetFriends 5,655
15 papermill 5,607
16 dplyr 4,645
17 r4ds 4,333
18 wave 3,848
19 awesome-conformal-prediction 3,334
20 ML-Workspace 3,315
21 rmarkdown 2,790
22 Data-science-best-resources 2,750
23 DifferentialEquations.jl 2,737
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives