Top 23 R Open-Source Projects

ML-For-Beginners

28 66,806 8.0 HTML

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

Project mention: Good coding groups for black women? | news.ycombinator.com | 2024-01-13

- https://github.com/microsoft/ML-For-Beginners
Also check out this list Pitt puts out every year:

Apache Spark

101 38,249 10.0 Scala

Apache Spark - A unified analytics engine for large-scale data processing

Project mention: "xAI will open source Grok" | news.ycombinator.com | 2024-03-11

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
dash

56 20,434 9.6 Python

Data Apps & Dashboards for Python. No JavaScript Required.

Project mention: dash VS solara - a user suggested alternative | libhunt.com/r/dash | 2023-10-13

Prophet

221 17,720 6.2 Python

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Project mention: Moirai: A Time Series Foundation Model for Universal Forecasting | news.ycombinator.com | 2024-03-25

https://facebook.github.io/prophet/
"Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well."

LightGBM

11 16,025 9.2 C++

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Project mention: SIRUS.jl: Interpretable Machine Learning via Rule Extraction | /r/Julia | 2023-06-29

SIRUS.jl is a pure Julia implementation of the SIRUS algorithm by Bénard et al. (2021). The algorithm is a rule-based machine learning model meaning that it is fully interpretable. The algorithm does this by firstly fitting a random forests and then converting this forest to rules. Furthermore, the algorithm is stable and achieves a predictive performance that is comparable to LightGBM, a state-of-the-art gradient boosting model created by Microsoft. Interpretability, stability, and predictive performance are described in more detail below.

ds-cheatsheets

2 12,570 0.0

List of Data Science Cheatsheets to rule the world
mal

94 9,792 0.0 Assembly

mal - Make a Lisp

Project mention: Ask HN: Is Lisp Simple? | news.ycombinator.com | 2023-08-21

>Would be interesting to see how the interpreter works actually...
It's quite easy to see, there are interpeters for Lisp in like 20 lines or so.
Here's a good one:
https://norvig.com/lispy.html
(It has the full code in a link towards the bottom)
There's also this:
https://github.com/kanaka/mal

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
hugo-blox-builder

4 7,766 9.3 HTML

😍 EASILY BUILD THE WEBSITE YOU WANT - NO CODE, JUST MARKDOWN BLOCKS! 使用块轻松创建任何类型的网站 - 无需代码。一个应用程序，没有依赖项，没有 JS
catboost

8 7,731 9.9 Python

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Project mention: CatBoost: Open-source gradient boosting library | news.ycombinator.com | 2024-03-05

metaflow

24 7,559 9.2 Python

:rocket: Build and manage real-life ML, AI, and data science projects with ease!

Project mention: FLaNK Stack 05 Feb 2024 | dev.to | 2024-02-05

H2O

10 6,721 9.7 Jupyter Notebook

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Project mention: Really struggling with open source models | /r/LocalLLaMA | 2023-07-12

I would use H20 if I were you. You can try out LLMs with a nice GUI. Unless you have some familiarity with the tools needed to run these projects, it can be frustrating. https://h2o.ai/

ggplot2

62 6,311 9.4 R

An implementation of the Grammar of Graphics in R

Project mention: ggplot2 | news.ycombinator.com | 2024-03-01

awesome-R

6 5,780 4.0 R

A curated list of awesome R packages, frameworks and software.

Project mention: Good coding groups for black women? | news.ycombinator.com | 2024-01-13

FriendsDontLetFriends

4 5,655 7.3 R

Friends don't let friends make certain types of data visualization - What are they and why are they bad.

Project mention: Friends don't let friends make certain types of data visualizations | news.ycombinator.com | 2023-11-19

papermill

26 5,615 7.9 Python

📚 Parameterize, execute, and analyze notebooks

Project mention: Spreadsheet errors can have disastrous consequences – yet we keep making them | news.ycombinator.com | 2024-01-25

Pandas docs > Comparison with spreadsheets: https://pandas.pydata.org/docs/getting_started/comparison/co...
Pandas docs > I/O > Excel files: https://pandas.pydata.org/docs/user_guide/io.html#excel-file...
nteract/papermill: https://github.com/nteract/papermill :
> papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks. [...]
> This opens up new opportunities for how notebooks can be used. For example:
> - Perhaps you have a financial report that you wish to run with different values on the first or last day of a month or at the beginning or end of the year, using parameters makes this task easier.
"The World Excel Championship is being broadcast on ESPN" (2022) https://news.ycombinator.com/item?id=32420925 :
> Computational notebook speedrun ideas:

dplyr

40 4,652 7.4 R

dplyr: A grammar of data manipulation

Project mention: Show HN: Open-source, browser-local data exploration using DuckDB-WASM and PRQL | news.ycombinator.com | 2024-03-15

That's great feedback, thanks!
This tool definitely comes from a place of personal need - beyond just handling large files, I've also never really gelled well with the Excel/Google Sheet model of changing data in place as if you were editing text. I'm a Data Scientist and always preferred the chained data transforms you see in things like dplyr (https://dplyr.tidyverse.org/) or Polars (https://pola.rs/) and I feel this tool maps very closely to the chained model.
Also, thank you for the feature requests! Those would all be very useful - we'll put them on the roadmap.

r4ds

165 4,339 8.7 R

R for data science: a book

Project mention: Ask HN: Learning Maths from the Ground Up | news.ycombinator.com | 2024-03-24

wave

21 3,852 9.2 Python

Realtime Web Apps and Dashboards for Python and R (by h2oai)

Project mention: Streamlit alternatives but for Rust? | /r/rust | 2023-10-01

https://streamlit.io/ https://wave.h2o.ai/ https://reflex.dev/

awesome-conformal-prediction

6 3,358 9.5

A professionally curated list of awesome Conformal Prediction videos, tutorials, books, papers, PhD and MSc theses, articles and open-source libraries.

Project mention: Dive Deep into Conformal Prediction with This Ultimate Resource Compilation | news.ycombinator.com | 2024-04-15

ML-Workspace

7 3,317 2.7 Jupyter Notebook

🛠 All-in-one web-based IDE specialized for machine learning and data science.
rmarkdown

38 2,795 7.6 R

Dynamic Documents for R

Project mention: Pandoc | news.ycombinator.com | 2024-01-28

I'm surprised to see no one has pointed out [RMarkdown + RStudio](https://rmarkdown.rstudio.com) as one way to immediately interface with Pandoc.
I used to write papers and slides in LaTeX (using vim, because who needs render previews), then eventually switched to Pandoc (also vim). I eventually discovered RMarkdown+RStudio. I was looking for a nice way to format a simple table and discovered that rmarkdown had nice extensions of basic markdown (this was many years ago so maybe that is incorporated into vanilla markdown/pandoc).
The RMarkdown page claims:
> R Markdown supports dozens of static and dynamic output formats including HTML, PDF, MS Word, Beamer, HTML5 slides, Tufte-style handouts, books, dashboards, shiny applications, scientific articles, websites, and more.
...which I think is largely due to using pandoc as the core generator.
RStudio shows you the pandoc command it runs to generate your document, which I've used to figure out the pandoc command I want to run when I've switched to using pandoc directly.
This is a bit of a "lazy" way to interact with pandoc. Maybe the "laziest" aspect: when I get a new computer, I can install the entire stack by installing Rstudio, then opening a new rmarkdown document. Rstudio asks whether I'd like to install all the necessary libraries -- click "yes" and that's it. Maybe that sounds silly but it used to be a lot of work to manage your LaTeX install. These days I greatly favor things that save me time, which seems to get more precious every year.

Data-science-best-resources

2 2,750 0.0

Carefully curated resource links for data science in one place
DifferentialEquations.jl

6 2,746 7.3 Julia

Multi-language suite for high-performance solvers of differential equations and scientific machine learning (SciML) components. Ordinary differential equations (ODEs), stochastic differential equations (SDEs), delay differential equations (DDEs), differential-algebraic equations (DAEs), and more in Julia.
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-15.

R related posts

Dive Deep into Conformal Prediction with This Ultimate Resource Compilation
1 project | news.ycombinator.com | 15 Apr 2024
How to generate a great website and reference manual for your R package
1 project | dev.to | 10 Apr 2024
Fortran on WebAssembly
2 projects | news.ycombinator.com | 5 Apr 2024
Moirai: A Time Series Foundation Model for Universal Forecasting
2 projects | news.ycombinator.com | 25 Mar 2024
Ask HN: Learning Maths from the Ground Up
3 projects | news.ycombinator.com | 24 Mar 2024
RStudio: Integrated development environment (IDE) for R
6 projects | news.ycombinator.com | 20 Mar 2024
"xAI will open source Grok"
3 projects | news.ycombinator.com | 11 Mar 2024
A note from our sponsor - SaaSHub
www.saashub.com | 19 Apr 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source R projects? This list will help you:

	Project	Stars
1	ML-For-Beginners	66,806
2	Apache Spark	38,249
3	dash	20,434
4	Prophet	17,720
5	LightGBM	16,025
6	ds-cheatsheets	12,570
7	mal	9,792
8	hugo-blox-builder	7,766
9	catboost	7,731
10	metaflow	7,559
11	H2O	6,721
12	ggplot2	6,311
13	awesome-R	5,780
14	FriendsDontLetFriends	5,655
15	papermill	5,615
16	dplyr	4,652
17	r4ds	4,339
18	wave	3,852
19	awesome-conformal-prediction	3,358
20	ML-Workspace	3,317
21	rmarkdown	2,795
22	Data-science-best-resources	2,750
23	DifferentialEquations.jl	2,746