Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →
Top 23 R Open-Source Projects
-
Project mention: Learn Machine Learning with these GitHub repositories | news.ycombinator.com | 2025-01-15
*Learn Machine Learning with these amazing GitHub repositories! *
1⃣ [ML for Beginners](https://github.com/microsoft/ML-For-Beginners) by Microsoft
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
Project mention: Unveiling the Apache License 2.0: A Deep Dive into Open Source Freedom | dev.to | 2025-03-11
One of the key attributes of Apache License 2.0 is its flexible nature. Permitting use in both proprietary and open source environments, it has become the go-to choice for innovative projects ranging from the Apache HTTP Server to large-scale initiatives like Apache Spark and Hadoop. This flexibility is not solely legal; it is also philosophical. The license is designed to encourage transparency and maintain a healthy balance between freedom and accountability, ultimately making it easier for developers to adapt and contribute without restrictive legal barriers. Another modern twist discussed in the article is the concept of dual licensing. Dual licensing can offer an attractive method for additional commercial exploitation while still upholding open source principles. However, as the article cautions, dual licensing involves legal intricacy and demands rigor in managing Contributor License Agreements (CLAs), a challenge that the open source community navigates with ongoing debates. For developers looking to understand similar innovative approaches to licensing, further information can be explored at License Token.
-
Prophet
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
Project mention: TimesFM (Time Series Foundation Model) for time-series forecasting | news.ycombinator.com | 2024-05-08 -
LightGBM
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
-
-
So this time I needed to tokenize, and perform the lexer on my own. If I only deal with numbers, everything is easy, but when it comes to string things get more complicated. I followed another tutorial, and rediscovered make-a-lisp project. Eventually I gave up, and used the lexer provided by hy-lang.
-
Project mention: Show HN: Flow – A Dynamic Task Engine for AI Agents Without DAG | news.ycombinator.com | 2024-12-02
Interesting! I feel like this is a cross between https://github.com/dagworks-inc/burr (switch state for context) and https://github.com/Netflix/metaflow because the output of the "task" declares its next hop...
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
hugo-blox-builder
🚨 GROW YOUR AUDIENCE WITH HUGOBLOX! 🚀 HugoBlox is an easy, fast no-code website builder for researchers, entrepreneurs, data scientists, and developers. Build stunning sites in minutes. 适合研究人员、企业家、数据科学家和开发者的简单快速无代码网站构建器。用拖放功能、可定制模板和内置SEO工具快速创建精美网站!
-
catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Project mention: 🚀 Why Your ML Service Needs Rust + CatBoost: A Setup Guide That Actually Works | dev.to | 2025-01-19[package] name = "MLApp" version = "0.1.0" edition = "2021" [dependencies] catboost = { git = "https://github.com/catboost/catboost", rev = "0bfdc35"}
-
H2O
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
-
Pardon me for shooting from the hip here, but IMO if you're using R for something radically different than statistical analysis and data visualization, there might be another tool/language that's more purpose-suited.
> As someone who basically uses R as a nice LISP-y scripting language to orchestrate calling low-level compiled code from other languages
When I read this, I think, would `bash` or something equally portable/universally installed work?
R is a beautiful thing when limited to its core uses (I use it every day ([0]). But in my experience, the more we build away from those core uses, the more brittleness we introduce. I wish the Posit team would focus on the core R experience, resolve some of the hundreds of open issues on its core packages in a timely way, [1,2] and just generally play to R's strengths.
[0] https://github.com/hsflabstanford/vegan-meta
[1] https://github.com/rstudio/rmarkdown/issues
[2] https://github.com/tidyverse/ggplot2/issues
-
FriendsDontLetFriends
Friends don't let friends make certain types of data visualization - What are they and why are they bad.
-
-
-
dplyr
-
Project mention: Visualizing Data on a Mesh with Displacement Mapping in R | news.ycombinator.com | 2024-06-18
My personal favorite resource is "R for Data Science" by Hadley Wickham. It covers lots of nice data manipulation and visualization examples, and provides a good introduction to the tidyverse, which is a particular dialect of R that's well-suited for data analysis. It's available for free at:
https://r4ds.hadley.nz/
For more specialized analytical methods there are lots of textbooks out there that provide a deep dive into packages for a specific field (e.g. survival analysis, machine learning, time series), but for general data manipulation and visualization it's hard to beat R4DS.
-
wave – Realtime Web Apps and Dashboards for Python and R
-
-
-
Project mention: Reinventing notebooks as reusable Python programs | news.ycombinator.com | 2025-03-19
I am surprised they didn't mention RMarkdown (https://rmarkdown.rstudio.com/), which was developed in parallel to Jupyter Notebooks, with lots of convergent evolution.
RMarkdown is essentially Markdown with executable code blocks. While it comes from an R background, code blocks can be written in any language (and you can mix multiple languages).
The biggest difference (and, I would say, advantage) is that it separates code from output, making it work well with version control.
-
DifferentialEquations.jl
Multi-language suite for high-performance solvers of differential equations and scientific machine learning (SciML) components. Ordinary differential equations (ODEs), stochastic differential equations (SDEs), delay differential equations (DDEs), differential-algebraic equations (DAEs), and more in Julia.
Another up-and-coming solution is Julia's simulation ecosystem [1]. It is powered by the commercial organization behind the Julia programming language, which has received DARPA funding [2] to build out these tools. This ecosystem unifies researchers in numerical methods [3], scalable compute, and domain experts in modeling engineering systems (electrical, mechanical, etc.) I believe this is where simulation is headed.
[1] https://juliahub.com/products/juliasim
[2] https://news.ycombinator.com/item?id=26425659
[3] https://docs.sciml.ai/DiffEqDocs/stable/
-
m2cgen
Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies
-
AIF360
A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
Project mention: The Cornerstones of Ethical Software Development: Privacy, Transparency, Fairness, Security, and Accountability | dev.to | 2024-06-26IBM AI Fairness 360 Toolkit
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
R discussion
R related posts
-
The Application of Java Programming In Data Analysis and Artificial Intelligence
-
Apache Spark: Revolutionizing Big Data with Sustainable Open Source Funding
-
echarts4r VS echarty - a user suggested alternative
2 projects | 3 Feb 2025 -
R package for collaborative writing of R Markdown documents in Google Docs
-
Run PySpark Local Python Windows Notebook
-
Infraestrutura para análise de dados com Jupyter, Cassandra, Pyspark e Docker
-
Introducing Rlinguo, a native mobile app that runs R
-
A note from our sponsor - CodeRabbit
coderabbit.ai | 25 Mar 2025
Index
What are some of the best open-source R projects? This list will help you:
# | Project | Stars |
---|---|---|
1 | ML-For-Beginners | 71,493 |
2 | Apache Spark | 40,785 |
3 | Prophet | 18,997 |
4 | LightGBM | 17,057 |
5 | ds-cheatsheets | 14,981 |
6 | mal | 10,220 |
7 | metaflow | 8,645 |
8 | hugo-blox-builder | 8,535 |
9 | catboost | 8,300 |
10 | H2O | 7,072 |
11 | ggplot2 | 6,639 |
12 | FriendsDontLetFriends | 6,632 |
13 | awesome-R | 6,119 |
14 | papermill | 6,116 |
15 | dplyr | 4,846 |
16 | r4ds | 4,723 |
17 | wave | 4,069 |
18 | ML-Workspace | 3,478 |
19 | Data-science-best-resources | 3,024 |
20 | rmarkdown | 2,918 |
21 | DifferentialEquations.jl | 2,921 |
22 | m2cgen | 2,865 |
23 | AIF360 | 2,546 |