SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 R Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Prophet
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
-
LightGBM
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
hugo-blox-builder
😍 EASILY BUILD THE WEBSITE YOU WANT - NO CODE, JUST MARKDOWN BLOCKS! 使用块轻松创建任何类型的网站 - 无需代码。 一个应用程序,没有依赖项,没有 JS
-
catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
-
H2O
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
-
FriendsDontLetFriends
Friends don't let friends make certain types of data visualization - What are they and why are they bad.
-
awesome-conformal-prediction
A professionally curated list of awesome Conformal Prediction videos, tutorials, books, papers, PhD and MSc theses, articles and open-source libraries.
-
DifferentialEquations.jl
Multi-language suite for high-performance solvers of differential equations and scientific machine learning (SciML) components. Ordinary differential equations (ODEs), stochastic differential equations (SDEs), delay differential equations (DDEs), differential-algebraic equations (DAEs), and more in Julia.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
- https://github.com/microsoft/ML-For-Beginners
Also check out this list Pitt puts out every year:
Project mention: Moirai: A Time Series Foundation Model for Universal Forecasting | news.ycombinator.com | 2024-03-25https://facebook.github.io/prophet/
"Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well."
Project mention: SIRUS.jl: Interpretable Machine Learning via Rule Extraction | /r/Julia | 2023-06-29SIRUS.jl is a pure Julia implementation of the SIRUS algorithm by Bénard et al. (2021). The algorithm is a rule-based machine learning model meaning that it is fully interpretable. The algorithm does this by firstly fitting a random forests and then converting this forest to rules. Furthermore, the algorithm is stable and achieves a predictive performance that is comparable to LightGBM, a state-of-the-art gradient boosting model created by Microsoft. Interpretability, stability, and predictive performance are described in more detail below.
>Would be interesting to see how the interpreter works actually...
It's quite easy to see, there are interpeters for Lisp in like 20 lines or so.
Here's a good one:
(It has the full code in a link towards the bottom)
There's also this:
Project mention: CatBoost: Open-source gradient boosting library | news.ycombinator.com | 2024-03-05
I would use H20 if I were you. You can try out LLMs with a nice GUI. Unless you have some familiarity with the tools needed to run these projects, it can be frustrating. https://h2o.ai/
Project mention: Friends don't let friends make certain types of data visualizations | news.ycombinator.com | 2023-11-19
Project mention: Spreadsheet errors can have disastrous consequences – yet we keep making them | news.ycombinator.com | 2024-01-25Pandas docs > Comparison with spreadsheets: https://pandas.pydata.org/docs/getting_started/comparison/co...
Pandas docs > I/O > Excel files: https://pandas.pydata.org/docs/user_guide/io.html#excel-file...
nteract/papermill: https://github.com/nteract/papermill :
> papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks. [...]
> This opens up new opportunities for how notebooks can be used. For example:
> - Perhaps you have a financial report that you wish to run with different values on the first or last day of a month or at the beginning or end of the year, using parameters makes this task easier.
"The World Excel Championship is being broadcast on ESPN" (2022) https://news.ycombinator.com/item?id=32420925 :
> Computational notebook speedrun ideas:
Project mention: Show HN: Open-source, browser-local data exploration using DuckDB-WASM and PRQL | news.ycombinator.com | 2024-03-15That's great feedback, thanks!
This tool definitely comes from a place of personal need - beyond just handling large files, I've also never really gelled well with the Excel/Google Sheet model of changing data in place as if you were editing text. I'm a Data Scientist and always preferred the chained data transforms you see in things like dplyr (https://dplyr.tidyverse.org/) or Polars (https://pola.rs/) and I feel this tool maps very closely to the chained model.
Also, thank you for the feature requests! Those would all be very useful - we'll put them on the roadmap.
https://streamlit.io/ https://wave.h2o.ai/ https://reflex.dev/
Project mention: Dive Deep into Conformal Prediction with This Ultimate Resource Compilation | news.ycombinator.com | 2024-04-15
I'm surprised to see no one has pointed out [RMarkdown + RStudio](https://rmarkdown.rstudio.com) as one way to immediately interface with Pandoc.
I used to write papers and slides in LaTeX (using vim, because who needs render previews), then eventually switched to Pandoc (also vim). I eventually discovered RMarkdown+RStudio. I was looking for a nice way to format a simple table and discovered that rmarkdown had nice extensions of basic markdown (this was many years ago so maybe that is incorporated into vanilla markdown/pandoc).
The RMarkdown page claims:
> R Markdown supports dozens of static and dynamic output formats including HTML, PDF, MS Word, Beamer, HTML5 slides, Tufte-style handouts, books, dashboards, shiny applications, scientific articles, websites, and more.
...which I think is largely due to using pandoc as the core generator.
RStudio shows you the pandoc command it runs to generate your document, which I've used to figure out the pandoc command I want to run when I've switched to using pandoc directly.
This is a bit of a "lazy" way to interact with pandoc. Maybe the "laziest" aspect: when I get a new computer, I can install the entire stack by installing Rstudio, then opening a new rmarkdown document. Rstudio asks whether I'd like to install all the necessary libraries -- click "yes" and that's it. Maybe that sounds silly but it used to be a lot of work to manage your LaTeX install. These days I greatly favor things that save me time, which seems to get more precious every year.
R related posts
- Dive Deep into Conformal Prediction with This Ultimate Resource Compilation
- How to generate a great website and reference manual for your R package
- Fortran on WebAssembly
- Moirai: A Time Series Foundation Model for Universal Forecasting
- Ask HN: Learning Maths from the Ground Up
- RStudio: Integrated development environment (IDE) for R
- "xAI will open source Grok"
-
A note from our sponsor - SaaSHub
www.saashub.com | 19 Apr 2024
Index
What are some of the best open-source R projects? This list will help you:
Project | Stars | |
---|---|---|
1 | ML-For-Beginners | 66,806 |
2 | Apache Spark | 38,249 |
3 | dash | 20,434 |
4 | Prophet | 17,720 |
5 | LightGBM | 16,025 |
6 | ds-cheatsheets | 12,570 |
7 | mal | 9,792 |
8 | hugo-blox-builder | 7,766 |
9 | catboost | 7,731 |
10 | metaflow | 7,559 |
11 | H2O | 6,721 |
12 | ggplot2 | 6,311 |
13 | awesome-R | 5,780 |
14 | FriendsDontLetFriends | 5,655 |
15 | papermill | 5,615 |
16 | dplyr | 4,652 |
17 | r4ds | 4,339 |
18 | wave | 3,852 |
19 | awesome-conformal-prediction | 3,358 |
20 | ML-Workspace | 3,317 |
21 | rmarkdown | 2,795 |
22 | Data-science-best-resources | 2,750 |
23 | DifferentialEquations.jl | 2,746 |