Solutions like Dependabot or Renovate update but don't merge dependencies. You need to do it manually while it could be fully automated! Add a Merge Queue to your workflow and stop caring about PR management & merging. Try Mergify for free. Learn more →
Top 23 Distributed Open-Source Projects
-
Project mention: Non-determinism in GPT-4 is caused by Sparse MoE | news.ycombinator.com | 2023-08-04
Right but that's not an inherent GPU determinism issue. It's a software issue.
https://github.com/tensorflow/tensorflow/issues/3103#issueco... is correct that it's not necessary, it's a choice.
Your line of reasoning appears to be "GPUs are inherently non-deterministic don't be quick to judge someone's code" which as far as I can tell is dead wrong.
Admittedly there are some cases and instructions that may result in non-determinism but they are inherently necessary. The author should thinking carefully before introducing non-determinism. There are many scenarios where it is irrelevant, but ultimately the issue we are discussing here isn't the GPU's fault.
-
Ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Project mention: Fine-Tuning Llama-2: A Comprehensive Case Study for Tailoring Custom Models | news.ycombinator.com | 2023-08-11Training times for GSM8k are mentioned here: https://github.com/ray-project/ray/tree/master/doc/source/te...
-
SonarCloud
Analyze your C and C++ projects with just one click.. SonarCloud, a cloud-based static analysis tool for your CI/CD workflows, offers a one-click automatic analysis of C and C++ projects hosted on GitHub. Zero configuration and free for open-source projects! Analyze free.
-
-
NextCloud - Once I have my Unraid NAS up and running I will be setting up NextCloud for the whole family. This way I can get my unencrypted files and photos off of services such as Dropbox and iCloud.
-
Project mention: How to Design a SurrealDB schema and create a basic client for TypeScript | dev.to | 2023-09-17
In the midst of a dynamic landscape of exciting new projects, one name shines bright — SurrealDB.
-
Redisson
Redisson - Easy Redis Java client with features of In-Memory Data Grid. Sync/Async/RxJava/Reactive API. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, RPC, local cache ...
Project mention: Kotlin Spring WebFlux, R2DBC and Redisson microservice in k8s 👋✨💫 | dev.to | 2022-10-17Source code you can find in the GitHub repository. he main idea of this project is the implementation of microservice using Kotlin, Spring WebFlux, PostgresSQL, and Redis with metrics and monitoring and deploying it to k8s. For interacting with PostgresSQL we will use reactive Spring Data R2DBC and for Redis caching using Redisson.
-
TDengine
TDengine is an open source, high-performance, cloud native time-series database optimized for Internet of Things (IoT), Connected Cars, Industrial IoT and DevOps.
-
Mergify
Updating dependencies is time-consuming.. Solutions like Dependabot or Renovate update but don't merge dependencies. You need to do it manually while it could be fully automated! Add a Merge Queue to your workflow and stop caring about PR management & merging. Try Mergify for free.
-
Yes! I love Elixir :) [Phoenix LiveView](https://www.phoenixframework.org/) is really amazing. I feel so fast working in it. I got hooked after watching Chris McCord's ['Build a real-time Twitter clone in 15 minutes'](https://www.youtube.com/watch?v=MZvmYaFkNJI&embeds_referring...), and things have improved a lot since then.
-
-
-
Oh can I address theses issues. I already looked at tools like Nx or Bit, but they aren't matching our needs with closed source libs.
-
LightGBM
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Project mention: SIRUS.jl: Interpretable Machine Learning via Rule Extraction | /r/Julia | 2023-06-29SIRUS.jl is a pure Julia implementation of the SIRUS algorithm by Bénard et al. (2021). The algorithm is a rule-based machine learning model meaning that it is fully interpretable. The algorithm does this by firstly fitting a random forests and then converting this forest to rules. Furthermore, the algorithm is stable and achieves a predictive performance that is comparable to LightGBM, a state-of-the-art gradient boosting model created by Microsoft. Interpretability, stability, and predictive performance are described in more detail below.
-
Project mention: We need a Facebook groups style decentralized alternative. Does one exist? | /r/selfhosted | 2023-07-06
-
nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
-
NebulaGraph Database
A distributed, fast open-source graph database featuring horizontal scalability and high availability (by vesoft-inc)
A NoSQL graph database is a type of non-relational, distributed database which employs a graph model. NoSQL stands for “Not only SQL” and refers to a new breed of databases that differ from traditional relational databases in their data model and performance. Graph databases are especially useful for data associated with relationships—everything from friendships on social netwo#rks to equipment supply chains or business processes. They can quickly traverse vast amounts of linked data points to discover insights and hidden connections between entities, making them ideal for network analysis– such as financial fraud detection, recommendation engines and many other use cases– all while performing at scale.
-
-
Project mention: FOSS hyperparameter optimization framework to automate hyperparameter search | news.ycombinator.com | 2023-08-10
-
Project mention: OrbitDB reaches version 1.0 after 8 years of development | news.ycombinator.com | 2023-09-19
-
H2O
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
I would use H20 if I were you. You can try out LLMs with a nice GUI. Unless you have some familiarity with the tools needed to run these projects, it can be frustrating. https://h2o.ai/
-
oceanbase
OceanBase is an enterprise distributed relational database with high availability, high performance, horizontal scalability, and compatibility with SQL standards.
Project mention: Show HN: OceanBase – An open-source distributed SQL database written in C++ | news.ycombinator.com | 2023-05-23 -
-
Hazelcast
Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.
Project mention: Does anyone know any good java implementations for distributed key-value store? | /r/ExperiencedDevs | 2023-06-08You're probably looking for Hazelcast here. Note that it does much more than just a distributed k/v, but it will get you where you need to go.
-
-
InfluxDB
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
Distributed related posts
- Corrosion: Gossip-based service discovery for large distributed systems
- OrbitDB reaches version 1.0 after 8 years of development
- Is Dgraph dead? (should I continue using it)
- I built a distributed workflow engine
- SurrealDB 1.0 Live
- SurrealDB the Scalable Rust SQL/NoSQL/Graph DB Released v1.0.0 Today
- SurrealDB 1.0.0
-
A note from our sponsor - Mergify
blog.mergify.com | 24 Sep 2023
Index
What are some of the best open-source Distributed projects? This list will help you:
Project | Stars | |
---|---|---|
1 | tensorflow | 177,728 |
2 | Ray | 27,697 |
3 | handson-ml | 25,038 |
4 | Nextcloud | 23,799 |
5 | surrealdb | 22,480 |
6 | Redisson | 21,760 |
7 | TDengine | 21,751 |
8 | Phoenix | 19,935 |
9 | dgraph | 19,611 |
10 | CNTK | 17,405 |
11 | Bit | 16,997 |
12 | LightGBM | 15,464 |
13 | diaspora* | 13,288 |
14 | nni | 13,270 |
15 | NebulaGraph Database | 9,474 |
16 | modin | 8,967 |
17 | optuna | 8,639 |
18 | orbitdb | 7,811 |
19 | H2O | 6,484 |
20 | oceanbase | 6,162 |
21 | PowerJob | 5,811 |
22 | Hazelcast | 5,556 |
23 | scrapy-redis | 5,338 |