Distributed

Top 23 Distributed Open-Source Projects

Distributed
  1. tensorflow

    An Open Source Machine Learning Framework for Everyone

    Project mention: None of the top 10 projects in GitHub is actually a software project 🤯 | dev.to | 2025-05-10

    We see an addition to the AI community with AutoGPT. Along with Tensorflow they represent the AI community in the software category, which is getting relevant (2 out of 8). We can expect in the future to have new AI projects in the top 25 such as Transformers or Ollama (currently top 34 and 36, respectively).

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. ClickHouse

    ClickHouse® is a real-time analytics database management system

    Project mention: Show HN: Hacker News historic upvote and score data | news.ycombinator.com | 2025-06-03
  4. Ray

    Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

    Project mention: My personal favorite MCP server which has became part of my life | dev.to | 2025-05-27

    GitHub: github.com/ray-project/ray (Ray Serve is part of Ray)

  5. Milvus

    Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

    Project mention: Milvus and Late Chunking: What I Learned About Context-Aware Embedding in RAG | dev.to | 2025-06-11

    After embedding, I stored the chunk vectors in Milvus. I compared its native ANN search against a brute-force cosine similarity scan. Both approaches returned identical top-3 matches for queries like "What are new features in milvus 2.4.13". This gave me high confidence in Milvus’s indexing fidelity.

  6. LocalAI

    :robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference

    Project mention: Nvidia on NixOS WSL – Ollama up 24/7 on your gaming PC | news.ycombinator.com | 2025-04-10

    If you're going to run Ollama in Windows anyway, why not use the native build? And if you want to use WSL, then I'd sugggest using something like LocalAI which gives you a lot more control and support for additional formats (GGML, GGUF, GPTQ, ONNX, etc).

    https://github.com/mudler/LocalAI

  7. Nextcloud

    ☁️ Nextcloud server, a safe home for all your data

    Project mention: Indie Hacking with Open Source Tools: Innovating on a Budget | dev.to | 2025-05-04

    Indie hackers also leverage collaboration tools like Nextcloud for file sharing and team projects, and Mattermost or Rocket.Chat as self-hosted alternatives to Slack. These tools empower remote teams and foster efficient communication across diverse development projects.

  8. surrealdb

    A scalable, distributed, collaborative, document-graph database, for the realtime web

    Project mention: SurrealDB 2.2: Benchmarking, graph path algorithms and foreign key constraints | dev.to | 2025-03-17

    To make this better, we've created a language testing suite similar to the ECMAscript conformance testing suite test262.

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. handson-ml

    ⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.

    Project mention: 🚀 20 Must-Know GitHub Repositories for Developers in 2025! | dev.to | 2025-03-08

    1️⃣1️⃣ Practical Machine Learning 📈 📌 https://github.com/ageron/handson-ml Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow.

  11. TDengine

    High-performance, scalable time-series database designed for Industrial IoT (IIoT) scenarios

    Project mention: Why SSDLC needs static analysis: a case study of 190 bugs in TDengine | dev.to | 2025-05-12

    We'll continue examining the TDengine project, which we've covered in three small notes on code refactoring:

  12. Redisson

    Redisson - Valkey & Redis Java client. Real-Time Data Platform. Sync/Async/RxJava/Reactive API. Over 50 Valkey and Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Bloom filter, Spring, Tomcat, Scheduler, JCache API, Hibernate, RPC, local cache..

    Project mention: Feature Comparison: Reliable Queue vs. Valkey and Redis Stream | dev.to | 2025-05-15

    In the final verdict, Reliable Queue is the more durable and feature-rich option. Standard Valkey/Redis streams will suffice for smaller applications, but Reliable Queue provides the enterprise-grade reliability that businesses depend on. To learn more, visit the Redisson PRO website today.

  13. Phoenix

    Peace of mind from prototype to production

    Project mention: Invisible Threads: Group email without the exposure | dev.to | 2025-06-08

    Invisible Threads is built with Elixir, Phoenix, and most importantly, Postmark. Data lives on disk instead of a traditional database to keep the demo light. Authentication uses Postmark API tokens, mapping each application user directly to a Postmark server. The whole thing is deployed to Fly.io. A minimal setup let me focus on Postmark's offerings.

  14. dgraph

    high-performance graph database for real-time use cases

    Project mention: Automatically Generate REST and GraphQL APIs From Your Database | dev.to | 2024-12-19

    Dgraph

  15. Bit

    AI-powered development workspaces with reusable components, architectural clarity and zero overhead.

    Project mention: Understanding how Vite deals with your node_modules | dev.to | 2025-04-20

    As part of my job, recently I'm working on integrating Vite (also Vitest) into a dev tool called Bit, which originally uses webpack in most of the cases. Basically, Bit is a component-driven development tool for various frontend frameworks and Node.js. In Bit, everything is a component and eventually consumed as an npm package. So technically, you would deal with all kinds of components as packages in your node_modules folder, whatever they are in CJS or ESM, need to be further transformed or not.

  16. CNTK

    Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

    Project mention: Top 8 AI Open Source Software Libraries | dev.to | 2024-07-24

    Github Source Code: CNTK

  17. LightGBM

    A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

  18. diaspora*

    A privacy-aware, distributed, open source social network.

    Project mention: Ask HN: Organize local communities without Facebook? | news.ycombinator.com | 2025-01-21

    * Look into Diaspora. (https://diasporafoundation.org/). Upside: It's basically a self-hosted facebook. Really cool project. Downside: Unlike facebook, there's no fake/pushed content so it tended to feel stale.

    * Look into hosting a forum (e.g. phpBB). Forums are excellent because they don't lose old information like facebook does. When someone says "Hey what's the policy on dogs?" three years later I can search "dogs" and find the answer. Downside: They're not pretty, not full of pictures and no infinite scrollingz. sadge alfababies.

    * IRC chat. I hosted an IRC group for several years at work and it worked great. We only killed it when we decided to move to an enterprise communication app.

  19. optuna

    A hyperparameter optimization framework

    Project mention: Intro to Machine Learning: A Practical Guide for Curious Coders | dev.to | 2025-05-12

    Hyper‑parameter tuning: GridSearchCV / RandomizedSearchCV or advanced tools like Optuna.

  20. NebulaGraph Database

    A distributed, fast open-source graph database featuring horizontal scalability and high availability (by vesoft-inc)

  21. modin

    Modin: Scale your Pandas workflows by changing a single line of code

  22. oneflow

    OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

  23. orbitdb

    Peer-to-Peer Databases for the Decentralized Web

    Project mention: Every System is a Log: Avoiding coordination in distributed applications | news.ycombinator.com | 2025-01-24

    There’s also OrbitDB https://github.com/orbitdb/orbitdb which to my understanding has been a pioneer for p2p logs, databases and CRDTs.

  24. PowerJob

    Enterprise job scheduling middleware with distributed computing ability.

  25. H2O

    H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

    Project mention: H2O: Your New Best Friend for Scalable Machine Learning | dev.to | 2025-05-05

    View the Project on GitHub

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Distributed discussion

Log in or Post with

Distributed related posts

  • Invisible Threads: Group email without the exposure

    4 projects | dev.to | 8 Jun 2025
  • Hatchet

    1 project | news.ycombinator.com | 21 May 2025
  • How to Build a Streaming Deduplication Pipeline with Kafka, GlassFlow, and ClickHouse

    4 projects | dev.to | 14 May 2025
  • Optimizing ML Training with Metagradient Descent

    1 project | news.ycombinator.com | 26 Mar 2025
  • Show HN: SuperMassive – Distributed scalable key-value database in 100% GO

    1 project | news.ycombinator.com | 21 Mar 2025
  • SurrealDB 2.2: Benchmarking, graph path algorithms and foreign key constraints

    2 projects | dev.to | 17 Mar 2025
  • Show HN: Sample NCSA Log Generator

    2 projects | news.ycombinator.com | 15 Mar 2025
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 14 Jun 2025
    InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →

Index

What are some of the best open-source Distributed projects? This list will help you:

# Project Stars
1 tensorflow 190,341
2 ClickHouse 41,164
3 Ray 37,508
4 Milvus 35,322
5 LocalAI 33,172
6 Nextcloud 29,788
7 surrealdb 29,384
8 handson-ml 25,329
9 TDengine 23,948
10 Redisson 23,843
11 Phoenix 22,159
12 dgraph 20,923
13 Bit 18,104
14 CNTK 17,579
15 LightGBM 17,286
16 diaspora* 13,541
17 optuna 12,106
18 NebulaGraph Database 11,391
19 modin 10,190
20 oneflow 8,906
21 orbitdb 8,538
22 PowerJob 7,489
23 H2O 7,191

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com