Distributed

Top 23 Distributed Open-Source Projects

Distributed
  • tensorflow

    An Open Source Machine Learning Framework for Everyone

    Project mention: Las 10 Mejores Herramientas de Inteligencia Artificial de Código Abierto | dev.to | 2024-08-21

    (https://dev-to-uploads.s3.amazonaws.com/uploads/articles/adae9icuiza0lhd532pc.png)

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • Ray

    Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

    Project mention: Amazon's Exabyte-Scale Migration from Apache Spark to Ray on Amazon EC2 | news.ycombinator.com | 2024-07-29

    Yeah, mmap, I think this is the relevant line [1].

    Fun fact, very early on, we used to create one mmapped file per serialized object, but that very quickly broke down.

    Then we switched to mmapping one large file at the start and storing all of the serialized objects in that file. But then as objects get allocated and deallocated, you need to manage the memory inside of that mmapped file, and we just repurposed a malloc implementation to handle that.

    [1] https://github.com/ray-project/ray/blob/21202f6ddc3ceaf74fbc...

  • Milvus

    A cloud-native vector database, storage for next generation AI applications

    Project mention: AIM Weekly for 23 September 2024 | dev.to | 2024-09-23
  • surrealdb

    A scalable, distributed, collaborative, document-graph database, for the realtime web

    Project mention: Ask HN: Lesser-known/underrated cool new web-oriented tech? | news.ycombinator.com | 2024-07-23

    I've been surveying the space lately and I re/discovered some really powerful new-ish tech which woke up my tech taste buds and am now looking for more such "tasty" tech (sorry I guess I'm due for a meal soon :P)

    Example as starters:

    - Qwik and resumable web apps (https://qwik.dev/)

    - SurrealDB, maximally flexible multi-model DB (https://surrealdb.com/)

    There are others, but I'm trying to keep to the starkest examples and not to influence the discussion too much.

    I do think this is the best place to ask such questions - I'm explicitly interested in cutting-edge tech, but the edge doesn't have to be excessively sharp ;).

  • Nextcloud

    ☁️ Nextcloud server, a safe home for all your data

    Project mention: Ask HN: Is Nextcloud a Great Alternative to Dropbox/Google Drive for Startups? | news.ycombinator.com | 2024-09-22

    In my opinion it’s not a good alternative if you or your team members expect exactly the same quality of service. When you switch to Nextcloud you’ll have to expect more bugs, less reliability, less performance and obviously more maintenance (since it's typically self-hosted) compared to Google Drive, Dropbox or One Drive. So you'll have to go into this with a different kind of mindset. What you gain is independence and extendability due to a rather big platform ecosystem.

    E.g. here are some specific things and examples of things you'll have to deal with, in no specific order. These are just some things I've had to deal with recently.

    - You'll have to educate people in your group that there are at least 3 different ways to share files among each other and that they can all coexist in parallel (Individual Shares vs. Group Shares vs. Group folders vs. Circles/Teams) (I did a german blog post on this: https://bitbetter.de/blog/nextcloud-freigabe-chaos/)

    - Handling of file/folder names with special characters is a mess e.g. if you have Windows and Linux clients there will most certainly be conflicts. (Luckily this has been fixed recently by the `forbidden_filename_characters` config option – which is not enforced yet via the Web UI) see https://github.com/nextcloud/ios/issues/2802

    - Creating Nextcloud users with spaces in their names, will break CalDAV on iOS Devices (https://github.com/nextcloud/server/issues/15641)

    - Nextcloud (aka Collabora) Office is very slow if you want to actually work collaboratively with it (no matter the power of your Collabora server) – unfortunately it's no match for Google Docs or Office 365

  • handson-ml

    ⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.

  • TDengine

    High-performance, scalable time-series database designed for Industrial IoT (IIoT) scenarios

    Project mention: TDengine: Open-Source, High-Performance Time-Series DB for IoT and Cloud | news.ycombinator.com | 2024-08-14
  • Redisson

    Redisson - Easy Redis Java client and Real-Time Data Platform. Valkey compatible. Sync/Async/RxJava/Reactive API. Over 50 Redis or Valkey based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Bloom filter, Spring, Tomcat, Scheduler, JCache API, Hibernate, RPC, local cache...

  • Phoenix

    Peace of mind from prototype to production

    Project mention: Running Elixir Phoenix on Windows | dev.to | 2024-09-20

    You've miraculously managed to install elixir, erlang, and friends on your Windows machine and you're ready to try out Phoenix. At some point in your tutorial you will be asked to run this command:

  • dgraph

    The high-performance database for modern applications

    Project mention: List of 45 databases in the world | dev.to | 2024-07-09

    Dgraph — Distributed, fast graph database.

  • Bit

    A build system for development of composable software.

    Project mention: Tools and libraries widely used in micro frontend architectures! | dev.to | 2024-08-09

    Official Website

  • CNTK

    Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

    Project mention: Top 8 AI Open Source Software Libraries | dev.to | 2024-07-24

    Github Source Code: CNTK

  • LightGBM

    A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

  • diaspora*

    A privacy-aware, distributed, open source social network.

    Project mention: Diaspora is a decentralized, federated alternative to Facebook that anyone can join and contribute to | /r/InnerNet | 2023-12-07
  • NebulaGraph Database

    A distributed, fast open-source graph database featuring horizontal scalability and high availability (by vesoft-inc)

  • optuna

    A hyperparameter optimization framework

    Project mention: Optuna – A Hyperparameter Optimization Framework | news.ycombinator.com | 2024-04-06

    I didn’t even know WandB did hyperparameter optimization, I figured it was a neural network visualizer based on 2 minute papers. Didn’t seem like many alternatives out there to Optuna with TPE + persistence in conditional continuous & discrete spaces.

    Anyway, it’s doable to make a multi objective decide_to_prune function with Optuna, here’s an example https://github.com/optuna/optuna/issues/3450#issuecomment-19...

  • modin

    Modin: Scale your Pandas workflows by changing a single line of code

  • orbitdb

    Peer-to-Peer Databases for the Decentralized Web

  • H2O

    H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

  • PowerJob

    Enterprise job scheduling middleware with distributed computing ability.

  • Apache Storm

    Apache Storm (by apache)

  • toydb

    Distributed SQL database in Rust, written as an educational project

  • Hazelcast

    Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Distributed discussion

Log in or Post with

Distributed related posts

  • Multimodal Madness! Create a Product Recommender for Smart Shopping

    6 projects | dev.to | 7 Aug 2024
  • Genetically synthesized supergain broadband wire-bundle antenna

    2 projects | news.ycombinator.com | 31 Jul 2024
  • Unified time series database for metrics, logs, and events written in Rust

    1 project | news.ycombinator.com | 26 Jul 2024
  • EchoVault: Embeddable Redis Alternative in Go

    1 project | dev.to | 23 Jul 2024
  • Go Embeddable Redis Alternative

    1 project | news.ycombinator.com | 15 Jul 2024
  • Enhancing the SQL Interval syntax: A story of Open Source contribution

    1 project | dev.to | 9 Jul 2024
  • OpenAI Acquires Rockset

    2 projects | news.ycombinator.com | 21 Jun 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 3 Oct 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Distributed projects? This list will help you:

Project Stars
1 tensorflow 185,741
2 Ray 33,194
3 Milvus 29,607
4 surrealdb 27,001
5 Nextcloud 26,844
6 handson-ml 25,166
7 TDengine 23,262
8 Redisson 23,238
9 Phoenix 21,255
10 dgraph 20,338
11 Bit 17,839
12 CNTK 17,500
13 LightGBM 16,562
14 diaspora* 13,389
15 NebulaGraph Database 10,675
16 optuna 10,633
17 modin 9,766
18 orbitdb 8,278
19 H2O 6,871
20 PowerJob 6,782
21 Apache Storm 6,589
22 toydb 6,122
23 Hazelcast 6,102

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you konow that C++ is
the 6th most popular programming language
based on number of metions?