Fault Tolerance in Distributed Systems: Strategies and Case Studies

This page summarizes the projects mentioned and recommended in the original post on dev.to

InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. Apache ZooKeeper

    Apache ZooKeeper

    Failure Detection and Recovery It’s not enough to have backup systems. It’s also crucial to detect failures quickly. Modern systems employ monitoring tools and rely on distributed coordination systems such as Zookeeper or etcd to identify faults in real-time: once detected, recovery mechanisms are triggered to restore the service.

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. etcd

    Distributed reliable key-value store for the most critical data of a distributed system

    Failure Detection and Recovery It’s not enough to have backup systems. It’s also crucial to detect failures quickly. Modern systems employ monitoring tools and rely on distributed coordination systems such as Zookeeper or etcd to identify faults in real-time: once detected, recovery mechanisms are triggered to restore the service.

  4. jaeger

    CNCF Jaeger, a Distributed Tracing Platform

    However, ensuring fault tolerance in distributed systems is not at all easy. These systems are complex, with multiple nodes or components working together. A failure in one node can cascade across the system if not addressed timely. Moreover, the inherently distributed nature of these systems can make it challenging to pinpoint the exact location and cause of fault - that is why modern systems rely heavily on distributed tracing solutions pioneered by Google Dapper and widely available now in Jaeger and OpenTracing. But still, understanding and implementing fault tolerance becomes not just about addressing the failure but predicting and mitigating potential risks before they escalate.

  5. opentracing-cpp

    Discontinued OpenTracing API for C++. 🛑 This library is DEPRECATED! https://github.com/opentracing/specification/issues/163

    However, ensuring fault tolerance in distributed systems is not at all easy. These systems are complex, with multiple nodes or components working together. A failure in one node can cascade across the system if not addressed timely. Moreover, the inherently distributed nature of these systems can make it challenging to pinpoint the exact location and cause of fault - that is why modern systems rely heavily on distributed tracing solutions pioneered by Google Dapper and widely available now in Jaeger and OpenTracing. But still, understanding and implementing fault tolerance becomes not just about addressing the failure but predicting and mitigating potential risks before they escalate.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Implementing a distributed key-value store on top of implementing Raft in Go

    5 projects | news.ycombinator.com | 25 May 2023
  • Apache ShardingSphere Enterprise Applications — Bilibili

    4 projects | dev.to | 9 Jun 2022
  • The Double-Edged Sword of Microservices: Balancing Abstraction and Complexity

    5 projects | dev.to | 26 Nov 2024
  • On Implementation of Distributed Protocols

    23 projects | dev.to | 5 Apr 2024
  • Running 2 web apps in one application using Go Routines

    2 projects | /r/golang | 30 Jan 2023

Did you know that Go is
the 4th most popular programming language
based on number of references?