System Design: The complete course

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

system-design-primer

380 253,398 0.0 Python

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

System Design Primer

Apache ZooKeeper

36 11,919 8.3 Java

Apache ZooKeeper

To solve this issue we can use a distributed system manager such as Zookeeper which can provide distributed synchronization. Zookeeper can maintain multiple ranges for our servers.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
RabbitMQ

92 11,590 10.0 Starlark

Open source RabbitMQ: core server and tier 1 (built-in) plugins

While this seems like a classic publish-subscribe use case, it is actually not as mobile devices and browsers each have their own way of handling push notifications. Usually, notifications are handled externally via Firebase Cloud Messaging (FCM) or Apple Push Notification Service (APNS) unlike message fan-out which we commonly see in backend services. We can use something like Amazon SQS or RabbitMQ to support this functionality.

PostgreSQL

404 14,673 10.0 C

Mirror of the official PostgreSQL GIT repository. Note that this is just a *mirror* - we don't work with pull requests on github. To contribute, please see https://wiki.postgresql.org/wiki/Submitting_a_Patch

We already have access to the latitude and longitude of our customers, and with databases like PostgreSQL and MySQL we can perform a query to find nearby driver locations given a latitude and longitude (X, Y) within a radius (R).

MySQL

146 10,208 9.8 C++

MySQL Server, the world's most popular open source database, and MySQL Cluster, a real-time, open source transactional database.

We already have access to the latitude and longitude of our customers, and with databases like PostgreSQL and MySQL we can perform a query to find nearby driver locations given a latitude and longitude (X, Y) within a radius (R).

MongoDB

248 25,418 10.0 C++

The MongoDB Database

Since the data is not strongly relational, NoSQL databases such as Amazon DynamoDB, Apache Cassandra, or MongoDB will be a better choice here, if we do decide to use an SQL database then we can use something like Azure SQL Database or Amazon RDS.

GlusterFS

19 4,478 6.4 C

Gluster Filesystem : Build your distributed storage in minutes

But where can we store files at scale? Well, object storage is what we're looking for. Object stores break data files up into pieces called objects. It then stores those objects in a single repository, which can be spread out across multiple networked systems. We can also use distributed file storage such as HDFS or GlusterFS.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
envoy

67 23,886 10.0 C++

Cloud-native high-performance edge/middle/service proxy

Service-to-service communication is essential in a distributed application but routing this communication, both within and across application clusters, becomes increasingly complex as the number of services grows. Service mesh enables managed, observable, and secure communication between individual services. It works with a service discovery protocol to detect services. Istio and envoy are some of the most commonly used service mesh technologies.

elasticsearch-mapper-attachments

102 503 0.0 Java

Discontinued Mapper Attachments Type plugin for Elasticsearch

Sometimes traditional DBMS are not performant enough, we need something which allows us to store, search, and analyze huge volumes of data quickly and in near real-time and give results within milliseconds. Elasticsearch can help us with this use case.

consul

57 27,774 9.9 Go

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.

Consul

ArangoDB

17 13,340 9.9 C++

🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

For mutual friends, we can build a social graph for every user. Each node in the graph will represent a user and a directional edge will represent followers and followees. After that, we can traverse the followers of a user to find and suggest a mutual friend. This would require a graph database such as Neo4j and ArangoDB.

murder

1,344 11 10.0 Ruby

Large scale server deploys using BitTorrent and the BitTornado library (by ervinb)

Let's design a Twitter like social media service, similar to services like Facebook, Instagram, etc.

Stripe

300 3,596 8.9 PHP

PHP library for the Stripe API.

Handling payments at scale is challenging, to simplify our system we can use a third-party payment processor like Stripe or PayPal. Once the payment is complete, the payment processor will redirect the user back to our application and we can set up a webhook to capture all the payment-related data.

Apache Spark

101 38,320 10.0 Scala

Apache Spark - A unified analytics engine for large-scale data processing

Recording analytics and metrics is one of our extended requirements. We can capture the data from different services and run analytics on the data using Apache Spark which is an open-source unified analytics engine for large-scale data processing. Additionally, we can store critical metadata in the views table to increase data points within our data.

Redis

318 64,705 9.7 C

Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.

Quadtree seems perfect for our use case, we can update the Quadtree every time we receive a new location update from the driver. To reduce the load on the quadtree servers we can use an in-memory datastore such as Redis to cache the latest updates. And with the application of mapping algorithms such as the Hilbert curve, we can perform efficient range queries to find nearby drivers for the customer.

Neo4j

49 12,430 9.9 Java

Graphs for Everyone

For mutual friends, we can build a social graph for every user. Each node in the graph will represent a user and a directional edge will represent followers and followees. After that, we can traverse the followers of a user to find and suggest a mutual friend. This would require a graph database such as Neo4j and ArangoDB.

NATS

106 14,720 9.8 Go

High-Performance server for NATS.io, the cloud and edge native messaging system.

Exactly once delivery and message ordering is challenging in a distributed system, we can use a dedicated message broker such as Apache Kafka or NATS to make our notification system more robust.

Memcached

55 13,178 8.5 C

memcached development tree

In a location services-based platform, caching is important. We have to be able to cache the recent locations of the customers and drivers for fast retrieval. We can use solutions like Redis or Memcached but what kind of cache eviction policy would best fit our needs?

Apache Solr

31 4,365 0.0 Java

Apache Lucene and Solr open-source search software

Elasticsearch is a distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. It is built on top of Apache Lucene.

kubernetes

657 106,611 10.0 Go

Production-Grade Container Scheduling and Management

Containers (eg. Kubernetes, Amazon ECS)

ApacheKafka

104 28 0.0

A curated re-sources list for awesome Apache Kafka

Push notifications are an integral part of any social media platform. We can use a message queue or a message broker such as Apache Kafka with the notification service to dispatch requests to Firebase Cloud Messaging (FCM) or Apple Push Notification Service (APNS) which will handle the delivery of the push notifications to user devices.

gRPC

201 40,685 9.9 C++

The C based gRPC (C++, Python, Ruby, Objective-C, PHP, C#)

gRPC is a modern open-source high-performance Remote Procedure Call (RPC) framework that can run in any environment. It can efficiently connect services in and across data centers with pluggable support for load balancing, tracing, health checking, authentication and much more.

foundation

210 86 0.0

GraphQL Foundation Charter and Legal Documents (by graphql)

GraphQL is a query language and server-side runtime for APIs that prioritizes giving clients exactly the data they request and no more. It was developed by Facebook and later open-sourced in 2015.

FFmpeg

485 42,250 10.0 C

Mirror of https://git.ffmpeg.org/ffmpeg.git

This results in a smaller size file and a much more optimized format for the target devices. Standalone solutions such as FFmpeg or cloud-based solutions like AWS Elemental MediaConvert can be used to implement this step of the pipeline.

etcd

61 46,292 9.9 Go

Distributed reliable key-value store for the most critical data of a distributed system

etcd

CouchDB

27 6,009 9.5 Erlang

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability

Example: Apache Cassandra, CouchDB.

nodejs-storage

20 878 8.5 TypeScript

Node.js client for Google Cloud Storage: unified object storage for developers and enterprises, from live data serving to data analytics/ML to data archiving.

We can use object stores like Amazon S3, Azure Blob Storage, or Google Cloud Storage for this use case.

nodejs-pubsub

24 512 8.6 TypeScript

Node.js client for Google Cloud Pub/Sub: Ingest event streams from anywhere, at any scale, for simple, reliable, real-time stream analytics.

Google Pub/Sub

Apache Cassandra

35 8,507 9.9 Java

Mirror of Apache Cassandra

Data partitioning in Apache Cassandra.

auth0-java

129 279 8.2 Java

Java client library for the Auth0 platform

Auth0

Aerospike

15 968 8.8 C

Aerospike Database Server – flash-optimized, in-memory, nosql database
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

How to choose the right type of database
15 projects | dev.to | 28 Feb 2024
The lightweight, easy-to-use, distributed relational database built on SQLite
1 project | news.ycombinator.com | 23 Feb 2024
CursusDB – A new scalable distributed document oriented database
5 projects | news.ycombinator.com | 4 Jan 2024
Rqlite 8.0
1 project | news.ycombinator.com | 7 Dec 2023
Adding new database engine support
1 project | /r/stalwartlabs | 25 Nov 2023

System Design: The complete course

This page summarizes the projects mentioned and recommended in the original post on dev.to
Database NoSQL Java Distributed Systems Go
Post date: 16 Aug 2022

system-design-primer

Apache ZooKeeper

InfluxDB

RabbitMQ

PostgreSQL

MySQL

MongoDB

GlusterFS

WorkOS

envoy

elasticsearch-mapper-attachments

consul

ArangoDB

murder

Stripe

Apache Spark

Redis

Neo4j

NATS

Memcached

Apache Solr

kubernetes

ApacheKafka

gRPC

foundation

FFmpeg

etcd

CouchDB

nodejs-storage

nodejs-pubsub

Apache Cassandra

auth0-java

Aerospike

SaaSHub

Related posts

System Design: The complete course

This page summarizes the projects mentioned and recommended in the original post on dev.to Database NoSQL Java Distributed Systems Go Post date: 16 Aug 2022

Related posts

This page summarizes the projects mentioned and recommended in the original post on dev.to
Database NoSQL Java Distributed Systems Go
Post date: 16 Aug 2022