cockroach
vitess
Our great sponsors
cockroach | vitess | |
---|---|---|
87 | 51 | |
26,855 | 15,824 | |
0.8% | 1.3% | |
10.0 | 9.9 | |
3 days ago | 4 days ago | |
Go | Go | |
GNU General Public License v3.0 or later | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
cockroach
- Good database solution
-
Does Go work well as a systems language?
You absolutely can write very high performance software in Go, that's kind of the point. You can efficiently interface with C libraries. You can create the sort of software everyone says should be done in Rust, like databases and web servers and system orchestration and games and every other goddamn thing that people will say isn't the right choice for Go.
-
Embed hard-coded SQL into binaries for a cleaner look!
PostgreSQL Parser separated from CockroachDB, a distributed DB.
- Any self hostable postgres, clustering, replication and fail over system?
-
Analysing Github Stars - Extracting and analyzing data from Github using Apache NiFi®, Apache Kafka® and Apache Druid®
Spencer Kimball (now CEO at CockroachDB) wrote an interesting article on this topic in 2021 where they created spencerkimball/stargazers based on a Python script. So I started thinking: could I create a data pipeline using Nifi and Kafka (two OSS tools often used with Druid) to get the API data into Druid - and then use SQL to do the analytics? The answer was yes! And I have documented the outcome below. Here’s my analytical pipeline for Github stars data using Nifi, Kafka and Druid.
- Ask HN: What is your distributed and fault-tolerant PostgreSQL setup?
-
Anyone had a success story of replacing C++ with Go?
One of the most popular distributed DB is built in Go: https://www.cockroachlabs.com/
-
Display CockroachDB metrics in Splunk Dashboards
Recently, I worked on such an integration with Splunk. The Splunk dashboard files that emulate the DB Console are now available in our repo for everyone's benefit.
-
How do I implement a HA PostgreSQL setup in k8s/k3s ?
Technically not postgres, but could be worth checking out cockroach db, it can use any postgres driver from programming languages and is build for distribution. I recently moved from postgres to cockroach (with golang postgres driver) and didnt need to change a single query
vitess
-
Want to avoid MySQL but find PlanetScale really appealing
A lot of this is possible thanks to the magic of Vitess.
-
YouTube System Design
### YouTube The popular implementations of an on-demand video streaming service are the following: - YouTube - Netflix - Vimeo - TikTok --- #### Requirements - The user (**client**) can upload video files - The user can stream video content - The user can search for videos based on the video title --- #### Data storage ##### Database schema - The primary entities are the videos, the users, and the comments tables - The relationship between the users and the videos is 1-to-many - The relationship between the users and the comments table is 1-to-many - The relationship between the videos and the comments table is 1-to-many --- ##### Type of data store - The wide-column data store ([LSM](https://en.wikipedia.org/wiki/Log-structured\_merge-tree) tree-based) such as [Apache HBase](https://hbase.apache.org/) is used to persist thumbnail images for clumping the files together, fault-tolerance, and replication - A cache server such as Redis is used to store the metadata of popular video content - Message queue such as Apache Kafka is used for the asynchronous processing (encoding) of videos - A relational database such as MySQL stores the metadata of the users and the videos - The video files are stored in a managed object storage such as AWS S3 - Lucene-based inverted-index data store such as Apache Solr is used to persist the video index data to provide search functionality --- #### High-level design - Popular video content is streamed from CDN - Video encoding (**transcoding**) is the process of converting a video format to other formats (MPEG, HLS) to provide the best stream possible on multiple devices and bandwidth - A message queue can be configured between services for parallelism and improved fault tolerance Codecs (H.264, VP9, HEVC) are compression and decompression algorithms used to reduce video file size while preserving video quality - The popular video streaming protocols (data transfer standard) are **MPEG-DASH** (Moving Pictures Experts Group - Dynamic Adaptive Streaming over HTTP), **Apple HLS** (HTTP Live Streaming), **Microsoft Smooth Streaming**, and **Adobe HDS** (HTTP Dynamic Streaming) --- #### Video upload workflow 1. The user (**client**) executes a DNS query to identify the server 2. The client makes an HTTP connection to the load balancer 3. The video upload requests are rate limited to prevent malicious clients 4. The load balancer delegates the client's request to an API server (**web server**) with free capacity 5. The web server delegates the client's request to an app server that handles the API endpoint 6. The ID of the uploaded video is stored on the message queue for asynchronous processing of the video file 7. The title and description (**metadata**) of the video are stored in the metadata database 8. The app server queries the object store service to generate a pre-signed URL for storing the raw video file 9. The client uploads the raw video file directly to the object store using the pre-signed URL to save the system network bandwidth 10. The transcoding servers query the message queue using the publish-subscribe pattern to get notified on uploaded videos 11. The transcoding server fetches the raw video file by querying the raw object store 12. The transcoding server transcodes the raw video file into multiple codecs and stores the transcoded content on the transcoded object store 13. The thumbnail server generates on average five thumbnail images for each video file and stores the generated images on the thumbnail store 14. The transcoding server persists the ID of the transcoded video on the message queue for further processing 15. The upload handler service queries the message queue through the publish-subscribe pattern to get notified on transcoded video files 16. The upload handler service updates the metadata database with metadata of transcoded video files 17. The upload handler service queries the notification service to notify the client of the video processing status 18. The database can be partitioned through [consistent hashing](https://systemdesign.one/consistent-hashing-explained/) (key = user ID or video ID) 19. [Block matching](https://en.wikipedia.org/wiki/Block-matching\_algorithm) or [Phase correlation](https://en.wikipedia.org/wiki/Phase\_correlation) algorithms can be used to detect the duplicate video content 20. The web server (API server) must be kept stateless for scaling out through replication 21. The video file is stored in multiple resolutions and formats in order to support multiple devices and bandwidth 22. The video can be split into smaller chunks by the client before upload to support the resume of broken uploads 23. Watermarking and encryption can be used to protect video content 24. The data centers are added to improve latency and data recovery at the expense of increased maintenance workflows 25. Dead letter queue can be used to improve fault tolerance and error handling 26. Chaos engineering is used to identify the failures on networks, servers, and applications 27. Load testing and chaos engineering are used to improve fault tolerance 28. [RAID](https://en.wikipedia.org/wiki/RAID) configuration improves the hardware throughput 29. The data store is partitioned to spread the writes and reads at the expense of difficult joins, transactions, and fat client 30. Federation and sharding are used to scale out the database 31. The write requests are redirected to the leader and the read requests are redirected to the followers of the database 32. [Vitess](https://vitess.io/) is a storage middleware for scaling out MySQL 33. Vitess redirects the read requests that require fresh data to the leader (For example, update user profile operation) 34. Vitess uses a lock server (Apache Zookeeper) for automatic sharding and leader election on the database layer 35. Vitess supports RPC-based joins, indexing, and transactions on SQL database 36. Vitess allows to offload of partitioning logic from the application and improves database queries by caching
-
Typesafe Database Queries on the Edge
PlanetScale is a serverless MySQL database provider which is based on Vitess. You get the scaling benefits of Vitess without the need to manage it yourself.
- YouTube confirms that it has removed the “sort by oldest/newest” option
-
Ask HN: Real-world anecdotes of MySQL at scale?
Are you referring to distributed MySQL such as Vitess? It is the backend for Slack and GitHub; also was the backend for YouTube in the past.
There’s Vitess that’s been mentioned on HN a lot recently. https://vitess.io/
-
One million queries per second with MySQL
> A relational database without relations is an oxymoron.
OK. You're the only one talking to this straw man though. :-) Every Vitess user that I'm aware of has a pretty typical 2NF/3NF schema design. A small sampling of them being listed here: https://vitess.io
You setup your data distribution/partitioning/sharding scheme so that you have data locality for 99.9999+% of your queries -- meaning that the query executes against a data subset that lives on a single shard/node (e.g. sharding by customer) -- and you live with the performance hit and consistency tradeoffs for those very rare cases that cross shard queries cannot be avoided (Vitess does support this). You should do this even if the solution you're using claims to have distributed SQL with ACID and MVCC guarantees/properties. There's no magic that improves the speed of light and removes other resource constraints. In practice most people say they want perfect security/consistency/ but then realize that the costs (perf, resources, $$, etc) are simply so high that it is not practical for their business/use case.
I know MySQL fairly well (I started working at MySQL, AB in 2003) and you can certainly claim that "MySQL-compatible" is dishonest but I would offer a counter claim that either you don't know this space very well or you're not operating in good faith here.
-
Need ideas for dealing with networking for high-throughput IoT platform
Rather than do this sharding manually, you can use something like Vitess.
-
Databases inside or outside k8s cluster?
Examples: - Vitess - MySQL cluster - YugabyteDB - ScyllaDB - Couchbase - ArangoDB
-
How I made a really fast Link Shortener that runs on the edge
The frontend is built with Next.js which is a full stack React framework. I'm using tRPC as my API layer for that sweet type-safety. I wrote a blog about tRPC if you're not familiar with it. The database is a MySQL database (Vitess to be precise) provided by PlanetScale.
What are some alternatives?
supabase - The open source Firebase alternative. Follow to stay updated about our public Beta.
tidb - TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://tidbcloud.com/free-trial
citus - Distributed PostgreSQL as an extension
Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
go-mysql-elasticsearch - Sync MySQL data into elasticsearch
yugabyte-db - YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
rqlite - The lightweight, distributed relational database built on SQLite
InfluxDB - Scalable datastore for metrics, events, and real-time analytics
dgraph - Native GraphQL Database with graph backend
Redis - Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.
Tile38 - Real-time Geospatial and Geofencing