tidb
vitess
Our great sponsors
tidb | vitess | |
---|---|---|
27 | 60 | |
36,046 | 17,777 | |
0.8% | 1.3% | |
10.0 | 9.9 | |
6 days ago | 7 days ago | |
Go | Go | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tidb
-
A MySQL compatible database engine written in pure Go
tidb has been around for a while, it is distributed, written in Go and Rust, and MySQL compatible. https://github.com/pingcap/tidb
Somewhat relatedly, StarRocks is also MySQL compatible, written in Java and C++, but it's tackling OLAP use-cases. https://github.com/StarRocks/starrocks
- Embed hard-coded SQL into binaries for a cleaner look!
-
Ask HN: Who is hiring? (January 2023)
PingCAP | https://www.pingcap.com | Database Engineer, Product Manager, Developer Advocate and more | Remote in California | Full-time
We work on a MySQL compatible distributed database called TiDB https://github.com/pingcap/tidb/ and key-value store called TiKV.
TiDB is written in Go and TiKV is written in Rust.
More roles and locations are available on https://www.pingcap.com/careers/
-
Banco de dados puramente com go
Pesquise por CockroachDB ou TiDB
- MySQL-mimic - Python implementation of the MySQL server wire protocol.
- Apache Pegasus – A a distributed key-value storage system
- Gitlab is splitting their main and ci Postgres databases
-
Open Source Databases in Go
tidb - TiDB is a distributed SQL database. Inspired by the design of Google F1.
-
Gitea – a painless self-hosted Git service
Gitea is very easy to use, but I find the Activity feature is a little slow.
I experienced the "Try Gitea" service and migrated our TiDB repo https://github.com/pingcap/tidb to it. When I clicked the Activity tab and selected "1 year" period, I found the page loading was so slow, nearly 90s. And I also found that this Activity doesn't have a Cache, I re-selected "1 year" again, and the page loading was nearly the same time.
I guess Gitea uses git command to traverse all the logs for the period every time. Maybe it can use a database to speed up, or like Github only provide at max "1 month" period.
-
Best language for database kernel development?
One of the founder of TiDB/TiKV here from [PingCAP](https://pingcap.com)
I have been thinking about this problem with my peers when I started to build [TiDB](https://github.com/pingcap/tidb) seven years ago. At that time, nearly all of us were familiar with Go language, so we decided to use Go to build the SQL layer of TiDB. Thanks to Go, we could develop TiDB very quickly and released the first MVP in half a year. I remembered clearly the sense when we ran TPC-C successfully, although the TPMC was just 1 at that time, this was a good start for us.
But Go had some problems, e.g. the GC was not good before, the fair scheduling might cause some latency problem, or data racing may happen sometimes. So when we decided to build a distributed storage (aha, [TiKV](https://githbu.com/tikv/tikv), we wanted use another language to guarantee safety. I really admire our courage - we chose Rust which was just released 1.0 and missed lots of libraries at that time. Now it seems that this is an awesome choice, TiKV has been graduated from CNCF, and been used as building block not only for TiDB, but also for other distributed systems. Thanks Rust.
When TiDB started being used in many companies, we found that our customer not only ran lots of online transactions in TiDB, but also they wanted to ran some realtime analytic queries directly because the data has been in TiDB already. So we decided to build a HTAP database, to introduce a column storage beside TiKV, this is [TiFlash](https://github.com/pingcap/tiflash). We build TiFlash based on Clickhouse, so of course, we use C++.
As you can see, to build only one integrated database - TiDB, we at least use three languages, every language has its own reason to be introduced. We can treat the distributed database as a service system, each service can be built with your favorite language and the services are linked by gRPC like TiDB does now. You may doubt that - “hey, guys, you are building a database, performance is very importance”. Yes, this is true, but we also build a complex distributed system, especially on the cloud. Scale-out, elastic, user experience must be important too. This is trade off for an engineer :-)
vitess
-
A MySQL compatible database engine written in pure Go
With Vitess likely merging a lot of its binaries into a single unified binary: https://github.com/vitessio/vitess/issues/7471#issuecomment-...
... it would be a wild future if Vitess replaced the underlying MySQL engine with this as long as the performance is good enough.
-
Vitess 18
Why would it be a Google project? https://github.com/vitessio/vitess
-
PlanetScale Scaler Pro
This is great news. I strolled around https://github.com/vitessio/vitess/issues/12967.
Are there any public discussions of more trade-offs vitess has to make to enable fks?
- Scaling Databases at Activision [pdf]
-
Want to avoid MySQL but find PlanetScale really appealing
A lot of this is possible thanks to the magic of Vitess.
-
YouTube System Design
### YouTube The popular implementations of an on-demand video streaming service are the following: - YouTube - Netflix - Vimeo - TikTok --- #### Requirements - The user (**client**) can upload video files - The user can stream video content - The user can search for videos based on the video title --- #### Data storage ##### Database schema - The primary entities are the videos, the users, and the comments tables - The relationship between the users and the videos is 1-to-many - The relationship between the users and the comments table is 1-to-many - The relationship between the videos and the comments table is 1-to-many --- ##### Type of data store - The wide-column data store ([LSM](https://en.wikipedia.org/wiki/Log-structured\_merge-tree) tree-based) such as [Apache HBase](https://hbase.apache.org/) is used to persist thumbnail images for clumping the files together, fault-tolerance, and replication - A cache server such as Redis is used to store the metadata of popular video content - Message queue such as Apache Kafka is used for the asynchronous processing (encoding) of videos - A relational database such as MySQL stores the metadata of the users and the videos - The video files are stored in a managed object storage such as AWS S3 - Lucene-based inverted-index data store such as Apache Solr is used to persist the video index data to provide search functionality --- #### High-level design - Popular video content is streamed from CDN - Video encoding (**transcoding**) is the process of converting a video format to other formats (MPEG, HLS) to provide the best stream possible on multiple devices and bandwidth - A message queue can be configured between services for parallelism and improved fault tolerance Codecs (H.264, VP9, HEVC) are compression and decompression algorithms used to reduce video file size while preserving video quality - The popular video streaming protocols (data transfer standard) are **MPEG-DASH** (Moving Pictures Experts Group - Dynamic Adaptive Streaming over HTTP), **Apple HLS** (HTTP Live Streaming), **Microsoft Smooth Streaming**, and **Adobe HDS** (HTTP Dynamic Streaming) --- #### Video upload workflow 1. The user (**client**) executes a DNS query to identify the server 2. The client makes an HTTP connection to the load balancer 3. The video upload requests are rate limited to prevent malicious clients 4. The load balancer delegates the client's request to an API server (**web server**) with free capacity 5. The web server delegates the client's request to an app server that handles the API endpoint 6. The ID of the uploaded video is stored on the message queue for asynchronous processing of the video file 7. The title and description (**metadata**) of the video are stored in the metadata database 8. The app server queries the object store service to generate a pre-signed URL for storing the raw video file 9. The client uploads the raw video file directly to the object store using the pre-signed URL to save the system network bandwidth 10. The transcoding servers query the message queue using the publish-subscribe pattern to get notified on uploaded videos 11. The transcoding server fetches the raw video file by querying the raw object store 12. The transcoding server transcodes the raw video file into multiple codecs and stores the transcoded content on the transcoded object store 13. The thumbnail server generates on average five thumbnail images for each video file and stores the generated images on the thumbnail store 14. The transcoding server persists the ID of the transcoded video on the message queue for further processing 15. The upload handler service queries the message queue through the publish-subscribe pattern to get notified on transcoded video files 16. The upload handler service updates the metadata database with metadata of transcoded video files 17. The upload handler service queries the notification service to notify the client of the video processing status 18. The database can be partitioned through [consistent hashing](https://systemdesign.one/consistent-hashing-explained/) (key = user ID or video ID) 19. [Block matching](https://en.wikipedia.org/wiki/Block-matching\_algorithm) or [Phase correlation](https://en.wikipedia.org/wiki/Phase\_correlation) algorithms can be used to detect the duplicate video content 20. The web server (API server) must be kept stateless for scaling out through replication 21. The video file is stored in multiple resolutions and formats in order to support multiple devices and bandwidth 22. The video can be split into smaller chunks by the client before upload to support the resume of broken uploads 23. Watermarking and encryption can be used to protect video content 24. The data centers are added to improve latency and data recovery at the expense of increased maintenance workflows 25. Dead letter queue can be used to improve fault tolerance and error handling 26. Chaos engineering is used to identify the failures on networks, servers, and applications 27. Load testing and chaos engineering are used to improve fault tolerance 28. [RAID](https://en.wikipedia.org/wiki/RAID) configuration improves the hardware throughput 29. The data store is partitioned to spread the writes and reads at the expense of difficult joins, transactions, and fat client 30. Federation and sharding are used to scale out the database 31. The write requests are redirected to the leader and the read requests are redirected to the followers of the database 32. [Vitess](https://vitess.io/) is a storage middleware for scaling out MySQL 33. Vitess redirects the read requests that require fresh data to the leader (For example, update user profile operation) 34. Vitess uses a lock server (Apache Zookeeper) for automatic sharding and leader election on the database layer 35. Vitess supports RPC-based joins, indexing, and transactions on SQL database 36. Vitess allows to offload of partitioning logic from the application and improves database queries by caching
-
Typesafe Database Queries on the Edge
PlanetScale is a serverless MySQL database provider which is based on Vitess. You get the scaling benefits of Vitess without the need to manage it yourself.
- YouTube confirms that it has removed the “sort by oldest/newest” option
-
Ask HN: Real-world anecdotes of MySQL at scale?
Are you referring to distributed MySQL such as Vitess? It is the backend for Slack and GitHub; also was the backend for YouTube in the past.
What are some alternatives?
supabase - The open source Firebase alternative.
cockroach - CockroachDB - the open source, cloud-native distributed SQL database.
citus - Distributed PostgreSQL as an extension
oceanbase - OceanBase is an enterprise distributed relational database with high availability, high performance, horizontal scalability, and compatibility with SQL standards.
go-mysql-elasticsearch - Sync MySQL data into elasticsearch
InfluxDB - Scalable datastore for metrics, events, and real-time analytics
kingshard - A high-performance MySQL proxy
Tile38 - Real-time Geospatial and Geofencing
go-mysql - a powerful mysql toolset with Go
migrate - Database migrations. CLI and Golang library.