tiflash
version2
Our great sponsors
tiflash | version2 | |
---|---|---|
5 | 6 | |
929 | 1,216 | |
1.3% | 2.4% | |
9.7 | 5.8 | |
2 days ago | 3 months ago | |
C++ | C++ | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tiflash
-
Significantly faster quicksort using SIMD
This is great, and can definitely help quite a lot database and big data projects. I can immediately imagine this is a perfect match to one open source HTAP system (https://github.com/tigraph/tidb) which uses SIMD in their columnar processing engine TiFlash (https://github.com/pingcap/tiflash).
-
Best language for database kernel development?
One of the founder of TiDB/TiKV here from [PingCAP](https://pingcap.com)
I have been thinking about this problem with my peers when I started to build [TiDB](https://github.com/pingcap/tidb) seven years ago. At that time, nearly all of us were familiar with Go language, so we decided to use Go to build the SQL layer of TiDB. Thanks to Go, we could develop TiDB very quickly and released the first MVP in half a year. I remembered clearly the sense when we ran TPC-C successfully, although the TPMC was just 1 at that time, this was a good start for us.
But Go had some problems, e.g. the GC was not good before, the fair scheduling might cause some latency problem, or data racing may happen sometimes. So when we decided to build a distributed storage (aha, [TiKV](https://githbu.com/tikv/tikv), we wanted use another language to guarantee safety. I really admire our courage - we chose Rust which was just released 1.0 and missed lots of libraries at that time. Now it seems that this is an awesome choice, TiKV has been graduated from CNCF, and been used as building block not only for TiDB, but also for other distributed systems. Thanks Rust.
When TiDB started being used in many companies, we found that our customer not only ran lots of online transactions in TiDB, but also they wanted to ran some realtime analytic queries directly because the data has been in TiDB already. So we decided to build a HTAP database, to introduce a column storage beside TiKV, this is [TiFlash](https://github.com/pingcap/tiflash). We build TiFlash based on Clickhouse, so of course, we use C++.
As you can see, to build only one integrated database - TiDB, we at least use three languages, every language has its own reason to be introduced. We can treat the distributed database as a service system, each service can be built with your favorite language and the services are linked by gRPC like TiDB does now. You may doubt that - “hey, guys, you are building a database, performance is very importance”. Yes, this is true, but we also build a complex distributed system, especially on the cloud. Scale-out, elastic, user experience must be important too. This is trade off for an engineer :-)
- TiFlash: The columnar storage engine of TiDB, is now open sourced
- Tiflash, Yet another columnar storage engine based on ClickHouse
- TiFlash: Analytical Engine for TiDB
version2
-
SIMD intrinsics and the possibility of a standard library solution
Vector class library - 938 GH stars
- Checking for the absence of a string, naive AVX-512 edition
-
-🎄- 2022 Day 4 Solutions -🎄-
Most of the time is spent parsing, but this problem lends itself nicely to a SIMD formulation, which using vectorclass doesn't even require detailed knowledge of the intrinsics. Hot runs take ~14 µs on a Core i9-12900K, including I/O. Full code is (here)[https://github.com/ahans/aoc2022/blob/main/cpp/day04.cc], the interesting part is this, where we process 32 elements at once:
- Significantly faster quicksort using SIMD
- Parsing JSON faster with Intel AVX-512
- What do you think is faster for batch-processing a lot of "double-type" arithmetic?
What are some alternatives?
vops
highway - Performance-portable, length-agnostic SIMD with runtime dispatch
aoc22 - Advent of Code solutions for 2022 (in Python)
advent2022
adventOfCode2022
simde - Implementations of SIMD instruction sets for systems which don't natively support them.
Day4 - My (messy) Python3 solution for day4's puzzle.
advent-of-code
adventOfCode2022 - For tracking my advent of code participation 2022
advent-of-code-2022-rust - My Rust advent of code 2022 solutions
simd_decimal - vectorized decimal parsing
aoc22 - aoc22