|6 days ago||4 days ago|
|MIT License||GNU General Public License v3.0 or later|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Online courses to learn more about databases and the concepts taught in Week 7?
1 project | reddit.com/r/cs50 | 10 May 2021
check this course from cmu
C++ Project Ideas
2 projects | reddit.com/r/cpp_questions | 28 Jan 2021
Are there any minimal relational DBs with understandable sources for learning internals?
2 projects | reddit.com/r/Database | 20 Jan 2021
Andy Pavlo's courses at CMU teach database internals (e.g., https://15445.courses.cs.cmu.edu/fall2020/). In the project coursework, you build features for a teaching relational database called BusTub (https://github.com/cmu-db/bustub).
C++ use cases for a backend developer
1 project | reddit.com/r/cpp_questions | 6 Jan 2021
What about writing a database engine? I've also been learning C++ and have been studying this project to learn how DBMSs work internally: https://github.com/cmu-db/bustub
Distributed SQL Essentials: Sharding and Partitioning in YugabyteDB
1 project | dev.to | 21 Nov 2021
The SST files store the key-value pairs for tables and indexes. Sharding is the right term here because each tablet is a database (based on RocksDB), with its own protection. This looks like the sharded databases we described above, except that they are not SQL databases but key-value document stores. They have all the required features for a reliable datastore, with transactions and strong consistency. However, they don’t have the burden of managing them as multiple databases because the SQL layer is above. Joins and secondary indexes are not processed at this level because this prevents cross-shard transactions.
Hello guys , needed help for building a key-value data store
1 project | reddit.com/r/Database | 9 Oct 2021
- RocksDB - kv store that uses LSM tree;
We built an open-source SQL DB for Intel SGX enclaves
3 projects | reddit.com/r/cybersecurity | 7 Aug 2021
Hi everyone! Our team just released EdgelessDB, an open-source database built on MariaDB that runs completely inside Intel SGX enclaves. As storage engine, it uses RocksDB with a custom encryption engine. The engine uses AES-GCM and is optimized for RocksDB’s specific SST file layout and the enclave environment. It has some nice properties like global confidentiality and verifiability and it considers strong attackers like malicious admins or rootkits. It also delivers rather low overheads (<10% for the TPC-C benchmark on Azure). In short: all data is only ever decrypted inside the enclave. This is different from other databases, where data and corresponding keys are processed in the clear in memory. We believe this is useful because (1) it’s very secure and (2) it enables some interesting use cases, like secure data pooling between parties. If you’re interested in trying it out: here’s a quickstart guide. In essence, you can run the Docker image with a single command on any recent Intel Xeon with SGX. Code and more info can be found on GitHub. Would be great to get your feedback on this :-)
Apache Hudi - The Streaming Data Lake Platform
8 projects | dev.to | 27 Jul 2021
Hudi tables can be used as sinks for Spark/Flink pipelines and the Hudi writing path provides several enhanced capabilities over file writing done by vanilla parquet/avro sinks. Hudi classifies write operations carefully into incremental (insert, upsert, delete) and batch/bulk operations (insert_overwrite, insert_overwrite_table, delete_partition, bulk_insert) and provides relevant functionality for each operation in a performant and cohesive way. Both upsert and delete operations automatically handle merging of records with the same key in the input stream (say, a CDC stream obtained from upstream table) and then lookup the index, finally invoke a bin packing algorithm to pack data into files, while respecting a pre-configured target file size. An insert operation on the other hand, is intelligent enough to avoid the precombining and index lookup, while retaining the benefits of the rest of the pipeline. Similarly, bulk_insert operation provides several sort modes for controlling initial file sizes and file counts, when importing data from an external table to Hudi. The other batch write operations provide MVCC based implementations of typical overwrite semantics used in batch data pipelines, while retaining all the transactional and incremental processing capabilities, making it seamless to switch between incremental pipelines for regular runs and batch pipelines for backfilling/dropping older partitions. The write pipeline also contains lower layers optimizations around handling large merges by spilling to rocksDB or an external spillable map, multi-threaded/concurrent I/O to improve write performance.8 projects | dev.to | 27 Jul 2021
There is a fundamental tradeoff today in data lakes between faster writing and great query performance. Faster writing typically involves writing smaller files (and later clustering them) or logging deltas (and later merging on read). While this provides good performance already, the pursuit of great query performance often warrants opening fewer number of files/objects on lake storage and may be pre-materializing the merges between base and delta logs. After all, most databases employ a buffer pool or block cache, to amortize the cost of accessing storage. Hudi already contains several design elements that are conducive for building a caching tier (write-through or even just populated by an incremental query), that will be multi-tenant and can cache pre-merged images of the latest file slices, consistent with the timeline. Hudi timeline can be used to simply communicate caching policies, just like how we perform inter table service co-ordination. Historically, caching has been done closer to the query engines or via intermediate in-memory file systems. By placing a caching tier closer and more tightly integrated with a transactional lake storage like Hudi, all query engines would be able to share and amortize the cost of the cache, while supporting updates/deletes as well. We look forward to building a buffer pool for the lake that works across all major engines, with the contributions from the rest of the community.
Ribbon filter: Practically smaller than Bloom and Xor
2 projects | news.ycombinator.com | 11 Jul 2021
How to configure kafka so that it can pick up and uses latest RocksDB for Kafka Streams
2 projects | reddit.com/r/Kafka | 13 Jun 2021
As per the (https://github.com/facebook/rocksdb/pull/7714) PR, I m seeing that the RocksDB has fixed this issue from their end. Can anyone please tell me how to update or let kafka streams use a later version of rocks db. Please let me know if this is the correct approach or should I do something different.
Just finished a Biff rewrite (batteries-included web framework)
3 projects | reddit.com/r/Clojure | 18 May 2021
This is an issue in rocksdb's RocksJava lib, they sorted out M1 support quickly in the main rocks libs but the java lib still has an open issue. https://github.com/facebook/rocksdb/issues/7720
Has anyone managed to upgrade to v16 with Rook?
1 project | reddit.com/r/ceph | 7 May 2021
Fix RocksDB SIGILL error on Raspberry PI 4
Nano full node on Mac M1
2 projects | reddit.com/r/nanocurrency | 26 Feb 2021
There's an issue with RocksDB, and some other irritating Cmake things I forgot in the last week. This needs to get merged, for one: https://github.com/facebook/rocksdb/pull/7714
What are some alternatives?
LMDB - Read-only mirror of official repo on openldap.org. Issues and pull requests here are ignored. Use OpenLDAP ITS for issues.
LevelDB - LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
SQLite - Unofficial git mirror of SQLite sources (see link for build instructions)
sled - the champagne of beta embedded databases
ClickHouse - ClickHouse® is a free analytics DBMS for big data
TileDB - The Universal Storage Engine
Sophia - Modern transactional key-value/row storage library.
upscaledb - A very fast lightweight embedded database engine with a built-in query language.
Bedrock - Rock solid distributed database specializing in active/active automatic failover and WAN replication
debezium - Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
Hiredis - Minimalistic C client for Redis >= 1.2