|6 days ago||2 days ago|
|GNU General Public License v3.0 or later||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Composing generic data structures in go
3 projects | dev.to | 30 Nov 2021
Recently a colleague, Nathan, reflecting on CockroachDB, remarked (paraphrased from memory) that the key data structure is the interval btree. The story of Nathan’s addition of the first interval btree to cockroach and the power of copy-on-write data structures is worthy of its own blog post for another day. It’s Nathan’s hand-specialization of that data structure that provided the basis (and tests) for the generalization I’ll be presenting here. The reason for this specialization was as much for the performance wins of avoiding excessive allocations, pointer chasing, and cost of type assertions when using interface boxing.
Stacked changes: how FB and Google engineers stay unblocked and ship faster
12 projects | news.ycombinator.com | 17 Nov 2021
I'm surprised Reviewable hasn't come up in this discussion. It does a great job of allowing stacked code reviews and even handles rebases nicely; the reviewer sees the diff between commit #1 and commit #1' (prime = after rebase).
CockroachDB has been using it since very early in the project.
1 project | reddit.com/r/facepalm | 6 Nov 2021
And even if you did want to run your database on a bunch of untrusted machines, a blockchain, being a linked list, is not a particularly efficient implementation. Its size increases linearly with the number of operations, which, for any rapid-fire application such as banking, means you have a tremendously inefficient marginal computational and storage cost per operation. You’d be considerably better off running something like Cockroach, or FoundationDB, or more ‘out-there’ offerings like Hypercore.
CockroachDB Grants and Schemas explained
1 project | dev.to | 28 Aug 2021
And here: https://github.com/cockroachdb/cockroach/issues/16790
Design to Duty: How we make architecture decisions at Adyen
1 project | dev.to | 28 Jul 2021
As you now know, we do not want to achieve this by restricting payments of some merchants to certain machines, as this would mean the machines are no longer linearly scalable. The information needs to be available locally, so we eventually decided on integrating Cockroach, a distributed database, with our PALs.
8 projects | dev.to | 15 Jul 2021
CockroachDB (label: E-easy) The Scalable, Survivable, Strongly-Consistent SQL Database
The start of my journey learning Go. Any tips/suggestions would greatly appreciated!
6 projects | reddit.com/r/golang | 29 Jun 2021
What is Cost-based Optimization?
4 projects | dev.to | 2 Jun 2021
In CockroachDB, the cost is an abstract 64-bit floating-point scalar value.
#30DaysofAppwrite : Appwrite’s building blocks
3 projects | dev.to | 3 May 2021
Appwrite uses MariaDB as the default database for project collections, documents, and all other metadata. Appwrite is agnostic to the database you use under the hood and support for more databases like Postgres, CockroachDB, MySQL and MongoDB is currently under active development! 😊
I am building a Serverless version of Redis - written in Rust
7 projects | reddit.com/r/rust | 2 May 2021
For me, if you look back to when Redis has been designed - 11 years ago, it was before the Cloud was a thing. Since then, you have Cloud alternatives that are mostly proprietary. The idea of RedisLess is not competing against a product that is existing for 11 years but showing a new path of how we can build a system on top of an existing one. You can see RedisLess as experimentation. How to build Cloud-native databases by taking advantage of existing solutions? TiDB, Yugabyte, CockroachDB are great examples of being MySQL wire protocol compatible and providing a Cloud way of managing data.
1 project | news.ycombinator.com | 3 Aug 2021
You might find https://trino.io/ interesting. It allows you to bolt on a MPP SQL execution engine on top of any data source including pre-built connectors for Druid and Kafka.
It's all ANSI SQL and the best part is you can combine data from heterogenous sources. e.g. You can join data between a topic in Kafka and a table in Druid or even between Kafka, S3 and your RDBMS.
Disclaimer: I'm a maintainer of the project.
What even is data mesh
2 projects | news.ycombinator.com | 29 Jul 2021
Not central to the main ideas of this article, but if you want to have a data mesh that is self-service, why force folks to use a particular storage medium like a data warehouse? That still requires centralization of the data.
Why not instead have a tool like Trino (https://trino.io) that allows you to let different domains use whatever datastore they happen to use. You still would need to enforce schema, but this can be done in tools like schema registry as mentioned in the article along with a data cataloging tool.
These tools facilitate the distributed nature of the problem nicely and encourage healthy standards to be discussed and the formalized in schema definitions and catalogs that remove the ambiguity of discourse and documentation.
Nice example is laid out in this repo of how Trino can accomplish data mesh principles 1 and 3 (https://github.com/findinpath/trino_data_mesh).
What is Cost-based Optimization?
4 projects | dev.to | 2 Jun 2021
In Presto/Trino, the cost is a vector of estimated CPU, memory, and network usage. The vector is also converted into a scalar value during comparison.
Looking for Feedback: Open Source SQL-in-Markdown Reporting tool
2 projects | reddit.com/r/SQL | 1 Jun 2021
Love it! I'd like it to be able to talk to Trino. I'm not sure if there's a driver for node but I could help build it.
ClickHouse: An open-source column-oriented database management system
5 projects | news.ycombinator.com | 27 May 2021
Take a look at query engines like Trino (formerly PrestoSQL) [https://trino.io/]. (Disclaimer: I'm a contributor to Trino).
I used it at a previous job to combine data from MongoDB, Kafka, S3 and Postgres to great effect. It tries to push-down as many operations as possible to the source too to improve performance.
Full ANSI SQL support over multiple number of backends (Kafka, Cassandra, Postgres, ClickHouse, S3 and many more).
The best part is it has a plugin ecosystem so you can very easily implement your own connectors and all the heavy lifting gets done by the core-engine while your plugin only has to abstract your backend to concepts that the engine can understand.
Why hasn't Presto become industry standard?
1 project | news.ycombinator.com | 1 Apr 2021
* Active-active HA is not really necessary IMO as Trino is designed for low latency interactive queries in general. It can handle longer running batch queries but it gives up fault tolerance to fail fast and you just resubmit the query vs predecessors like Hive, Spark, etc... that handle ETL and long running batch processes efficiently but this adds complexity to the query to checkpoint the work. I could see the need for an active-passive HA to have on deck during a failure. Setting up your own active-passive HA is as simple as putting two coordinators behind a proxy and pointing your workers to the proxy address. Then you basically have the proxy run health checks and flip over in the event of an outage. Here's the issue to track native HA though https://github.com/trinodb/trino/issues/391.
Speeding up SQL queries by orders of magnitude using UNION
1 project | news.ycombinator.com | 20 Mar 2021
How does AWS Athena manage to load 10GB/s from s3? I've managed 230 mb/s from a c6gn.16xlarge
1 project | reddit.com/r/aws | 16 Mar 2021
Checkout https://trino.io (formerly Presto) but is what Athena is based off of. Essentially parallelism allows for this so there’s many worker nodes all reading from S3. You can also run Presto on EMR which is sort of fun looking at the admin UI because it will show you how it breaks the query into parts and fans the work out to worker nodes. Pretty cool because if allowed to (from a resource management perspective), Presto will try to saturate the entire cluster CPU resources to compete the query as fast as possible.
Looking for paid contributors to open-source project
1 project | reddit.com/r/golang | 17 Feb 2021
To a purpose-built in-memory database based on Trino (https://trino.io/)
What are some alternatives?
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
dremio-oss - Dremio - the missing link in modern data
Apache Drill - Apache Drill is a distributed MPP query layer for self describing data
tidb - TiDB is an open source distributed HTAP database compatible with the MySQL protocol
Apache Calcite - Apache Calcite
vitess - Vitess is a database clustering system for horizontal scaling of MySQL.
ClickHouse - ClickHouse® is a free analytics DBMS for big data
yugabyte-db - The high-performance distributed SQL database for global, internet-scale apps.
dgraph - Native GraphQL Database with graph backend
hudi - Upserts, Deletes And Incremental Processing on Big Data.
InfluxDB - Scalable datastore for metrics, events, and real-time analytics
rqlite - The lightweight, distributed relational database built on SQLite