nessie
noms
nessie | noms | |
---|---|---|
13 | 11 | |
834 | 7,502 | |
3.6% | - | |
9.9 | 1.9 | |
4 days ago | over 2 years ago | |
Java | Go | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
nessie
-
A deep dive into the concept and world of Apache Iceberg Catalogs
Nessie is an innovative open-source catalog that extends beyond the traditional catalog capabilities in the Apache Iceberg ecosystem, introducing git-like features to data management. This catalog not only tracks table metadata but also allows users to capture commits at a holistic level, enabling advanced operations such as multi-table transactions, rollbacks, branching, and tagging. These features provide a new layer of flexibility and control over data changes, resembling version control systems in software development.
- FLaNK Stack Weekly 22 January 2024
-
Why is Hive Metastore everywhere? (Especially Iceberg)
Try Nessie https://github.com/projectnessie/nessie - it recently got trino support as well ..
- What are the main things I need to know to be hired as a Java developer?
- Is learning and mastering Spring & Spring boot worth it in 2023 ?
-
Which lakehouse table format do you expect your organization will be using by the end of 2023?
Project Nessie (https://projectnessie.org/) will be the catalog that eventually decouples Iceberg from Hive. At that point, I think it will be a no brainer to go Iceberg over Delta.
-
5 Reasons Your Data Lakehouse should Embrace Dremio Cloud
The Dremio Sonar query engine can query your data where it exists whether it's AWS Glue, S3, Nessie Catalogs, MySQL, Postgres, RedShift and an ever growing list of sources.
- Project Nessie: Transactional Catalog for Data Lakes with Git-Like Semantics
-
Introduction to The World of Data - (OLTP, OLAP, Data Warehouses, Data Lakes and more)
We will also need a catalog to track all of these tables, with the open source Project Nessie we can do just that, and also get great versioning features similar to using Git when developing applications allowing data engineers to practice "data as code" and "write-audit-publish" patterns on their data.
- DoltLab v0.2.0
noms
-
How Dolt Stores Table Data
This is from 2022. It is based on Noms [1], which is no longer maintained (they forked it).
I think the Noms doc linked from this article [2] is clearer than the article itself. That said I sill cannot turn my head around to grasp how this entire thing work tbh. I hope they wrote a peer reviewed paper to serve the audience better.
[1] https://github.com/attic-labs/
[2] https://github.com/attic-labs/noms/blob/master/doc/intro.md#...
-
I was wrong. CRDTs are the future
I am. But i know very little about CRDTs lol, so we'll see how that goes. I'm interested in converting some immutable, local-first data warehouse tooling i enjoy to a CRDT version. Prior it was more.. Git-like. Basically just Git with data structures inspired-massively from Noms[1].
The thing i've found most interesting is it appears[2] that CRDT backends need to expose CRDT flavored types to users. Which is to say how i'm writing this combines the notion of a type, say `[i32]` with how you want the merges to work. CRDT works great but based on my amateur-hour researching on the subject i don't feel you can write a single CRDT merge strategy for a single data type ala `[i32]` and have it be always correct. Applications need to indicate enough context on what makes sense for a given data type.
So yea, i agree with you. I'm interested in making a database-like thing, backed by CRDTs, but i also have seen very few general purpose implementations with CRDTs. It feels like i'm breaking "new ground", while having no idea what i'm doing and having no intention of being an actual researcher here. I'm just making apps i enjoy heh.
[1]: https://github.com/attic-labs/noms
- Building a decentralized database
-
Picking low-hanging memory usage bugs of an open source database
Most of the changes are in the noms package which used to live in a separate repo (https://github.com/attic-labs/noms), but Dolt has since adopted them.
-
Downsides of Offline First
Not much more to say other than Noms was my favorite project (https://github.com/attic-labs/noms) for a while until acquisition and the engineers are now the ones behind Replicache (https://replicache.dev/).
I think this is going to be the next "Realm" that works everywhere.
- calling Format() on a time struct in a golang program changes the default Location's timezone information in the rest of the program
-
Steps to build Database System from sratch?
The storage layer based on Noms: https://github.com/attic-labs/noms
- Noms: The versioned, forkable, syncable database
-
Dolt is Git for Data: a SQL database that you can fork, clone, branch, merge
Noms might be what you’re looking for (https://github.com/attic-labs/noms). Dolt is actually a fork of Noms.
-
CondensationDB: Build secure and collaborative apps [open-source]
People that are interested in a similar feature set should check out https://github.com/attic-labs/noms and the SQL fork of Noms, https://github.com/dolthub/dolt
What are some alternatives?
git-bug - Distributed, offline-first bug tracker embedded in git, with bridges
rqlite - The lightweight, distributed relational database built on SQLite.
dvc - 🦉 ML Experiments and Data Management with Git
dat - Go Postgres Data Access Toolkit
hiveberg - Demonstration of a Hive Input Format for Iceberg
dolt - Dolt – Git for Data
dremio-oss - Dremio - the missing link in modern data
sql-migrate - SQL schema migration tool for Go.
skeema - Declarative pure-SQL schema management for MySQL and MariaDB
Flyway - Flyway by Redgate • Database Migrations Made Easy.
cockroach - CockroachDB - the open source, cloud-native distributed SQL database.