Pravega
streaming-consistency
Pravega | streaming-consistency | |
---|---|---|
2 | 3 | |
1,966 | 19 | |
0.1% | - | |
8.3 | 1.8 | |
about 1 month ago | about 3 years ago | |
Java | Java | |
Apache License 2.0 | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Pravega
-
Building a Real-Time Data Warehouse with TiDB and Pravega
Open sourced by Dell EMC, Pravega is a stream storage system and a Cloud Native Computing Foundation (CNCF) sandbox project. It is similar to Kafka and Apache Pulsar and provides stream and schema registry. But Pravega offers more functionalities:
- An opinionated map of incremental and streaming systems (2018)
streaming-consistency
-
The Query Your Database Can’t Answer
Anyone thinking about using Confluent as some kind of alternative to a database should read this blog post outlining the myriad correctness problems with ksqlDB: https://scattered-thoughts.net/writing/internal-consistency-...
-
An opinionated map of incremental and streaming systems (2018)
Spark structured streaming is in there under structured, high temporal locality.
It didn't make it into https://scattered-thoughts.net/writing/internal-consistency-... because it has severe limitations for low temporal locality operations:
> * As of Spark 2.4, you can use joins only when the query is in Append output mode. Other output modes are not yet supported.
-
Internal Consistency in Streaming Systems
> And then try to join credits and debits together by updating_tx.
You can't join on updating_tx because the credits and debits per account are disjoint sets of transactions - that join will never produce output.
I did try something similar with timestamps - https://github.com/jamii/streaming-consistency/blob/main/fli.... This is also wrong (because the timestamps don't have to match between credits and debits) but it at least produces output. It had a very similar error distribution to the original.
What are some alternatives?
kafka-streams-in-action - Source code for the Kafka Streams in Action Book
lasp - Prototype implementation of Lasp in Erlang.
Alluxio (formerly Tachyon) - Alluxio, data orchestration for analytics and machine learning in the cloud
differential-datalog - DDlog is a programming language for incremental computation. It is well suited for writing programs that continuously update their output in response to input changes. A DDlog programmer does not write incremental algorithms; instead they specify the desired input-output mapping in a declarative manner.
Seaweed File System - SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding. [Moved to: https://github.com/seaweedfs/seaweedfs]
flow - Computational parallel flows on top of GenStage
GlusterFS - Gluster Filesystem : Build your distributed storage in minutes
Camlistore - Perkeep (née Camlistore) is your personal storage system for life: a way of storing, syncing, sharing, modelling and backing up content.
Ceph - Ceph is a distributed object, block, and file storage platform
Tahoe-LAFS - The Tahoe-LAFS decentralized secure filesystem.
Go IPFS - IPFS implementation in Go [Moved to: https://github.com/ipfs/kubo]
huststore - High-performance Distributed Storage