lambdo
blog
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
lambdo
-
Why isn't differential dataflow more popular?
It will return the sum of all values in column A. For large tables it will take some time to compute the result. Now assume we append a new record and want to get the new result. The traditional approach is execute this query again. A better approach is to process this new record only by adding its value in A to the result of the previous query. It is important in (stateful) stream processing.
Something similar is implemented in these libraries which however rely on a different data processing conception (alternative to map-reduce):
https://github.com/asavinov/prosto - Functions matter! No join-groupby, No map-reduce.
https://github.com/asavinov/lambdo - Feature engineering and machine learning: together at last!
-
Feature Processing in Go
I find this project quite interesting because sklearn has a good general design including data transformations and it does make sense to provide compatible functionality for Go.
Feature engineering in general is a hot topic and especially if features are not simple hard-coded transformations but rather can be learned from data. For example, I developed a toolkit intended for combining feature engineering and ML:
https://github.com/asavinov/lambdo - Feature engineering and machine learning: together at last!
blog
- Advent of Code 2023 in Recursive SQL
-
Big Data Is Dead
This reminds me of a great blog post by Frank McSherry (Materialize, timely dataflow, etc) talking about how using the right tools on a laptop could beat out a bunch of these JVM distributed querying tools because... data locality basically.
https://github.com/frankmcsherry/blog/blob/master/posts/2015...
- Quokka and Spark/Databricks
- Rust for Data-Intensive Computation (2020)
- Cost in the Land of Databases (2017)
-
Show HN: Cozo – new Graph DB with Datalog, embedded like SQLite, written in Rust
Oh, cool!
And yeah, licenses can be challenging and frustrating, especially the first time you release a major project.
I am really super excited by the idea of embedded Datalog in Rust. I run into a lot of situations where I need something that fits in that awkward gap between SQL and Prolog. I want more expressiveness, better composability, and better graph support than SQL. But I also want finite-sized results that I can materialize in bounded time.
There has been some very neat work with incrementally-updated Datalog in the Rust community. For example, I think Datafrog is really neat: https://github.com/frankmcsherry/blog/blob/master/posts/2018... But it's great to see more neat projects in this space, so thank you.
- [AskJS] JavaScript for data processing
-
Differential Dataflow for Mere Mortals
They used to but Frank McSherry (author of differential dataflow) wrote them a specialized version without all the dataflow infrastructure [1]. It's part of the rust-lang nursery [2] now but hasn't been updated in a while, so I'm not sure what happened to it.
[1] https://github.com/frankmcsherry/blog/blob/master/posts/2018...
[2] https://github.com/rust-lang/datafrog
-
Why isn't differential dataflow more popular?
Importantly, this doesn't just use memoization (it actually avoids having to spend memory on that), but rather uses operators (nodes in the dataflow graph) that directly work with `(time, data, delta)` tuples. The `time` is a general lattice, so fairly flexible (e.g. for expressing loop nesting/recursive computations, but also for handling multiple input sources with their own timestamps), and the `delta` type is between a (potentially commutative) semigroup (don't be confused, they use addition as the group operation) and an abelian group. E.g. collections that are iteratively refined in loops often need an abelian `delta` type, while monoids (semigroup + explicit zero element) allow for efficient append-only computations [0].
[0]: https://github.com/frankmcsherry/blog/blob/master/posts/2019...
What are some alternatives?
differential-dataflow - An implementation of differential dataflow using timely dataflow on Rust.
ballista - Distributed compute platform implemented in Rust, and powered by Apache Arrow.
btree-typescript - A reasonably fast in-memory B+ tree with a powerful API based on the standard Map. Small minified. Well documented.
rslint - A (WIP) Extremely fast JavaScript and TypeScript linter and Rust crate
Hydra - Functional hybrid modelling (FHM) language for modelling and simulation of physical systems using implicitly formulated (undirected) Differential Algebraic Equations (DAEs)
tablespoon - 🥄✨Time-series Benchmark methods that are Simple and Probabilistic
openHistorian - The Open Source Time-Series Data Historian
differential-datalog - DDlog is a programming language for incremental computation. It is well suited for writing programs that continuously update their output in response to input changes. A DDlog programmer does not write incremental algorithms; instead they specify the desired input-output mapping in a declarative manner.
sliding-window-aggregators - Reference implementations of sliding window aggregation algorithms
pond - Immutable timeseries data structures built with Typescript