SaaSHub helps you find the best software and product alternatives Learn more β
Top 23 Rust Data Projects
-
prql
PRQL is a modern language for transforming data β a simple, powerful, pipelined SQL replacement
Project mention: Show HN: Trilogy β A Reusable, Composable SQL Experiment | news.ycombinator.com | 2024-11-25 -
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
Project mention: Quadratic β native JavaScript support in a spreadsheet | news.ycombinator.com | 2024-09-28
We're working on it. Currently we support exporting to CSV. Here's the open issue around it https://github.com/quadratichq/quadratic/issues/1154 if you want to follow progress.
-
spiceai
A self-hostable CDN for databases. Spice provides a unified SQL query interface and portable runtime to locally materialize, accelerate, and query datasets across databases, data warehouses, and data lakes.
We chose Apache 2.0 for the Spice OSS runtime.
TL;DR: Data-plane Apache 2.0, control-plane BSL.
Being such a core component, we want developers to be completely comfortable integrating and deploying the Spice runtime in their applications and services, as well as running Spice in their own infrastructure.
In addition, Spice OSS is built on other great open-source projects like DataFusion and Arrow, both Apache 2.0, and DuckDB (MIT), so being permissively licensed aligns with the fundamental technologies and communities it's built upon.
We expect to release specific enterprise control-plane services, such as our Kubernetes Operator under a license such as BSL.
[1] https://github.com/spiceai/spiceai
-
dozer
Dozer is a real-time data movement tool that leverages CDC from various sources and moves data into various sinks. (by getdozer)
Project mention: Pg_flo β Stream, transform, and route PostgreSQL data in real-time | news.ycombinator.com | 2024-11-03I'll evaluate this during my next CDC endeavor. Also on my list is Dozer: https://github.com/getdozer/dozer
-
-
-
-
-
sail
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads. (by lakehq)
-
hypergraph
Hypergraph is data structure library to create a directed hypergraph in which a hyperedge can join any number of vertices.
Project mention: Show HN: HypergraphZ β A Hypergraph Implementation in Zig | news.ycombinator.com | 2024-09-09I see that this is a second implementation, the first being in Rust: https://github.com/yamafaktory/hypergraph
I've found that Zig is an excellent tool for implementing data-structure-oriented libraries. Comptime genericity is simple to understand and use, providing a C interface is very easy, and libraries take an allocator, so any memory-safety issues are the consumer's problem. If you want to use it from a memory-safe language, well, all of those have C FFIs so far as I know, Rust very much included, so you can.
A hypergraph is clearly a data structure which demands a lot of cyclic references, no getting around that, so I'm curious: can you compare and contrast the experience of implementing this in Rust vs. Zig?
-
Project mention: Should you use Rust in LLM based tools for performance? | news.ycombinator.com | 2024-10-01
I do wonder though if assuming the reader is lazy is the best. Especially in technical posts. I think there is a difficulty in balancing forcing the person to digest what you say and making it approachable (especially when you consider a noisy audience). It is a natural filter, is that good or bad? Guess depends.
Agreed about the microbenchmarks and scale. Things don't always scale as expected. But I think there are a lot of variables here so it might be difficult to portray an accurate expected result. Though I can see this being worthwhile for anyone wanting to build RAGs or process lots of text into some embeddings. Also looks like the project is still under active development and started 6 months ago (single dev?) so I'm not sure we should expect to see too big of scale: https://github.com/bosun-ai/swiftide
So idk, that seems like exactly the kinda thing HN should be well suited for: new projects where people are hacking together useful frameworks. But idk, I guess if YC is funding companies who's business model is to fork an OSS then the bar might be lower than I think. But I thought we were supposed to be hackers (not necessarily crackers) Β―\_(γ)_/Β―
-
> Very curious if anyone knows how to pull this off.
I work in this space (small/mid-size).
The good news is that there are several "obvious" ways to pull this off because an ERP is the culmination of everything a company needs and does. So almost anything you can imagine on the software is part of it.
The bad news, and the reason everyone wants a solution, is that is truly a big space, and then you need E.V.E.R.Y.T.H.I.N.G.
---
My take is to start from the bottom, and build a much better version of Access/FoxPro (https://tablam.org).
Any medium/big ERP end being a specialized computing platform that needs:
- A programming language
- A database engine
- An orchestration engine
- ELT engine
- Auth
- UI/Report builders
And to be clear: NONE of the "programming language", "database engine", etc are a good fit today.
NONE.
This is the big thing, This is the reason (from a tech POW only) that most attempts fail.
This is the secret of why Cobol rule(d): Is all of this! but is too old! (also, this is why SQL still is best: Is almost this).
---
So, to pull this off, you need a team that knows what is "missing" from our current tools, makes a well-integrated package, and adds a "user-friendly" interface in a way that is palatable for the kind of user that uses excel (powerfully).
Is not that impossible. FoxPro was the best example of this kind of integrated solution.
P.D: This is my life's dream, to make this truth!
-
rsql
Command line interface for CockroachDB, DuckDB, LibSQL, MariaDB, MySQL, PostgreSQL, Redshift, Snowflake, SQLite3 and SQL Server
-
Envio
The Modular Data Stack. The fastest, most flexible way to get on-chain data. Any EVM L1, L2, L3 & Fuel. β‘
Wildcard indexing is one of Envio's latest features, designed to simplify how you index events. With this feature, you can capture all events matching a specified signature, without needing to specify the contract address from which the event was emitted. Here's how it works.
-
Project mention: Rust-pgdatadiff: A re-write of pgdatadiff in Rust | news.ycombinator.com | 2024-03-15
-
Project mention: Xvc: Manage your binary data with Git repositories (Rust) | news.ycombinator.com | 2024-10-19
-
transparency-data
U.S. Healthcare Transparency Data. Supplemental data for the CMS/HHS price transparency rules.
-
-
-
rusqttbom
RusQTTbom takes weather data from the Bureau of Meteorology (BOM) and publishes that data via MQTT messages.
-
-
Rust Data discussion
Rust Data related posts
-
Streamline Event Indexing with Wildcard Indexing
-
Xvc: Manage your binary data with Git repositories (Rust)
-
PGVector's Missing Features
-
Swiftide 0.12 - Hybrid Search, search filters, parquet loader, and a giant speed bump
-
Xvc: A CLI tool to manage data and ML pipelines in Rust (GPL3)
-
Show HN: Xvc β CLI tool to manage data and pipelines in Rust (+Python bindings)
-
Pg_lakehouse: Query Any Data Lake from Postgres
-
A note from our sponsor - SaaSHub
www.saashub.com | 2 Dec 2024
Index
What are some of the best open-source Data projects in Rust? This list will help you:
Project | Stars | |
---|---|---|
1 | prql | 9,978 |
2 | arroyo | 3,810 |
3 | quadratic | 3,043 |
4 | spiceai | 1,933 |
5 | dozer | 1,514 |
6 | tensorbase | 1,439 |
7 | nutype | 1,421 |
8 | orz | 810 |
9 | FnckSQL | 566 |
10 | sail | 521 |
11 | hypergraph | 287 |
12 | swiftide | 265 |
13 | TablaM | 191 |
14 | rsql | 131 |
15 | Envio | 80 |
16 | rust-pgdatadiff | 70 |
17 | xvc | 38 |
18 | transparency-data | 31 |
19 | system-info-collector | 17 |
20 | csvsource | 8 |
21 | rusqttbom | 5 |
22 | raven | 2 |
23 | server | 1 |