Rust Data

Open-source Rust projects categorized as Data

Top 23 Rust Data Projects

  • prql

    PRQL is a modern language for transforming data β€” a simple, powerful, pipelined SQL replacement

    Project mention: Show HN: Trilogy – A Reusable, Composable SQL Experiment | news.ycombinator.com | 2024-11-25
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • arroyo

    Distributed stream processing engine in Rust

    Project mention: FLaNK AI Weekly 18 March 2024 | dev.to | 2024-03-18
  • quadratic

    Quadratic | Spreadsheet with Python, SQL, and AI

    Project mention: Quadratic – native JavaScript support in a spreadsheet | news.ycombinator.com | 2024-09-28

    We're working on it. Currently we support exporting to CSV. Here's the open issue around it https://github.com/quadratichq/quadratic/issues/1154 if you want to follow progress.

  • spiceai

    A self-hostable CDN for databases. Spice provides a unified SQL query interface and portable runtime to locally materialize, accelerate, and query datasets across databases, data warehouses, and data lakes.

    Project mention: We Picked AGPL | news.ycombinator.com | 2024-08-13

    We chose Apache 2.0 for the Spice OSS runtime.

    TL;DR: Data-plane Apache 2.0, control-plane BSL.

    Being such a core component, we want developers to be completely comfortable integrating and deploying the Spice runtime in their applications and services, as well as running Spice in their own infrastructure.

    In addition, Spice OSS is built on other great open-source projects like DataFusion and Arrow, both Apache 2.0, and DuckDB (MIT), so being permissively licensed aligns with the fundamental technologies and communities it's built upon.

    We expect to release specific enterprise control-plane services, such as our Kubernetes Operator under a license such as BSL.

    [1] https://github.com/spiceai/spiceai

  • dozer

    Dozer is a real-time data movement tool that leverages CDC from various sources and moves data into various sinks. (by getdozer)

    Project mention: Pg_flo – Stream, transform, and route PostgreSQL data in real-time | news.ycombinator.com | 2024-11-03

    I'll evaluate this during my next CDC endeavor. Also on my list is Dozer: https://github.com/getdozer/dozer

  • tensorbase

    TensorBase is a new big data warehousing with modern efforts.

  • nutype

    Rust newtype with guarantees πŸ‡ΊπŸ‡¦ πŸ¦€

  • orz

    a high performance, general purpose data compressor written in the crab-lang

  • FnckSQL

    SQL as a Function for Rust

  • sail

    LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads. (by lakehq)

    Project mention: AI and All Data Weekly - 02 December 2024 | dev.to | 2024-12-02
  • hypergraph

    Hypergraph is data structure library to create a directed hypergraph in which a hyperedge can join any number of vertices.

    Project mention: Show HN: HypergraphZ – A Hypergraph Implementation in Zig | news.ycombinator.com | 2024-09-09

    I see that this is a second implementation, the first being in Rust: https://github.com/yamafaktory/hypergraph

    I've found that Zig is an excellent tool for implementing data-structure-oriented libraries. Comptime genericity is simple to understand and use, providing a C interface is very easy, and libraries take an allocator, so any memory-safety issues are the consumer's problem. If you want to use it from a memory-safe language, well, all of those have C FFIs so far as I know, Rust very much included, so you can.

    A hypergraph is clearly a data structure which demands a lot of cyclic references, no getting around that, so I'm curious: can you compare and contrast the experience of implementing this in Rust vs. Zig?

  • swiftide

    Fast, streaming indexing and query library for AI (RAG) applications, written in Rust

    Project mention: Should you use Rust in LLM based tools for performance? | news.ycombinator.com | 2024-10-01

    I do wonder though if assuming the reader is lazy is the best. Especially in technical posts. I think there is a difficulty in balancing forcing the person to digest what you say and making it approachable (especially when you consider a noisy audience). It is a natural filter, is that good or bad? Guess depends.

    Agreed about the microbenchmarks and scale. Things don't always scale as expected. But I think there are a lot of variables here so it might be difficult to portray an accurate expected result. Though I can see this being worthwhile for anyone wanting to build RAGs or process lots of text into some embeddings. Also looks like the project is still under active development and started 6 months ago (single dev?) so I'm not sure we should expect to see too big of scale: https://github.com/bosun-ai/swiftide

    So idk, that seems like exactly the kinda thing HN should be well suited for: new projects where people are hacking together useful frameworks. But idk, I guess if YC is funding companies who's business model is to fork an OSS then the bar might be lower than I think. But I thought we were supposed to be hackers (not necessarily crackers) Β―\_(ツ)_/Β―

  • TablaM

    The practical relational programing language for data-oriented applications

    Project mention: YC's Latest Request for Startups | news.ycombinator.com | 2024-02-14

    > Very curious if anyone knows how to pull this off.

    I work in this space (small/mid-size).

    The good news is that there are several "obvious" ways to pull this off because an ERP is the culmination of everything a company needs and does. So almost anything you can imagine on the software is part of it.

    The bad news, and the reason everyone wants a solution, is that is truly a big space, and then you need E.V.E.R.Y.T.H.I.N.G.

    ---

    My take is to start from the bottom, and build a much better version of Access/FoxPro (https://tablam.org).

    Any medium/big ERP end being a specialized computing platform that needs:

    - A programming language

    - A database engine

    - An orchestration engine

    - ELT engine

    - Auth

    - UI/Report builders

    And to be clear: NONE of the "programming language", "database engine", etc are a good fit today.

    NONE.

    This is the big thing, This is the reason (from a tech POW only) that most attempts fail.

    This is the secret of why Cobol rule(d): Is all of this! but is too old! (also, this is why SQL still is best: Is almost this).

    ---

    So, to pull this off, you need a team that knows what is "missing" from our current tools, makes a well-integrated package, and adds a "user-friendly" interface in a way that is palatable for the kind of user that uses excel (powerfully).

    Is not that impossible. FoxPro was the best example of this kind of integrated solution.

    P.D: This is my life's dream, to make this truth!

  • rsql

    Command line interface for CockroachDB, DuckDB, LibSQL, MariaDB, MySQL, PostgreSQL, Redshift, Snowflake, SQLite3 and SQL Server

    Project mention: Rsql: Multi-Database CLI (Rust) | news.ycombinator.com | 2024-08-29
  • Envio

    The Modular Data Stack. The fastest, most flexible way to get on-chain data. Any EVM L1, L2, L3 & Fuel. ⚑

    Project mention: Streamline Event Indexing with Wildcard Indexing | dev.to | 2024-11-22

    Wildcard indexing is one of Envio's latest features, designed to simplify how you index events. With this feature, you can capture all events matching a specified signature, without needing to specify the contract address from which the event was emitted. Here's how it works.

  • rust-pgdatadiff

    Sequence & table data comparison between 2 PostgresQL databases

    Project mention: Rust-pgdatadiff: A re-write of pgdatadiff in Rust | news.ycombinator.com | 2024-03-15
  • xvc

    A robust (🐒) and fast (πŸ‡) MLOps tool for managing data and pipelines in Rust (πŸ¦€)

    Project mention: Xvc: Manage your binary data with Git repositories (Rust) | news.ycombinator.com | 2024-10-19
  • transparency-data

    U.S. Healthcare Transparency Data. Supplemental data for the CMS/HHS price transparency rules.

  • system-info-collector

    App to collect ram/cpu usage from OS and show it in pretty graphs

  • csvsource

    Converts a CSV file to SQL Insert Statements.

  • rusqttbom

    RusQTTbom takes weather data from the Bureau of Meteorology (BOM) and publishes that data via MQTT messages.

  • raven

    RavenCol, Tabular data manipulation in Rust (by irvingfisica)

  • server

    REST API for Gico application. It's part of Database class project (by gico-net)

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Rust Data discussion

Log in or Post with

Rust Data related posts

  • Streamline Event Indexing with Wildcard Indexing

    1 project | dev.to | 22 Nov 2024
  • Xvc: Manage your binary data with Git repositories (Rust)

    1 project | news.ycombinator.com | 19 Oct 2024
  • PGVector's Missing Features

    4 projects | news.ycombinator.com | 13 Sep 2024
  • Swiftide 0.12 - Hybrid Search, search filters, parquet loader, and a giant speed bump

    1 project | dev.to | 13 Sep 2024
  • Xvc: A CLI tool to manage data and ML pipelines in Rust (GPL3)

    1 project | news.ycombinator.com | 11 Sep 2024
  • Show HN: Xvc – CLI tool to manage data and pipelines in Rust (+Python bindings)

    2 projects | news.ycombinator.com | 15 Jul 2024
  • Pg_lakehouse: Query Any Data Lake from Postgres

    1 project | news.ycombinator.com | 12 May 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 2 Dec 2024
    SaaSHub helps you find the best software and product alternatives Learn more β†’

Index

What are some of the best open-source Data projects in Rust? This list will help you:

Project Stars
1 prql 9,978
2 arroyo 3,810
3 quadratic 3,043
4 spiceai 1,933
5 dozer 1,514
6 tensorbase 1,439
7 nutype 1,421
8 orz 810
9 FnckSQL 566
10 sail 521
11 hypergraph 287
12 swiftide 265
13 TablaM 191
14 rsql 131
15 Envio 80
16 rust-pgdatadiff 70
17 xvc 38
18 transparency-data 31
19 system-info-collector 17
20 csvsource 8
21 rusqttbom 5
22 raven 2
23 server 1

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com