Rust Data Science

Open-source Rust projects categorized as Data Science

Top 13 Rust Data Science Projects

  • lance

    Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..

    Project mention: Understanding Parquet, Iceberg and Data Lakehouses | news.ycombinator.com | 2023-12-29

    Parquet has been the lakehouse file format of choice for nearly half a decade. But we are starting to see other contenders that are optimized more for lower latency like lance https://github.com/lancedb/lance

  • tidy-viewer

    📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.

    Project mention: Csvlens: Command line CSV file viewer. Like less but made for CSV | news.ycombinator.com | 2024-01-06
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • Daft

    Distributed DataFrame for Python designed for the cloud, powered by Rust

    Project mention: Daft: Distributed DataFrame for Python | news.ycombinator.com | 2024-02-29

    There are benchmarks here - https://github.com/Eventual-Inc/Daft?tab=readme-ov-file#benc.... Seems to outperform Dask by a fair bit.

  • charming

    A visualization library for Rust

    Project mention: Charming: A Visualization Library for Rust | news.ycombinator.com | 2023-07-17
  • ciphercore

    User-friendly secure computation engine based on secure multi-party computation

  • kaskada

    Modern, open-source event-processing

    Project mention: Need feedback from the folks here on efficiency for streaming a parquet file | /r/dataengineering | 2023-06-03

    You might try using a query engine for this - duckdb is really great for SQL analysis and I’m part of the team behind a tool called Kaskada that’s focused on time-based analysis.

  • kamu-cli

    New generation decentralized data lake and a streaming data pipeline

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • hypergraph

    Hypergraph is data structure library to create a directed hypergraph in which a hyperedge can join any number of vertices.

  • Oxen

    Oxen.ai's core rust library, server, and CLI

  • bhtsne

    Parallel Barnes-Hut t-SNE implementation written in Rust.

  • hyperparameter

    Hyperparameter, Make configurable AI applications.Build for Python hackers.

  • xvc

    A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)

  • rexcel

    A lightweight CSV viewer/editor

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-02-29.

Rust Data Science related posts

Index

What are some of the best open-source Data Science projects in Rust? This list will help you:

Project Stars
1 lance 3,216
2 tidy-viewer 2,017
3 Daft 1,653
4 charming 1,504
5 ciphercore 371
6 kaskada 340
7 kamu-cli 273
8 hypergraph 267
9 Oxen 145
10 bhtsne 57
11 hyperparameter 23
12 xvc 22
13 rexcel 17
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com