Rust Data Science

Open-source Rust projects categorized as Data Science

Top 13 Rust Data Science Projects

  • lance

    Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..

  • Project mention: Supabase Storage: now supports the S3 protocol | news.ycombinator.com | 2024-04-18

    you should look at lance(https://lancedb.github.io/lance/)

  • tidy-viewer

    πŸ“Ί(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.

  • Project mention: Csvlens: Command line CSV file viewer. Like less but made for CSV | news.ycombinator.com | 2024-01-06
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Daft

    Distributed DataFrame for Python designed for the cloud, powered by Rust

  • Project mention: Daft: Distributed DataFrame for Python | news.ycombinator.com | 2024-02-29

    There are benchmarks here - https://github.com/Eventual-Inc/Daft?tab=readme-ov-file#benc.... Seems to outperform Dask by a fair bit.

  • charming

    A visualization library for Rust

  • Project mention: Charming: A Visualization Library for Rust | news.ycombinator.com | 2023-07-17
  • ciphercore

    User-friendly secure computation engine based on secure multi-party computation

  • kaskada

    Modern, open-source event-processing

  • Project mention: Need feedback from the folks here on efficiency for streaming a parquet file | /r/dataengineering | 2023-06-03

    You might try using a query engine for this - duckdb is really great for SQL analysis and I’m part of the team behind a tool called Kaskada that’s focused on time-based analysis.

  • kamu-cli

    New generation decentralized data lake and a streaming data pipeline

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • hypergraph

    Hypergraph is data structure library to create a directed hypergraph in which a hyperedge can join any number of vertices.

  • Oxen

    Oxen.ai's core rust library, server, and CLI

  • bhtsne

    Parallel Barnes-Hut t-SNE implementation written in Rust.

  • hyperparameter

    Hyperparameter, Make configurable AI applications.Build for Python hackers.

  • xvc

    A robust (🐒) and fast (πŸ‡) MLOps tool for managing data and pipelines in Rust (πŸ¦€)

  • rexcel

    A lightweight CSV viewer/editor

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Rust Data Science related posts

Index

What are some of the best open-source Data Science projects in Rust? This list will help you:

Project Stars
1 lance 3,232
2 tidy-viewer 2,020
3 Daft 1,666
4 charming 1,529
5 ciphercore 370
6 kaskada 339
7 kamu-cli 275
8 hypergraph 267
9 Oxen 149
10 bhtsne 57
11 hyperparameter 23
12 xvc 22
13 rexcel 17

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com