Rust llms

Open-source Rust projects categorized as llms

Top 8 Rust llm Projects

  • tabby

    Self-hosted AI coding assistant

  • Project mention: Google CodeGemma: Open Code Models Based on Gemma [pdf] | news.ycombinator.com | 2024-04-09
  • paradedb

    Postgres for Search and Analytics

  • Project mention: Using ClickHouse to scale an events engine | news.ycombinator.com | 2024-04-11
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • lance

    Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..

  • Project mention: The Nimble File Format by Meta | news.ycombinator.com | 2024-04-25
  • bionic-gpt

    BionicGPT is an on-premise replacement for ChatGPT, offering the advantages of Generative AI while maintaining strict data confidentiality

  • Project mention: Ask HN: What's wrong/right with Postgres migrations? | news.ycombinator.com | 2024-03-11

    dbmate uses .sql files which also makes things a lot easier.

    You can see my setup here https://github.com/bionic-gpt/bionic-gpt

    Have a look at the CONTRIBUTING.md to get an idea of the dev workflow.

  • chidori

    A reactive runtime for building durable AI agents

  • Project mention: Chidori – Declarative Framework for AI Agents (Rust, Python, and Node.js) | news.ycombinator.com | 2023-07-26
  • motorhead

    🧠 Motorhead is a memory and information retrieval server for LLMs.

  • Project mention: Motorhead is a memory and information retrieval server for LLMs | news.ycombinator.com | 2023-10-22
  • anansi

    open source tooling for AI search and understanding (by infrawhispers)

  • Project mention: DiskANN Implementation in Rust + Easy NN Search | /r/rust | 2023-05-20

    Hi! I have been noodling away at a re-implementation of the original C++ DiskANN project as well as packaging the latest advances in embedding generation. The rough repo is here and will remain licensed as Apache-2.0!

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • clerk

    LLM based file organizer (by blankenshipz)

  • Project mention: Show HN: Out-of-the-box text classification models | news.ycombinator.com | 2024-03-18

    This is fantastic, but as you note on your launch page, people are going to need custom topic taxonomies. We use several custom ones, maintained as YAML that non-technical users can edit.

    I'm guessing from having been looking for a project like yours for a decade now, that it's that custom taxonomy problem that means most OOTB don't work for people, so they make their own which they don't open source because they ended up ... tailoring ... a topic text classifier for themselves.

    The only thing I've found close to this "OOTB" is:

    https://cloud.google.com/natural-language/docs/classifying-t...

    https://cloud.google.com/natural-language/docs/categories#ca...

    And, to be frank, I can't see why I'd send my confidential information to you when I can send it to Google. (Ahem!)

    But the problem with theirs and yours is the OOTB categories are for a global topic set, something like Yahoo directory, rather than for a given discipline.

    I've found the general lists, like LCM[^1] (what you really want is LCSH subject headings, not LCM[^2]), too broad for my business or personal content, while something like ACM[^3] is more what's needed for, say, computing related content.

    For a firmwide knowledge base at a {field}-tech firm, you have a mix of the firm's focus field, and computing, and a broad scope fallback like you're starting with. Even libraries have their own topic hierarchy! [^4]. Plenty fields have controlled vocabularies[^6], and if you can't find one for a field, you can usually generate one by finding someone who is already classifying that field, and looking at their TOC. All of which is to say, to be generally useful, you have to let people BYOT (bring your own topics) for this.

    For instance, we built our topic list based on combining a reference taxonomy for our field, a reference taxonomy for computing, a reference taxonomy for business books, and the Google NLP tool mentioned above.

    There are occasional tools that try to match arbitrary documents to arbitrary hierarchies such as clerk [^5] but they are challenging for various reasons.

    You have a note to contact you for different topics, but raising this here since so far (6 hours) you had no feedback, and I'm a big fan of what you're doing and the niche is underserved.

    A couple other thoughts:

    Aside from topics taxonomy or hierarchy, we've recently found that something like properties based classification proves needed when we're 10K+ to 100K+ short and long form content documents in the knowledge base. For instance, https://en.wikipedia.org/wiki/Colon_classification, that adds "facets" like time dimension. This is incredibly helpful for relevance while still being able to drill in and just browse a topics/branch/leaf.

    I really like your "intent" classification, far more interesting than sentiment, since it could help separate blog posts from new articles, self-guided tutorials from reviews, and so on: Problem Solving, News, Informational, maybe?. Sifting these to focus a robust KB can be tremendously valuable.

    Your privacy policy is by-and-large useless, since the information being classified is unlikely personal (PII) class, and more likely confidential or non-public (NPI) class.

    You are, effectively, saying "let us have a copy of all info you're classifying", yet nowhere on your main site nor docs site do you explain how you actively prevent yourselves from seeing an API user's information.

    Ideally your "architecture" would explain how you built it to be able to do the work without you being able to see the content, not just a "pinky swear we won't look" sort of promise. Many businesses have their own confidentiality and privacy policies. Those require looping in subprocessors, which is you, and right now you can't be used.

    [^1]: https://en.wikipedia.org/wiki/Library_of_Congress_Classifica...

    [^2]: https://id.loc.gov/authorities/subjects.html

    [^3]: https://en.wikipedia.org/wiki/ACM_Computing_Classification_S...

    [^4]: https://www.ala.org/tools/topics/atoz

    [^5]: https://github.com/blankenshipz/clerk/tree/main

    [^6]: https://pitt.libguides.com/metadatadiscovery/controlledvocab...

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Rust llms related posts

Index

What are some of the best open-source llm projects in Rust? This list will help you:

Project Stars
1 tabby 17,315
2 paradedb 3,863
3 lance 3,256
4 bionic-gpt 1,590
5 chidori 1,193
6 motorhead 828
7 anansi 47
8 clerk 10

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com