SaaSHub helps you find the best software and product alternatives Learn more →
Top 8 Rust llm Projects
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
lance
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
-
bionic-gpt
BionicGPT is an on-premise replacement for ChatGPT, offering the advantages of Generative AI while maintaining strict data confidentiality
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Project mention: Google CodeGemma: Open Code Models Based on Gemma [pdf] | news.ycombinator.com | 2024-04-09
Project mention: Ask HN: What's wrong/right with Postgres migrations? | news.ycombinator.com | 2024-03-11dbmate uses .sql files which also makes things a lot easier.
You can see my setup here https://github.com/bionic-gpt/bionic-gpt
Have a look at the CONTRIBUTING.md to get an idea of the dev workflow.
Project mention: Chidori – Declarative Framework for AI Agents (Rust, Python, and Node.js) | news.ycombinator.com | 2023-07-26
Project mention: Motorhead is a memory and information retrieval server for LLMs | news.ycombinator.com | 2023-10-22
Hi! I have been noodling away at a re-implementation of the original C++ DiskANN project as well as packaging the latest advances in embedding generation. The rough repo is here and will remain licensed as Apache-2.0!
Project mention: Show HN: Out-of-the-box text classification models | news.ycombinator.com | 2024-03-18This is fantastic, but as you note on your launch page, people are going to need custom topic taxonomies. We use several custom ones, maintained as YAML that non-technical users can edit.
I'm guessing from having been looking for a project like yours for a decade now, that it's that custom taxonomy problem that means most OOTB don't work for people, so they make their own which they don't open source because they ended up ... tailoring ... a topic text classifier for themselves.
The only thing I've found close to this "OOTB" is:
https://cloud.google.com/natural-language/docs/classifying-t...
https://cloud.google.com/natural-language/docs/categories#ca...
And, to be frank, I can't see why I'd send my confidential information to you when I can send it to Google. (Ahem!)
But the problem with theirs and yours is the OOTB categories are for a global topic set, something like Yahoo directory, rather than for a given discipline.
I've found the general lists, like LCM[^1] (what you really want is LCSH subject headings, not LCM[^2]), too broad for my business or personal content, while something like ACM[^3] is more what's needed for, say, computing related content.
For a firmwide knowledge base at a {field}-tech firm, you have a mix of the firm's focus field, and computing, and a broad scope fallback like you're starting with. Even libraries have their own topic hierarchy! [^4]. Plenty fields have controlled vocabularies[^6], and if you can't find one for a field, you can usually generate one by finding someone who is already classifying that field, and looking at their TOC. All of which is to say, to be generally useful, you have to let people BYOT (bring your own topics) for this.
For instance, we built our topic list based on combining a reference taxonomy for our field, a reference taxonomy for computing, a reference taxonomy for business books, and the Google NLP tool mentioned above.
There are occasional tools that try to match arbitrary documents to arbitrary hierarchies such as clerk [^5] but they are challenging for various reasons.
You have a note to contact you for different topics, but raising this here since so far (6 hours) you had no feedback, and I'm a big fan of what you're doing and the niche is underserved.
A couple other thoughts:
Aside from topics taxonomy or hierarchy, we've recently found that something like properties based classification proves needed when we're 10K+ to 100K+ short and long form content documents in the knowledge base. For instance, https://en.wikipedia.org/wiki/Colon_classification, that adds "facets" like time dimension. This is incredibly helpful for relevance while still being able to drill in and just browse a topics/branch/leaf.
I really like your "intent" classification, far more interesting than sentiment, since it could help separate blog posts from new articles, self-guided tutorials from reviews, and so on: Problem Solving, News, Informational, maybe?. Sifting these to focus a robust KB can be tremendously valuable.
Your privacy policy is by-and-large useless, since the information being classified is unlikely personal (PII) class, and more likely confidential or non-public (NPI) class.
You are, effectively, saying "let us have a copy of all info you're classifying", yet nowhere on your main site nor docs site do you explain how you actively prevent yourselves from seeing an API user's information.
Ideally your "architecture" would explain how you built it to be able to do the work without you being able to see the content, not just a "pinky swear we won't look" sort of promise. Many businesses have their own confidentiality and privacy policies. Those require looping in subprocessors, which is you, and right now you can't be used.
[^1]: https://en.wikipedia.org/wiki/Library_of_Congress_Classifica...
[^2]: https://id.loc.gov/authorities/subjects.html
[^3]: https://en.wikipedia.org/wiki/ACM_Computing_Classification_S...
[^4]: https://www.ala.org/tools/topics/atoz
[^5]: https://github.com/blankenshipz/clerk/tree/main
[^6]: https://pitt.libguides.com/metadatadiscovery/controlledvocab...
Rust llms related posts
- Ask HN: What's wrong/right with Postgres migrations?
- When will "local" LLMs hit?
- ChatGPT clone in 30 minutes on AWS Kubernetes
- Bionic-GPT: LocalLLM for Team
- Bionic GPT
- Ask HN: Has anyone successfully implemented a LLM for a knowledge base?
-
A note from our sponsor - SaaSHub
www.saashub.com | 30 Apr 2024
Index
What are some of the best open-source llm projects in Rust? This list will help you:
Project | Stars | |
---|---|---|
1 | tabby | 17,315 |
2 | paradedb | 3,863 |
3 | lance | 3,256 |
4 | bionic-gpt | 1,590 |
5 | chidori | 1,193 |
6 | motorhead | 828 |
7 | anansi | 47 |
8 | clerk | 10 |
Sponsored