Weaviate
sonic
Our great sponsors
Weaviate | sonic | |
---|---|---|
76 | 48 | |
9,237 | 19,317 | |
5.5% | - | |
10.0 | 7.5 | |
about 3 hours ago | 3 months ago | |
Go | Rust | |
BSD 3-clause "New" or "Revised" License | Mozilla Public License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Weaviate
-
pgvecto.rs alternatives - qdrant and Weaviate
3 projects | 13 Mar 2024
- FLaNK Stack 29 Jan 2024
- Qdrant, the Vector Search Database, raised $28M in a Series A round
-
How to use Weaviate to store and query vector embeddings
In this tutorial, I introduce Weaviate, an open-source vector database, with the thenlper/gte-base embedding model from Alibaba, through Hugging Face's transformers library.
-
Choosing vector database: a side-by-side comparison
This will be solved in Weaviate https://github.com/weaviate/weaviate/issues/2424
-
Who's hiring developer advocates? (October 2023)
Link to GitHub -->
-
Do we think about vector dbs wrong?
Hey @rvrs, I work on Weaviate and we are doing some improvements around increasing write throughput:
1. gRPC. Using gRPC to write vectors has had a really nice performance boost. It is released in Weaviate core but here is still some work on do on the clients. Feel free to get in contact if you would like to try it out.
2. Parameter tuning. lowering `efConstruction` can speed up imports.
3. We are also working on async indexing https://github.com/weaviate/weaviate/issues/3463 which will further speed things up.
In comparison with pgvector, Weaviate has more flexible query options such as hybrid search and quantization to save memory on larger datasets.
-
Pros and cons of vector search in elastic?
Highly opinionated as I'm working for Weaviate, so take my comment with a large portion of salt.
My highly opinionated view is that for Elastic, they're not really open source and the dependency on Java of the Lucene ecosystem is a big disadvantage, so as you already said, speed, they're getting better at this, but if you need to scale, this problem scales with you.
So if you already have ELK stack and don't need to scale, sure go for it otherwise, Weaviate offers real open source, so use it for free on your own infrastructure https://github.com/weaviate/weaviate
-
Lost on LangChain: Can someone help with the Question Answer concept?
If you do not wish to store your private data on pinecone you can use open source alternatives like Weaviate where you can spin up your own instance. Other option could be to use Agents. You'll need to find sutaible agent for your database which will allow LLMs to directly query data from your private database.
-
Questions about memory, tree-of-thought, planning
I tried cromadb but had terrible performance and could not pin down the cause (likely a problem on my end). Weaviate was easy to setup and had excellent performance, this is probably what I will use in the future. Next on my list is txtinstruct, to finetune a model with data that does not change and using a vector db for everything else seems promising.
sonic
-
What is Hybrid Search?
Sonic - a project written in Rust, uses custom network communication protocol for fast communication between the client and the server.
-
ArchiveBox: Open-source self-hosted web archiving
This is uncanny, I just discovered ArchiveBox earlier today and set up a self-hosted instance on some home hardware for a collection of bookmarks of useful guides, tutorials, and references I've collected over the years.
Setting it up on K8s with sonic [1] as the search backend and importing a few hundred URLs only took ~an hour or so, and the cached pages look great for the most part.
- Seeking a free full text search solution for large data with progress display
- Show HN: CozoDB, Hybrid Relational-Graph-Vector DB, the Hippocampus for LLMs
- FLiP Stack Weekly for 15-Jan-2023
-
Building an Internet Scale Meme Search Engine
If you don't need advanced search features, you can use Sonic (https://github.com/valeriansaliou/sonic). It's blazing fast and you can save lot of money on servers.
-
Any Full Text Search library for json data?
What about Sonic? Maybe it requires a bit of integration, but it's simple and blazing fast.
-
10 Trending Github repositories / October, 27 2022
git clone https://github.com/valeriansaliou/sonic.git
-
An alternative to Elasticsearch that runs on a few MBs of RAM
- Sonic (https://github.com/valeriansaliou/sonic)
There isn't enough out there comparing all these for the simple typical fuzzy search/search box usecase -- which I think is 80% of people doing search today.
Like other people are pointing out, most of these engines won't have all the features of ES (or more accurately Lucene) but I am pretty convinced that most of the time it doesn't actually matter and if someone is searching on your site excessively maybe there's a problem with your UX (unless you're a search engine or repository of information).
[0]: https://supabase.com/blog/postgres-full-text-search-vs-the-r...
What are some alternatives?
Milvus - A cloud-native vector database, storage for next generation AI applications
faiss - A library for efficient similarity search and clustering of dense vectors.
pgvector - Open-source vector similarity search for Postgres
qdrant - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
MeiliSearch - A lightning-fast search API that fits effortlessly into your apps, websites, and workflow
jina - ☁️ Build multimodal AI applications with cloud-native stack
fastapi - FastAPI framework, high performance, easy to learn, fast to code, ready for production
Typesense - Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences
tantivy - Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
vald - Vald. A Highly Scalable Distributed Vector Search Engine
ChatterBot - ChatterBot is a machine learning, conversational dialog engine for creating chat bots
marqo - Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai