hnswlib vs finetuner

hnswlib

Header-only C++/python library for fast approximate nearest neighbors (by nmslib)

Suggest topics

Source Code

github.com

Suggest alternative

Edit details

finetuner

:dart: Task-oriented embedding tuning for BERT, CLIP, etc. (by jina-ai)

fine-tuning pretrained-models few-shot-learning negative-sampling metric-learning siamese-network triplet-loss transfer-learning jina neural-search finetuning similarity-learning Bert openai-clip

Source Code

finetuner.jina.ai

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

hnswlib		finetuner
	Project
12	Mentions	36
4,015	Stars	1,427
1.5%	Growth	1.0%
6.2	Activity	5.5
19 days ago	Latest Commit	about 2 months ago
C++	Language	Python
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

hnswlib

Posts with mentions or reviews of hnswlib. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-14.

Show HN: A fast HNSW implementation in Rust
6 projects | news.ycombinator.com | 14 Mar 2024

How does this compare to hsnwlib - is it faster? https://github.com/nmslib/hnswlib
Show HN: Moodflix – a movie recommendation engine based on your mood
1 project | news.ycombinator.com | 9 Nov 2023

Last week I released Moodflix (https://moodflix.streamlit.app), a movie recommendation engine based to find movies based on your mood.
Moodflix was created on top of a movie dataset of 10k movies from The Movie Database. I vectorised the films using Hugging Face's T5 model (https://huggingface.co/docs/transformers/model_doc/t5) using the film's plot synopsis, genres and languages. Then I indexed the vectors using hnswlib (https://github.com/nmslib/hnswlib). LLMs can understand a movie's plot pretty well and distill the similarities between a user's query (mood) to the movie's plot and genres.
I have got feedback from close friends around linking movies to other review sites like IMDB or Rotten Tomatoes, linking movies to sites to stream the movie and adding movie posters. I would also love to hear from the community what things you like, what you want to see and what things you consider can be improved.
Hierarchical Navigable Small Worlds
2 projects | news.ycombinator.com | 10 Jul 2023

Actually the "ef" is not epsilon. It is a parameter of the HNSW index: https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md...
Vector Databases 101
3 projects | /r/datascience | 25 Jun 2023

If you want to go larger you could still use some simple setup in conjunction with faiss, annoy or hnsw.
[P] Compose a vector database
2 projects | /r/MachineLearning | 13 May 2023

Many vector databases are using Hnswlib and that is a supported vector index alongside Faiss and Annoy.
Faiss: A library for efficient similarity search
14 projects | news.ycombinator.com | 30 Mar 2023

hnswlib (https://github.com/nmslib/hnswlib) is a strong alternative to faiss that I have enjoyed using for multiple projects. It is simple and has great performance on CPU.
After working through several projects that utilized local hnswlib and different databases for text and vector persistence, I integrated hnswlib with sqlite to create an embedded vector search engine that can easily scale up to millions of embeddings. For self-hosted situations of under 10M embeddings and less than insane throughput I think this combo is hard to beat.
https://github.com/jiggy-ai/hnsqlite
Storing OpenAI embeddings in Postgres with pgvector
9 projects | news.ycombinator.com | 6 Feb 2023

https://github.com/nmslib/hnswlib
Used it to index 40M text snippets in the legal domain. Allows incremental adding.
I love how it just works. You know, doesn’t ANNOY me or makes a FAISS. ;-)
Seeking advice on improving NLP search results
4 projects | /r/LanguageTechnology | 22 Jan 2023

3000 texts doesn't sound like to many, so may be a brute force cos calculation to find the most similar vector would work. If that's taking too much time, may be look at KNN or ANN modules to speed up finding the most similar vector. I use hsnwlib in knn mode for this. SOrt through about 350,000 vectors in about 30-50 msec.
How to Build a Semantic Search Engine in Rust
3 projects | news.ycombinator.com | 9 Nov 2022

hnswlib is in cpp and has python bindings (you should be able to make your own for other languages).
https://github.com/nmslib/hnswlib
Anatomy of a txtai index
4 projects | dev.to | 2 Mar 2022

embeddings - The embeddings index file. This is an Approximate Nearest Neighbor (ANN) index with either Faiss (default), Hnswlib or Annoy, depending on the settings.

finetuner

Posts with mentions or reviews of finetuner. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-02-17.

How do you think search will change with technology like ChatGPT, Bing’s new AI search engine and the upcoming Google Bard?
1 project | /r/singularity | 21 Feb 2023

And all of that has something to do with finetuners. It basically fine-tunes AI models for specific use cases. With it can create a custom search experience that is tailored to their specific needs. I also wonder how this is going to be integrated into SEO tools soon since those tools are catered to traditional search engines.
Combining multiple lists into one, meaningfully
1 project | /r/GPT3 | 17 Feb 2023

Combining multiple lists into one is tough, but it's doable if you have the right approach. Fine-tuning GPT-3 might help, but finding enough examples is tough. You could use existing text data or manually label a set of training examples. A finetuner could be help too. It's a platform-agnostic toolkit that can fine-tune pre-trained models and it's customizable to do lots of tasks.
speech_recognition not able to convert the full live audio to text. Please help me to fine-tune it.
1 project | /r/MLQuestions | 17 Feb 2023

You can adjust the pause threshold a little longer for pauses between and phrases. You can also use the phrase detection mode, which sets a time limit for the entire phrase instead of ending the transcription prematurely. If your microphone sensitivity is low, you can also try adjusting the energy threshold. If you want, you can use finetuners.
Questions about fine-tuned results. Should the completion results be identical to fine-tune examples?
1 project | /r/OpenAI | 17 Feb 2023

It's possible that completion results may be identical to fine-tuned examples, but not guaranteed. Even with the same prompt, slight variations in output are expected due to the nature of probabilistic language models. You can experiment with different settings and parameters, including those with finetuners like these.
How can I create a dataset to refine Whisper AI from old videos with subtitles?
4 projects | /r/OpenAI | 17 Feb 2023

You can try creating your own dataset. Get some audio data that you want, preprocess it, and then create a custom dataset you can use to fine tune. You could use finetuners like these if you want as well.
A Guide to Using OpenTelemetry in Jina for Monitoring and Tracing Applications
6 projects | dev.to | 16 Feb 2023

We derived the dataset by pre-processing the deepfashion dataset using Finetuner. The image label generated by Finetuner is extracted and formatted to produce the text attribute of each product.
[D] Looking for an open source Downloadable model to run on my local device.
2 projects | /r/MachineLearning | 12 Feb 2023

You can either use Hugging Face Transformers as they have a lot of pre-trained models that you can customize. Or Finetuners like this one: which is a toolkit for fine-tuning multiple models.
Improving Search Quality for Non-English Queries with Fine-tuned Multilingual CLIP Models
2 projects | dev.to | 10 Feb 2023

Very recently, a few non-English and multilingual CLIP models have appeared, using various sources of training data. In this article, we’ll evaluate a multilingual CLIP model’s performance in a language other than English, and show how you can improve it even further using Jina AI’s Finetuner.
Is there a way I can feed the gpt3 model database object like tables? I know we can create fine tune model but not sure about the completion part. Please help!
1 project | /r/GPT3 | 8 Feb 2023

I think you can convert your data into text and fine-tune the model on it. But that might not be the ideal way to go since you kind of base that on the model. Try transfer learning or finetuning with a finetuner.
Classification using prompt or fine tuning?
2 projects | /r/GPT3 | 6 Feb 2023

you can try prompt-based classification or fine-tuning with a Finetuner. Prompts work well for simple tasks but fine-tuning may give better results for complex ones. Althouigh it's going to need more resources, but try both and see what works best for you.

What are some alternatives?

When comparing hnswlib and finetuner you can also consider the following projects:

faiss - A library for efficient similarity search and clustering of dense vectors.

gpt_index - LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data. [Moved to: https://github.com/jerryjliu/llama_index]

annoy - Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Jina AI examples - Jina examples and demos to help you get started

qdrant - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

RWKV-LM - RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

awesome-vector-search - Collections of vector search related libraries, service and research papers

jina - ☁️ Build multimodal AI applications with cloud-native stack

semantic-search-through-wikipedia-with-weaviate - Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine

Promptify - Prompt Engineering | Prompt Versioning | Use GPT or other prompt based models to get structured output. Join our discord for Prompt-Engineering, LLMs and other latest research

txtai - 💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

pysot - SenseTime Research platform for single object tracking, implementing algorithms like SiamRPN and SiamMask.

hnswlib vs faiss finetuner vs gpt_index hnswlib vs annoy finetuner vs Jina AI examples hnswlib vs qdrant finetuner vs RWKV-LM hnswlib vs awesome-vector-search finetuner vs jina hnswlib vs semantic-search-through-wikipedia-with-weaviate finetuner vs Promptify hnswlib vs txtai finetuner vs pysot

Compare hnswlib vs finetuner and see what are their differences.

hnswlib

finetuner

hnswlib

finetuner

What are some alternatives?