retake
code-indexer-loop
retake | code-indexer-loop | |
---|---|---|
4 | 3 | |
757 | 165 | |
- | 3.0% | |
10.0 | 6.3 | |
9 months ago | 2 months ago | |
Rust | Python | |
GNU General Public License v3.0 or later | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
retake
-
Show HN: Retake – Open-Source Hybrid Search for Postgres
https://github.com/getretake/retake/pull/198 is a refreshing change given the recent rug pulls, so thank you for that
-
We created an open-source semantic search Python package on top of Postgres
We found it difficult to do well with standard vector databases and so we ended up making a nice open-source package to layer semantic search on top of Postgres with just a few lines of code. It supports Python backends right now, always stays in sync with Postgres via Kafka, doubles as a vector store, and can be deployed anywhere.
- Show HN: Open-Source Infrastructure for Vector Data Streams
code-indexer-loop
- Python library for indexing and retrieving source code files through an integrated vector database (not mine)
-
Show HN: Code Indexer Loop
Sweep is mentioned as attribution in multiple place a) https://github.com/definitive-io/code-indexer-loop#attributi... b) https://github.com/definitive-io/code-indexer-loop/blob/fd9d...
The difference is packaging it as a consumable PyPI package that can easily be used in a project (they even call out for separating this out into a stand alone project but that they lack the time to do so): https://docs.sweep.dev/blogs/chunking-2m-files#future-
In addition, we expand and fix the implementation, for example it now supports limiting on token count instead of character count, and we fix some white space inconsistencies in parsing/chunk reconstruction.
What are some alternatives?
bionicgpt - BionicGPT is an on-premise replacement for ChatGPT, offering the advantages of Generative AI while maintaining strict data confidentiality [Moved to: https://github.com/bionic-gpt/bionic-gpt]
bor - User-friendly, tiny source code searcher written by pure Python.
nfcompose - Build REST APIs/Integrations in minutes instead of hours - NF Compose is a (data) integration platform that allows developers to define REST APIs in seconds instead of hours. Generated REST APIs are backed by postgres and support automatic consumer webhook notifications on data changes out of the box.
flit - Simplified packaging of Python modules
vectorflow - VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.
Resume-Matcher - Resume Matcher is an open source, free tool to improve your resume. It works by using language models to compare and rank resumes with job descriptions.
embedditor - ⚡ GUI for editing LLM vector embeddings. No more blind chunking. Upload content in any file extension, join and split chunks, edit metadata and embedding tokens + remove stop-words and punctuation with one click, add images, and download in .veml to share it with your team.
tinyvector - A tiny embedding database in pure Rust.
prism - Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
pgsync - Postgres to Elasticsearch/OpenSearch sync
paradedb - Postgres for Search and Analytics
qdrant - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/