code-indexer-loop
retake
code-indexer-loop | retake | |
---|---|---|
3 | 4 | |
161 | 757 | |
5.6% | - | |
6.3 | 10.0 | |
about 1 month ago | 8 months ago | |
Python | Rust | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
code-indexer-loop
- Python library for indexing and retrieving source code files through an integrated vector database (not mine)
-
Show HN: Code Indexer Loop
Sweep is mentioned as attribution in multiple place a) https://github.com/definitive-io/code-indexer-loop#attributi... b) https://github.com/definitive-io/code-indexer-loop/blob/fd9d...
The difference is packaging it as a consumable PyPI package that can easily be used in a project (they even call out for separating this out into a stand alone project but that they lack the time to do so): https://docs.sweep.dev/blogs/chunking-2m-files#future-
In addition, we expand and fix the implementation, for example it now supports limiting on token count instead of character count, and we fix some white space inconsistencies in parsing/chunk reconstruction.
retake
-
Show HN: Retake – Open-Source Hybrid Search for Postgres
https://github.com/getretake/retake/pull/198 is a refreshing change given the recent rug pulls, so thank you for that
-
We created an open-source semantic search Python package on top of Postgres
We found it difficult to do well with standard vector databases and so we ended up making a nice open-source package to layer semantic search on top of Postgres with just a few lines of code. It supports Python backends right now, always stays in sync with Postgres via Kafka, doubles as a vector store, and can be deployed anywhere.
- Show HN: Open-Source Infrastructure for Vector Data Streams
What are some alternatives?
bor - User-friendly, tiny source code searcher written by pure Python.
bionicgpt - BionicGPT is an on-premise replacement for ChatGPT, offering the advantages of Generative AI while maintaining strict data confidentiality [Moved to: https://github.com/bionic-gpt/bionic-gpt]
flit - Simplified packaging of Python modules
nfcompose - Build REST APIs/Integrations in minutes instead of hours - NF Compose is a (data) integration platform that allows developers to define REST APIs in seconds instead of hours. Generated REST APIs are backed by postgres and support automatic consumer webhook notifications on data changes out of the box.
Resume-Matcher - Resume Matcher is an open source, free tool to improve your resume. It works by using language models to compare and rank resumes with job descriptions.
embedditor - ⚡ GUI for editing LLM vector embeddings. No more blind chunking. Upload content in any file extension, join and split chunks, edit metadata and embedding tokens + remove stop-words and punctuation with one click, add images, and download in .veml to share it with your team.
vectorflow - VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.
tinyvector - A tiny embedding database in pure Rust.
pgsync - Postgres to Elasticsearch/OpenSearch sync
prism - Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
paradedb - Postgres for Search and Analytics
qdrant - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/