RAG Using Unstructured Data and Role of Knowledge Graphs

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

txtai

354 6,953 9.3 Python

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

If you're interested in graphs + RAG and want an alternate approach, txtai has a semantic graph component.
https://neuml.hashnode.dev/introducing-the-semantic-graph
https://github.com/neuml/txtai
Disclaimer: I'm the primary author of txtai

NaLLM

2 931 7.4 TypeScript

Repository for the NaLLM project

The article is a good summary of RAG in the enterprise. It shed some light for me on the quality of building KG using LLMs, as recently, it is an approach that Neo4j was proposing [0].
According to the article, it is either costly (if using OpenAI), or slow using open source AI models. In both cases, predicting the quality of generated KG using LLMs is hard.
[0] https://github.com/neo4j/NaLLM

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
OpenNRE

3 4,241 6.5 Python

An Open-Source Package for Neural Relation Extraction (NRE)

OpenNRE (https://github.com/thunlp/OpenNRE) is another good approach to neural relation extraction, though it's slightly dated. What would be particularly interesting is to combine models like OpenNRE or SpanMarker with entity-linking models to construct KG triples. And a solid, scalable graph database underneath would make for a great knowledge base that can be constructed from unstructured text.

tantivy

48 9,896 9.1 Rust

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

By this I presume you mean build a search index that can retrieve results based on keywords? I know certain databases use Lucene to build a keyword-based index on top of unstructured blobs of data. Another alternative is to use Tantivy (https://github.com/quickwit-oss/tantivy), a Rust version of Lucene, if building search indices via Java isn't your cup of tea :)
Both libraries offer multilingual support for keywords, I believe, so that's a benefit to vector search where multilingual embedding models are rather expensive.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project