RAG Using Unstructured Data and Role of Knowledge Graphs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • txtai

    💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

  • If you're interested in graphs + RAG and want an alternate approach, txtai has a semantic graph component.

    https://neuml.hashnode.dev/introducing-the-semantic-graph

    https://github.com/neuml/txtai

    Disclaimer: I'm the primary author of txtai

  • NaLLM

    Repository for the NaLLM project

  • The article is a good summary of RAG in the enterprise. It shed some light for me on the quality of building KG using LLMs, as recently, it is an approach that Neo4j was proposing [0].

    According to the article, it is either costly (if using OpenAI), or slow using open source AI models. In both cases, predicting the quality of generated KG using LLMs is hard.

    [0] https://github.com/neo4j/NaLLM

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • OpenNRE

    An Open-Source Package for Neural Relation Extraction (NRE)

  • OpenNRE (https://github.com/thunlp/OpenNRE) is another good approach to neural relation extraction, though it's slightly dated. What would be particularly interesting is to combine models like OpenNRE or SpanMarker with entity-linking models to construct KG triples. And a solid, scalable graph database underneath would make for a great knowledge base that can be constructed from unstructured text.

  • tantivy

    Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

  • By this I presume you mean build a search index that can retrieve results based on keywords? I know certain databases use Lucene to build a keyword-based index on top of unstructured blobs of data. Another alternative is to use Tantivy (https://github.com/quickwit-oss/tantivy), a Rust version of Lucene, if building search indices via Java isn't your cup of tea :)

    Both libraries offer multilingual support for keywords, I believe, so that's a benefit to vector search where multilingual embedding models are rather expensive.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts