unstructured-data

Top 14 unstructured-data Open-Source Projects

  • towhee

    Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

  • Project mention: FLaNK Stack Weekly for 14 Aug 2023 | dev.to | 2023-08-14
  • bootcamp

    Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc. (by milvus-io)

  • Project mention: FLaNK AI - 01 April 2024 | dev.to | 2024-04-01
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • awesome-document-understanding

    A curated list of resources for Document Understanding (DU) topic

  • spotlight

    Interactively explore unstructured datasets from your dataframe. (by Renumics)

  • Project mention: Renumics/spotlight: Interactively explore unstructured datasets from dataframes | news.ycombinator.com | 2024-03-10
  • Nuclia DB

    NucliaDB, The AI Search database for RAG

  • Project mention: Tantivy 0.20 is released: Schemaless column store, Schemaless aggregations, Phrase prefix queries, Percentiles, and more... | /r/rust | 2023-06-20

    You have also NucliaDB that is built on top of tantivy and addresses vector search for documents and video search.

  • cursusdb

    CursusDB is an open-source distributed in-memory yet persisted document oriented database system with real time capabilities.

  • Project mention: CursusDB: Fast, open-source document oriented database with SQL like query | news.ycombinator.com | 2024-01-08
  • trex

    Enforce structured output from LLMs 100% of the time (by automorphic-ai)

  • Project mention: Show HN: Generate JSON mock data for testing/initial app development | news.ycombinator.com | 2023-10-03

    A friend of mine built a tool called Trex that you might find helpful, check it out here: https://github.com/automorphic-ai/trex

    It's very consistent at generating templated data.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • relevanceai

    Home of the AI workforce - Multi-agent system, AI agents & tools

  • dkm

    Dynamic Kernel Matching (DKM) for Classifying Data with Non-conforming Features

  • unstract

    No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents

  • Project mention: Ask HN: Is RAG the Future of LLMs? | news.ycombinator.com | 2024-04-14

    Fast changing libraries are a huge pain. That's why a no-code approach like Unstract (https://github.com/zipstack/unstract) makes sense.

  • base

    Adansons Base is a data programming tool for error-analysis of training results. It organizes metadata of unstructured data and creates and organizes datasets. It makes dataset creation more effective and helps to find low-quality data by using the training results and improves AI performance. (by adansons)

  • deprecated-core

    🔮 Instill Core contains components for supporting Instill VDP and Instill Model

  • Project mention: Building an Instill AI Pipeline in 5 minutes | dev.to | 2023-10-22

    Step 1: Log in to your InstillAI Cloud account. If you don't have an account yet, you can create one here for free using your Email or Google or GitHub ID.

  • html_tag_annotator

    A Machine Learning tool to create the training dataset very quickly & easily by using a smart chrome extension

  • etl-texts

    ETL-Texts aims to be a simple and efficient pipeline designed for extracting, translating, cleaning, and transforming text files.

  • Project mention: ETL Texts | news.ycombinator.com | 2024-01-14
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

unstructured-data related posts

Index

What are some of the best open-source unstructured-data projects? This list will help you:

Project Stars
1 towhee 2,989
2 bootcamp 1,619
3 awesome-document-understanding 1,115
4 spotlight 1,010
5 Nuclia DB 571
6 cursusdb 411
7 trex 238
8 relevanceai 97
9 dkm 95
10 unstract 90
11 base 28
12 deprecated-core 13
13 html_tag_annotator 12
14 etl-texts 5

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com