Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 14 unstructured-data Open-Source Projects
-
towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
-
bootcamp
Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc. (by milvus-io)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
cursusdb
CursusDB is an open-source distributed in-memory yet persisted document oriented database system with real time capabilities.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
base
Adansons Base is a data programming tool for error-analysis of training results. It organizes metadata of unstructured data and creates and organizes datasets. It makes dataset creation more effective and helps to find low-quality data by using the training results and improves AI performance. (by adansons)
-
html_tag_annotator
A Machine Learning tool to create the training dataset very quickly & easily by using a smart chrome extension
-
etl-texts
ETL-Texts aims to be a simple and efficient pipeline designed for extracting, translating, cleaning, and transforming text files.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Renumics/spotlight: Interactively explore unstructured datasets from dataframes | news.ycombinator.com | 2024-03-10
Project mention: Tantivy 0.20 is released: Schemaless column store, Schemaless aggregations, Phrase prefix queries, Percentiles, and more... | /r/rust | 2023-06-20You have also NucliaDB that is built on top of tantivy and addresses vector search for documents and video search.
Project mention: CursusDB: Fast, open-source document oriented database with SQL like query | news.ycombinator.com | 2024-01-08
Project mention: Show HN: Generate JSON mock data for testing/initial app development | news.ycombinator.com | 2023-10-03A friend of mine built a tool called Trex that you might find helpful, check it out here: https://github.com/automorphic-ai/trex
It's very consistent at generating templated data.
Fast changing libraries are a huge pain. That's why a no-code approach like Unstract (https://github.com/zipstack/unstract) makes sense.
Step 1: Log in to your InstillAI Cloud account. If you don't have an account yet, you can create one here for free using your Email or Google or GitHub ID.
unstructured-data related posts
- Show HN: LLMWhisperer – Prep complex documents ready for use in LLMs
- RAGFlow is an open-source RAG engine based on deep document understanding
- CursusDB: Fast, open-source document oriented database with SQL like query
- Milvus Adventures Jan 5, 2023
- A new open-source distributed in-memory and persisted document oriented DBMS
- CursusDB – Distributed document oriented DBMS with an SQL like query language
- How to approach databases inside Next.js?
-
A note from our sponsor - InfluxDB
www.influxdata.com | 27 Apr 2024
Index
What are some of the best open-source unstructured-data projects? This list will help you:
Project | Stars | |
---|---|---|
1 | towhee | 2,989 |
2 | bootcamp | 1,619 |
3 | awesome-document-understanding | 1,115 |
4 | spotlight | 1,010 |
5 | Nuclia DB | 571 |
6 | cursusdb | 411 |
7 | trex | 238 |
8 | relevanceai | 97 |
9 | dkm | 95 |
10 | unstract | 90 |
11 | base | 28 |
12 | deprecated-core | 13 |
13 | html_tag_annotator | 12 |
14 | etl-texts | 5 |
Sponsored