Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Document Open-Source Projects
-
SurveyJS
Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
-
docx
Easily generate and modify .docx files with JS/TS with a nice declarative API. Works for Node and on the Browser.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Teedy
Lightweight document management system packed with all the features you can expect from big expensive solutions (by sismics)
-
Docspell
Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
-
react-native-document-scanner
Document scanner, features live border detection, perspective correction, image filters and more ! 📲📸
-
awesome-flutter
💗 A curated list of awesome Flutter libraries, tools, tutorials, articles and more.. All you should know about Flutter development! (by nepaul)
-
arcadedb
ArcadeDB Multi-Model Database, one DBMS that supports SQL, Cypher, Gremlin, HTTP/JSON, MongoDB and Redis. ArcadeDB is a conceptual fork of OrientDB, the first Multi-Model DBMS. ArcadeDB supports Vector Embeddings.
-
hms-ml-demo
HMS ML Demo provides an example of integrating Huawei ML Kit service into applications. This example demonstrates how to integrate services provided by ML Kit, such as face detection, text recognition, image segmentation, asr, and tts.
-
EveryDocs
A simple Document Management System for private use with basic functionality to organize your documents digitally
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
My main authoring tool is then Emacs Markdown Mode (https://jblevins.org/projects/markdown-mode/). For data entry, it comes with some bells and whistles similar to org-mode, like C-c C-l for inserting links etc.
I seldom export my notes for external usage, but if it is the case, I use lowdown (https://kristaps.bsd.lv/lowdown/) which also comes with some nice output targets (among the more unusual are Groff and Terminal). Of cource pandoc (https://pandoc.org/) does a very good job here, too.
if you have postgres, just use https://github.com/FerretDB/FerretDB
Project mention: Launch HN: Onedoc (YC W24) – A better way to create PDFs | news.ycombinator.com | 2024-03-11I'm facing that same pain point of programmatic PDF filling. I noodled around in the PDF format and learned it's a bit difficult to deal with fonts and formatting. But I think this client-side library works well enough, as a start: https://pdf-lib.js.org/#:~:text=a%20single%20document.-,Fill...
I've also heard of one paid API that I forgot but seemed to work well, and this related service https://www.jotform.com/, and I also considered porting some server-side libraries to WASM. One day I'll collect all the libraries and findings in a blog post.
Are you looking to programmatically fill any PDF form by detecting the fields? Or are you filling one known PDF template?
I'm part of the team that build LlamaParse. It's net improvement compare to other PDF->Structured Text extractors (I build several in the past, includig https://github.com/axa-group/Parsr).
For character extraction, LlamaParse use a mixture of OCR / character extraction from the PDF (it's the only parser I'm aware of that address some of the buggy PDF font issues, check the 'text' mode to see raw document before reconstruction), use a mixture of heuristic and Machine learning models to reconstruct the document.
Once plug with a Recursive retrieval strategy, allow you to get Sota result on question answering over complexe text (see notebook: https://github.com/run-llama/llama_parse/blob/main/examples/...).
AMA
Project mention: Read-only document management recommendations (no paperless-*) | /r/selfhosted | 2023-06-04Maybe Docspell?
Project mention: Looking for a plugin to convert scantron forms/multiple choice grids into CSV... | /r/OpenAI | 2023-09-22
Project mention: Svelte Native: The Svelte Mobile Development Experience | news.ycombinator.com | 2024-01-29It's being used here: https://github.com/Akylas/OSS-DocumentScanner
Project mention: ArcadeDB: Multi-Model Database Supporting Graphs, KV, Documents, TS, and Vectors | news.ycombinator.com | 2024-01-04
Document related posts
- ArcadeDB: Multi-Model Database Supporting Graphs, KV, Documents, TS, and Vectors
- OSS Document Scanner
- Show HN: PrivatePDF – minimal PDF editor that runs in the browser
- OSS Document Scanner (nice app! Check it!)
- Docutron Toolkit: detection and segmentation analysis for legal data extraction over documents
- Show HN: Nebra – Type-Safe NoSQL with Node and SQLite
- [PHPxLaravel] DocKing: Your shared-microservice that takes over the document templates management & render/export PDF
-
A note from our sponsor - InfluxDB
www.influxdata.com | 26 Apr 2024
Index
What are some of the best open-source Document projects? This list will help you:
Project | Stars | |
---|---|---|
1 | pandoc | 32,396 |
2 | Etherpad | 15,824 |
3 | FerretDB | 8,509 |
4 | pdf-lib | 6,238 |
5 | Parsr | 5,645 |
6 | docx | 3,926 |
7 | PSD.rb | 3,123 |
8 | Teedy | 1,772 |
9 | zathura | 1,705 |
10 | Docspell | 1,442 |
11 | javascript-sdk-design | 1,406 |
12 | OpenScan | 1,396 |
13 | react-native-document-scanner | 822 |
14 | OMRChecker | 660 |
15 | awesome-flutter | 600 |
16 | rust-library-i18n | 531 |
17 | OSS-DocumentScanner | 488 |
18 | arcadedb | 440 |
19 | hms-ml-demo | 347 |
20 | DB3 | 340 |
21 | PHPMongo | 243 |
22 | Lassi | 185 |
23 | EveryDocs | 179 |
Sponsored