Top 23 Document Open-Source Projects

pandoc

420 32,396 9.8 Haskell

Universal markup converter

Project mention: Beautifying Org Mode in Emacs (2018) | news.ycombinator.com | 2024-04-15

My main authoring tool is then Emacs Markdown Mode (https://jblevins.org/projects/markdown-mode/). For data entry, it comes with some bells and whistles similar to org-mode, like C-c C-l for inserting links etc.
I seldom export my notes for external usage, but if it is the case, I use lowdown (https://kristaps.bsd.lv/lowdown/) which also comes with some nice output targets (among the more unusual are Groff and Terminal). Of cource pandoc (https://pandoc.org/) does a very good job here, too.

Etherpad

45 15,824 9.8 JavaScript

Etherpad: A modern really-real-time collaborative document editor.

Project mention: Edit This Blog Post | news.ycombinator.com | 2024-02-06

SurveyJS

surveyjs.io sponsored

Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
FerretDB

43 8,509 9.8 Go

A truly Open Source MongoDB alternative

Project mention: Figma's Databases team lived to tell the scale | news.ycombinator.com | 2024-03-14

if you have postgres, just use https://github.com/FerretDB/FerretDB

pdf-lib

23 6,238 0.0 TypeScript

Create and modify PDF documents in any JavaScript environment

Project mention: Launch HN: Onedoc (YC W24) – A better way to create PDFs | news.ycombinator.com | 2024-03-11

I'm facing that same pain point of programmatic PDF filling. I noodled around in the PDF format and learned it's a bit difficult to deal with fonts and formatting. But I think this client-side library works well enough, as a start: https://pdf-lib.js.org/#:~:text=a%20single%20document.-,Fill...
I've also heard of one paid API that I forgot but seemed to work well, and this related service https://www.jotform.com/, and I also considered porting some server-side libraries to WASM. One day I'll collect all the libraries and findings in a blog post.
Are you looking to programmatically fill any PDF form by detecting the fields? Or are you filling one known PDF template?

Parsr

7 5,645 4.6 JavaScript

Transforms PDF, Documents and Images into Enriched Structured Data

Project mention: LlamaCloud and LlamaParse | news.ycombinator.com | 2024-02-20

I'm part of the team that build LlamaParse. It's net improvement compare to other PDF->Structured Text extractors (I build several in the past, includig https://github.com/axa-group/Parsr).
For character extraction, LlamaParse use a mixture of OCR / character extraction from the PDF (it's the only parser I'm aware of that address some of the buggy PDF font issues, check the 'text' mode to see raw document before reconstruction), use a mixture of heuristic and Machine learning models to reconstruct the document.
Once plug with a Recursive retrieval strategy, allow you to get Sota result on question answering over complexe text (see notebook: https://github.com/run-llama/llama_parse/blob/main/examples/...).
AMA

docx

11 3,926 9.0 TypeScript

Easily generate and modify .docx files with JS/TS with a nice declarative API. Works for Node and on the Browser.

Project mention: Ajuda docx | /r/programacao | 2023-06-19

PSD.rb

0 3,123 0.0 Ruby

Parse Photoshop files in Ruby with ease
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Teedy

23 1,772 6.6 JavaScript

Lightweight document management system packed with all the features you can expect from big expensive solutions (by sismics)
zathura

11 1,705 8.7 C

a document viewer
Docspell

33 1,442 9.5 Elm

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.

Project mention: Read-only document management recommendations (no paperless-*) | /r/selfhosted | 2023-06-04

Maybe Docspell?

javascript-sdk-design

1 1,406 0.0 JavaScript

JavaScript SDK Design Guide extracted from work and personal experience
OpenScan

27 1,396 6.5 C++

A privacy-friendly Document Scanner app
react-native-document-scanner

1 822 0.0 Java

Document scanner, features live border detection, perspective correction, image filters and more ! 📲📸
OMRChecker

5 660 4.3 Python

Evaluate OMR sheets fast and accurately using a scanner 🖨 or your phone 🤳.

Project mention: Looking for a plugin to convert scantron forms/multiple choice grids into CSV... | /r/OpenAI | 2023-09-22

awesome-flutter

6 600 6.5

💗 A curated list of awesome Flutter libraries, tools, tutorials, articles and more.. All you should know about Flutter development! (by nepaul)
rust-library-i18n

1 531 3.4 Rust

Rust 核心库和标准库中文翻译，可作为 IDE 工具的智能提示，并生成本地 API 文档
OSS-DocumentScanner

4 488 9.9 C++

Android document document scanning app

Project mention: Svelte Native: The Svelte Mobile Development Experience | news.ycombinator.com | 2024-01-29

It's being used here: https://github.com/Akylas/OSS-DocumentScanner

arcadedb

4 440 9.7 JavaScript

ArcadeDB Multi-Model Database, one DBMS that supports SQL, Cypher, Gremlin, HTTP/JSON, MongoDB and Redis. ArcadeDB is a conceptual fork of OrientDB, the first Multi-Model DBMS. ArcadeDB supports Vector Embeddings.

Project mention: ArcadeDB: Multi-Model Database Supporting Graphs, KV, Documents, TS, and Vectors | news.ycombinator.com | 2024-01-04

hms-ml-demo

4 347 4.5 Java

HMS ML Demo provides an example of integrating Huawei ML Kit service into applications. This example demonstrates how to integrate services provided by ML Kit, such as face detection, text recognition, image segmentation, asr, and tts.
DB3

20 340 7.6 Rust

a Lightweight, Permanent JSON document database
PHPMongo

0 243 0.0 PHP

MongoDB ODM. Part of @PHPMongoKit
Lassi

0 185 7.5 Kotlin

All in 1 picker library for android.
EveryDocs

0 179 8.1 Ruby

A simple Document Management System for private use with basic functionality to organize your documents digitally
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Document related posts

ArcadeDB: Multi-Model Database Supporting Graphs, KV, Documents, TS, and Vectors
1 project | news.ycombinator.com | 4 Jan 2024
OSS Document Scanner
1 project | news.ycombinator.com | 2 Jan 2024
Show HN: PrivatePDF – minimal PDF editor that runs in the browser
2 projects | news.ycombinator.com | 27 Dec 2023
OSS Document Scanner (nice app! Check it!)
1 project | /r/fossdroid | 8 Dec 2023
Docutron Toolkit: detection and segmentation analysis for legal data extraction over documents
1 project | /r/Python | 24 Oct 2023
Show HN: Nebra – Type-Safe NoSQL with Node and SQLite
1 project | news.ycombinator.com | 18 Aug 2023
[PHPxLaravel] DocKing: Your shared-microservice that takes over the document templates management & render/export PDF
1 project | /r/opensource | 27 Jul 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 26 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Document projects? This list will help you:

	Project	Stars
1	pandoc	32,396
2	Etherpad	15,824
3	FerretDB	8,509
4	pdf-lib	6,238
5	Parsr	5,645
6	docx	3,926
7	PSD.rb	3,123
8	Teedy	1,772
9	zathura	1,705
10	Docspell	1,442
11	javascript-sdk-design	1,406
12	OpenScan	1,396
13	react-native-document-scanner	822
14	OMRChecker	660
15	awesome-flutter	600
16	rust-library-i18n	531
17	OSS-DocumentScanner	488
18	arcadedb	440
19	hms-ml-demo	347
20	DB3	340
21	PHPMongo	243
22	Lassi	185
23	EveryDocs	179