Document

Top 23 Document Open-Source Projects

  • pandoc

    Universal markup converter

  • Project mention: Beautifying Org Mode in Emacs (2018) | news.ycombinator.com | 2024-04-15

    My main authoring tool is then Emacs Markdown Mode (https://jblevins.org/projects/markdown-mode/). For data entry, it comes with some bells and whistles similar to org-mode, like C-c C-l for inserting links etc.

    I seldom export my notes for external usage, but if it is the case, I use lowdown (https://kristaps.bsd.lv/lowdown/) which also comes with some nice output targets (among the more unusual are Groff and Terminal). Of cource pandoc (https://pandoc.org/) does a very good job here, too.

  • Etherpad

    Etherpad: A modern really-real-time collaborative document editor.

  • Project mention: Edit This Blog Post | news.ycombinator.com | 2024-02-06
  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • FerretDB

    A truly Open Source MongoDB alternative

  • Project mention: Figma's Databases team lived to tell the scale | news.ycombinator.com | 2024-03-14

    if you have postgres, just use https://github.com/FerretDB/FerretDB

  • pdf-lib

    Create and modify PDF documents in any JavaScript environment

  • Project mention: Launch HN: Onedoc (YC W24) – A better way to create PDFs | news.ycombinator.com | 2024-03-11

    I'm facing that same pain point of programmatic PDF filling. I noodled around in the PDF format and learned it's a bit difficult to deal with fonts and formatting. But I think this client-side library works well enough, as a start: https://pdf-lib.js.org/#:~:text=a%20single%20document.-,Fill...

    I've also heard of one paid API that I forgot but seemed to work well, and this related service https://www.jotform.com/, and I also considered porting some server-side libraries to WASM. One day I'll collect all the libraries and findings in a blog post.

    Are you looking to programmatically fill any PDF form by detecting the fields? Or are you filling one known PDF template?

  • Parsr

    Transforms PDF, Documents and Images into Enriched Structured Data

  • Project mention: LlamaCloud and LlamaParse | news.ycombinator.com | 2024-02-20

    I'm part of the team that build LlamaParse. It's net improvement compare to other PDF->Structured Text extractors (I build several in the past, includig https://github.com/axa-group/Parsr).

    For character extraction, LlamaParse use a mixture of OCR / character extraction from the PDF (it's the only parser I'm aware of that address some of the buggy PDF font issues, check the 'text' mode to see raw document before reconstruction), use a mixture of heuristic and Machine learning models to reconstruct the document.

    Once plug with a Recursive retrieval strategy, allow you to get Sota result on question answering over complexe text (see notebook: https://github.com/run-llama/llama_parse/blob/main/examples/...).

    AMA

  • docx

    Easily generate and modify .docx files with JS/TS with a nice declarative API. Works for Node and on the Browser.

  • Project mention: Ajuda docx | /r/programacao | 2023-06-19
  • PSD.rb

    Parse Photoshop files in Ruby with ease

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Teedy

    Lightweight document management system packed with all the features you can expect from big expensive solutions (by sismics)

  • zathura

    a document viewer

  • Docspell

    Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.

  • Project mention: Read-only document management recommendations (no paperless-*) | /r/selfhosted | 2023-06-04

    Maybe Docspell?

  • javascript-sdk-design

    JavaScript SDK Design Guide extracted from work and personal experience

  • OpenScan

    A privacy-friendly Document Scanner app

  • react-native-document-scanner

    Document scanner, features live border detection, perspective correction, image filters and more ! 📲📸

  • OMRChecker

    Evaluate OMR sheets fast and accurately using a scanner 🖨 or your phone 🤳.

  • Project mention: Looking for a plugin to convert scantron forms/multiple choice grids into CSV... | /r/OpenAI | 2023-09-22
  • awesome-flutter

    💗 A curated list of awesome Flutter libraries, tools, tutorials, articles and more.. All you should know about Flutter development! (by nepaul)

  • rust-library-i18n

    Rust 核心库和标准库中文翻译,可作为 IDE 工具的智能提示,并生成本地 API 文档

  • OSS-DocumentScanner

    Android document document scanning app

  • Project mention: Svelte Native: The Svelte Mobile Development Experience | news.ycombinator.com | 2024-01-29

    It's being used here: https://github.com/Akylas/OSS-DocumentScanner

  • arcadedb

    ArcadeDB Multi-Model Database, one DBMS that supports SQL, Cypher, Gremlin, HTTP/JSON, MongoDB and Redis. ArcadeDB is a conceptual fork of OrientDB, the first Multi-Model DBMS. ArcadeDB supports Vector Embeddings.

  • Project mention: ArcadeDB: Multi-Model Database Supporting Graphs, KV, Documents, TS, and Vectors | news.ycombinator.com | 2024-01-04
  • hms-ml-demo

    HMS ML Demo provides an example of integrating Huawei ML Kit service into applications. This example demonstrates how to integrate services provided by ML Kit, such as face detection, text recognition, image segmentation, asr, and tts.

  • DB3

    a Lightweight, Permanent JSON document database

  • PHPMongo

    MongoDB ODM. Part of @PHPMongoKit

  • Lassi

    All in 1 picker library for android.

  • EveryDocs

    A simple Document Management System for private use with basic functionality to organize your documents digitally

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Document related posts

Index

What are some of the best open-source Document projects? This list will help you:

Project Stars
1 pandoc 32,396
2 Etherpad 15,824
3 FerretDB 8,509
4 pdf-lib 6,238
5 Parsr 5,645
6 docx 3,926
7 PSD.rb 3,123
8 Teedy 1,772
9 zathura 1,705
10 Docspell 1,442
11 javascript-sdk-design 1,406
12 OpenScan 1,396
13 react-native-document-scanner 822
14 OMRChecker 660
15 awesome-flutter 600
16 rust-library-i18n 531
17 OSS-DocumentScanner 488
18 arcadedb 440
19 hms-ml-demo 347
20 DB3 340
21 PHPMongo 243
22 Lassi 185
23 EveryDocs 179

Sponsored
Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com