Document Processing

Open-source projects categorized as Document Processing

Top 8 Document Processing Open-Source Projects

  • docx4j

    JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files

  • Apache POI

    Mirror of Apache POI

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • fastexcel

    Generate and read big Excel files quickly

  • documents4j

    documents4j is a Java library for converting documents into another document format

  • formkiq-core

    A full-featured Document Layer for your application, providing the functionality of a flexible document management system, including storage, discovery, processing, and retrieval. Deploys directly into your Amazon Web Services Cloud. 🌟 Star to support our work!

  • Project mention: A Clutter-Free Life: Going Paperless with Paperless-Ngx | news.ycombinator.com | 2023-10-07

    We may want to get in touch with each other. We have an Open Core document management platform that runs in AWS; I'm not sure about your roadmap, but there may be something there that's of use: https://github.com/formkiq/formkiq-core

  • zerocell

    Simple, efficient Excel to POJO library for Java

  • pandoc-include

    An include filter for Pandoc

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • parsee-core

    Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular data extraction and multimodal queries.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Document Processing related posts

  • It seems like almost everyone here is working on a SaaS for other SaaS bootstrappers —- is anyone building a product for a vertical outside of email/marketing/forms/dev tools/productivity?

    1 project | /r/SaaS | 6 Jun 2023
  • Anyone using AI for enterprise content management?

    1 project | /r/managers | 31 May 2023
  • [D] Is there any way to filter searches by metadata over current vector DBs like Pinecone?

    2 projects | /r/MachineLearning | 30 May 2023
  • Does anyone have ideas on how to reach out to other startups to pitch our startup program?

    1 project | /r/startups | 19 Apr 2023
  • Show HN: Build your perfect document management system using Open Core software

    1 project | news.ycombinator.com | 19 Apr 2023
  • Email filing & automation methods & systems

    1 project | /r/paralegal | 12 Apr 2023
  • Building SaaS for document management for smb and corporates

    1 project | /r/SaaS | 8 Mar 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 2 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Document Processing projects? This list will help you:

Project Stars
1 docx4j 2,027
2 Apache POI 1,835
3 fastexcel 603
4 documents4j 524
5 formkiq-core 91
6 zerocell 78
7 pandoc-include 60
8 parsee-core 21

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com