Python document-intelligence

Open-source Python projects categorized as document-intelligence

Python document-intelligence Projects

document-intelligence
  1. PaddleNLP

    Easy-to-use and powerful LLM and SLM library with awesome model zoo.

  2. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
  3. contextgem

    ContextGem: Effortless LLM extraction from documents

    Project mention: Transform DOCX into LLM-ready data | news.ycombinator.com | 2025-05-04

    As part of work on my open-source project ContextGem, I've built a native, zero-dependency DOCX converter that transforms Word documents into LLM-ready data.

    This custom-built converter directly processes Word XML, provides comprehensive content extraction + covers what other open-source tools often miss or lack support for:

    - Rich paragraph and sentence metadata for enhanced context

    - Misaligned tables

    - Comments, footnotes, and textboxes

    - Embedded images

    The converted document can then be easily used in ContextGem's LLM extraction workflows.

    Perfect for developers building contract intelligence applications where precision matters. The converter preserves document structure and relationships, empowering LLMs to better understand and analyze document content.

    Try it / share with your dev team today and see the difference in your document processing pipeline!

    GitHub: https://github.com/shcherbak-ai/contextgem

    All DocxConverter features: https://contextgem.dev/converters/docx.html

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python document-intelligence discussion

Log in or Post with

Python document-intelligence related posts

  • Transform DOCX into LLM-ready data

    2 projects | news.ycombinator.com | 4 May 2025
  • I Built an Open-Source Framework to Make LLM Data Extraction Dead Simple

    1 project | dev.to | 2 May 2025

Index

# Project Stars
1 PaddleNLP 12,690
2 contextgem 1,248

Sponsored
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video.
Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
getstream.io