Python llm-extraction

Open-source Python projects categorized as llm-extraction

Top 3 Python llm-extraction Projects

llm-extraction
  1. contextgem

    ContextGem: Effortless LLM extraction from documents

    Project mention: Transform DOCX into LLM-ready data | news.ycombinator.com | 2025-05-04

    As part of work on my open-source project ContextGem, I've built a native, zero-dependency DOCX converter that transforms Word documents into LLM-ready data.

    This custom-built converter directly processes Word XML, provides comprehensive content extraction + covers what other open-source tools often miss or lack support for:

    - Rich paragraph and sentence metadata for enhanced context

    - Misaligned tables

    - Comments, footnotes, and textboxes

    - Embedded images

    The converted document can then be easily used in ContextGem's LLM extraction workflows.

    Perfect for developers building contract intelligence applications where precision matters. The converter preserves document structure and relationships, empowering LLMs to better understand and analyze document content.

    Try it / share with your dev team today and see the difference in your document processing pipeline!

    GitHub: https://github.com/shcherbak-ai/contextgem

    All DocxConverter features: https://contextgem.dev/converters/docx.html

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. validex

    Simplifies the retrieval, extraction, and training of structured data from various unstructured sources.

    Project mention: ValidEx – Structured Data Extraction Library | news.ycombinator.com | 2025-03-27
  4. sdk

    Lightfeed SDK to search and filter web data (by lightfeed)

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python llm-extraction discussion

Log in or Post with

Index

What are some of the best open-source llm-extraction projects in Python? This list will help you:

# Project Stars
1 contextgem 1,237
2 validex 143
3 sdk 5

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?