docext

An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/) (by NanoNets)

Docext Alternatives

Similar projects and alternatives to docext

  1. marker

    34 docext VS marker

    Convert PDF to markdown + JSON quickly with high accuracy

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. docling

    30 docext VS docling

    Get your documents ready for gen AI

  4. doctor

    1 docext VS doctor

    A microservice for document conversion at scale (by freelawproject)

  5. table-transformer

    🔍 Table Extraction Tool: A powerful open-source solution combining OCR and computer vision for extracting structured tabular data from images. Ideal for LLM preprocessing, data analysis, and automation. 🚀 (by Sudhanshu1304)

  6. contextgem

    3 docext VS contextgem

    ContextGem: Effortless LLM extraction from documents

  7. awesome-document-understanding

    A curated list of resources for Document Understanding (DU) topic

  8. extractous

    Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

  9. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
  10. pdf-craft

    1 docext VS pdf-craft

    PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.

  11. MindsDB

    90 docext VS MindsDB

    AI's query engine - Platform for building AI that can answer questions over large scale federated data. - The only MCP Server you'll ever need

  12. pydoxtools

    Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better docext alternative or higher similarity.

docext discussion

Log in or Post with

docext reviews and mentions

Posts with mentions or reviews of docext. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2025-06-15.

Stats

Basic docext repo stats
5
1,448
9.5
15 days ago

NanoNets/docext is an open source project licensed under MIT License which is an OSI approved license.

The primary programming language of docext is Python.


Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?