MarkItDown: Python tool for converting files and office documents to Markdown

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Judoscale - Save 47% on cloud hosting with autoscaling that just works
Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
judoscale.com
featured
InfluxDB high-performance time series database
Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
influxdata.com
featured
  1. markitdown

    Python tool for converting files and office documents to Markdown.

    For PDFs it's entirely a wrapper around https://pdfminersix.readthedocs.io/en/latest/tutorial/highle... - https://github.com/microsoft/markitdown/blob/main/src/markit...

    So if that's your use case, PDFMiner might be better to integrate with directly!

  2. Judoscale

    Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.

    Judoscale logo
  3. docling

    Get your documents ready for gen AI

    Quite curious how this compares to docling - https://github.com/DS4SD/docling

    docling uses an LLM IIRC, so that's already a difference in approach

  4. pandoc

    Universal markup converter

    Pandoc (https://pandoc.org) can be used to convert a .docx file to markdown and other file formats like djot and typst. I don't think pandoc can convert powerpoint and excel files.

  5. Awesome-Tabular-LLMs

    We collect papers about "large language models (LLM) for table-related tasks", e.g., using LLM for Table QA task. “表格+LLM”相关论文整理

    This is an active area of research: https://github.com/SpursGoZmy/Awesome-Tabular-LLMs is a good starting point!

  6. vim-office

    read common binary files, such as PDFs and those of Microsoft Office or LibreOffice, in Vim

    Looking at its [source], it indeed seems to be a wrapper to python variants of those. Making the pool smaller can hardly improve the output.

    [here] https://github.com/Konfekt/vim-office

  7. python-mammoth

    Convert Word documents (.docx files) to HTML

    And the core code mostly calls other libraries for heavy lifting -- eg `mammoth`: https://github.com/mwilliamson/python-mammoth

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • How I'm Automating `resume.pdf` creation with Git Hooks and Pandoc

    1 project | dev.to | 1 Jan 2025
  • NOTE: How to Install LaTeX and Pandoc on Ubuntu

    1 project | dev.to | 5 Dec 2024
  • The biggest blocker to LibreOffice adoption? LibreOffice

    2 projects | news.ycombinator.com | 23 Nov 2024
  • John Carmack on Inlined Code

    2 projects | news.ycombinator.com | 9 Oct 2024
  • Terminal-based presentations using Pandoc

    1 project | news.ycombinator.com | 3 Aug 2024