Python Extract

Open-source Python projects categorized as Extract

Top 15 Python Extract Projects

  1. video-subtitle-extractor

    视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

  2. Judoscale

    Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.

    Judoscale logo
  3. dlt

    data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

    Project mention: Data Loading Tool | news.ycombinator.com | 2024-12-14
  4. text-extract-api

    Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown

    Project mention: PDF Extract API Using Ollama with Anonymization and PII Removal | news.ycombinator.com | 2025-01-07
  5. excalibur

    A web interface to extract tabular data from PDFs (by camelot-dev)

  6. extrakto

    extrakto for tmux - quickly select, copy/insert/complete text without a mouse

    Project mention: Extrakto for tmux – fuzzy find your text instead of selecting it by hand | news.ycombinator.com | 2025-03-14
  7. URLExtract

    URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.

  8. open-mcr

    :pencil: Exam bubble sheet scorer. Created with OpenCV and Python.

  9. InfluxDB

    InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.

    InfluxDB logo
  10. icoextract

    Extract icons from Windows PE files (.exe/.dll)

  11. pdf2doi

    A python library/command-line tool to extract the DOI or other identifiers of a scientific paper from a pdf file.

  12. NewPipePlaylistExtractor

    Download your NewPipe created playlists as mp3, wav or other codec and listen to it offline. It is also possible to export the playlists as CSV, M3U8 or other text formats.

  13. docxlatex

    A python library for extracting equations, text, and images from .docx files

  14. MPKExtractor

    Simple extractor script for Diablo Immortal's .MPK files

  15. Spooq

  16. AutomaticDemuxer

    Automatically Demux tracks from media-files

  17. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Extract discussion

Log in or Post with

Python Extract related posts

  • Extrakto for tmux – fuzzy find your text instead of selecting it by hand

    1 project | news.ycombinator.com | 14 Mar 2025
  • PDF Extract API Using Ollama with Anonymization and PII Removal

    1 project | news.ycombinator.com | 7 Jan 2025
  • How can I scrape every .sensorpanel attachment from this thread?

    1 project | /r/DataHoarder | 5 Dec 2023
  • Ratarmount: Random Access Tar Mount

    1 project | news.ycombinator.com | 14 May 2023
  • Figured out how to combine Google Earth tiles into a single glTF, load it into Blender or any game engine like PlayCanvas

    2 projects | /r/computergraphics | 13 May 2023
  • How to UV unwrap a large object (1 million verts) using Smart UV Project? Would love to have it all unwrapped on one object if possible...

    2 projects | /r/blenderhelp | 26 Nov 2022
  • Ratarmount – Fast transparent access to archives through FUSE

    2 projects | news.ycombinator.com | 10 Mar 2022
  • A note from our sponsor - CodeRabbit
    coderabbit.ai | 29 Apr 2025
    Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →

Index

What are some of the best open-source Extract projects in Python? This list will help you:

# Project Stars
1 video-subtitle-extractor 7,109
2 dlt 3,516
3 text-extract-api 2,557
4 excalibur 1,651
5 extrakto 952
6 URLExtract 253
7 open-mcr 173
8 icoextract 123
9 pdf2doi 113
10 grablinks 24
11 NewPipePlaylistExtractor 22
12 docxlatex 14
13 MPKExtractor 11
14 Spooq 8
15 AutomaticDemuxer 2

Sponsored
Save 47% on cloud hosting with autoscaling that just works
Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
judoscale.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?