Extract

Open-source projects categorized as Extract

Top 23 Extract Open-Source Projects

  • video-subtitle-extractor

    视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

  • SwiftSoup

    SwiftSoup: Pure Swift HTML Parser, with best of DOM, CSS, and jquery (Supports Linux, iOS, Mac, tvOS, watchOS)

  • Project mention: Does iOS application development platform support HTML rendering? | /r/iOSProgramming | 2023-05-30

    For parsing there is this amazing library, but again, that's only for parsing HTML, not rendering anything: https://github.com/scinfu/SwiftSoup

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • archiver

    Easily create & extract archives, and compress & decompress files of various formats

  • camelot

    Camelot: PDF Table Extraction for Humans (by atlanhq)

  • Project mention: How do you parse tables in PDF with langchain? Especially, the context which is few lines above and below the table. | /r/LangChain | 2023-06-23
  • pdfsam

    PDFsam, a desktop application to split, merge, mix, rotate PDF files and extract pages

  • Project mention: pdfsam VS cpdf-binaries - a user suggested alternative | libhunt.com/r/pdfsam | 2023-08-18
  • Jailer

    Database Subsetting and Relational Data Browsing Tool.

  • Project mention: Jailer – open-source database client | news.ycombinator.com | 2024-04-30
  • UtinyRipper

    GUI and API library to work with Engine assets, serialized and bundle files

  • Project mention: How to make mods? | /r/SunHaven | 2023-06-09
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Python

    This extension is now maintained in the Microsoft fork.

  • dlt

    data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

  • Project mention: Ask HN: Freelancer? Seeking freelancer? (December 2023) | news.ycombinator.com | 2023-12-03

    SEEKING FREELANCER | REMOTE | GERMANY

    dltHub is looking for a freelance help in the following repos:

    - https://github.com/dlt-hub/dlt

  • excalibur

    A web interface to extract tabular data from PDFs (by camelot-dev)

  • Project mention: Ask HN: What's a good library/command line tool to extract tables from PDFs? | news.ycombinator.com | 2023-06-10

    have not tried it, but this has been in my bookmarks a while: https://github.com/camelot-dev/excalibur

  • vscode-glean

    The extension provides refactoring tools for your React codebase

  • article-extractor

    To extract main article from given URL with Node.js

  • Project mention: ScrapeGraphAI: Web scraping using LLM and direct graph logic | news.ycombinator.com | 2024-05-07

    Agreed!

    Apify's Website Content Crawler[0] does a decent job of this for most websites in my experience. It allows you to "extract" content via different built-in methods (e.g. Extractus [1]).

    We currently use this at Magic Loops[2] and it works _most_ of the time.

    The long-tail is difficult though, and it's not uncommon for users to back out to raw HTML, and then have our tool write some custom logic to parse the content they want from the scraped results (fun fact: before GPT-4 Turbo, the HTML page was often too large for the context window... and sometimes it still is!).

    Would love a dedicated tool for this. I know the folks at Reworkd[3] are working on something similar, but not sure how much is public yet.

    [0] https://apify.com/apify/website-content-crawler

    [1] https://github.com/extractus/article-extractor

    [2] https://magicloops.dev/

    [3] https://reworkd.ai/

  • download

    Download and extract files (by kevva)

  • lessmsi

    A tool to view and extract the contents of an Windows Installer (.msi) file.

  • extrakto

    extrakto for tmux - quickly select, copy/insert/complete text without a mouse

  • extract-xiso

    Xbox ISO Creation/Extraction utility. Imported from SourceForge.

  • Project mention: Playing SWBFII mods with XEMU (Tutorial) | /r/xemu | 2023-12-07
  • datashare

    A self-hosted search engine for documents.

  • decompress

    Extracting archives made easy

  • extract-loader

    webpack loader to extract HTML and CSS from the bundle

  • color.js

    Extract colors from an image (0.75 KB) 🎨 (by luukdv)

  • libarchivejs

    Archive library for browsers

  • MFT_Browser

    $MFT directory tree reconstruction & FILE record info

  • sling-cli

    Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.

  • Project mention: FLaNK 04 March 2024 | dev.to | 2024-03-04
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Extract related posts

  • Improve your productivity with regexp-it-cli

    1 project | dev.to | 5 May 2024
  • Jailer – open-source database client

    1 project | news.ycombinator.com | 30 Apr 2024
  • Show HN: Jailer is a unique open-source database client tool

    1 project | news.ycombinator.com | 14 Mar 2024
  • Database browsing tool with sophisticated and animated Java Swing UI

    1 project | news.ycombinator.com | 10 Feb 2024
  • Jailer is a unique open-source database client tool

    1 project | news.ycombinator.com | 18 Jan 2024
  • How can I scrape every .sensorpanel attachment from this thread?

    1 project | /r/DataHoarder | 5 Dec 2023
  • Jailer, a unique open-source database tool

    1 project | news.ycombinator.com | 6 Sep 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 20 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Extract projects? This list will help you:

Project Stars
1 video-subtitle-extractor 4,943
2 SwiftSoup 4,341
3 archiver 4,244
4 camelot 3,553
5 pdfsam 3,121
6 Jailer 2,715
7 UtinyRipper 2,703
8 Python 2,069
9 dlt 1,792
10 excalibur 1,483
11 vscode-glean 1,455
12 article-extractor 1,416
13 download 1,273
14 lessmsi 1,221
15 extrakto 814
16 extract-xiso 607
17 datashare 549
18 decompress 409
19 extract-loader 316
20 color.js 278
21 libarchivejs 277
22 MFT_Browser 280
23 sling-cli 273

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com