Python text-extraction

Open-source Python projects categorized as text-extraction

Top 3 Python text-extraction Projects

  • sumy

    Module for automatic summarization of text documents and HTML pages.

  • tika-python

    Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

    Project mention: Document Parsing - an unsolved problem? | reddit.com/r/LanguageTechnology | 2022-07-19

    At my previous job we had the same problem which we solved by using Tika. We called it on the server along with other stuff, but there is also a Python binding.

  • Zigi

    Close all those tabs. Zigi will handle your updates.. Zigi monitors Jira and GitHub updates, pings you when PRs need approval and lets you take fast actions - all directly from Slack! Plus it reduces cycle time by up to 75%.

  • trafilatura

    Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

    Project mention: Testing fast installation in tear-down environment | reddit.com/r/learnpython | 2022-07-06

    I want to test how easy it is to install a package plus special extra dependencies to run a certain script in that package: https://github.com/adbar/trafilatura

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-07-19.

Python text-extraction related posts

Index

What are some of the best open-source text-extraction projects in Python? This list will help you:

Project Stars
1 sumy 2,970
2 tika-python 1,210
3 trafilatura 673
Build time-series-based applications quickly and at scale.
InfluxDB is the Time Series Data Platform where developers build real-time applications for analytics, IoT and cloud-native services in less time with less code.
www.influxdata.com