Python Text processing

Open-source Python projects categorized as Text processing

Top 23 Python Text processing Projects

  • pydantic

    Data validation using Python type hints

    Project mention: popularity behind pydantic | | 2023-03-24

    I did read this ... Pydantic Docs.

  • fuzzywuzzy

    Fuzzy String Matching in Python

    Project mention: Need help solving a subtitles problem. The logic seems complex | | 2023-01-19

    Do fuzzy matching (something like fuzzywuzzy maybe) to see if the the words line up (allowing for wrong words). You'll need to work out how to use scoring to work out how well aligned the two lists are.

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • diff-match-patch

    Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.

    Project mention: Form editing, changelogs, and progressive diffing - am I reinventing the wheel? | | 2022-08-06

    Outside of that, to get the diffs there is a library called diff-match-patch that has implementations in most languages. Your data model / state tracking sounds like it matches the internal constraints.

  • 汉字拼音转换工具(Python 版)


  • Lark

    Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

    Project mention: can you create your own program language in python, if yes how? | | 2023-03-12

    Lark is a good library to assist with this.

  • ftfy

    Fixes mojibake and other glitches in Unicode text, after the fact.

    Project mention: 7 Useful Python Libraries You Should Use in Your Next Project | | 2022-11-23


  • phonenumbers

    Python port of Google's libphonenumber

    Project mention: Python: Data validation | | 2023-01-20

    Validating a phone number using phonenumbers

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

  • sqlparse

    A non-validating SQL parser module for Python

    Project mention: Data Load Diagram | | 2023-03-06

    Gotcha, since we haven't actually written all of this yet I don't have any useful code snippets to share but we've discussed tackling the problem internally using something like sqlparse. You'd need to identify the relevant sql chunks, parse them for table dependency information and then create the relevant entities in whichever data lineage tool you were using.

  • TextDistance

    Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

    Project mention: textdistance: Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage. | | 2022-08-04
  • PLY

    Python Lex-Yacc

    Project mention: Why is the grammar that I defined does not use tokens? (LEX/YACC/python) | | 2022-05-20 you can find yacc and lex files here

  • chardet

    Python character encoding detector

    Project mention: After almost a year, Ben Eater is back | | 2022-11-05
  • shortuuid

    A generator library for concise, unambiguous and URL-safe UUIDs.

    Project mention: Short, friendly base32 slugs from timestamps | | 2023-01-18

    I use shortuuid[0] for that stuff, which also omits the capital letter I, and has some other niceties (I wrote the library). It works really well, and I like how small the IDs are.


  • jellyfish

    🪼 a python library for doing approximate and phonetic matching of strings.

  • pyparsing

    Python library for creating PEG parsers

    Project mention: Need help developing an interpreter | | 2023-03-07

    Look into "parser combinators" for building an interpreter. There's a few ones out there, but PyParsing is one I've seen around that looks pretty nifty.

  • python-user-agents

    A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

  • python-slugify

    Returns unicode slugs

  • pyfiglet

    An implementation of figlet written in Python

    Project mention: CS50P WEEK 4 FRANK, IAN AND GLENS LETTERS PSET. | | 2022-11-17

    Sorry, just noticed you said you went to github. The way I found it was by going to the intro page (I don't really consider what's you're quoting to be docs), then going to github and then explored that. You yourself slightly missed it on github, it is in the the file: . There's also another docs file in that project but that is mostly for developers that are working on library itself, not users.

  • Construct

    Construct: Declarative data structures for python that allow symmetric parsing and building

    Project mention: MPK mini MK3 not working | | 2023-01-18
  • xpinyin

    Translate Chinese hanzi to pinyin (拼音) by Python, 汉字转拼音

  • python-nameparser

    A simple Python module for parsing human names into their individual components

    Project mention: Least expensive way to find a partial match in database query | | 2022-08-08

    Something like ?

  • awesome-slugify

    Python flexible slugify function

  • Charset Normalizer

    Truly universal encoding detector in pure Python

  • unicode-slugify

    A slugifier that works in unicode

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-03-24.

Python Text processing related posts


What are some of the best open-source Text processing projects in Python? This list will help you:

Project Stars
1 pydantic 12,974
2 fuzzywuzzy 8,820
3 diff-match-patch 6,089
4 汉字拼音转换工具(Python 版) 4,252
5 Lark 3,654
6 ftfy 3,465
7 phonenumbers 3,194
8 sqlparse 3,180
9 TextDistance 3,077
10 PLY 2,446
11 chardet 1,870
12 shortuuid 1,828
13 jellyfish 1,809
14 pyparsing 1,789
15 python-user-agents 1,329
16 python-slugify 1,316
17 pyfiglet 1,123
18 Construct 804
19 xpinyin 785
20 python-nameparser 583
21 awesome-slugify 471
22 Charset Normalizer 362
23 unicode-slugify 315
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives