Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression. Learn more →
Top 23 Python Text processing Projects
-
I did read this ... Pydantic Docs.
-
Project mention: Need help solving a subtitles problem. The logic seems complex | reddit.com/r/learnpython | 2023-01-19
Do fuzzy matching (something like fuzzywuzzy maybe) to see if the the words line up (allowing for wrong words). You'll need to work out how to use scoring to work out how well aligned the two lists are.
-
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
diff-match-patch
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Project mention: Form editing, changelogs, and progressive diffing - am I reinventing the wheel? | reddit.com/r/AskProgramming | 2022-08-06Outside of that, to get the diffs there is a library called diff-match-patch that has implementations in most languages. Your data model / state tracking sounds like it matches the internal constraints.
-
-
Lark
Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
Project mention: can you create your own program language in python, if yes how? | reddit.com/r/Python | 2023-03-12Lark is a good library to assist with this.
-
Project mention: 7 Useful Python Libraries You Should Use in Your Next Project | reddit.com/r/Python | 2022-11-23
ftfy
-
Validating a phone number using phonenumbers
-
InfluxDB
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
-
Gotcha, since we haven't actually written all of this yet I don't have any useful code snippets to share but we've discussed tackling the problem internally using something like sqlparse. You'd need to identify the relevant sql chunks, parse them for table dependency information and then create the relevant entities in whichever data lineage tool you were using.
-
TextDistance
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
Project mention: textdistance: Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage. | reddit.com/r/coding | 2022-08-04 -
Project mention: Why is the grammar that I defined does not use tokens? (LEX/YACC/python) | reddit.com/r/AskProgramming | 2022-05-20
https://github.com/dabeaz/ply/tree/master/ply you can find yacc and lex files here
-
-
I use shortuuid[0] for that stuff, which also omits the capital letter I, and has some other niceties (I wrote the library). It works really well, and I like how small the IDs are.
-
-
Look into "parser combinators" for building an interpreter. There's a few ones out there, but PyParsing is one I've seen around that looks pretty nifty.
-
python-user-agents
A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.
-
-
Sorry, just noticed you said you went to github. The way I found it was by going to the intro page (I don't really consider what's you're quoting to be docs), then going to github and then explored that. You yourself slightly missed it on github, it is in the the __init__.py file: https://github.com/pwaller/pyfiglet/blob/master/pyfiglet/__init__.py . There's also another docs file in that project but that is mostly for developers that are working on library itself, not users.
-
Construct
Construct: Declarative data structures for python that allow symmetric parsing and building
-
-
Project mention: Least expensive way to find a partial match in database query | reddit.com/r/django | 2022-08-08
Something like https://github.com/derek73/python-nameparser ?
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Text processing related posts
- data structures & algorithms resources available with python ?
- can you create your own program language in python, if yes how?
- what is colon (:) operator?
- Data Load Diagram
- Lark a Python lexer/parser library
- Ask HN: Will we see a TypeScript for Python?
- Create your own scripting language in Python with Sly
-
A note from our sponsor - InfluxDB
www.influxdata.com | 26 Mar 2023
Index
What are some of the best open-source Text processing projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | pydantic | 12,974 |
2 | fuzzywuzzy | 8,820 |
3 | diff-match-patch | 6,089 |
4 | 汉字拼音转换工具(Python 版) | 4,252 |
5 | Lark | 3,654 |
6 | ftfy | 3,465 |
7 | phonenumbers | 3,194 |
8 | sqlparse | 3,180 |
9 | TextDistance | 3,077 |
10 | PLY | 2,446 |
11 | chardet | 1,870 |
12 | shortuuid | 1,828 |
13 | jellyfish | 1,809 |
14 | pyparsing | 1,789 |
15 | python-user-agents | 1,329 |
16 | python-slugify | 1,316 |
17 | pyfiglet | 1,123 |
18 | Construct | 804 |
19 | xpinyin | 785 |
20 | python-nameparser | 583 |
21 | awesome-slugify | 471 |
22 | Charset Normalizer | 362 |
23 | unicode-slugify | 315 |