Python Parser

Open-source Python projects categorized as Parser | Edit details

Top 23 Python Parser Projects

  • pydantic

    Data parsing and validation using Python type hints

    Project mention: Strict Python Function Parameters | | 2022-01-23

    Slightly off-topic, but everyone writing modern Python should be familiar with Pydantic and similar libraries that use type hints for validation and parsing:

    We're using Pydantic for Robusta ( and absolutely love it. You get the best of traditional Python (rapid prototyping and no boilerplate) while still being able to scale your codebase and keep it maintainable. Robusta is the first large project I've written in Python where I'm not encountering type errors at runtime left and right.

  • pdfminer.six

    Community maintained fork of pdfminer - we fathom PDF

    Project mention: How should I go about extracting a dataframe from a PDF? | | 2021-11-09
  • OPS

    OPS - Build and Run Open Source Unikernels. Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.

  • Lark

    Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

    Project mention: Made a Programing language using python | | 2021-11-29

    There's also lark, which is used by a plethora of projects (I haven't used it, but I heard about PreQL on a podcast where they talk for a bit about what it's like to develop a new language in lark)

  • phonenumbers

    Python port of Google's libphonenumber

    Project mention: Does anyone know where I can find official docs for python-phonenumbers package? | | 2022-01-12

    This is the GitHub repo for the package.

  • sqlparse

    A non-validating SQL parser module for Python

    Project mention: Open Source SQL Parsers | | 2021-10-08

    Regular expressions is a popular approach to extract information from SQL statements. However, regular expressions quickly become too complex to handle common features like WITH, sub-queries, windows clauses, aliases and quotes. sqlparse is a popular python package that uses regular expressions to parse SQL.

  • PLY

    Python Lex-Yacc

  • rdflib

    RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.

    Project mention: RDFLib equivalent in JavaScript? | | 2021-09-27

    I was wondering if there was an equivalent to RDFLib for Javascript.

  • SonarLint

    Deliver Cleaner and Safer Code - Right in Your IDE of Choice!. SonarLint is a free and open source IDE extension that identifies and catches bugs and vulnerabilities as you code, directly in the IDE. Install from your favorite IDE marketplace today.

  • m3u8

    Python m3u8 Parser for HTTP Live Streaming (HLS) Transmissions

    Project mention: check if m3u8 link is active | | 2021-03-12
  • snoop

    Snoop — инструмент разведки на основе открытых данных (OSINT world) (by snooppr)

    Project mention: FOSS News International #2: November 8-145, 2021 | | 2021-11-15

    Snoop 1.3.1

  • python-user-agents

    A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

  • strictyaml

    Type-safe YAML parser and validator.

    Project mention: Typed Config Languages | | 2022-01-21

    I like the approach of strictyaml. A parser that concentrates on a restricted subset of yaml and allows to use a schema to have a type safe validator.

  • imdbpy

    IMDbPY is a Python package useful to retrieve and manage the data of the IMDb movie database about movies, people, characters and companies

    Project mention: [OC]IMDB Top 30 movies: cast death rate | | 2022-01-17
  • ViperMonkey

    A VBA parser and emulation engine to analyze malicious macros.

    Project mention: De-obfuscation | | 2021-06-02
  • Construct

    Construct: Declarative data structures for python that allow symmetric parsing and building

    Project mention: Binary serialization library for at least C++17? | | 2021-10-10

    I myself am looking for a binary serializer/deserializer that's like construct in python or construct-js, but obviously I wouldn't need some of the types that they have, since C++ already has them.

  • guessit

    GuessIt is a python library that extracts as much information as possible from a video filename.

    Project mention: Small but fast open directory with movies, and Zappa documentary (personal interest) | | 2021-01-31

    Extract movie details (title, year) from the filename with guessit

  • textX

    Domain-Specific Languages and parsers in Python made easy

  • python-nameparser

    A simple Python module for parsing human names into their individual components

  • mwparserfromhell

    A Python parser for MediaWiki wikicode

    Project mention: [Python] How can I clean up Wikipedia's XML backup dump to create dictionaries of commonly used words for multiple languages? | | 2021-10-12

    In particular what you're looking at is not XML but wikitext. I found a discussion on stackoverflow about solving the same problem of getting text from wikitext. Seems like the most promising solution in Python since you already have the dump is to run each page through mwparserfromhell. According to the top stackoverflow answer you could use something like

  • astroid

    A common base representation of python source code for pylint and other projects (by PyCQA)

    Project mention: Klara: Python automatic test generations and static analysis library | | 2021-09-13

    It also provide inference for static analysis purposes, similar to astroid, with SMT support. E.g.

  • bad_json_parsers

    Exposing problems in json parsers of several programming languages.

    Project mention: Parsing JSON is a Minefield 💣 (2018) | | 2021-10-11

    The nginx default is 1MB, which gets you 512 uncompressed nested arrays. That's already beyond the nesting limit of many parsers (see the Results section of that repository README, which documents the limit on many different language libraries).

  • Pyverilog

    Python-based Hardware Design Processing Toolkit for Verilog HDL

    Project mention: How to compare HDL simulation/implementation results to Matlab? | | 2021-06-01


  • whispers

    Identify hardcoded secrets in static structured text (by Skyscanner)

    Project mention: Skyscanner/whispers - Identify hardcoded secrets and dangerous behaviours | | 2021-10-07
  • wiktextract

    Wiktionary dump file parser and multilingual data extractor

    Project mention: What are some of the best digital free dictionaries available online (even for commercial use)? | | 2022-01-02

    Many parsers are available.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-01-23.

Python Parser related posts


What are some of the best open-source Parser projects in Python? This list will help you:

Project Stars
1 pydantic 8,781
2 pdfminer.six 3,325
3 Lark 2,969
4 phonenumbers 2,916
5 sqlparse 2,696
6 PLY 2,090
7 rdflib 1,606
8 m3u8 1,314
9 snoop 1,261
10 python-user-agents 1,228
11 strictyaml 969
12 imdbpy 926
13 ViperMonkey 824
14 Construct 705
15 guessit 683
16 textX 564
17 python-nameparser 516
18 mwparserfromhell 496
19 astroid 374
20 bad_json_parsers 360
21 Pyverilog 344
22 whispers 328
23 wiktextract 295
Find remote jobs at our new job board There are 30 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
Less time debugging, more time building
Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.