Python Parsing

Open-source Python projects categorized as Parsing | Edit details

Top 18 Python Parsing Projects

  • pydantic

    Data parsing and validation using Python type hints

    Project mention: Strict Python Function Parameters | news.ycombinator.com | 2022-01-23

    Slightly off-topic, but everyone writing modern Python should be familiar with Pydantic and similar libraries that use type hints for validation and parsing:

    https://pydantic-docs.helpmanual.io/

    https://fastapi.tiangolo.com/

    https://github.com/tiangolo/typer

    We're using Pydantic for Robusta (https://github.com/robusta-dev/robusta) and absolutely love it. You get the best of traditional Python (rapid prototyping and no boilerplate) while still being able to scale your codebase and keep it maintainable. Robusta is the first large project I've written in Python where I'm not encountering type errors at runtime left and right.

  • Maya

    Datetimes for Humans™

  • OPS

    OPS - Build and Run Open Source Unikernels. Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.

  • dateutil

    Useful extensions to the standard Python datetime features

    Project mention: [python] dateutil exception handling making zero sense | reddit.com/r/learnprogramming | 2021-06-10

    The parser module gives direct access to parser.ParserError, which could be accessed by way of parser.ParserError.

  • pyparsing

    Python library for creating PEG parsers

    Project mention: Parser Combinators in Haskell | news.ycombinator.com | 2021-12-22

    Since it is not mentioned in the article: Python users may also want to check out pyparsing [0]. It is slightly different from Parsec/FParsec (for instance, it ignores all whitespace by default), but I think it is a really good project.

    [0]: https://github.com/pyparsing/pyparsing/

  • plaso

    Super timeline all the things

    Project mention: Solving a child porn case (student environment) | reddit.com/r/computerforensics | 2021-10-23

    My advice would be to go through a timeline to assert the activity before and after these files "appeared" . This can be done in log2timeline / plaso , this script can parse the raw image (or e01 or whatever you have) and build a timeline , parse it and sort it. Also look for lnk files and shellbags to see if the files were opened , used etc.

  • maigret

    🕵️‍♂️ Collect a dossier on a person by username from thousands of sites

    Project mention: How would I go about removing as much information as you can from the internet? | reddit.com/r/opsec | 2021-07-30

    For finding accounts that are not visible in Google search I recommend running https://github.com/soxoj/maigret

  • FormatFuzzer

    FormatFuzzer is a framework for high-efficiency, high-quality generation and parsing of binary inputs.

    Project mention: FormatFuzzer: A framework for efficient and quality generation of binary inputs | news.ycombinator.com | 2021-10-31
  • SonarQube

    Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.

  • funcparserlib

    Recursive descent parsing library for Python based on functional combinators

    Project mention: Zig, Parser Combinators – and Why They're | news.ycombinator.com | 2021-03-10
  • wikitextparser

    A simple WikiText parsing library for MediaWiki

    Project mention: Updated: I've saved all of Wikipedia into a SQLITE database! | reddit.com/r/DataHoarder | 2021-04-06

    The use of regex seems inefficient, is there any reason why you didn't start with lxml or a purpose built parser like wikitextparser?

  • py-pdf-parser

    A Python tool to help extracting information from structured PDFs.

    Project mention: Extract text from PDF | reddit.com/r/Python | 2021-11-02

    I'd recommend trying py-pdf-parser [0] - it allows you to fetch data from documents based on text "markers". E.g. you can easily find data, located to the right of "EMAL FROM:" text [0] - https://github.com/jstockwin/py-pdf-parser

  • yacv

    Yet Another Compiler Visualizer

    Project mention: I made a parser visualizer using manim | news.ycombinator.com | 2021-03-07

    The README file will link you to the demo video on YouTube [1]. There is also a working example section [2] on the landing page of documentation which shows what all visualizations yacv can produce given a CFG and a string

    [1]: https://www.youtube.com/watch?v=BozB0O0__Qg

    [2]: https://ashutoshbsathe.github.io/yacv/#working-example

  • OpenSIEM-Logstash-Parsing

    SIEM Logstash parsing for more than hundred technologies

    Project mention: The Cargill SIEM team has published this new project with a collection of logstash parser configs developed in house for multiple technologies. Logstash parsers are usually scattered around in gists and repos but this is a very comprehensive library in a single project! | reddit.com/r/logstash | 2021-03-25
  • tree-hugger

    A light-weight, extendable, high level, universal code parser built on top of tree-sitter

    Project mention: Tree Sitter and the Complications of Parsing Languages | news.ycombinator.com | 2021-11-24

    tree-sitter is a great framework. I have used it quite a bit in past. I even created a small library on top of it, called tree-hugger (https://github.com/autosoft-dev/tree-hugger) Really enjoyed their playground as well.

  • arxiv-miner

    arxiv_miner is a toolkit for mining research papers on CS ArXiv.

    Project mention: ArXiv_miner: A toolkit for mining research papers on CS ArXiv | reddit.com/r/CKsTechNews | 2021-05-29
  • python-hslog

    Python module to parse Hearthstone Power.log files

    Project mention: Full Battle Simulator | reddit.com/r/BobsTavern | 2021-09-13

    If you want to stick with python, you can have a look at https://github.com/HearthSim/python-hslog (not owned by me by the way). It matches pretty closely the internal game logs, so you'll probably have to first understand how these work.

  • dataconf

    Simple dataclasses configuration management for Python with hocon/json/yaml/properties/env-vars/dict support.

    Project mention: Show HN: Dataconf, Python dataclasses config (hocon/JSON/YAML/env-vars/dict) | news.ycombinator.com | 2021-11-07
  • tokenstream

    A versatile token stream for handwritten parsers.

    Project mention: vberlier/tokenstream: A versatile token stream for handwritten parsers | reddit.com/r/Python | 2021-06-17

    The repo has examples with some of the generated error messages. https://github.com/vberlier/tokenstream/blob/main/examples/json.py

  • TernaryTerminator

    Changes conditionals nearby to one line ternary statements

    Project mention: no comment | reddit.com/r/ProgrammerHumor | 2022-01-19
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-01-23.

Python Parsing related posts

Index

What are some of the best open-source Parsing projects in Python? This list will help you:

Project Stars
1 pydantic 8,781
2 Maya 3,295
3 dateutil 1,726
4 pyparsing 1,370
5 plaso 1,207
6 maigret 906
7 FormatFuzzer 286
8 funcparserlib 284
9 wikitextparser 181
10 py-pdf-parser 173
11 yacv 119
12 OpenSIEM-Logstash-Parsing 109
13 tree-hugger 88
14 arxiv-miner 87
15 python-hslog 37
16 dataconf 34
17 tokenstream 6
18 TernaryTerminator 0
Find remote jobs at our new job board 99remotejobs.com. There are 30 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
Less time debugging, more time building
Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
scoutapm.com