Python Parsing

Open-source Python projects categorized as Parsing

Top 23 Python Parsing Projects

  1. pydantic

    Data validation using Python type hints

    Project mention: Advanced Pydantic: Generic Models, Custom Types, and Performance Tricks | dev.to | 2025-05-05

    Across this five-post series, we’ve journeyed from Pydantic’s basics—type validation and nested models—to advanced integrations with FastAPI, SQLAlchemy, and scalable techniques. You’ve learned how to build declarative, type-safe models, handle complex APIs, and optimize performance. To deepen your knowledge, explore the Pydantic documentation, contribute to the open-source project, or experiment with real-world use cases. Check out our GitHub repo for code samples and a Pydantic cheat sheet. Thank you for joining us—happy coding!

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. maigret

    🕵️‍♂️ Collect a dossier on a person by username from thousands of sites

    Project mention: Maigret collects a dossier on a person by username only | news.ycombinator.com | 2024-12-11
  4. llmware

    Unified framework for building enterprise RAG pipelines with small, specialized models

    Project mention: How I Learned Generative AI in Two Weeks (and You Can Too): Part 3 - Prompts & Models | dev.to | 2025-05-14

    Notebook for example 3: prompts and models

  5. llama_cloud_services

    Knowledge Agents and Management in the Cloud

    Project mention: Parsing PDFs (and more) in Elixir using Rust | news.ycombinator.com | 2025-01-29

    For instace Llamaparse(https://docs.llamaindex.ai/en/stable/llama_cloud/llama_parse...)uses LLMs for pdf text extraction, but the problem is hallucination. e.g > https://github.com/run-llama/llama_parse/issues/420

    There is also LLMWhisperer that preserves the layout(tables, checkboxes, forms)and hence the context. https://pg.llmwhisperer.unstract.com/

  6. Maya

    Datetimes for Humans™

  7. dateutil

    Useful extensions to the standard Python datetime features

  8. pyparsing

    Python library for creating PEG parsers

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. plaso

    Super timeline all the things

  11. pydantic-core

    Core validation logic for pydantic written in rust

  12. facexlib

    FaceXlib aims at providing ready-to-use face-related functions based on current STOA open-source methods.

  13. socid-extractor

    ⛏️ Extract accounts info from personal pages on various sites for OSINT purpose

  14. WhatsApp-Chat-Exporter

    A cross-platform tool for parsing WhatsApp chat databases from Android and iOS/iPadOS backups. Supports Android .crypt12, .crypt14, .crypt15, and the latest database formats. Outputs chat history in readable HTML or structured JSON.

  15. FormatFuzzer

    FormatFuzzer is a framework for high-efficiency, high-quality generation and parsing of binary inputs.

  16. py-pdf-parser

    A Python tool to help extracting information from structured PDFs.

  17. pytago

    A source-to-source transpiler for Python to Go translation

  18. funcparserlib

    Recursive descent parsing library for Python based on functional combinators

  19. wikitextparser

    A Python library to parse MediaWiki WikiText

  20. OpenSIEM-Logstash-Parsing

    SIEM Logstash parsing for more than hundred technologies

  21. yacv

    Yet Another Compiler Visualizer

    Project mention: Yacv (Yet Another Compiler Visualizer): LL and LR Parser Animations | news.ycombinator.com | 2024-06-21
  22. parglare

    A pure Python LR/GLR parser - http://www.igordejanovic.net/parglare/

  23. arxiv-miner

    arxiv_miner is a toolkit for mining research papers on CS ArXiv.

  24. tree-hugger

    A light-weight, extendable, high level, universal code parser built on top of tree-sitter

    Project mention: The History of Machine Learning in Trackmania | news.ycombinator.com | 2024-07-03

    > Nienders concluded that this was due to the difference in the information available. Sophy had information about the track curvature of the upcoming 6 seconds of track, based on the current speed. TMRL, however, only had distance measurements from the LIDAR. While the TMRL program could plan for the next turn, it could not plan two turns ahead, and this fundamentally limited the program to mere safe driving, avoiding walls and crashes, but never optimizing.

    I think that point is an important one. ML algorithms work better when they are given better context. Especially in programming, it is clear the models are trained on code, rather than repositories. They know about files and repositories, but i always get the impression that they are totally clueless about whole programs.

    What could be done better in code, is provide in training more data about where each function is located in the project, some other files where similar functions are defined or called and so on. In general before each code is fed into the training, to do a little bit of data mining in the project like the tree-hugger project [1] enables. Tree-hugger is a little bit older code, and tree-sitter has advanced a lot the last 4 years.

    In my opinion 5x to 10x in code, is within reach, with no need to increase GPU compute or electricity.

    [1] https://github.com/autosoft-dev/tree-hugger

  25. dataconf

    Simple dataclasses configuration management for Python with hocon/json/yaml/properties/env-vars/dict/cli support.

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Parsing discussion

Log in or Post with

Python Parsing related posts

  • How I Learned Generative AI in Two Weeks (and You Can Too): Part 3 - Prompts & Models

    1 project | dev.to | 14 May 2025
  • Advanced Pydantic: Generic Models, Custom Types, and Performance Tricks

    1 project | dev.to | 5 May 2025
  • Replace OCR with Vision Language Models

    7 projects | news.ycombinator.com | 26 Feb 2025
  • Build your next AI Tech Startup with DeepSeek

    6 projects | dev.to | 3 Feb 2025
  • Maigret collects a dossier on a person by username only

    1 project | news.ycombinator.com | 11 Dec 2024
  • Firefox will consider a Rust implementation of JPEG-XL

    2 projects | news.ycombinator.com | 4 Sep 2024
  • Checkbox Extraction from PDFs - A Tutorial

    3 projects | dev.to | 16 Jul 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 16 May 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Parsing projects in Python? This list will help you:

# Project Stars
1 pydantic 23,837
2 maigret 15,232
3 llmware 13,227
4 llama_cloud_services 3,964
5 Maya 3,414
6 dateutil 2,455
7 pyparsing 2,319
8 plaso 1,836
9 pydantic-core 1,581
10 facexlib 890
11 socid-extractor 804
12 WhatsApp-Chat-Exporter 730
13 FormatFuzzer 414
14 py-pdf-parser 403
15 pytago 390
16 funcparserlib 351
17 wikitextparser 309
18 OpenSIEM-Logstash-Parsing 184
19 yacv 159
20 parglare 140
21 arxiv-miner 133
22 tree-hugger 126
23 dataconf 81

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com