Python Parser

Open-source Python projects categorized as Parser

Top 23 Python Parser Projects

  1. MinerU

    A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

    Project mention: AIM Weekly for 23 September 2024 | dev.to | 2024-09-23

    🚀 P by @ Tim Spann 🚀 Air by @ Tim Spann 🤖 Milvus by @ Tim Spann 💰 AIM ADS-B 👽 Milvus RAG 🍿PDF Processing

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. pydantic

    Data validation using Python type hints

    Project mention: FastAPI, Pydantic, Psycopg3: the holy trinity for Python web APIs | dev.to | 2024-10-24

    Pydantic is bundled with FastAPI and is excellent for modelling, validating, and serialising API responses.

  4. sqlglot

    Python SQL Parser and Transpiler

    Project mention: Writing Composable SQL Using Knex and Pipelines | news.ycombinator.com | 2024-11-28

    You can compose SQL with https://ibis-project.org/tutorials/ibis-for-sql-users, which is using https://github.com/tobymao/sqlglot to parse the SQL under the hood.

    As an alternative to parsing the SQL yourself, DuckDB's [relational API](https://duckdb.org/docs/api/python/relational_api) allows you to compose SQL expressions efficiently and lazily, which I've used when playing around with thinks like https://gist.github.com/ajfriend/eea0795546c7c44f1c24ab0560a...

  5. pdfminer.six

    Community maintained fork of pdfminer - we fathom PDF

  6. Lark

    Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

  7. MegaParse

    File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.

    Project mention: AI and All Data Weekly for 09 Dec 2024 | dev.to | 2024-12-09

    ❄️ Apache Polaris + Iceberg Quickstart ⚡️ How to extract tables from pdfs 🚀 Microsoft 1bit LLM BitNet 🐿️ Verifying Kafka Transactions Entry 2 🐿️ FLUSS: Streaming Storage 🐿️ Fluss -> Flow for Flink Real Time Analytics 🌐 TableFlow - iceberg / kafka ❄️ Snowflake Cortex AI + Slack 🐿️❄️ Door dash flink, kafka, snowflake 🧠 Prompt Stack -- all in one 🔌 SpaCY Layout for PDF 📱 Responsible AI Pathways 📼 Megaparse documents python 🔌 Time Series LLM ❄️ Generate Synthetic Data in Snowflake 🐿️ LLMs and GenAI - When to use them 🐿️ Flink Observability with Prometheus 📡 New SQL GUI 🍫 TDD for GenAI 🕵️ 🎁 Open Source Agent Framework for Production 💻 Cedit command line editor 🏭 ServiceNow AgentLab 🎤 Snowflake Lessons Learned in Replication 🎄 Privastead 🔌 Backup Icloud with nodejs on linux 🔌 Backup Google with nodejs on linux 🎄 HuggingFace macos chat source code 🎁 Ollama working with structured output 🎁 dspy ai how to 🔌 Piazza updater 🔌 Building a financial report with langgraph ColPali Notebook with QWEN 2 VL

  8. sqlparse

    A non-validating SQL parser module for Python

  9. phonenumbers

    Python port of Google's libphonenumber

  10. snoop

    Snoop — инструмент разведки на основе открытых данных (OSINT world)

  11. oletools

    oletools - python tools to analyze MS OLE2 files (Structured Storage, Compound File Binary Format) and MS Office documents, for malware analysis, forensics and debugging.

    Project mention: Paper sospechoso - CTF - PWNEDCR0x7 | dev.to | 2024-11-10
  12. PLY

    Python Lex-Yacc

    Project mention: Adding Syntax to the CPython Interpreter | news.ycombinator.com | 2024-10-19

    I don't think PLY can handle the newest Python grammar, but I haven't looked into it.

    For what it's worth, my Python grammar was based on an older project I did called GardenSnake, which still available as a PLY example, at https://github.com/dabeaz/ply/tree/master/example/GardenSnak... .

    I've been told it's was one of the few examples of how to handle an indentation-based grammar using a yacc-esque parser generator.

  13. msgspec

    A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML

    Project mention: Don't let dicts spoil your code | news.ycombinator.com | 2024-10-09

    gjson [1] and a few other go packages offer a way to parse arbitrary JSON without requiring structs to hold them.

    re: Python. I like PyRight/PyLance for Python typing, it seems to "just work" afaict. I also like msgspec for dataclass like behavior [2].

    ---

    1: https://github.com/tidwall/gjson

    2: https://jcristharif.com/msgspec/

  14. rdflib

    RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.

  15. m3u8

    Python m3u8 Parser for HTTP Live Streaming (HLS) Transmissions

  16. typeguard

    Run-time type checker for Python

    Project mention: Python type hints may not be not for me in practice | news.ycombinator.com | 2024-11-27

    You want runtime typechecking.

    See either beartype [1] or typeguard [2]. And if you're doing any kind of array-based programming (JAX or not), then jaxtyping [3].

    [1] https://github.com/beartype/beartype/

    [2] https://github.com/agronholm/typeguard

    [3] https://github.com/patrick-kidger/jaxtyping

  17. strictyaml

    Type-safe YAML parser and validator.

    Project mention: Unit Tests as Documentation | news.ycombinator.com | 2024-10-17

    I dont think it is about discipline. At its core, a good test will take an example and do something with it to demonstrate an outcome.

    That's exactly what how to docs do - probably with the same examples.

    Logically, they should be the same thing.

    For example:

    https://github.com/crdoconnor/strictyaml/blob/master/hitch/s...

    And:

    https://hitchdev.com/strictyaml/using/alpha/scalar/email-and...

    You just need a (non turing complete) language that is dual use - it generates docs and runs tests.

  18. python-user-agents

    A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

  19. json_repair

    A python module to repair invalid JSON, commonly used to parse the output of LLMs

    Project mention: Crafting Structured {JSON} Responses: Ensuring Consistent Output from any LLM 🦙🤖 | dev.to | 2024-09-15

    Generating perfectly formatted JSON from LLMs is a common yet challenging task. By guiding the JSON syntax, communicating its usage, and using validation tools like json-fixer, we can significantly improve the consistency and reliability of the output. By combining clear instructions, strategic prompting, and robust validation, we can transform our LLM interactions from a gamble into a reliable pipeline for structured data. That's all for the day folks, Stay informed, iterate, and refine your approach to master the art of JSON generation from any LLM.

  20. cinemagoer

    Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies

  21. ViperMonkey

    A VBA parser and emulation engine to analyze malicious macros.

  22. godot-gdscript-toolkit

    Independent set of GDScript tools - parser, linter, formatter, and more

    Project mention: Godot Rust CI: Handy GDScript & Rust GitHub Actions | dev.to | 2024-08-15

    The project uses a mix of Rust and GDScript, and for linting the GDScript code in GitHub actions, I use godot-gdscript-toolkit.

  23. Construct

    Construct: Declarative data structures for python that allow symmetric parsing and building

  24. wiktextract

    Wiktionary dump file parser and multilingual data extractor

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Parser discussion

Log in or Post with

Python Parser related posts

  • HOCON - secret behind .conf files

    1 project | dev.to | 19 Dec 2024
  • QuivrHQ/MegaParse: File Parser Optimised for LLM Ingestion with No Loss

    1 project | news.ycombinator.com | 7 Dec 2024
  • MegaParse – File Parser Optimised for LLM Ingestion

    1 project | news.ycombinator.com | 5 Dec 2024
  • Show HN: RasperDucky, an Implementation of DuckyScript3 for Raspberry Pico

    3 projects | news.ycombinator.com | 2 Nov 2024
  • Adding Syntax to the CPython Interpreter

    4 projects | news.ycombinator.com | 19 Oct 2024
  • Checkbox Extraction from PDFs - A Tutorial

    3 projects | dev.to | 16 Jul 2024
  • Show HN: Griffe, load Python APIs, find breaking changes

    1 project | news.ycombinator.com | 15 Jul 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 19 Jan 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Parser projects in Python? This list will help you:

# Project Stars
1 MinerU 24,613
2 pydantic 22,019
3 sqlglot 6,998
4 pdfminer.six 6,139
5 Lark 5,045
6 MegaParse 4,950
7 sqlparse 3,796
8 phonenumbers 3,537
9 snoop 3,125
10 oletools 2,975
11 PLY 2,815
12 msgspec 2,539
13 rdflib 2,189
14 m3u8 2,094
15 typeguard 1,576
16 strictyaml 1,503
17 python-user-agents 1,449
18 json_repair 1,382
19 cinemagoer 1,243
20 ViperMonkey 1,065
21 godot-gdscript-toolkit 1,052
22 Construct 934
23 wiktextract 847

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?