SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Parsing Projects
-
Project mention: Advanced Pydantic: Generic Models, Custom Types, and Performance Tricks | dev.to | 2025-05-05
Across this five-post series, we’ve journeyed from Pydantic’s basics—type validation and nested models—to advanced integrations with FastAPI, SQLAlchemy, and scalable techniques. You’ve learned how to build declarative, type-safe models, handle complex APIs, and optimize performance. To deepen your knowledge, explore the Pydantic documentation, contribute to the open-source project, or experiment with real-world use cases. Check out our GitHub repo for code samples and a Pydantic cheat sheet. Thank you for joining us—happy coding!
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
Project mention: Maigret collects a dossier on a person by username only | news.ycombinator.com | 2024-12-11
-
Project mention: How I Learned Generative AI in Two Weeks (and You Can Too): Part 3 - Prompts & Models | dev.to | 2025-05-14
Notebook for example 3: prompts and models
-
For instace Llamaparse(https://docs.llamaindex.ai/en/stable/llama_cloud/llama_parse...)uses LLMs for pdf text extraction, but the problem is hallucination. e.g > https://github.com/run-llama/llama_parse/issues/420
There is also LLMWhisperer that preserves the layout(tables, checkboxes, forms)and hence the context. https://pg.llmwhisperer.unstract.com/
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
-
facexlib
FaceXlib aims at providing ready-to-use face-related functions based on current STOA open-source methods.
-
-
WhatsApp-Chat-Exporter
A cross-platform tool for parsing WhatsApp chat databases from Android and iOS/iPadOS backups. Supports Android .crypt12, .crypt14, .crypt15, and the latest database formats. Outputs chat history in readable HTML or structured JSON.
-
FormatFuzzer
FormatFuzzer is a framework for high-efficiency, high-quality generation and parsing of binary inputs.
-
-
-
-
-
-
Project mention: Yacv (Yet Another Compiler Visualizer): LL and LR Parser Animations | news.ycombinator.com | 2024-06-21
-
-
-
tree-hugger
A light-weight, extendable, high level, universal code parser built on top of tree-sitter
> Nienders concluded that this was due to the difference in the information available. Sophy had information about the track curvature of the upcoming 6 seconds of track, based on the current speed. TMRL, however, only had distance measurements from the LIDAR. While the TMRL program could plan for the next turn, it could not plan two turns ahead, and this fundamentally limited the program to mere safe driving, avoiding walls and crashes, but never optimizing.
I think that point is an important one. ML algorithms work better when they are given better context. Especially in programming, it is clear the models are trained on code, rather than repositories. They know about files and repositories, but i always get the impression that they are totally clueless about whole programs.
What could be done better in code, is provide in training more data about where each function is located in the project, some other files where similar functions are defined or called and so on. In general before each code is fed into the training, to do a little bit of data mining in the project like the tree-hugger project [1] enables. Tree-hugger is a little bit older code, and tree-sitter has advanced a lot the last 4 years.
In my opinion 5x to 10x in code, is within reach, with no need to increase GPU compute or electricity.
[1] https://github.com/autosoft-dev/tree-hugger
-
dataconf
Simple dataclasses configuration management for Python with hocon/json/yaml/properties/env-vars/dict/cli support.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Parsing discussion
Python Parsing related posts
-
How I Learned Generative AI in Two Weeks (and You Can Too): Part 3 - Prompts & Models
-
Advanced Pydantic: Generic Models, Custom Types, and Performance Tricks
-
Replace OCR with Vision Language Models
-
Build your next AI Tech Startup with DeepSeek
-
Maigret collects a dossier on a person by username only
-
Firefox will consider a Rust implementation of JPEG-XL
-
Checkbox Extraction from PDFs - A Tutorial
-
A note from our sponsor - SaaSHub
www.saashub.com | 16 May 2025
Index
What are some of the best open-source Parsing projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | pydantic | 23,837 |
2 | maigret | 15,232 |
3 | llmware | 13,227 |
4 | llama_cloud_services | 3,964 |
5 | Maya | 3,414 |
6 | dateutil | 2,455 |
7 | pyparsing | 2,319 |
8 | plaso | 1,836 |
9 | pydantic-core | 1,581 |
10 | facexlib | 890 |
11 | socid-extractor | 804 |
12 | WhatsApp-Chat-Exporter | 730 |
13 | FormatFuzzer | 414 |
14 | py-pdf-parser | 403 |
15 | pytago | 390 |
16 | funcparserlib | 351 |
17 | wikitextparser | 309 |
18 | OpenSIEM-Logstash-Parsing | 184 |
19 | yacv | 159 |
20 | parglare | 140 |
21 | arxiv-miner | 133 |
22 | tree-hugger | 126 |
23 | dataconf | 81 |