Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Python Parsing Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
facexlib
FaceXlib aims at providing ready-to-use face-related functions based on current STOA open-source methods.
-
WhatsApp-Chat-Exporter
A customizable Android and iOS/iPadOS WhatsApp database parser that will give you the history of your WhatsApp conversations in HTML and JSON. Android Backup Crypt12, Crypt14, Crypt15, and new schema supported.
-
FormatFuzzer
FormatFuzzer is a framework for high-efficiency, high-quality generation and parsing of binary inputs.
-
tree-hugger
A light-weight, extendable, high level, universal code parser built on top of tree-sitter
-
dataconf
Simple dataclasses configuration management for Python with hocon/json/yaml/properties/env-vars/dict/cli support.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
First, note the method prefix_allowed_tokens_fn. This method applies a Pydantic model to constrain/guide how the LLM generates tokens. Next, see how that constrain can be applied to txtai's LLM pipeline.
Project mention: More Agents Is All You Need: LLMs performance scales with the number of agents | news.ycombinator.com | 2024-04-06I couldn't agree more. You should check out LLMWare's SLIM agents (https://github.com/llmware-ai/llmware/tree/main/examples/SLI...). It's focusing on pretty much exactly this and chaining multiple local LLMs together.
A really good topic that ties in with this is the need for deterministic sampling (I may have the terminology a bit incorrect) depending on what the model is indended for. The LLMWare team did a good 2 part video on this here as well (https://www.youtube.com/watch?v=7oMTGhSKuNY)
I think dedicated miniture LLMs are the way forward.
Disclaimer - Not affiliated with them in any way, just think it's a really cool project.
Project mention: Using Openpyxl - keep min date, handle line breaks, handle duplicates | /r/learnpython | 2023-05-01Here is an example for a single cell (I'm using the dateutil package to parse the strings):
After over a year since the last release of pyparsing, I've bundled up all the bug-fixes and changes, and they are now released as pyparsing 3.1.0. Visit this link for the details.
Funny that you ask... https://github.com/pydantic/pydantic-core Unfortunately it seems that the functionality you ask for is not (yet) part of this ...
Project mention: stable diffusion downloads something from github when making a image | /r/StableDiffusion | 2023-07-22"https://github.com/xinntao/facexlib/releases/download/v0.1.0/detection_Resnet50_Final.pth"
Project mention: Autogenerating a Book Series from Three Years of iMessages | news.ycombinator.com | 2024-03-07https://github.com/KnugiHK/WhatsApp-Chat-Exporter
Just in case you missed my other comment.
Not my repo.
These are not new, but my takeaways from https://tratt.net/laurie/blog/2020/which_parsing_approach.ht... and https://rust-analyzer.github.io/blog/2020/09/16/challeging-L... are to embrace various forms of LR parsing. https://github.com/igordejanovic/parglare is a very capable GLR parser, and I've been keeping a close eye on it for use in my projects.
Python Parsing related posts
- Advanced RAG with guided generation
- Pydantic v2 ruined the elegance of Pydantic v1
- The Story Continues: Announcing Version 14 of Wolfram Language and Mathematica
- Ask HN: Pydantic has too much deprecation. Why is it popular?
- OpenAI uses Pydantic for their ChatCompletions API
- What If OpenDocument Used SQLite?
- Why my favourite API is a zipfile on the European Central Bank's website
-
A note from our sponsor - InfluxDB
www.influxdata.com | 25 Apr 2024
Index
What are some of the best open-source Parsing projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | pydantic | 18,617 |
2 | maigret | 9,606 |
3 | Maya | 3,402 |
4 | llmware | 3,086 |
5 | dateutil | 2,247 |
6 | pyparsing | 2,086 |
7 | plaso | 1,618 |
8 | pydantic-core | 1,263 |
9 | facexlib | 741 |
10 | socid-extractor | 581 |
11 | WhatsApp-Chat-Exporter | 446 |
12 | FormatFuzzer | 384 |
13 | pytago | 371 |
14 | funcparserlib | 336 |
15 | py-pdf-parser | 335 |
16 | wikitextparser | 268 |
17 | OpenSIEM-Logstash-Parsing | 174 |
18 | yacv | 132 |
19 | parglare | 133 |
20 | tree-hugger | 121 |
21 | arxiv-miner | 111 |
22 | htmldate | 106 |
23 | dataconf | 79 |
Sponsored