SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Parser Open-Source Projects
-
ADR-003 documented the choice of markdown-it over alternatives like marked, based on careful evaluation of edge cases and built-in features like URL linking.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
The system they tested are mostly used as a part of a larger system. A more fair comparison would be to use something like MinerU [1] and proper benchmark like the OHR Bench and Reductos table bench. This paper is really bad...
[1]: https://github.com/opendatalab/MinerU
-
I once wanted to contribute to the popular swc project (https://github.com/swc-project/swc). I cloned the repo, ran build, and a whooping 20GB was gone from my disk. The parser itself (https://github.com/swc-project/swc/blob/main/crates/swc_ecma...) has over a dozen dependencies, including serde.
Meanwhile, the heaviest JavaScript parser implemented in JavaScript is more lightweight.
I decided that I should leave this project alone and spend my time elsewhere.
-
Scraping the Academy Award winners listed on Wikipedia with cheerio and saving them to a CSV file.
-
Website: postcss.org
-
Project mention: Advanced Pydantic: Generic Models, Custom Types, and Performance Tricks | dev.to | 2025-05-05
Across this five-post series, we’ve journeyed from Pydantic’s basics—type validation and nested models—to advanced integrations with FastAPI, SQLAlchemy, and scalable techniques. You’ve learned how to build declarative, type-safe models, handle complex APIs, and optimize performance. To deepen your knowledge, explore the Pydantic documentation, contribute to the open-source project, or experiment with real-world use cases. Check out our GitHub repo for code samples and a Pydantic cheat sheet. Thank you for joining us—happy coding!
-
Paste this into the Playground (try it here). You’ll get something like:
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Project mention: Show HN: Using eBPF to see through encryption without a proxy | news.ycombinator.com | 2025-05-08
-
Once rector gets 8.4 rules out, this will be pretty awesome:
https://github.com/rectorphp/rector/issues/8701
https://github.com/nikic/PHP-Parser/commit/7b0384cdbe03431c4...
-
-
The React team is also collaborating with the oxc team to eventually add native support for the compiler. Once Rolldown, a Rust-based bundler for JavaScript and TypeScript, is released and supported in Vite, developers should be able to integrate the compiler without relying on Babel.
-
jsoniter
A high-performance 100% compatible drop-in replacement of "encoding/json" (by json-iterator)
Project mention: Go Performance: Pequenas mudanças que ajudam a melhorar o desempenho do seu app | dev.to | 2024-07-30 -
On top of this, because it scrapes HTML content, I also needed to use the Jsoup and Ksoup libraries to parse the HTML content and extract the necessary values. Because it's built on top of Ktor, it's available on multiple plaforms, meaning I needed to find different libraries to parse the HTML content on different platforms, and use different Ktor engines. For example, on Android, it uses the OkHttp engine, while on iOS, it uses the Darwin engine, and on the JVM, it uses the Java engine introduced in JDK11. You can look at the full build.gradle.kts file to see the dependencies and how they are set up.
-
Project mention: Ask HN: What Are You Working On? (February 2025) | news.ycombinator.com | 2025-02-23
-
Cool! I did not know about that book. Added to [1]. :-)
--
1: https://github.com/munificent/craftinginterpreters/issues/92...
-
Use free tools like Terser for JavaScript and CSSNano for CSS:
-
-
Project mention: Show HN: SQL-tString a t-string SQL builder in Python | news.ycombinator.com | 2025-05-16
https://github.com/tobymao/sqlglot :
> SQLGlot is a no-dependency SQL parser, transpiler, optimizer, and engine [written in Python] . It can be used to format SQL or translate between 24 different dialects like DuckDB, Presto / Trino, Spark / Databricks, Snowflake, and BigQuery. It aims to read a wide variety of SQL inputs and output syntactically and semantically correct SQL in the targeted dialects.
-
dasel
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
Project mention: TomWright/dasel: Select, put and delete data from JSON, TOML, YAML, XML and CSV | news.ycombinator.com | 2024-08-18No HCL support [0] though.
[0]: https://github.com/TomWright/dasel/issues/98
-
I'm particularly fond of Firecracker, but it’s a bit of work to set up, so if you cannot afford the time yet, you want to be on the safe side, do a combination of static analysis and time-boxing execution. You can use esprima to parse the code and check for any malicious act.
-
Project mention: Rusty Cascading Style Sheets – Another CSS Preprocessor | news.ycombinator.com | 2025-04-09
CSS Custom Properties have a cost. If you’re using them as global variables, and don’t need to look them up from JavaScript, or change them according to media queries, it’s good to flatten them out of existence: your bundle will be smaller, your execution faster, and your memory usage reduced. Same with mixins.
It would be good if Lightning CSS supported that use case: https://github.com/parcel-bundler/lightningcss/issues/69.
Compile-time variables are still a useful feature. CSS Custom Properties get used in a lot of places where compile-time variables would be better.
-
-
MegaParse
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
Project mention: MegaParse: Your One-Stop Solution for Effortless Document Parsing | dev.to | 2025-02-23View the Project on GitHub
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Parser discussion
Parser related posts
-
Evolution of Rust Compiler Errors
-
Rethinking How I Deal With CLI Arguments (replacing getopt)
-
Tree-Sitter: From Code to Syntax-Tree
-
Nnd – a TUI debugger alternative to GDB, LLDB
-
React Compiler RC: What it means for React devs
-
Advanced Pydantic: Generic Models, Custom Types, and Performance Tricks
-
Detailed Guide to go-doudou CLI Commands
-
A note from our sponsor - SaaSHub
www.saashub.com | 21 May 2025
Index
What are some of the best open-source Parser projects? This list will help you:
# | Project | Stars |
---|---|---|
1 | marked | 34,640 |
2 | MinerU | 33,591 |
3 | swc | 32,168 |
4 | cheerio | 29,436 |
5 | PostCSS | 28,775 |
6 | pydantic | 23,837 |
7 | tree-sitter | 20,605 |
8 | vector | 19,529 |
9 | PHP Parser | 17,238 |
10 | Parsedown | 14,895 |
11 | oxc | 14,449 |
12 | jsoniter | 13,741 |
13 | jsoup | 11,160 |
14 | nom | 9,888 |
15 | Crafting Interpreters | 9,750 |
16 | terser | 8,964 |
17 | sh | 7,724 |
18 | sqlglot | 7,719 |
19 | dasel | 7,451 |
20 | esprima | 7,093 |
21 | lightningcss | 7,016 |
22 | pdfminer.six | 6,455 |
23 | MegaParse | 6,408 |