Parser

Open-source projects categorized as Parser

Top 23 Parser Open-Source Projects

  1. marked

    A markdown parser and compiler. Built for speed.

    Project mention: Building PicoSSG: 'Just Enough Code' | dev.to | 2025-05-16

    ADR-003 documented the choice of markdown-it over alternatives like marked, based on careful evaluation of edge cases and built-in features like URL linking.

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. MinerU

    A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

    Project mention: Gemini beats everyone on new OCR benchmark | news.ycombinator.com | 2025-02-14

    The system they tested are mostly used as a part of a larger system. A more fair comparison would be to use something like MinerU [1] and proper benchmark like the OHR Bench and Reductos table bench. This paper is really bad...

    [1]: https://github.com/opendatalab/MinerU

  4. swc

    Rust-based platform for the Web

    Project mention: Rust Dependencies Scare Me | news.ycombinator.com | 2025-05-09

    I once wanted to contribute to the popular swc project (https://github.com/swc-project/swc). I cloned the repo, ran build, and a whooping 20GB was gone from my disk. The parser itself (https://github.com/swc-project/swc/blob/main/crates/swc_ecma...) has over a dozen dependencies, including serde.

    Meanwhile, the heaviest JavaScript parser implemented in JavaScript is more lightweight.

    I decided that I should leave this project alone and spend my time elsewhere.

  5. cheerio

    The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

    Project mention: A JavaScript scraper for the Wikipedia Academy Award List. | dev.to | 2025-01-23

    Scraping the Academy Award winners listed on Wikipedia with cheerio and saving them to a CSV file.

  6. PostCSS

    Transforming styles with JS plugins

    Project mention: 30 Best Free Tools for Frontend Developers in 2025 | dev.to | 2025-03-01

    Website: postcss.org

  7. pydantic

    Data validation using Python type hints

    Project mention: Advanced Pydantic: Generic Models, Custom Types, and Performance Tricks | dev.to | 2025-05-05

    Across this five-post series, we’ve journeyed from Pydantic’s basics—type validation and nested models—to advanced integrations with FastAPI, SQLAlchemy, and scalable techniques. You’ve learned how to build declarative, type-safe models, handle complex APIs, and optimize performance. To deepen your knowledge, explore the Pydantic documentation, contribute to the open-source project, or experiment with real-world use cases. Check out our GitHub repo for code samples and a Pydantic cheat sheet. Thank you for joining us—happy coding!

  8. tree-sitter

    An incremental parsing system for programming tools

    Project mention: Decoding Tree-sitter Playground Output For Fun | dev.to | 2025-05-09

    Paste this into the Playground (try it here). You’ll get something like:

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. vector

    A high-performance observability data pipeline.

    Project mention: Show HN: Using eBPF to see through encryption without a proxy | news.ycombinator.com | 2025-05-08
  11. PHP Parser

    A PHP parser written in PHP

    Project mention: PHP 8.4 Released | news.ycombinator.com | 2024-11-21

    Once rector gets 8.4 rules out, this will be pretty awesome:

    https://github.com/rectorphp/rector/issues/8701

    https://github.com/nikic/PHP-Parser/commit/7b0384cdbe03431c4...

  12. Parsedown

    Better Markdown Parser in PHP

  13. oxc

    ⚓ A collection of JavaScript tools written in Rust.

    Project mention: React Compiler RC: What it means for React devs | dev.to | 2025-05-05

    The React team is also collaborating with the oxc team to eventually add native support for the compiler. Once Rolldown, a Rust-based bundler for JavaScript and TypeScript, is released and supported in Vite, developers should be able to integrate the compiler without relying on Babel.

  14. jsoniter

    A high-performance 100% compatible drop-in replacement of "encoding/json" (by json-iterator)

    Project mention: Go Performance: Pequenas mudanças que ajudam a melhorar o desempenho do seu app | dev.to | 2024-07-30
  15. jsoup

    jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.

    Project mention: Reverse-Engineering Cookies with Ktor and Ksoup | dev.to | 2025-04-06

    On top of this, because it scrapes HTML content, I also needed to use the Jsoup and Ksoup libraries to parse the HTML content and extract the necessary values. Because it's built on top of Ktor, it's available on multiple plaforms, meaning I needed to find different libraries to parse the HTML content on different platforms, and use different Ktor engines. For example, on Android, it uses the OkHttp engine, while on iOS, it uses the Darwin engine, and on the JVM, it uses the Java engine introduced in JDK11. You can look at the full build.gradle.kts file to see the dependencies and how they are set up.

  16. nom

    Rust parser combinator framework

    Project mention: Ask HN: What Are You Working On? (February 2025) | news.ycombinator.com | 2025-02-23
  17. Crafting Interpreters

    Repository for the book "Crafting Interpreters"

    Project mention: Nnd – a TUI debugger alternative to GDB, LLDB | news.ycombinator.com | 2025-05-06

    Cool! I did not know about that book. Added to [1]. :-)

    --

    1: https://github.com/munificent/craftinginterpreters/issues/92...

  18. terser

    🗜 JavaScript parser, mangler and compressor toolkit for ES6+

    Project mention: How to Optimize Website Performance on a Budget | dev.to | 2025-03-05

    Use free tools like Terser for JavaScript and CSSNano for CSS:

  19. sh

    A shell parser, formatter, and interpreter with bash support; includes shfmt (by mvdan)

  20. sqlglot

    Python SQL Parser and Transpiler

    Project mention: Show HN: SQL-tString a t-string SQL builder in Python | news.ycombinator.com | 2025-05-16

    https://github.com/tobymao/sqlglot :

    > SQLGlot is a no-dependency SQL parser, transpiler, optimizer, and engine [written in Python] . It can be used to format SQL or translate between 24 different dialects like DuckDB, Presto / Trino, Spark / Databricks, Snowflake, and BigQuery. It aims to read a wide variety of SQL inputs and output syntactically and semantically correct SQL in the targeted dialects.

  21. dasel

    Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

    Project mention: TomWright/dasel: Select, put and delete data from JSON, TOML, YAML, XML and CSV | news.ycombinator.com | 2024-08-18

    No HCL support [0] though.

    [0]: https://github.com/TomWright/dasel/issues/98

  22. esprima

    ECMAScript parsing infrastructure for multipurpose analysis

    Project mention: Running Untrusted JavaScript Code | dev.to | 2024-07-21

    I'm particularly fond of Firecracker, but it’s a bit of work to set up, so if you cannot afford the time yet, you want to be on the safe side, do a combination of static analysis and time-boxing execution. You can use esprima to parse the code and check for any malicious act.

  23. lightningcss

    An extremely fast CSS parser, transformer, bundler, and minifier written in Rust.

    Project mention: Rusty Cascading Style Sheets – Another CSS Preprocessor | news.ycombinator.com | 2025-04-09

    CSS Custom Properties have a cost. If you’re using them as global variables, and don’t need to look them up from JavaScript, or change them according to media queries, it’s good to flatten them out of existence: your bundle will be smaller, your execution faster, and your memory usage reduced. Same with mixins.

    It would be good if Lightning CSS supported that use case: https://github.com/parcel-bundler/lightningcss/issues/69.

    Compile-time variables are still a useful feature. CSS Custom Properties get used in a lot of places where compile-time variables would be better.

  24. pdfminer.six

    Community maintained fork of pdfminer - we fathom PDF

  25. MegaParse

    File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.

    Project mention: MegaParse: Your One-Stop Solution for Effortless Document Parsing | dev.to | 2025-02-23

    View the Project on GitHub

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Parser discussion

Log in or Post with

Parser related posts

  • Evolution of Rust Compiler Errors

    1 project | news.ycombinator.com | 16 May 2025
  • Rethinking How I Deal With CLI Arguments (replacing getopt)

    1 project | news.ycombinator.com | 16 May 2025
  • Tree-Sitter: From Code to Syntax-Tree

    1 project | dev.to | 11 May 2025
  • Nnd – a TUI debugger alternative to GDB, LLDB

    5 projects | news.ycombinator.com | 6 May 2025
  • React Compiler RC: What it means for React devs

    4 projects | dev.to | 5 May 2025
  • Advanced Pydantic: Generic Models, Custom Types, and Performance Tricks

    1 project | dev.to | 5 May 2025
  • Detailed Guide to go-doudou CLI Commands

    1 project | dev.to | 4 May 2025
  • A note from our sponsor - SaaSHub
    www.saashub.com | 21 May 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Parser projects? This list will help you:

# Project Stars
1 marked 34,640
2 MinerU 33,591
3 swc 32,168
4 cheerio 29,436
5 PostCSS 28,775
6 pydantic 23,837
7 tree-sitter 20,605
8 vector 19,529
9 PHP Parser 17,238
10 Parsedown 14,895
11 oxc 14,449
12 jsoniter 13,741
13 jsoup 11,160
14 nom 9,888
15 Crafting Interpreters 9,750
16 terser 8,964
17 sh 7,724
18 sqlglot 7,719
19 dasel 7,451
20 esprima 7,093
21 lightningcss 7,016
22 pdfminer.six 6,455
23 MegaParse 6,408

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Rust is
the 5th most popular programming language
based on number of references?