Top 23 Parser Open-Source Projects

marked

60 31,926 9.5 JavaScript

A markdown parser and compiler. Built for speed.

Project mention: Eleventy vs. Next.js for static site generation | dev.to | 2023-12-14

Next, install gray-matter to extract metadata from the front matter of markdown files, and marked to convert the markdown files to HTML:

swc

139 30,053 9.9 Rust

Rust-based platform for the Web

Project mention: Storybook 8 Beta | dev.to | 2024-02-06

First, we switched the default compiler for new projects from Babel to SWC (Speedy Web Compiler). SWC is dramatically faster than Babel and requires zero configuration. We’ll continue to support Babel in any project currently using it.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
PostCSS

86 28,210 8.8 TypeScript

Transforming styles with JS plugins

Project mention: PostCSS - my initial experience | dev.to | 2024-01-11

the plugins in the official PostCSS website were old like IE6 or the marquee tag, and

cheerio

50 27,801 9.7 TypeScript

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

Project mention: 8 NPM Packages for JavaScript Beginners [2024][+tutorials] | dev.to | 2024-04-02

Cheerio is your ticket to the world of server-side magic, allowing you to manipulate HTML and XML documents with jQuery-like syntax. It’s perfect for web scraping, data extraction, or just making sense of the mess that is web content. With Cheerio, you get to play around with the DOM, use CSS selectors, and basically do all the cool things you'd do in the browser, but server-side.

pydantic

167 18,733 9.8 Python

Data validation using Python type hints

Project mention: Advanced RAG with guided generation | dev.to | 2024-04-18

First, note the method prefix_allowed_tokens_fn. This method applies a Pydantic model to constrain/guide how the LLM generates tokens. Next, see how that constrain can be applied to txtai's LLM pipeline.

PHP Parser

11 16,846 8.3 PHP

A PHP parser written in PHP

Project mention: PHP-Parser: A PHP parser written in PHP | news.ycombinator.com | 2024-03-06

vector

97 16,561 9.9 Rust

A high-performance observability data pipeline.

Project mention: What is a low/reasonable cost solution for service log storage and querying? | news.ycombinator.com | 2024-05-05

I am thinking about using https://vector.dev/ but would also love opinions on the best deal for lower or reasonable cost storage/querying of logs. Thanks!

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
tree-sitter

62 16,625 9.8 Rust

An incremental parsing system for programming tools

Project mention: Lezer: A Parsing System for CodeMirror, Inspired by Tree-Sitter | news.ycombinator.com | 2024-03-24

I learned from a google search that these days upstream tree-sitter provides WebAssembly bindings.
Source: https://github.com/tree-sitter/tree-sitter/tree/master/lib/b...
NPM: https://www.npmjs.com/package/web-tree-sitter
Download from the latest Github release: js file (https://github.com/tree-sitter/tree-sitter/releases/download...) and wasm file (https://github.com/tree-sitter/tree-sitter/releases/download...)

Parsedown

7 14,650 4.8 PHP

Better Markdown Parser in PHP

Project mention: Parsedown: Better Markdown Parser in PHP | news.ycombinator.com | 2024-01-05

jsoniter

12 13,085 0.0 Go

A high-performance 100% compatible drop-in replacement of "encoding/json" (by json-iterator)

Project mention: Handling high-traffic HTTP requests with JSON payloads | /r/golang | 2023-12-07

Since most of the time would be spent decoding json, you could try to cut this time using https://github.com/bytedance/sonic or https://github.com/json-iterator/go, both are drop-in replacements for the stdlib, sonic is faster.

jsoup

27 10,645 9.1 Java

jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.

Project mention: FLaNK Stack Weekly for 20 June 2023 | dev.to | 2023-06-20

nom

85 9,020 7.4 Rust

Rust parser combinator framework

Project mention: Planespotting with Rust: using nom to parse ADS-B messages | dev.to | 2023-10-28

Just in case you are not familiar with nom, it is a parser combinator written in Rust. The most basic thing you can do with it is import one of its parsing functions, give it some byte or string input and then get a Result as output with the parsed value and the rest of the input or an error if the parser failed. tag for example is used to recognize literal character/byte sequences.

oxc

10 8,927 10.0 Rust

⚓ A collection of JavaScript tools written in Rust.

Project mention: The JavaScript Oxidation Compiler | news.ycombinator.com | 2023-12-16

terser

28 8,432 8.9 JavaScript

🗜 JavaScript parser, mangler and compressor toolkit for ES6+

Project mention: How I use Devbox in my Elm projects | dev.to | 2024-05-02

These projects use Caddy as my local development server, Dart Sass for converting my Sass files to CSS, elm, elm-format, elm-optimize-level-2, elm-review, elm-test (only in Calculator), ShellCheck to find bugs in my shell scripts, and Terser to mangle and compress JavaScript code.

Crafting Interpreters

45 8,166 0.0 HTML

Repository for the book "Crafting Interpreters"

Project mention: Crafting Interpreters | news.ycombinator.com | 2023-12-26

esprima

8 6,962 0.0 TypeScript

ECMAScript parsing infrastructure for multipurpose analysis

Project mention: ESLint: under the hood | dev.to | 2023-11-07

Focusing again on ESLint, the parser used by the linter is called Espree. This is an in-house parser built by the ESLint folks to fully support ECMAScript 6 and JSX on top of the already existing Esprima. The Espree module provide APIs for both tokenization and parsing that you can easily test out.

sh

21 6,790 7.6 Go

A shell parser, formatter, and interpreter with bash support; includes shfmt (by mvdan)

Project mention: Show HN: Hucksh – A Shell with a Good Memory | news.ycombinator.com | 2023-12-21

* The shell itself is https://github.com/mvdan/sh, a bash-like command interpreter

lightningcss

11 5,966 8.7 Rust

An extremely fast CSS parser, transformer, bundler, and minifier written in Rust.

Project mention: I'm fed up with it, so I'm writing a browser | news.ycombinator.com | 2023-09-22

Would you consider using some libraries in your project? There are lots of good ones in the Rust ecosystem, and many of them are not part of any existing browsers.
For example:
- https://github.com/servo/html5ever (HTML parsing - note: this is used in Servo)
- https://github.com/parcel-bundler/lightningcss (CSS parsing)
- https://github.com/DioxusLabs/taffy (web layout)
- https://github.com/pop-os/cosmic-text (text layout and rendering)
Obviously you should be free to work on whatever you like, but just as a benchmark on the scope of your project: I spent ~6 months implementing just the CSS Grid algorithm in Taffy last year. An entire browser from literal scratch is probably a 10 year project for one person.

astexplorer

43 5,953 6.0 JavaScript

A web tool to explore the ASTs generated by various parsers.

Project mention: Understanding Code Structure: A Beginner's Guide to Tree-sitter | dev.to | 2024-04-06

You can play with your code here, and visualise ASTs for the same.

remarkable

5 5,671 3.9 JavaScript

Markdown parser, done right. Commonmark support, extensions, syntax plugins, high speed - all in one. Gulp and metalsmith plugins available. Used by Facebook, Docusaurus and many others! Use https://github.com/breakdance/breakdance for HTML-to-markdown conversion. Use https://github.com/jonschlinkert/markdown-toc to generate a table of contents.
sqlglot

56 5,573 9.9 Python

Python SQL Parser and Transpiler

Project mention: The Future of MySQL is PostgreSQL: an extension for the MySQL wire protocol | news.ycombinator.com | 2024-04-26

This is probably referring to "zero changes to your driver code" and not "zero changes to the SQL you send over this driver".
Translating between SQL dialects is notoriously hard and attempts to translate [1] are working in 95% of cases. But the last 5% would require 5x amount of work. That's because "SQL dialect" also includes weird edge cases of type inference of things like COALESCE(5, FALSE) and emulation of system catalogs (pg_catalog, information_schema).
[1] https://github.com/tobymao/sqlglot

pdfminer.six

14 5,469 6.8 Python

Community maintained fork of pdfminer - we fathom PDF

Project mention: Code to extract text from pdf to excel | /r/Python | 2023-06-02

I love to use PDFMiner and PDFQuery for this https://github.com/pdfminer/pdfminer.six https://towardsdatascience.com/scrape-data-from-pdf-files-using-python-and-pdfquery-d033721c3b28

body-parser

7 5,380 0.0 JavaScript

Node.js body parsing middleware

Project mention: NodeJS Security Best Practices | dev.to | 2024-02-19

Using body-parser you can set the limit on the size of the payload

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Parser related posts

The Future of MySQL is PostgreSQL: an extension for the MySQL wire protocol

1 project | news.ycombinator.com | 26 Apr 2024
Advanced RAG with guided generation

2 projects | dev.to | 18 Apr 2024
Understanding Code Structure: A Beginner's Guide to Tree-sitter

2 projects | dev.to | 6 Apr 2024
How to create your own Eslint rule with tests, boosting the DX, and code-review

2 projects | dev.to | 27 Mar 2024
Lezer: A Parsing System for CodeMirror, Inspired by Tree-Sitter

9 projects | news.ycombinator.com | 24 Mar 2024
Difftastic, a structural diff tool that understands syntax

17 projects | news.ycombinator.com | 21 Mar 2024
Programming from Top to Bottom - Parsing

2 projects | dev.to | 18 Mar 2024
A note from our sponsor - InfluxDB
www.influxdata.com | 8 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Parser projects? This list will help you:

	Project	Stars
1	marked	31,926
2	swc	30,053
3	PostCSS	28,210
4	cheerio	27,801
5	pydantic	18,733
6	PHP Parser	16,846
7	vector	16,561
8	tree-sitter	16,625
9	Parsedown	14,650
10	jsoniter	13,085
11	jsoup	10,645
12	nom	9,020
13	oxc	8,927
14	terser	8,432
15	Crafting Interpreters	8,166
16	esprima	6,962
17	sh	6,790
18	lightningcss	5,966
19	astexplorer	5,953
20	remarkable	5,671
21	sqlglot	5,573
22	pdfminer.six	5,469
23	body-parser	5,380

Parser

Top 23 Parser Open-Source Projects

Parser related posts

The Future of MySQL is PostgreSQL: an extension for the MySQL wire protocol

Advanced RAG with guided generation

Understanding Code Structure: A Beginner's Guide to Tree-sitter

How to create your own Eslint rule with tests, boosting the DX, and code-review

Lezer: A Parsing System for CodeMirror, Inspired by Tree-Sitter

Difftastic, a structural diff tool that understands syntax

Programming from Top to Bottom - Parsing

Index