Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Parser Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
jsoniter
A high-performance 100% compatible drop-in replacement of "encoding/json" (by json-iterator)
-
remarkable
Markdown parser, done right. Commonmark support, extensions, syntax plugins, high speed - all in one. Gulp and metalsmith plugins available. Used by Facebook, Docusaurus and many others! Use https://github.com/breakdance/breakdance for HTML-to-markdown conversion. Use https://github.com/jonschlinkert/markdown-toc to generate a table of contents.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Next, install gray-matter to extract metadata from the front matter of markdown files, and marked to convert the markdown files to HTML:
First, we switched the default compiler for new projects from Babel to SWC (Speedy Web Compiler). SWC is dramatically faster than Babel and requires zero configuration. We’ll continue to support Babel in any project currently using it.
the plugins in the official PostCSS website were old like IE6 or the marquee tag, and
Cheerio is your ticket to the world of server-side magic, allowing you to manipulate HTML and XML documents with jQuery-like syntax. It’s perfect for web scraping, data extraction, or just making sense of the mess that is web content. With Cheerio, you get to play around with the DOM, use CSS selectors, and basically do all the cool things you'd do in the browser, but server-side.
First, note the method prefix_allowed_tokens_fn. This method applies a Pydantic model to constrain/guide how the LLM generates tokens. Next, see how that constrain can be applied to txtai's LLM pipeline.
Project mention: What is a low/reasonable cost solution for service log storage and querying? | news.ycombinator.com | 2024-05-05I am thinking about using https://vector.dev/ but would also love opinions on the best deal for lower or reasonable cost storage/querying of logs. Thanks!
Project mention: Lezer: A Parsing System for CodeMirror, Inspired by Tree-Sitter | news.ycombinator.com | 2024-03-24I learned from a google search that these days upstream tree-sitter provides WebAssembly bindings.
Source: https://github.com/tree-sitter/tree-sitter/tree/master/lib/b...
NPM: https://www.npmjs.com/package/web-tree-sitter
Download from the latest Github release: js file (https://github.com/tree-sitter/tree-sitter/releases/download...) and wasm file (https://github.com/tree-sitter/tree-sitter/releases/download...)
Since most of the time would be spent decoding json, you could try to cut this time using https://github.com/bytedance/sonic or https://github.com/json-iterator/go, both are drop-in replacements for the stdlib, sonic is faster.
Just in case you are not familiar with nom, it is a parser combinator written in Rust. The most basic thing you can do with it is import one of its parsing functions, give it some byte or string input and then get a Result as output with the parsed value and the rest of the input or an error if the parser failed. tag for example is used to recognize literal character/byte sequences.
These projects use Caddy as my local development server, Dart Sass for converting my Sass files to CSS, elm, elm-format, elm-optimize-level-2, elm-review, elm-test (only in Calculator), ShellCheck to find bugs in my shell scripts, and Terser to mangle and compress JavaScript code.
Focusing again on ESLint, the parser used by the linter is called Espree. This is an in-house parser built by the ESLint folks to fully support ECMAScript 6 and JSX on top of the already existing Esprima. The Espree module provide APIs for both tokenization and parsing that you can easily test out.
* The shell itself is https://github.com/mvdan/sh, a bash-like command interpreter
Would you consider using some libraries in your project? There are lots of good ones in the Rust ecosystem, and many of them are not part of any existing browsers.
For example:
- https://github.com/servo/html5ever (HTML parsing - note: this is used in Servo)
- https://github.com/parcel-bundler/lightningcss (CSS parsing)
- https://github.com/DioxusLabs/taffy (web layout)
- https://github.com/pop-os/cosmic-text (text layout and rendering)
Obviously you should be free to work on whatever you like, but just as a benchmark on the scope of your project: I spent ~6 months implementing just the CSS Grid algorithm in Taffy last year. An entire browser from literal scratch is probably a 10 year project for one person.
Project mention: Understanding Code Structure: A Beginner's Guide to Tree-sitter | dev.to | 2024-04-06You can play with your code here, and visualise ASTs for the same.
Project mention: The Future of MySQL is PostgreSQL: an extension for the MySQL wire protocol | news.ycombinator.com | 2024-04-26This is probably referring to "zero changes to your driver code" and not "zero changes to the SQL you send over this driver".
Translating between SQL dialects is notoriously hard and attempts to translate [1] are working in 95% of cases. But the last 5% would require 5x amount of work. That's because "SQL dialect" also includes weird edge cases of type inference of things like COALESCE(5, FALSE) and emulation of system catalogs (pg_catalog, information_schema).
[1] https://github.com/tobymao/sqlglot
I love to use PDFMiner and PDFQuery for this https://github.com/pdfminer/pdfminer.six https://towardsdatascience.com/scrape-data-from-pdf-files-using-python-and-pdfquery-d033721c3b28
Using body-parser you can set the limit on the size of the payload
Parser related posts
-
The Future of MySQL is PostgreSQL: an extension for the MySQL wire protocol
-
Advanced RAG with guided generation
-
Understanding Code Structure: A Beginner's Guide to Tree-sitter
-
How to create your own Eslint rule with tests, boosting the DX, and code-review
-
Lezer: A Parsing System for CodeMirror, Inspired by Tree-Sitter
-
Difftastic, a structural diff tool that understands syntax
-
Programming from Top to Bottom - Parsing
-
A note from our sponsor - InfluxDB
www.influxdata.com | 8 May 2024
Index
What are some of the best open-source Parser projects? This list will help you:
Project | Stars | |
---|---|---|
1 | marked | 31,926 |
2 | swc | 30,053 |
3 | PostCSS | 28,210 |
4 | cheerio | 27,801 |
5 | pydantic | 18,733 |
6 | PHP Parser | 16,846 |
7 | vector | 16,561 |
8 | tree-sitter | 16,625 |
9 | Parsedown | 14,650 |
10 | jsoniter | 13,085 |
11 | jsoup | 10,645 |
12 | nom | 9,020 |
13 | oxc | 8,927 |
14 | terser | 8,432 |
15 | Crafting Interpreters | 8,166 |
16 | esprima | 6,962 |
17 | sh | 6,790 |
18 | lightningcss | 5,966 |
19 | astexplorer | 5,953 |
20 | remarkable | 5,671 |
21 | sqlglot | 5,573 |
22 | pdfminer.six | 5,469 |
23 | body-parser | 5,380 |
Sponsored