peg-bootstrap vs 500lines

peg-bootstrap

A PEG that compiles itself. (by kragen)

Suggest topics

Source Code

Suggest alternative

Edit details

500lines

500 Lines or Less (by kragen)

Suggest topics

Source Code

Suggest alternative

Edit details

SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App

With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

surveyjs.io

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

peg-bootstrap		500lines
	Project
6	Mentions	2
74	Stars	28
-	Growth	-
0.0	Activity	10.0
over 13 years ago	Latest Commit	almost 10 years ago
JavaScript	Language	Python
-	License	GNU General Public License v3.0 or later

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

peg-bootstrap

Posts with mentions or reviews of peg-bootstrap. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-11-07.

Writing a Compiler is Surprisingly Easy (part 1)
5 projects | news.ycombinator.com | 7 Nov 2023

a problem that a lot of these series run into is that the author runs out of steam before they finish writing them. crenshaw's otherwise excellent series suffers from this, for example
so far the author of this one has only written the first chapter
i've written a few didactic compilers that are complete enough to compile themselves, though nothing else
https://github.com/kragen/stoneknifeforth (from a forth-like language to an i386 linux elf executable)
https://github.com/kragen/peg-bootstrap/blob/master/peg.md (from a peg language description with semantic actions to javascript)
http://canonical.org/~kragen/sw/urscheme (from a subset of scheme to at&t-syntax i386 assembly)
PEGs in a PEG
1 project | news.ycombinator.com | 28 Jun 2023
Pegs in a Peg
1 project | news.ycombinator.com | 5 May 2023
A complete compiler and VM in 150 lines of code
4 projects | news.ycombinator.com | 16 Jul 2022

That's powerful enough to conveniently write, for example, a numerical root finding program for an arbitrary arithmetic expression.
But I think that within a complexity budget of 150 lines of code you can maybe be even more ambitious than that.
The example compiler in https://github.com/darius/parson/blob/master/eg_calc_compile... is a bit more stripped down than that, but in its 32 lines of code it compiles arithmetic assignment statements to a three-address RISC-like code (though using an unbounded number of registers). https://github.com/darius/parson/blob/master/eg_calc_to_rpn.... is a 16-line version that compiles the same language to a stack machine like your tutorial example.
In 66 lines of code in https://github.com/kragen/peg-bootstrap/blob/master/peg.md I wrote an example compiler which compiles a PEG grammar into a JavaScript parser for that grammar. Admittedly those 66 lines do not include an implementation of JavaScript to run the code on. It compiles the language it's written in.
In 132 lines of code in https://github.com/kragen/stoneknifeforth/blob/master/tinybo... I wrote an example compiler which compiles a crippled Forth dialect into i386 machine code, including an ELF header so you can run the result. It also compiles the language it's written in. It also doesn't include an i386 emulator to run it on.
In 83 lines of code in http://canonical.org/~kragen/sw/dev3/neelcompiler.ml Neel Krishnaswami wrote a compiler from the untyped λ-calculus to a simple assembly language for a register machine. It also doesn't include an implementation of the assembly language.
In 18 lines of code in http://canonical.org/~kragen/sw/dev3/meta5ix.m5, a simplification of META-II, I wrote a compiler from grammar descriptions to an assembly code for a parsing-oriented virtual machine. It compiles the language it's written in. A Python interpreter for the machine is in http://canonical.org/~kragen/sw/dev3/meta5ixrun.py (109 lines of code) and a precompiled version of the compiler-compiler for bootstrapping is in http://canonical.org/~kragen/sw/dev3/meta5ix.generated.m5asm.
A slightly incompatible variant of Meta5ix which instead compiles itself to C is in http://canonical.org/~kragen/sw/dev3/meta5ix2c.m5 (133 lines of code, depending on how you count). (No C compiler is included.) The precompiled C output for bootstrapping is in http://canonical.org/~kragen/sw/dev3/meta5ix2c.c.
Meta5ix is extremely weak and limited, really only enough for a compiler front-end; it can't, for example, do the kinds of RPN tricks we're talking about above.
Understanding LSM Trees: What Powers Write-Heavy Databases
3 projects | news.ycombinator.com | 9 Feb 2022

Oh goodness, I never thought about using LuaLaTeX for that! I wrote handaxeweb.lua originally for https://github.com/kragen/peg-bootstrap/peg.md but have used it for various things since then.
First C compiler ported to GCC
2 projects | news.ycombinator.com | 5 Apr 2021

I did pretty much the same thing with https://github.com/kragen/peg-bootstrap/blob/master/peg.md, although admittedly hand-compiling to JS was noticeably less work than hand-compiling to assembly language would have taken. My friend Dave did something similar with Val Schorre's META-II: https://queue.acm.org/detail.cfm?id=2724586

500lines

Posts with mentions or reviews of 500lines. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-07-21.

Ask HN: What are some 'cool' but obscure data structures you know about?
54 projects | news.ycombinator.com | 21 Jul 2022

I'm thinking that the bar for "obscure" here is maybe pretty low if bloom filters pass?
Here's a few fun ones.
— ⁂ —
The Burrows-Wheeler transform is the second character of each suffix in a (cyclic) suffix array, and there turns out to be a simple algorithm to recover the original text from this transform; live demo: http://canonical.org/~kragen/sw/bwt. But the new text is very easily compressible, which is the basis for bzip2. As mentioned by Blahah and dahart, this is only one of many interesting things you can do with suffix arrays; they can also do efficient regular expression search, for example. This is more interesting since three practical linear-time and -space suffix array construction algorithms were discovered in the early 02000s.
— ⁂ —
Pentti Kanerva devised a "fully distributed representation" in the 01990s, which has some features in common with bloom filters — every association stored in an FDR record is stored to an equal extent (probabilistically) in every bit of the record, so erasing some of the bits erases none of the associations but increases the error probability for all of them. You assign a random or pseudorandom bitvector (of, say, 16384 bits) to every atomic value to be stored, represent an association of two (or more) atoms as their XOR, and then take the bitwise majority rule of all the associations to be stored as the record. To retrieve an association, you XOR the record with an atom, then compute the correlation of the XOR result with all possible atoms; the correlation with the correct atoms will be many standard deviations away from the mean, unless the number of associations stored in the record is high enough that conventional data structures wouldn't have been able to store it either.
This and some related work has blossomed into a field called "hyperdimensional computing" and "vector symbolic architectures". But the problem of finding the nearest atom or atoms seems, on its face, to make this impractical: if you have 16384 possible atoms and 16384 bits in your bitvector, you would seem to need at least 16777216 bit operations to compute all the associations.
I realized the other night that if you use the rows of a Walsh–Hadamard matrix as the "pseudorandom" atom representations, you can compute all the correlations in linearithmic time with the fast Walsh–Hadamard transform. Moreover, Cohn and Lempel's result from 01977 "on fast M-sequence transforms" is that after a simple transposition you can also use the FWHT to compute all the correlations with the complete output of a maximal LFSR (a so-called "M-sequence") in linearithmic time, so you can also use the cyclic shifts of an LFSR output for your atoms without losing efficiency. Probably someone else has already figured this out previously, but I hadn't found it in the literature.
— ⁂ —
I'm not sure if the quantities used in reduced affine arithmetic count as a "data structure"? They're a data type, at least, but they're not a container. The idea is that you execute some arbitrary numerical algorithm, not merely on floating-point numbers, or even on dual numbers as in forward-mode automatic differentiation, nor even yet on (min, max) intervals as in ordinary interval arithmetic, but on a vector of coefficients [a₀, a₁, a₂...aₙ], which represents a linear function a₀x₀ + a₁x₁ + a₂x₂ + ... aₙxₙ, where x₀ = 1 and each of the other xᵢ ∈ [-1, 1], so this function is really a representation of an interval. Then you can assign each of the xᵢ (except x₀ and xₙ) to be the unknown error in one of the input parameters to your algorithm, whose magnitude is determined by the aᵢ value in the value of that parameter; and when you introduce rounding error, you increase aₙ enough to account for it. (The non-reduced kind of affine arithmetic just increases n without limit to account for new roundoff errors.)
— ⁂ —
Reduced affine arithmetic has two advantages over the usual kind of interval arithmetic. First, it cancels errors correctly; if you calculate (y+1)(y+2)/(y+3) with interval arithmetic, with some uncertainty in the value of y, you usually get an unnecessarily loose bound on the result because interval arithmetic assumes that the uncertainties in the numerator and the denominator are uncorrelated, when in fact for many values of y they partly cancel out. (You can't apply this cancelation to aₙ, the error stemming from all the roundoffs in your algorithm, just the other aᵢ.) But the more interesting result is that RAA gives you a linear function that approximates the calculation result as a linear function of the inputs, which is pretty similar to calculating the gradient — but with error bounds on the approximation!
— ⁂ —
LSM trees are only about as "obscure" as bloom filters, since they underlie several prominent filesystems, LevelDB, and most of the world's search engines, but if you don't know about bloom filters, you might not know about LSM trees either. I wrote a tutorial explanation of LSM trees in https://github.com/kragen/500lines/tree/master/search-engine, though the name "LSM tree" hadn't yet gone mainstream.
Binary array sets, as described in https://www.nayuki.io/page/binary-array-set, are closely related to LSM trees, to the point that you could describe them as an application of LSM trees. They are a simple data structure for the exact version of the problem bloom filters solve approximately: maintaining a set S of items supporting the operations S.add(item) and S.contains?(item). Insertion (.add) is amortized constant time, and membership testing (.contains?) is O(log² N).
— ⁂ —
A different and more efficient data structure for the exact set membership problem, limited to the case where the items to be stored are up to M integers in the range [0, N), supports constant-time creation, clearing, insertion, size measurement, and membership testing, and linear-time enumeration, but I think cannot be implemented in modern standard C because of its rules about reading uninitialized data. (By allowing creation to be linear time you can implement it in standard C.) It uses two arrays, A[M] and B[N] and a variable n, and i is a member of the set iff B[i] < n ∧ A[B[i]] == i, from which it is straightforward to work out the other methods.
I don't remember what this data structure is called, but I definitely didn't invent it. I described it and some uses for it in https://dercuano.github.io/notes/constant-time-sets-for-pixe.... You can associate additional values with this structure to use it as a sparse array.
Understanding LSM Trees: What Powers Write-Heavy Databases
3 projects | news.ycombinator.com | 9 Feb 2022
Groom seems to have forgotten to mention the "merge" part of "log-structured merge" trees, and consequently the "log-structured" part too. He does talk about a "compaction process," but he sort of just doesn't mention the process of selecting when and what to compact, or why the compaction process can use purely sequential I/O, which are the crucial aspects of LSM-tree performance.
My unfinished attempt from 02014 to explain LSM-trees (and, in particular, full-text search engines using them) with an implementation in 250 lines of Python 2 is at https://github.com/kragen/500lines/tree/master/search-engine. I think it's a more complete explanation (except for tombstones), but it's longer (half an hour of reading) and there aren't as many illustrations.
On the plus side, you can actually run it:
```
    $ make
```

What are some alternatives?

When comparing peg-bootstrap and 500lines you can also consider the following projects:

stoneknifeforth - a tiny self-hosted Forth implementation

multiversion-concurrency-contro

go-sstables - Go library for protobuf compatible sstables, a skiplist, a recordio format and other database building blocks like a write-ahead log. Ships now with an embedded key-value store.

proofs - My personal repository of formally verified mathematics.

parson - Yet another PEG parser combinator library and DSL

first-cc-gcc - The first C compiler made to work under modern GCC