Professor Donald Knuth on Writing and More, an Interview [pdf]

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • nestor

    A parallel implementation of NESTOR in Python 3

  • (In fact his earlier non-literate TeX78 in the SAIL language was written as a bunch of separate modules rather than a single 25000-line file like the literate/"better" tex.web.)

    So that's a bit about the genesis of literate programming. Now about some of your other comments:

    - Your quote about top-down programming from literateprogramming.com is from the author of fweb, but actually Knuth himself doesn't advocate top-down programming. One of the things he likes about literate programming is that the program can be written in a mixture of bottom-up and top-down programming without being forced to commit to either; from studying many of his programs it's clear that his preference is to build things mostly bottom-up, though each small "layer" he may write top-down. (For example, in his TeX program he starts with "chapters" on string handling, arithmetic, memory allocation, data structures etc, building up to the main program as the climax of the book. Each "chapter", which is sometimes just a single function, is written top-down, more or less.)

    - You linked to how the PBRT book introduces literate programming, and said it shows noweb/LP is "still too mired by constraints of C", but it's just that the example they used for the idea of rearranging sections is one of overcoming some constraints of C using LP. (It's not really specific to C though; the same thing would apply whenever one had a lot of initialization to do in different places, like the giant object constructors one can see in many C++ codebases.)

    - Tastes are subjective, but IMO the program he wrote for Bentley's question of "top k most common words" (not his choice of problem!) is actually very interesting: it's an exposition of an ingenious data structure called a (packed) hash trie that has, to the best of my knowledge, never been described anywhere else, and which he probably made up on the spot for this problem. (Packed tries are described in an exercise in TAOCP and the TeX program uses them to store hyphenation patterns—his student Liang's thesis was about this—but packing them randomly using an open-addressing hash table seems to be a twist particular to this program.) I was planning to write a blog post last year illustrating this with pictures and all that, but never got around to it. I did write a translation into (non-literate) C++ here: https://codegolf.stackexchange.com/a/197870

    - Returning to a couple of your other comments and the phrase "top-down style associated with literate programming": I think another way of looking at this is historical. When Knuth learned to program, in machine code on an IBM 650 and later writing compilers in machine code (or with his own assemblers) other machines, even the idea of subroutines (with standard calling conventions and all that) was hardly well-established, which is why TAOCP Chapter 1 has, under 1.4 "Some Fundamental Programming Techniques" an explanation of what subroutines are. When "Structured Programming" took over in the 1970s (just the idea of using if/while/for and subroutines instead of arbitrary control-flow jumps), and even that Knuth is not still on board with. (If anyone is interested I'll write a blog post on Knuth's opinion on/defence of "goto", which he still uses.) Since then, the programming / software-engineering world at large has settled on certain ideas of structuring programs, in order to keep complexity manageable: the use of abstraction, encapsulation, modules, information-hiding, objects, avoiding global variable,s etc. In some sense what Knuth came up with for himself with WEB/CWEB is a wholesale alternative to all these ways of structuring programs. This seems to suffice for him, but for the rest of us it's a highly unfamiliar/unpalatable way of writing programs. This doesn't mean "Knuth is doing it wrong" in the sense of Kartik's post, because what you should take away as "Literate Programming" from reading his programs is not the diff w.r.t. how a modern programmer would it, but the diff w.r.t. how he would write it without LP. That is, if you take for granted that the program is going to have very few functions, lots of global variables, etc., the question is whether LP is a better way of writing or presenting that program than non-LP. (It seems yes, though even this is debatable. Besides, Knuth does say that without LP he'd probably write more levels of functions but with LP he doesn't "have to", so you may disagree with my assertion too.) You should instead look at examples of programs that a modern programmer inculturated in modern dogma wrote with and without LP. I imagine these days one might have better results searching for people using org-babel (rather than cweb/noweb), or the https://github.com/AndrewOwenMartin/nestor mentioned in another thread above, or codebase of/written in nbdev that came up recently (https://www.fast.ai/2019/12/02/nbdev/ , https://github.blog/2020-11-20-nbdev-a-literate-programming-...).

    - But that apart, returning once again to Knuth's own literate programs. I have struggled a lot with them. I'm sure there are many people who have understood more than me, but I suspect I have struggled more than anyone :D I had some grand plans to make it more readable etc (https://shreevatsa.net/tex/program), but eventually they started making more sense as they are. (I should update those pages…) The point is, literate programming does not necessarily mean writing code in a vacuum for readers who know nothing. Knuth does not imagine explaining language features (as you pointed out with the INFORM example in another comment here), and moreover he also assumes familiarity with his LP conventions (the overall structure of the program, how to use the index and flip back-and-forth) and even I think his usual programming conventions. So the fact that the ADVENTURE program has the source file's outline (with #includes and everything at the top) on the second page, with a typedef of the "boolean" type and a couple of general-purpose variables declared at the top of "main" (which will probably be (ab)used for lots of different purposes, I expect)—all of these are not a problem, for the "literate" reader. To Knuth, literate programming does not mean belabouring and explaining everything; it's just a way of structuring your program. And though he mentions reading the program like a book, he doesn't think of books as things to be read strictly word-for-word in the sequence they appear; he imagines reading any book non-linearly (looking up the index, flipping back-and-forth, skimming, etc) and that's how the programs are to be read too: the order is just more or less the "right" one, not strictly.

    About the ADVENTURE program in particular, let me post a separate comment as this is already very long.

  • TypeScript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • > This misconception has led to claims that comment-extraction tools, such as the Perl Plain Old Documentation or Java Javadoc systems, are "literate programming tools". However, because these tools do not implement the "web of abstract concepts" hiding behind the system of natural-language macros, or provide an ability to change the order of the source code from a machine-imposed sequence to one convenient to the human mind, they cannot properly be called literate programming tools in the sense intended by Knuth.[12][13]

    I was then curious if any tooling has been created for literate programming in TypeScript. I haven't given a close look yet, but here are some relevant links:

    - https://github.com/microsoft/TypeScript/issues/31364

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • knuth-literate-programs

    Examples of literate programming by Knuth

  • So with that understanding in the comment above, I think one is supposed to glean that the following is the structure of the ADVENT program (http://literateprogramming.com/adventure.pdf or download from https://github.com/shreevatsa/knuth-literate-programs/blob/m... for working hyperlinks):

    - In "Introduction" (sections 1–3), the program's overall outline is given,

    - In "The Vocabulary" (sections 4–17), we have data structures for storing words (the hash table), how they map to motions and objects and other responses from the program,

    - In "Cave Data" (sections 18–20) there are the 100+ locations one can be in during the game,

    - In "Cave Connections" (sections 21–62) there's the game's "map" data: how you get around, what messages you get, etc,

    - In "Data structures for objects" (sections 63–68), there's data structures for the list of (game) objects at a given location, for groups of objects, and object properties (like its name and in-game note),

    - In "Object data" (sections 69–70) these are populated,

    - "Low-level input" (sections 71–73), is just asking yes/no questions and taking in commands,

    - The next chapter, "The main control loop" (sections 74–91) begins with some comments very revealing about the author's programming style / thinking:

    > Now we’ve got enough low-level mechanisms in place to start thinking of the program from the top down, and to specify the high-level control. […] Here is our overall strategy for administering the game. […] The execution consists of two nested loops: There are “minor cycles” inside of “major cycles.” Actions define minor cycles in which you stay in the same place and we tell you the result of your action. Motions define major cycles in which you move and we tell you what you can see at the new place.

    etc. And this chapter, in turn using many of the following ones, is what populates one of the main sections that were mentioned in the outline of the program on Page 2. There's parsing of the user's input, dealing with edge cases, until we figure out where to `goto`.

    - The next chapter, "Simple verbs" (sections 92–103) starts with “Let’s get experience implementing the actions by dispensing with the easy cases first” i.e. it's the start of several chapters implementing the action procedures we invoke (or to which we "goto") from the previously mentioned main control loop.

    - The next chapter, "Liquid assets" (sections 104–115) starts with something that I also find revealing: "Readers of this program will already have noticed that the BOTTLE is a rather complicated object, since it can be empty or filled with either water or oil" (so he expects that your reading will have noticed this already?)

    - Then "The other actions" (sections 116–139) is what you'd expect: "Now that we understand how to write action routines, we’re ready to complete the set" (must say, this whole game looks a lot of fun! Thanks Crowther and Woods!)

    - Then "Motions" (sections 140–153) (filling out a section referred to in the main control loop above), "Random Numbers" (sections 154–158, half a page), "Dwarf stuff" (sections 159–176), "Closing the Cave" (177–182), and "Death and Resurrection" (183–192), "Scoring" (193–199), and "Launching the program" (section 200).

    Well, with my hard-won familiarity with WEB/CWEB conventions (I bet you didn't expect the table of contents to be on the last page) and with Knuth's style (there's often a main control loop branching off to various action procedures), the above was what I understand of the overall structure of the program, and it took me about a second or two per page, maybe a bit more here and there, less than five minutes overall. But now I feel oriented enough that I can now understand the program as a whole, and I think literate programming (though I haven't even started actually reading it!) makes the program more helpful than if it was written in "for the compiler" order with the same soup of functions and gotos. But to reiterate, this is about better presentation of programs written in straightforward "just tell the computer what to do" style. It is conceivable that with modern programming conventions, with just the right abstractions and objects and whatnot, one could produce something more readable. But that's not relevant here, I think (and besides maybe that program too could be reordered, written literately?)

    Now, one could ask: why doesn't Knuth just write out this outline at the beginning? Why is the heart of how the program works, the main control loop, mentioned neither at beginning nor end but in the middle of the book, section 74 of 200? Is it fair to the reader to have so many conventions that are not explained in the program itself? Well, I have no end of frustrations with trying to read Knuth's literate programs (see my complaints and jokes in the page I linked above and in the slides), but still I think the answer would be: (1) Every house style has its conventions that must be learned (like: "variables ending with an underscore are private member variables in this codebase"), (2) It would be extra work to write this all out when the reader can just skim and find out, and for Knuth, literate programming (and programming in general) is supposed to be fun, not drudgery. So while I think Knuth-style literate programming has problems, I think the more interesting problems are not the surface-level ones.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts