Hyperlink Maximalism (2022)

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • archivebox-browser-extension

    Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.

  • https://chromewebstore.google.com/detail/habonpimjphpdnmcfka... (or https://github.com/tjhorner/archivebox-exporter for source)

    Pushes your history to ArchiveBox, which does the heavy lifting storing/processing the content.

    Alas, might not work with Epiphany because there's no complete extension support.

    But IIRC, it stores its urls in $XDG_DATA_HOME/epiphany/ephy-history.db - so a bit of sqlite and ArchiveBox might do the trick for you.

    Note: I'm running something similar, but find that I'd rather not rely on my history, I tend to click on a lot of garbage ;) You might want to curate a bit.

  • gwern.net

    Site infrastructure for gwern.net (CSS/JS/HS/images/icons). Custom Hakyll website with unique automatic link archiving, recursive tooltip popup UX, dark mode, and typography (sidenotes+dropcaps+admonitions+inflation-adjuster).

  • How to add hyperlinks is something I've thought a bit about for Gwern.net: there's no point having all these fancy popups if there are no hyperlinks exploiting them, right?

    The way I currently do it is that first, I make hyperlinks stable by automatically snapshotting & making local archives of pages (https://gwern.net/archiving#preemptive-local-archiving). There is no point in adding links if linkrot discourages anyone from using them, of course, and I found that manual linkrot fixing did not scale to the amount of writing & hyperlinking I want to do.

    The next step is adding links automatically. Particularly in the STEM topics I write most about these days, AI, there are many acronyms & named systems which mean specific things but it's easy to get lost in. Fortunately, that makes them easy to write automatic link rules for: https://github.com/gwern/gwern.net/blob/master/build/Config/... These run automatically on essay bodies when compiling the site, and on annotations when created. If a URL is already present, its rule doesn't run; and if it's not, only the first instance gets linked and the rest are skipped. (This is important: there are some approaches which take the lazy approach of hyperlinking every instance. This is bad and discredits linking.) This code is very slow but fast enough for static site building, anyway.

    Sometimes terms are too ambiguous or too rare or too much work to write an explicit rewrite rule for. But it will still exist on-site. In fact, you can say that the site corpus defines a set of rewrite rules: everytime I write by hand `[foo](http://bar)`, am I not implicitly saying that there ought to be a rewrite rule for the string `foo` which ought to hyperlink `http://bar`? So there is a script (https://github.com/gwern/gwern.net/blob/master/build/link-su...) which will parse the site corpus, compile all the text/link pairs, create/remove a bunch of them per whitelist/blacklists and a frequency/length threshold, and then generate a bunch of Emacs Lisp pairs. This master list of rewrites then gets read by an Elisp snippet in my Emacs and turned into several thousand interactive search-and-replace commands when I run my generic formatting command on a buffer.

    The effect of this second script is that after I have linked `Foo et al 2023` to `/doc/2023-foo.pdf` a few times (perhaps I went back and hyperlinked all instances of it after realizing it's an important paper), any future instances of 'Foo et al 2023' will pop up a search-and-replace asking to hyperlink it to `/doc/2023-foo.pdf`, and so on.

    Third, I exploit my link-recommendations for manually-curated 'see also' sections appended to annotations. I have a fairly standard link-recommender approach where each annotation is embedded by a neural network (OA API for now), and one does nearest-neighbor lookups to find _n_ 'similar' annotations, and shows the reader them in case any are relevant. So far so good. But I also do that after editing each annotation: embed-recommend-list, and spits out a HTML list of the top 20 or so similar-links appended to the annotation. I can look at that and delete the irrelevant entries, or the entire list. This means that they'll be included in the final embedded version of the annotation, will show up in any fulltext searche I run, are more visible to the reader, can be edited into the main body if I want to, etc.

    Fourth and most lately, I've been experimenting with GPT-4 for auto-formatting & auto-linking (https://github.com/gwern/gwern.net/blob/master/build/paragra...). GPT-4 has memorized many URLs, and where it hasn't, it still makes pretty good guesses. So, as part of the standard formatting passes, I pass annotations through GPT-4, with a bit added to its prompt, 'try to add useful hyperlinks to Wikipedia and other sources'. It often does, and it's quite convenient when that works. GPT-4 still confabulates URLs more often than I link, and sometimes hyperlinks too-obvious WP links and I have to delete them. So, still some adjustments required there.

    And these work well with the other site features like recursive popups, or bidirectional backlinks (https://gwern.net/design#backlink).

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Show HN: My related-posts finder script (with LLM and GPT4 enhancement)

    1 project | news.ycombinator.com | 8 Dec 2023
  • [Media] Nested browsing the Rust docs

    1 project | /r/rust | 4 Jan 2023
  • 本网站的设计(2021年) (Design of This Website (2021))

    1 project | /r/hnzh | 6 Apr 2022
  • Design of This Website (2021)

    1 project | /r/WhileTrueCode | 6 Apr 2022
  • Design of This Website

    1 project | /r/patient_hackernews | 6 Apr 2022