jupyterlab-lsp VS gwern.net

Compare jupyterlab-lsp vs gwern.net and see what are their differences.

jupyterlab-lsp

Coding assistance for JupyterLab (code navigation + hover suggestions + linters + autocompletion + rename) using Language Server Protocol (by jupyter-lsp)

gwern.net

Site infrastructure for gwern.net (CSS/JS/HS/images/icons). Custom Hakyll website with unique automatic link archiving, recursive tooltip popup UX, dark mode, and typography (sidenotes+dropcaps+admonitions+inflation-adjuster). (by gwern)
Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
jupyterlab-lsp gwern.net
17 16
1,730 434
2.6% -
9.4 9.9
2 days ago 3 days ago
TypeScript Haskell
BSD 3-clause "New" or "Revised" License MIT License
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

jupyterlab-lsp

Posts with mentions or reviews of jupyterlab-lsp. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-07-30.

gwern.net

Posts with mentions or reviews of gwern.net. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-25.
  • Show HN: My related-posts finder script (with LLM and GPT4 enhancement)
    1 project | news.ycombinator.com | 8 Dec 2023
    I do something similar on my website ( https://www.gwern.net ; crummy code at https://github.com/gwern/gwern.net/ ) for the 'similar' feature: call OA API with embedding, nearest-neighbor via cosine, list of links for suggested further reading.

    Because it's a static site, managing the similar links poses the difficulties OP mentions: where do you store & update it? In the raw original Markdown? We solve it by transclusion: the list of 'similar' links is stored in a separate HTML snippet, which is just transcluded into the web page on demand. The snippets can be arbitrarily updated without affecting the Markdown essay source. We do this for other things too, it's a handy design pattern for static sites, to make things more compositional (allowing one HTML snippet to be reused in arbitrarily many places or allowing 'extremely large' pages) at the cost of some client-side work doing the transclusion.

    I refine it in a couple ways: I don't need to call GPT-4 for summarization because the links all have abstracts/excerpts; I usually write abstracts for my own essays/posts (which everyone should do, and if the summaries are good enough to embed, why not just use them yourself for your posts? would also help your cache & cost issues, and be more useful than the 'explanation'). Then I also throw in the table of contents (which is implicitly an abstract), available metadata like tags & authors, and I further throw into the embeddings a list of the parsed links as well as reverse citations/backlinks. My assumption is that these improve the embedding by explicitly listing the URLs/titles of references, and what other pages find a given thing worth linking.

    Parsing the links means I can improve the list of suggestions by deleting anything already linked in the article. OP has so few posts this may not be a problem for him, if you are heavily hyperlinking and also have good embeddings (like I do), this will happen a lot, and it is annoying to a reader to be suggested links he has already seen and either looked at or ignored. This also means that it's easy to provide a curated 'see also' list: simply dump the similar list at the beginning, and keep the ones you like. They will be filtered out of the generated list automatically, so you can present known-good ones upfront and then the similars provide a regularly updated list of more. (Which helps handle the tension he notes between making a static list up front while new links regularly enter the system.)

    One neat thing you can do with a list of hits, that I haven't seen anyone else do, is sort them by distance. The default presentation everyone does is to simply present them in order of distance to the target. This is sorta sensible because you at least see the 'closest' first, but the more links you have, the smaller the difference is, and the more that sorting looks completely arbitrary. What you can do instead is sort them by their distance to each other: if you do that, even in a simple greedy way, you get what is a list which automatically clusters by the internal topics. (Imagine there are two 'clusters' of topics equidistant to the current article; the default distance sort would give you something random-looking like A/B/B/A/B/A/A/A/B/B/A, which is painful to read, but if you sort by distance to each other to minimize the total distance, you'd get something more like B/B/B/B/B/B/A/A/A/A/A/A.) I call this 'sort by magic' or 'sort by semantic similarity': https://gwern.net/design#future-tag-features

    Additional notes: I would not present 'Similarity score: 79% match' because I assume this is just the cosine distance, which is equal for both suggestions (and therefore not helpful) and also is completely embedding dependent and basically arbitrary. (A good heuristic is: would it mean anything to the reader if the number were smaller, larger, or has one less digit? A 'similarity score' of 89%, or 7.9, or 70%, would all mean the same thing to the reader - nothing.)

    > Complex or not, calculating cosine similarity is a lot less work than creating a fully-fledged search algorithm, and the results will be of similar quality. In fact, I'd be willing to bet that the embedding-based search would win a head-to-head comparison most of the time.

    You are probably wrong. The full search algorithm, using exact word count indexes of everything, is highly competitive with embedding search. If you are interested, the baseline you're looking for in research papers on retrieval is 'BM25'.

    > For each post, the script then finds the top two most-similar posts based on the cosine similarity of the embedding vectors.

    Why only top two? It's at the bottom of the page, you're hardly hurting for space.

  • Hyperlink Maximalism (2022)
    2 projects | news.ycombinator.com | 25 Jul 2023
    How to add hyperlinks is something I've thought a bit about for Gwern.net: there's no point having all these fancy popups if there are no hyperlinks exploiting them, right?

    The way I currently do it is that first, I make hyperlinks stable by automatically snapshotting & making local archives of pages (https://gwern.net/archiving#preemptive-local-archiving). There is no point in adding links if linkrot discourages anyone from using them, of course, and I found that manual linkrot fixing did not scale to the amount of writing & hyperlinking I want to do.

    The next step is adding links automatically. Particularly in the STEM topics I write most about these days, AI, there are many acronyms & named systems which mean specific things but it's easy to get lost in. Fortunately, that makes them easy to write automatic link rules for: https://github.com/gwern/gwern.net/blob/master/build/Config/... These run automatically on essay bodies when compiling the site, and on annotations when created. If a URL is already present, its rule doesn't run; and if it's not, only the first instance gets linked and the rest are skipped. (This is important: there are some approaches which take the lazy approach of hyperlinking every instance. This is bad and discredits linking.) This code is very slow but fast enough for static site building, anyway.

    Sometimes terms are too ambiguous or too rare or too much work to write an explicit rewrite rule for. But it will still exist on-site. In fact, you can say that the site corpus defines a set of rewrite rules: everytime I write by hand `[foo](http://bar)`, am I not implicitly saying that there ought to be a rewrite rule for the string `foo` which ought to hyperlink `http://bar`? So there is a script (https://github.com/gwern/gwern.net/blob/master/build/link-su...) which will parse the site corpus, compile all the text/link pairs, create/remove a bunch of them per whitelist/blacklists and a frequency/length threshold, and then generate a bunch of Emacs Lisp pairs. This master list of rewrites then gets read by an Elisp snippet in my Emacs and turned into several thousand interactive search-and-replace commands when I run my generic formatting command on a buffer.

    The effect of this second script is that after I have linked `Foo et al 2023` to `/doc/2023-foo.pdf` a few times (perhaps I went back and hyperlinked all instances of it after realizing it's an important paper), any future instances of 'Foo et al 2023' will pop up a search-and-replace asking to hyperlink it to `/doc/2023-foo.pdf`, and so on.

    Third, I exploit my link-recommendations for manually-curated 'see also' sections appended to annotations. I have a fairly standard link-recommender approach where each annotation is embedded by a neural network (OA API for now), and one does nearest-neighbor lookups to find _n_ 'similar' annotations, and shows the reader them in case any are relevant. So far so good. But I also do that after editing each annotation: embed-recommend-list, and spits out a HTML list of the top 20 or so similar-links appended to the annotation. I can look at that and delete the irrelevant entries, or the entire list. This means that they'll be included in the final embedded version of the annotation, will show up in any fulltext searche I run, are more visible to the reader, can be edited into the main body if I want to, etc.

    Fourth and most lately, I've been experimenting with GPT-4 for auto-formatting & auto-linking (https://github.com/gwern/gwern.net/blob/master/build/paragra...). GPT-4 has memorized many URLs, and where it hasn't, it still makes pretty good guesses. So, as part of the standard formatting passes, I pass annotations through GPT-4, with a bit added to its prompt, 'try to add useful hyperlinks to Wikipedia and other sources'. It often does, and it's quite convenient when that works. GPT-4 still confabulates URLs more often than I link, and sometimes hyperlinks too-obvious WP links and I have to delete them. So, still some adjustments required there.

    And these work well with the other site features like recursive popups, or bidirectional backlinks (https://gwern.net/design#backlink).

  • [Media] Nested browsing the Rust docs
    1 project | /r/rust | 4 Jan 2023
    Gwern's site has a great implementation of that. Hover over any link https://www.gwern.net/Design
  • Ask HN: Good resources for programmers to learn about UX/design?
    2 projects | news.ycombinator.com | 18 Jun 2022
  • 本网站的设计(2021年) (Design of This Website (2021))
    1 project | /r/hnzh | 6 Apr 2022
  • Design of This Website (2021)
    1 project | /r/WhileTrueCode | 6 Apr 2022
  • Hacker News top posts: Apr 6, 2022
    4 projects | /r/hackerdigest | 6 Apr 2022
    Design of This Website\ (134 comments)
  • Design of This Website
    1 project | /r/patient_hackernews | 6 Apr 2022
    1 project | /r/hackernews | 6 Apr 2022
    9 projects | news.ycombinator.com | 5 Apr 2022
    That page is a bit outdated because I am still finetuning the on-site archive system before I do a writeup.

    I still use archiver-bot etc, they're just not how I do the on-site archives. See https://github.com/gwern/gwern.net/blob/master/build/LinkArc... https://github.com/gwern/gwern.net/blob/master/build/linkArc... for that.

    The quick summary is that PDFs are automatically downloaded, hosted locally, and links rewritten to the local PDF; other URLs, after a delay, call the CLI version of https://github.com/gildas-lormeau/SingleFile to run headless Chrome to dump a snapshot, which are manually reviewed by myself & improved as necessary, and then links get rewritten to the snapshot HTML. They get some no-crawl HTTP headers and robots.txt exclusions to try to reduce copyright trouble.

What are some alternatives?

When comparing jupyterlab-lsp and gwern.net you can also consider the following projects:

polynote - A better notebook for Scala (and more)

Tufte CSS - Style your webpage like Edward Tufte’s handouts.

Spyder - Official repository for Spyder - The Scientific Python Development Environment

SingleFile - Web Extension for saving a faithful copy of a complete web page in a single HTML file

ansible-language-server - 🚧 Ansible Language Server codebase is now included in vscode-ansible repository

org-protocol-capture-html - Capture HTML from the browser selection into Emacs as org-mode content

ydata-profiling - 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

commento - A fast, bloat-free comments platform (Github mirror)

julia-snail - An Emacs development environment for Julia

manuel.kiessling.net - The Hugo-based code from which https://manuel.kiessling.net is generated.

jupyter-black - Black formatter for Jupyter Notebook

breckyunits.com - Breck Yunits' Blog