A search engine that favors text-heavy sites and punishes modern web design

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • zr

    🌩 offline and serverless stackoverflow/man/etc.. search with low memory footprint

  • Amazing!

    I have only one recommendation that might make the search a bit more relevant, e.g when searching for 'linux locking' or 'kernel locking' kind of things.

    Try to upsort things that match near the top of the content, like the top of the man page vs middle vs bottom.

    One easy way to do it without having to store the positions, is to index the ngrams with max(sqrt,8) of their line number, this will cover first 64 lines, you can also use log() or just decide ad hock, top, middle, bottom of the document, so you can use only 3 values.

    e.g. https://www.kernel.org/doc/html/v5.0/kernel-hacking/locking.... would do unreliable_1 guide_1 locking_1 ... then at line 4 kernel_2 locking_2 ... after line 50 ... then_7 ... and after that everything will be _8.

    then just make the query "kernel locking" to "dismax(kernel_1 OR kernel_2 OR kernel_3...) AND dismax(locking_1 OR locking_2 ...) with some tiebreaker of 0.1 or so, you can also say "i want to upsort things on the same line, or few lines apart" by modifying the query a bit.

    It works really well and costs very little in terms of space, i tried it at https://github.com/jackdoe/zr while searching all of stackoverfow/man pages and etc and was pretty surprised by the result.

    This approach is a bit cheaper than storing the positions because positions are (lets say) 4 bytes per term per doc, while this approach has fixed uppre bound cost of 8*4 per document (assuming 4 byte document ids)

  • proposal-pipeline-operator

    A proposal for adding a useful pipe operator to JavaScript.

  • TIL about https://github.com/tc39/proposal-pipeline-operator, which I am immediately looking forward to playing with once it gains traction Some Time From Nowâ„¢

    (I have no earnest reason to transpile)

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • based.cooking

    A simple culinary website.

  • Wow this is immediately useful

    Already discovered this recipe site: https://based.cooking/

    I love how adding recipes is through pull requests: https://github.com/LukeSmithxyz/based.cooking/pulls

  • Gigablast

    Nov 20 2017 -- A distributed open source search engine and spider/crawler written in C/C++ for Linux on Intel/AMD. From gigablast dot com, which has binaries for download. See the README.md file at the very bottom of this page for instructions.

  • You could look at the source code for Gigablast. https://github.com/gigablast/open-source-search-engine

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts