Gigablast
zr
Gigablast | zr | |
---|---|---|
6 | 2 | |
1,518 | 25 | |
- | - | |
3.6 | 0.6 | |
4 months ago | 5 months ago | |
C++ | Go | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Gigablast
-
What is going on with search engines these days?
Gigablast - A search engine whose source code is available on Github. TMK, it doesn't depend on any existing indexes either and uses its own crawler. Would be what I'd recommend to everyone but they recently collaborated with a company that has a questionable past to create a privacy focused search engine. A company that was behind the hostile takeover of Freenode IRC.
-
EU Open Web Search Project Started
Any new open-source search option is good, but I also wish more attention was given to prior open projects like GigaBlast[0]/KBlast[1] crawlers, etc.
It hasn't escaped the wider world that quality open-source search is desirable, and it's hard to think what this new EU project brings to the table that isn't already available if others want to contribute to existing efforts. I wish the EU project the best of luck of course!
[0] https://github.com/gigablast/open-source-search-engine
[1] https://github.com/fossabot/kblast
- Gigablast – open-source-search-engine
-
What are your favorite open-source search engines?
Gigablast Source Apache-2.0
-
A search engine that favors text-heavy sites and punishes modern web design
You could look at the source code for Gigablast. https://github.com/gigablast/open-source-search-engine
zr
- ZR – offline and serverless stackoverflow/man/etc. low memory search
-
A search engine that favors text-heavy sites and punishes modern web design
Amazing!
I have only one recommendation that might make the search a bit more relevant, e.g when searching for 'linux locking' or 'kernel locking' kind of things.
Try to upsort things that match near the top of the content, like the top of the man page vs middle vs bottom.
One easy way to do it without having to store the positions, is to index the ngrams with max(sqrt,8) of their line number, this will cover first 64 lines, you can also use log() or just decide ad hock, top, middle, bottom of the document, so you can use only 3 values.
e.g. https://www.kernel.org/doc/html/v5.0/kernel-hacking/locking.... would do unreliable_1 guide_1 locking_1 ... then at line 4 kernel_2 locking_2 ... after line 50 ... then_7 ... and after that everything will be _8.
then just make the query "kernel locking" to "dismax(kernel_1 OR kernel_2 OR kernel_3...) AND dismax(locking_1 OR locking_2 ...) with some tiebreaker of 0.1 or so, you can also say "i want to upsort things on the same line, or few lines apart" by modifying the query a bit.
It works really well and costs very little in terms of space, i tried it at https://github.com/jackdoe/zr while searching all of stackoverfow/man pages and etc and was pretty surprised by the result.
This approach is a bit cheaper than storing the positions because positions are (lets say) 4 bytes per term per doc, while this approach has fixed uppre bound cost of 8*4 per document (assuming 4 byte document ids)
What are some alternatives?
Searx - Privacy-respecting metasearch engine
based.cooking - A simple culinary website.
Yacy - Distributed Peer-to-Peer Web Search Engine and Intranet Search Appliance
MeiliSearch - A lightning-fast search API that fits effortlessly into your apps, websites, and workflow
sist2 - Lightning-fast file system indexer and search tool
Seeks - Seeks is a decentralized p2p websearch and collaborative tool.
multiSearchHome - :mag_right: Local standalone html homepage to search in 175 search engine (duckduckgo, youtube, twitter, wikipedia, etc..) // FR___: Page d'accueil html autonome, pour chercher dans 175 moteurs de recherche.
proposal-pipeline-operator - A proposal for adding a useful pipe operator to JavaScript.
Typesense - Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences
Ambar - :mag: Ambar: Document Search Engine
whoogle-search - A self-hosted, ad-free, privacy-respecting metasearch engine
kblast - A web search engine forked from Gigablast