Our great sponsors
-
rum
RUM access method - inverted index with additional information in posting lists (by postgrespro)
-
SurveyJS
Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
-
tatoeba2
Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I suggest to have a look at https://github.com/postgrespro/rum if you haven’t yet. It solves the issue of slow ranking in PostgreSQL FTS.
[2] https://github.com/bvaughn/react-virtualized
For the character mappings, it might be useful to have a look at the config for https://tatoeba.org (or rather, the PHP script that generates the config): https://github.com/Tatoeba/tatoeba2/blob/dev/src/Shell/Sphin...
There's one big list of mappings for almost every script under the sun, including Greek. (With mappings like 'U+1F08..U+1F0F->U+1F00..U+1F07' turning U+1F08 Ἀ [CAPITAL ALPHA WITH PSILI] into U+1F00 ἀ [SMALL ALPHA WITH PSILI], and the same for seven other accented alphas. I've considered turning them all into unaccented alpha instead, but I don't know enough about Greek orthography to decide that.) https://github.com/Tatoeba/tatoeba2/blob/3170f7326ad2939c691...
For Latin, there are some special exceptions so that "GAIVS IVLIVS CAESAR" and "Gaius Julius Caesar" are treated the same: https://github.com/Tatoeba/tatoeba2/blob/3170f7326ad2939c691...
It's not beautiful, but it's used in production. People who don't need to support quite as many languages as Tatoeba will probably want a simpler config, but it might still be useful as a reference.
This is really cool. Something like this should exist.
It seems like you could do it more easily, and have faster search responses, with the following steps:
1. Mirror the current gutenberg archive (e.g. rsync -av --del aleph.gutenberg.org::gutenberg gutenberg
2. Install recoll-webui from https://www.lesbonscomptes.com/recoll/pages/recoll-webui-ins... or using docker-recoll-webui: https://github.com/sunde41/recoll
Thanks! I had the exact same problem and eventually it got me to do something about it. It is particularly bad with writers from antiquity or with a lot of popular appeal.
I've begun adding to this repository, it'll come in piece by piece as I clean up the code: https://github.com/cordb/gutensearch
Related posts
- The Secret Weapon of Top Developers: 7 React JS Libraries You Can't Afford to Ignore
- Memoizing table rows in a table that can be filtered
- Faster re-rending of table when only inserts are needed
- 5 Tips for Optimizing ReactJS Performance and Building Lightning-Fast Applications
- Phoenix Dev Blog - Streams