Elasticsearch
whoosh
Our great sponsors
Elasticsearch | whoosh | |
---|---|---|
91 | 5 | |
67,531 | 524 | |
1.1% | - | |
10.0 | 0.0 | |
4 days ago | 4 months ago | |
Java | Python | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Elasticsearch
-
Elasticsearch Version 9
You could check out their GitHub and see what is going on https://github.com/elastic/elasticsearch/issues
- One .gitignore to rule them all
-
Who's hiring developer advocates? (October 2023)
Link to GitHub -->
-
Do we think about vector dbs wrong?
I believe the 1024 limit has been upped in recent versions of Elasticsearch
https://github.com/elastic/elasticsearch/issues/92458
-
Elasticsearch VS openobserve - a user suggested alternative
2 projects | 30 Aug 2023
- A dedicated Elasticsearch query language (ES|QL)
- Fleet datastreams: custom index templates
-
Integrating Elasticsearch with Node.js Applications
Elasticsearch is written in Java and its source code is available on Github.
-
Murmur3 hash plugin for nested objects?
I don't think the murmur3 hash implementation has changed since it was added as the default in version 2.0 (see the [changes](https://github.com/elastic/elasticsearch/commits/main/server/src/main/java/org/elasticsearch/cluster/routing/Murmur3HashFunction.java)). The plugin itself has seen [more changes](https://github.com/elastic/elasticsearch/commits/main/plugins/mapper-murmur3) but that's IMO because of internals and not visible changes in the calculations.
-
Mongo or Mysql for 10tb of JSON documents, I'm questioning my previous choice.
Mysql is not as open source as postgres (long story). And you can see how open elasticsearch is by just having access to the bugs database https://github.com/elastic/elasticsearch/issue
whoosh
-
Milli-py: Python bindings for Milli, an embeddable high-performance search engine
The only other embeddable search engine I'm aware off, Whoosh, is brilliant but building the index was quite slow, and search performance degraded quite a lot as number of documents increase (performance is strictly a non-goal). Meilisearch was comparatively faster, I didn't like managing a server to get "just search" in my scripts and applications. However, their underlying engine Milli solves both issues I had, and all that was needed creating bindings for it.
-
Meilisearch v1.0 – the open-source Rust alternative to Algolia and Elasticsearch
Is it really "just a single statically linked binary"?
I'd love to use Meilisearch as you describe, but their so-called SDKs are just about for the search client, you still need the HTTP server listening on localhost.
I would love to see something like SQLite based off Meilisearch (i.e. a fully selfcontained library like https://github.com/mchaput/whoosh). Do you know if such a thing exists?
-
Faster Full Text Search
For our full text search, we used whoosh, which works pretty well for moderately big amount of data.
-
We upgraded an old, 3PB large, Elasticsearch cluster without downtime
Nearly a decade ago (oh god) I converted some overdesigned five node ES mess to https://github.com/mchaput/whoosh. It's (obviously) not the fastest or anything, but it was more than good enough for low-dozens of GBs of mostly static data.
-
Starting a KF Discord Bot
Your best bet is to start using a proper search library rather than the simple loop with 'in' checks that you have now. A search lib will handle things like Unicode/ASCII similarities, removal of stop words, stemming, TF-IDF (and other) weighting, etc. and will be massively faster as well. Quite a few pages come up if you Google "python search engine", also Whoosh looks promising.
What are some alternatives?
OpenSearch - 🔎 Open source distributed and RESTful search engine.
Search Engine Parser - Lightweight package to query popular search engines and scrape for result titles, links and descriptions
Apache Superset - Apache Superset is a Data Visualization and Data Exploration Platform [Moved to: https://github.com/apache/superset]
pysolr - Pysolr — Python Solr client
bleve - A modern text/numeric/geo-spatial/vector indexing library for go
elasticsearch-dsl-py - High level Python client for Elasticsearch
pgvector - Open-source vector similarity search for Postgres
query-builder - sql query builder library for crystal-lang
Whoosh
query.cr - Query abstraction for Crystal Language. Used by active_record.cr library.
MeiliSearch - A lightning-fast search API that fits effortlessly into your apps, websites, and workflow
lunr.js - A bit like Solr, but much smaller and not as bright