searcharray
www.mechaelephant.com
searcharray | www.mechaelephant.com | |
---|---|---|
4 | 3 | |
162 | 1 | |
- | - | |
9.7 | 8.8 | |
5 days ago | 10 days ago | |
Python | JavaScript | |
Apache License 2.0 | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
searcharray
-
A search engine in 80 lines of Python
This is really cool. I have a pretty fast BM25 search engine in Pandas I've been working on for local testing.
https://github.com/softwaredoug/searcharray
Why Pandas? Because BM25 is one thing, but you also want to combine with other factors (recency, popularity, etc) easily computed in pandas / numpy...
-
Are we at peak vector database?
You might be interested in
https://github.com/softwaredoug/searcharray
- SearchArray turns Pandas string columns into a term index
-
Show HN: SearchArray β Text Search in Pandas
I've long worked with Lucene based search engines like Solr and Elasticsearch. Anytime I need to experiment with relevance ranking in these systems, I'm exhausted by needing to set them up and work with something so disjoint from normal data tooling.
Further - the underlying ranking is buried in needless mystique (you know a boolean should query, sums the scores, right?). You shouldn't need to read a book (like Relevant Search ;) ) to unpack mystique that's really basic math.
Why not just let people build ranking systems with vectorized math in a numpy/pandas stack?
SearchArray lets anyone build a search prototype in Pandas. Typically building / experimenting with a smaller labeled dataset. If it works out, you can transfer it relatively easily to Elasticsearch or Solr for implementation.
SearchArray is a pandas extension array that creates an underlying search index for BM25 term/phrase based searching.
It's not quite done (will it ever be?) but its getting far enough along to be useful. So feedback is very welcome.
https://github.com/softwaredoug/searcharray
www.mechaelephant.com
- Ask HN: Tips to get started on my own server
- A search engine in 80 lines of Python
-
My Second Brain β Zettelkasten
For me, the idea is sound but the implementation always seems so cumbersome. I want something that separates the data from the display as much as possible, has an easy 'note taking' and has an easy install. One problem I always encounter is that if the interface to add notes has too much friction, I stop using it pretty quickly.
Anyway, so I created something over the weekend called 'notenox' [0]. It creates a a JSON file of relevant information, one JSON file per note, with keywords and a "special" keyword prefix called a 'title' that mimics how I've actually been taking notes (email, so the 'title' mimics an email thread). For display, I consolidate all JSON files into a single JSON file and then have it loaded into the browser with some Javascript to group by title or keyword, along with doing all cross referencing and counting on the client end.
Creating notes is done through the command line, because that's a common way I interact with my computer, with different options to create titles, links, keywords, etc. I'm sure there are many different Zettelkasten implementations out there but they always seem so clunky and cumbersome. It's not hard, so the simple use case should be simple, nor should it proprietary or locked behind a SaaS.
You can see my personal notes in action, if you like [1] (sorry, not mobile friendly!).
[0] https://github.com/abetusk/www.mechaelephant.com/tree/releas...
[1] https://mechaelephant.com/notenox
What are some alternatives?
searx - Privacy-respecting metasearch engine [Moved to: https://github.com/searx/searx]
anystyle - Fast citation reference parsing
PaddleNLP - π Easy-to-use and powerful NLP and LLM library with π€ Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including πText Classification, π Neural Search, β Question Answering, βΉοΈ Information Extraction, π Document Intelligence, π Sentiment Analysis etc.