dictomaton
sdsl-lite
Our great sponsors
dictomaton | sdsl-lite | |
---|---|---|
2 | 5 | |
129 | 2,174 | |
- | - | |
1.8 | 0.0 | |
about 2 years ago | 11 months ago | |
Java | C++ | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dictomaton
-
Calculate the difference and intersection of any two regexes
Say you want to compute all strings of length 5 that the automaton can generate. Conceptually the nicest way is to create an automaton that matches any five characters and then compute the intersection between that automaton and the regex automaton. Then you can generate all the strings in the intersection automaton. Of course, IRL, you wouldn't actually generate the intersection (you can easily do this on the fly), but you get the idea.
Automata are really a lost art in modern natural language processing. We used to do things like store a large vocabulary in an deterministic acyclic minimized automaton (nice and compact, so-called dictionary automaton). And then to find, say all words within Levenshtein distance 2 of hacker, create a Levenshtein automaton for hacker and then compute (on the fly) the intersection between the Levenshtein automaton and the dictionary automaton. The language of the automaton is then all words within the intersection automaton.
I wrote a Java package a decade ago that implements some of this stuff:
https://github.com/danieldk/dictomaton
-
Ask HN: What are some 'cool' but obscure data structures you know about?
Also related: Levenshtein automata - automata for words that match every word within a given Levenshtein distance. The intersection of a Levenshtein automaton of a word and a DAWG gives you an automaton of all words within the given edit distance.
I haven't done any Java in years, but I made a Java package in 2013 that supports: DAWGs, Levenshtein automata and perfect hash automata:
https://github.com/danieldk/dictomaton
sdsl-lite
- SDSL – Succinct Data Structure Library
-
Ask HN: What are some 'cool' but obscure data structures you know about?
Succinct Data Structures [0] [1]. It encompass many different underlying data structure types but the overarching idea is that you want small data size while still keeping "big O" run time.
In other words, data structures that effectively reach a 'practical' entropy lower bound while still keeping asymptotic run time.
[0] https://en.wikipedia.org/wiki/Succinct_data_structure
[1] https://github.com/simongog/sdsl-lite
-
SDSL-RS: A Rust interface for the C++ Succinct Data Structure Library.
The book mentioned in another comment is probably the best way to go. But FYI, documentation for some data structures include references. An SDSL-lite example can be found here. And its equivalent in SDSL-RS can be found here.
What are some alternatives?
ann-benchmarks - Benchmarks of approximate nearest neighbor libraries in Python
plurid-data-structures-typescript - Utility Data Structures Implemented in TypeScript
RVS_Generic_Swift_Toolbox - A Collection Of Various Swift Tools, Like Extensions and Utilities
sdsl-lite - Succinct Data Structure Library 3.0
multiversion-concurrency-contro
minisketch - Minisketch: an optimized library for BCH-based set reconciliation
gring - Golang circular linked list with array backend
TablaM - The practical relational programing language for data-oriented applications
pyroscope - Continuous Profiling Platform. Debug performance issues down to a single line of code [Moved to: https://github.com/grafana/pyroscope]
RVS_Generic_Swift_Tool
ctrie-java - Java implementation of a concurrent trie