dictomaton
asami
Our great sponsors
dictomaton | asami | |
---|---|---|
2 | 6 | |
129 | 626 | |
- | 0.6% | |
1.8 | 0.0 | |
about 2 years ago | about 2 years ago | |
Java | Clojure | |
Apache License 2.0 | Eclipse Public License 1.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dictomaton
-
Calculate the difference and intersection of any two regexes
Say you want to compute all strings of length 5 that the automaton can generate. Conceptually the nicest way is to create an automaton that matches any five characters and then compute the intersection between that automaton and the regex automaton. Then you can generate all the strings in the intersection automaton. Of course, IRL, you wouldn't actually generate the intersection (you can easily do this on the fly), but you get the idea.
Automata are really a lost art in modern natural language processing. We used to do things like store a large vocabulary in an deterministic acyclic minimized automaton (nice and compact, so-called dictionary automaton). And then to find, say all words within Levenshtein distance 2 of hacker, create a Levenshtein automaton for hacker and then compute (on the fly) the intersection between the Levenshtein automaton and the dictionary automaton. The language of the automaton is then all words within the intersection automaton.
I wrote a Java package a decade ago that implements some of this stuff:
https://github.com/danieldk/dictomaton
-
Ask HN: What are some 'cool' but obscure data structures you know about?
Also related: Levenshtein automata - automata for words that match every word within a given Levenshtein distance. The intersection of a Levenshtein automaton of a word and a DAWG gives you an automaton of all words within the given edit distance.
I haven't done any Java in years, but I made a Java package in 2013 that supports: DAWGs, Levenshtein automata and perfect hash automata:
https://github.com/danieldk/dictomaton
asami
- Ask HN: What are some 'cool' but obscure data structures you know about?
-
Ask HN: Why are relational DBs are the standard instead of graph-based DBs?
Unlike some other commenters, I agree that graph models are usually a better fit for most data than relational models. There's been some interesting work in recent years developing this idea: in the Clojure world there's Datomic, XTDB, and a host of competitors, all of which build on work from Semantic Web/SPARQL/triplestores and logic programming. Some are even intended to be used as primary datastores: they support some amount of schema and constraints, have well-defined consistency and ACID guarantees, etc. This makes them unlike graph databases like Neo4J and others, which fill an architectural role more like Elasticsearch as a read-optimization tool. Here's an interesting talk making a case for triple-based databases.
- Introduction to the Asami Graph Database
-
How to query Datomic, Datascript, Asami, or other graph databases
Despite the documentation that exists, I've heard many people who have been confused about how to query Datomic, Datascript, Asami, or other graph databases. So I've made an attempt at explaining it https://github.com/threatgrid/asami/wiki/Introduction
- Introduction (To Graph Databases)
-
Asami
The first Graph implementation for Asami was a simple in-memory data structure, described in my ClojureD talk. The code for this appears in asami.index. This file started much smaller (as referenced above), but has since expanded with the needs extended functionality, such as transactions, and transitive closure operations.
What are some alternatives?
ann-benchmarks - Benchmarks of approximate nearest neighbor libraries in Python
datascript - Immutable database and Datalog query engine for Clojure, ClojureScript and JS
sdsl-lite - Succinct Data Structure Library 2.0
crux - General purpose bitemporal database for SQL, Datalog & graph queries. Backed by @juxt [Moved to: https://github.com/xtdb/xtdb]
RVS_Generic_Swift_Toolbox - A Collection Of Various Swift Tools, Like Extensions and Utilities
datahike - A durable Datalog implementation adaptable for distribution.
multiversion-concurrency-contro
datalevin - A simple, fast and versatile Datalog database
minisketch - Minisketch: an optimized library for BCH-based set reconciliation
Apache AGE - Graph database optimized for fast analysis and real-time data processing. It is provided as an extension to PostgreSQL. [Moved to: https://github.com/apache/age]
TablaM - The practical relational programing language for data-oriented applications
naga - Datalog based rules engine