Our great sponsors
-
dictpress
A stand-alone web server application for building and publishing full fledged dictionary websites and APIs for any language.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
There are a couple starting points you could take. I spent a weekend hacking out a program that generates fake word/definition pairs with a transformer model set against a dictionary: https://youtu.be/XnJ2TKAn-Vk?t=1547. If you substitute fake words for real words and have a sufficiently accurate model you could quickly generate reasonable and novel definitions.
There are more complete versions of this publicly available: https://github.com/turtlesoupy/this-word-does-not-exist
> This would be amazing, for example, to run on a large corpus, generate the dictionary, and then run it again to find words that are used but not defined - not just in the original corpus but in the definitions too.
I think this would be how you would gauge success of the model. That is to say, you would evaluate model accuracy on a set of held-out words with definitions that never appeared in your dictionary training set but appeared in context in your corpus. You would have to manually annotate whether or not the generated definition of these held out words was acceptable.