Reverse-Engineering Apple Dictionary

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

binwalk

29 10,144 0.0 Python

Firmware Analysis Tool

For anyone else needing to tackle something like this, its definitely worth checking out [Binwalk](https://github.com/ReFirmLabs/binwalk). It is meant for extracting firmware but it works decently well on most files-in-files type data formats.

osx-dictionary

1 46 0.0 Objective-C++

CLI for OSX Dictionary.app

Thank you for posting this code on Github! There has been some reverse-engineering done on the language dictionaries bundled with Mac OS, and it's nice to know that the same model is being used on the Apple Watch!
https://josephg.com/blog/reverse-engineering-apple-dictionar...
There's also a command-line tool that can query the dictionary:
https://github.com/takumakei/osx-dictionary

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
MacOSX-SDKs

5 2,485 0.0

A collection of those pesky SDK folders: MacOSX10.1.5.sdk thru MacOSX11.3.sdk

Another approach for this is to explore the format through Apple's tools for building dictionaries – as they still provide a "Dictionary Development Kit" in Xcode's downloadable "Additional Tools" package (which has documentation for the XML format).
It turns out that dictionary bundles are entirely supported by system APIs in CoreServices! The APIs are private, but Apple accidentally shipped a header file with documentation for them in the 10.7 SDK [1].
[1] https://github.com/phracker/MacOSX-SDKs/blob/master/MacOSX10...

icu

15 2,503 9.7 C++

The home of the ICU project source code.

No, the ICU dictionaries are seen at: https://github.com/unicode-org/icu/tree/main/icu4c/source/da...
No idea where the corresponding files are in osx.

declensions

1 5 0.0 HTML

Russian Declension-o-matic - search for declension tables on Wiktionary

> Otherwise I think it lacks structure and can't be harvested automatically easily
Indeed, it depends on the language and your goals - I had a very high success rate plucking out Russian grammatical tables from English Wiktionary with a few hours of scripting the data cleaning (https://github.com/thombles/declensions). I have a theory that you could get better results using an offline archive of the page sources but haven't tried this yet.

dictionary-api

1 13 0.0 JavaScript

I also found this Dictionary API which imports the dictionaries into NodeJs by utilizing a utility called „dedict“.
https://github.com/nikvdp/dictionary-api/blob/master/convert...

apple-peeler

2 25 0.0 Python

Extract XML from the OS X dictionaries.

Thank you I didn't know about this Binwalk.
I used it and was able to figure out the remaining bits of the file format thanks to you and other tips in this thread.
https://github.com/solarmist/apple-peeler

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project