21 Apr 2015

I’ve discovered that my personalized dictionary files are specific to Android 4.4 KitKat, and can only be decoded according to the specs of the source code for that version. Confusingly, there are multiple versions/encodings of binary dictionaries. I’ve determined that the one I took from my phone is a “version 4” which is separated into a series of files.

Words and frequencies are stored as nodes in an array of bytes. Each node contains the address of its child in the trie, the address of its parent, the characters of the word, and the word’s frequency. The difficult work that remains is to write the code that loops through the byte array and parses node data.

I’m working in a sandbox folder which has all the Android source files I need to decode the binary dictionary.