The dictionary table is held in static memory and its byte address is stored in the word at $08 in the header.
The table begins with a short header:
n list of keyboard input codes entry-length number-of-entries byte ------n bytes----------------- byte 2-byte wordThe keyboard input codes are "word-separators": typically (and under Inform mandatorily) these are the ZSCII codes for full stop, comma and double-quote. Note that a space character (32) should never be a word-separator. The "entry length" is the length of each word's entry in the dictionary table. (It must be at least 4 in Versions 1 to 3, and at least 6 in later Versions.)
2.1
Note that the word-separators table can only contain codes which are defined in ZSCII for both input and output.
In Versions 1 to 3, each word has an entry in the form
encoded text of word bytes of data ------- 4 bytes ------ (entry length-4) bytesThe interpreter ignores the bytes of data (presumably the game's parser will use them). The encoded text contains 6 Z-characters (it is always padded out with Z-character 5's to make up 4 bytes: see S 3). The text may include spaces or other word-separators (though, if so, the interpreter will never match any text to the dictionary word in question: surprisingly, this can be useful and is a trick used in the Inform library).
In Versions 4 and later, the encoded text has 6 bytes and always contains 9 Z-characters.
fred / , / go / fishing
6.2
Each word is then encoded as a Z-machine string in dictionary form, and searched for in the dictionary.
6.3
A "parse table" is then written, recording the number of words, the length and position of each word and the dictionary address of each word which is recognised. For the format, see the read opcode.
It is essential that dictionary entries are in numerical order of the bytes of encrypted text so that interpreters can search the dictionary efficiently (e.g. by a binary-chop algorithm). Because the letters in A0 are in alphabetical order, because the bits are ordered in the right way and because the pad character 5 is less than the values for the letters, the numerical ordering corresponds to normal English alphabetical order for ordinary words. (For instance "an" comes before "anaconda".)
Both Infocom and Inform-compiled games contain words whose initial character is not a letter (for instance, "#record").
Linards Ticmanis reports that some of Infocom's interpreters convert question marks to spaces before lexical analysis. This is not Standard behaviour. (Thus, typing "What is a grue?" into 'Zork I' no longer works: the player must type "What is a grue" instead.)
Section 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10 / 11 / 12 / 13 / 14 / 15 / 16
Appendix A / B / C / D / E / F