8   Assembly of code, text and data structures
        ------------------------------------------

    8.1   Assembly of initial code
          ------------------------

The front end of "asm.c" consists of three routines to assemble routine
start and finish, and to place labels; plus many routines for assembling
individual instructions.  These all fill out data in the
assembly_instruction AI and then call a single assemble-instruction routine.
Thus, the back end of "asm.c" contains four operations to place data into
the Z-machine's code area:

    Routine start           assemble_routine_header()
    Routine end             assemble_routine_end()
    Label                   assemble_label_no()
    Instruction             assemble_instruction()

For the format of Z-code instructions, see the Z-Machine Standard Document,
which goes into the Z-machine's format with exhaustive thoroughness.  Note
that Inform 6 is much more compliant to the standard (so far as I know,
perfectly compliant) than earlier releases: for example, the Flags 2 bits
for "UNDO is needed" and so on are set correctly for the first time.

One of the few pieces of information fed back to Inform's higher levels by
"asm.c" is a flag indicating whether execution can possibly reach the next
instruction or not.  The assembly of certain instructions (returning from
a routine, terminating the Z-machine, unconditionally jumping) sets the
flag

    execution_never_reaches_here

while the placing of a label removes it again (because by branching to that
label it may be possible to reach this point after all).  This is used to
issue dead code warnings (and also to remove some unnecessary branches
when code-generating "if"..."else"...).

Assembling a routine header is straightforward: note that an option exists
here to assemble some code along with it to print out tracing information.
For this reason, and also to print out assembly trace at compile time,
"asm.c" accesses the variable names in the symbols table.

All labels are referred to in Inform's higher layers by number, rather than
by address; only "asm.c" has any access to what their addresses are.  Labels
are numbered from 0 upwards, but the assembler also recognises

    -2    branch only to the next instruction (this converts a branch
          instruction such as @get_child to a non-branch one, since it
          causes the instruction to behave identically whether the branch
          is made or not)
    -3    return false rather than branch
    -4    return true rather than branch

The code -1, used higher up for "no branch intended", should never descend
this far.

Note that the Z-machine has no formal concept of "routine end".  Therefore,
assemble_routine_end() only generates code to cause a return (and only then
if the present instruction position can be reached).  By long-standing
(if unfortunate) Inform convention, the value returned in this case will
be 0 for a routine in an object definition, or 1 for any other routine.

The routine-start/end routines also keep track of the usage of local
variables and labels, and issue warnings about those which were declared
but not used.  Since the scope of these symbols is restricted to the routine
in which they are defined, label names are unassigned (i.e. reset to
UNKNOWN_SFLAG CONSTANT_T's).  Local variable names are not held in the
symbols table and are automatically cleared when the next routine starts.
Finally, we can now detect the use of a label name not defined in the routine,
so we issue an error message if this has happened.


What happens in this first phase of code assembly is that a form of Z-code
called "initial code" is stored in a temporary holding area of memory until
a routine has been completed.  It is then treated by the branch optimiser
(see the next section).

"Initial code" is the same as Z-code, except that:

    (a)  all branches to labels have long form, and instead of an offset
         the branch value is the label number within this routine
         (0, 1, 2, ...);
    (b)  LABEL operands (to "jump" instructions) are, similarly, label
         numbers of the label to branch to;
    (c)  parallel to the holding area is a uchar[] buffer of the same size,
         holding a "marker value" for each byte.

(Note that branches to -2, -3 and -4 are assembled as short-form branches
and not marked as BRANCH_MV.)  The marker values are as follows:

    BRANCH_MV        This byte is the first of two in a branch operand
    LABEL_MV         Ditto but in a LABEL operand
    DELETED_MV       (Inserted by the optimiser, rather than at initial
                     code assembly.)  This byte has been deleted and should
                     be skipped over when the code is transferred to
                     long-term storage.
    a "module value" marker value
                     This byte is the first of two in a LONG_CONSTANT_OT
                     operand which should be marked with this value in
                     module compilation
    ditto + 128      Ditto but a one-byte SHORT_CONSTANT_OT or VARIABLE_OT
                     operand


    8.2   Branch offset backpatching and optimisation
          -------------------------------------------

There are three phases of this process:

    (i)   Deciding whether each branch should or should not be optimised
          to the 1-byte short form, using the label positions in the
          initial code;
    (ii)  Recalculating the label positions to take account of this
          (which will tend to move the labels closer together as second
          bytes of branch operands are deleted);
    (iii) Transferring the initial code to long-term storage, replacing
          branch and LABEL operand values with address offsets and
          issuing module markers as indicated (if in module mode).

A branch can be "short form" if it is a forward branch of between 0 and 29
bytes from the PC position of the next instruction.          

To record a branch as "short form" during (i), the second byte of the branch
operand is marked as DELETED_MV.

The long-term storage alluded to is either temporary file 2 or else the
zcode_area memory block.  Note that the routines moved there are still not
finally correct, but are "raw Z-code": Z-code which has incorrect operands
because value backpatching has not yet occurred.


    8.3   Text translation: ISO and Unicode resolution
          --------------------------------------------

The algorithm encoding ZSCII text to a compressed sequence of 5-bit
"Z-characters" has been documented many times: see the Z-machine standards
document.  

The section of Inform responsible, "text.c", keeps text for three Z-machine
areas: the static strings area, the dictionary and the "low strings pool".
(In addition, as requested by "asm.c", it encodes text into the Z-code area
as part of the assembly instructions @print and @print_ret.)

Static strings are written to a temporary holding area of memory, manipulated
until correct and then transferred to long-term storage: either temporary
file 1 or the static_strings_area memory block.

The low strings pool holds a table of strings in low Z-machine memory used
for abbreviations.  Such strings need to be accessible in an unusual way
(they are addressed by halved byte addresses, not packed addresses) owing
to an anomaly in the Z-machine architecture, and therefore must reside in
the bottom 64K of memory: they cannot be written to the static strings area.


Source text arrives at the text translation routines still encoded in an
ISO extension to the ASCII character set (depending on the current value
of the -C switch).  A character value of $ca might mean Latin E-circumflex,
E-ogonek, Cyrillic capital hard-sign, Arabic Teh or Greek capital Kappa
according to this context.  It would be illegal in plain ASCII or ISO
Hebrew, but since the Inform lexer has already filtered out any illegal
ISO characters, this possibility no longer arises.

Furthermore, characters can be specified by string escapes beginning with
the @ character.  The sequence @^E would specify Latin E-circumflex, and
the sequence @{39A} would specify Greek capital Kappa (by its Unicode
value, $039A) regardless of the current -C setting.

The work of sorting out character codes is handled by "chars.c".  It may
be useful to summarise the different meanings which character codes can
hold within Inform:

  (1)  "Source".  Raw source files can contain any character values
       between $00 and $ff.  The lexer filters these characters to reduce
       them to:

  (2)  "ISO".  An unsigned 8-bit character code within which the values
       $20 to $7e always have their usual ASCII meanings, and $7f to $9f
       are always undefined, and some values in the range $a0 to $ff
       may have meanings as specified in ISO 8859-1 to -9 if the current
       Inform -C setting is 1 to 9.  If -C is set to 0, "ISO" means the
       same as plain ASCII.

  (3)  "Unicode".  Unicode, or ISO 10646-1, is an unsigned 16-bit
       character code which includes essentially every human script.
       Inform uses it as an intermediate state between the diverse
       ISO possibilities and the final ZSCII result.

  (4)  "Text".  A string of ISO characters to specify a sequence of one
       or more Unicode characters.  Usually, one ISO character is
       translated to one Unicode character: the exception is for string
       escapes.  For instance, the text "a@{39a}c" specifies a sequence
       of 3 Unicode characters.

  (5)  "ZSCII".  An unsigned 10-bit character code used within the
       Z-machine.  See the Standards document for its definition: note
       that between $20 and $7e, it agrees with ASCII.

  
There are several translation routines in "chars.c" to move between these
forms: for instance, "zscii_to_text" is used when printing out the
contents of the dictionary into a transcript file.  (Transcript files
use the same ISO setting as the original source code.)  However, the
main sequence of translation is:

       Source -------> Text -------> Unicode -------> ZSCII
              lexer         ISO-to-          Unicode-to-
              filtration    Unicode;         ZSCII
                            string escape
                            parsing

Note that the first two stages (lexer filtration and ISO to Unicode)
depend only on the current -C setting, and are always possible.  The
third stage (Unicode to ZSCII) is more complex because it is programmable
and may fail, depending on what the text is needed for.

Unicode already specifies over 30000 different characters and
compromise is inevitable in trying to squash this into the 1024 possible
ZSCII values.  The Z-machine's stock of "extra characters"
(see Standard Document 1.0) is configured by the story file to
correspond to a block of up to 97 Unicode characters, and these
are the only ones usable in Inform strings, dictionary words or
as quoted values.

Inform sets up this block to a sensible default set given the
current ISO setting.  For example, if Inform reads in source code under
-C6 (ISO Arabic) then it will choose all the Arabic letters from
ISO Arabic.  (This default setting can be overridden using the
"Zcharacter" directive.)  Note that if the designer wants, say,
Unicode $274b ("heavy eight teardrop-spoked propeller asterisk")
as well as plain old $0621 (Arabic letter Hamza), this can simply
be added to the stock accumulated so far.

The stock of extra characters is defined by a list of Unicode values
written out as the "Unicode translation table" in "tables.c".


    8.4   Dictionary management
          ---------------------

As dictionary words are found in the source code, for instance in the value
of "name" properties, or in grammar, or in constants like 'this', they are
added to the dictionary by a routine called dictionary_add(), which numbers
each different word in "accession order": word 0 is the first which was
entered, and so on.

References to dictionary words are actually stored in the Z-machine as
byte addresses within the dictionary table, which cannot be known until after
the compilation pass (when the full alphabetical ordering is known).  Such
references are therefore always marked (with DICT_MV) for backpatching.

During compilation a dictionary table similar to the Z-machine format is
kept: there is a 7-byte header (left blank here to be filled in at the
construct_storyfile() stage in "tables.c") and then a sequence of records,
one per word, in the form

        <Z-coded text>    <dict_par1>  <dict_par2>  <dict_par3>
        4 or 6 bytes       byte         byte         byte

The difference between this and the final Z-machine dictionary table is
that the records occur in accession order, not alphabetical order.  (The
construct_storyfile() routine in "tables.c" rearranges them.)

The Z-machine does not specify what use is made of the three bytes of
"dictionary parameters" (it doesn't even specify that there have to be only
three): for this, see the next section.


The type "dict_word" is used for a structure containing the (4 or) 6 bytes of
Z-coded text of a word.  Usefully, because the PAD character 5 is < all
alphabetic characters, alphabetic order corresponds to numeric order of
Z-encoding (regarding the Z-encoded bytes as a 32 or 48 bit big-end-first
number; sign is unimportant as the initial bit is never set).  For this
reason, the dict_word is called the "sort code" of the original text word.

A special text-translation routine is used to "prepare" such a sort code:
as dictionary words do not use every feature possible in ordinary text
translation (abbreviations, for instance, or two of the string escape
forms) it's worth having a separate routine, optimised for speed.  Note
that this routine makes use of "text_to_unicode" and "unicode_to_zscii"
in "chars.c" when, but only when, it hits a string escape character @.
(For ordinary ISO characters, when no string escape is hit, these routines
would be too slow.)


Since dictionaries typically contain 500 to 2000 words, speed is extremely
important when searching and sorting: a simple O(n^2) algorithm to search the
dictionary, for example, greatly increases Inform's total compilation time on
medium-size games.  (Here "n" is the number of dictionary words.)

Inform 1 to 3 used a crude O(n^2) mass comparison algorithm, which was
unacceptably slow.  Inform 4 to 6.05 used a form of hash table, with the
first character of a word as hash function (there were 27 hash values, for
A to Z plus "other"): this behaved very acceptably most of the time, but was
ultimately still O(n^2) and the uneven distribution of initial letters in
English meant that the hashing function was not ideal.

Inform 6.1 instead stores the dictionary as a "red-black tree".  (See, e.g.,
Robert Sedgewick, "Algorithms" (1983, second ed. 1988).)  The tree is binary
and always has the property that at node X, anything hanging from the left
branch of X is alphabetically before what's at X, and anything hanging from
the right is alphabetically after.  For instance, a legal configuration is:

                             fieldmouse
                               /    \
                          abacus   patrician
                          /   \
                         a    constantinople

This is simple enough to build and to search through.  The problem is that
one usually ends up with a haphazardly arranged or "unbalanced" tree in which
some branches are very much longer than others, so that searching times can be
unpredictable, and in worst case the construction process is still O(n^2).

Thus one wants a method of building the tree which attempts to restore the
balance as it does so.  The algorithm here involves "colouring" each branch
either red or black (that is, storing one bit of information for each branch).
Roughly speaking, suppose X is added as a branch of Y.  Then the branch
leading down from Y to X -- the new branch -- is coloured black.  But the
branch leading down to Y -- which was already there -- is recoloured red.

We do not permit the tree to contain two consecutive red branches: whenever
we notice this happening, we perform a "rotation".  A typical rotation is:

                  a                        a
                   \                        \
                    bag                      cat
                   / \(red)    --->     (red)/ \(red)
                 baa cat                  bag   dog
                     / \(red)            /  \
                   bun   dog           baa  bun

It's like lifting the "bag" subtree and re-hanging it from the word "cat".
(There is another case, when the red branches slope in opposite directions.)

We also try to percolate redness upwards (redness being the possibility for
change in the tree structure).  So, when searching, if we ever find

                 \(black)
                   node
             (red)/    \(red)

then we replace it with

                 \(red)
                   node
           (black)/    \(black)

and check the "rotate if you get two consecutive red branches" rule again.

It is known that:
(i)  if there are N words in the tree, then any search takes less than
     2 log-to-base-2(N) + 2 comparisons (i.e., this is the maximum depth
     of the tree);
(ii) at worst you need to perform only one rotation for every four
     comparisons.
In practice, the observed values are even better: the average seems to
be log-to-base-2(N) comparisons and 1 rotation (though this is unproven).
Trees are almost optimally balanced most of the time.

For an Inform game with a dictionary of 2000 words, then, we expect to perform
about 12 comparisons per search.  This compares with a typical figure of 75
using the Inform 6.05 hashing algorithm.  Comparisons are not terribly
expensive, but searches are quite frequent.


    8.5   Format of tables unspecified by the Z-machine
          ---------------------------------------------

The Z-machine standards document requires many tables to have particular
formats and locations: but Inform makes several other tables, and this
section is to document their formats.  (See the construct_storyfile()
routine in "tables.c" for exactly how the byte-addressable Z-machine memory
is put together.)

Details of differences in modules as compared with story files are given
later.

    (i)  Header

Following standard conventions, Inform: leaves bits 2 and 3 of "Flags 1"
clear; places the release number in the word at bytes 2 and 3; the grammar
table address at bytes 14 and 15; and the serial number in bytes 18 to 23. 
The default release number is 1 and the default serial number is either
970000 (if no access to the current date is available) or the compilation
date in the form YYMMDD.

In addition, Inform writes its version number in ASCII form into bytes
60 to 63 (the last four bytes of the header) in the form "6.01".  This
makes story files produced by Inform 6 distinguishable from all other
story files, something interpreter writers have long wanted.

    (ii)  Low strings pool and abbreviations table
    
By default, the 96 entries are all "   " (this Z-encodes to the $80 $00,
which makes it a convenient default value).  The first two banks of 32
are used for strings declared by Abbreviate; the third is set at run time
by the "string" statement, to values which have previously been declared
by Low_string.

The pool of low strings is always placed at $0040, immediately after the
header and before the abbreviations table.

    (iii)  Header extension table

The Z-machine standard allows games not to provide a header extension
table (formally called the "mouse data table" after the only information
which Infocom ever stored in it) and Inform 6.01 to 6.11 never do.
Inform 6.12 and later always generate this extension, which always
contains at least 3 word entries.

    (iv)  Character set table

This is a table of 78 = 3*26 bytes, giving the Z-character numbers of
the characters to appear in the three translation alphabets.  The table
is only written if the "Zcharacter" directive has been used to alter
the standard settings.  (See section 12.2.)

    (v)  Unicode translation table

This is generated (by Inform 6.12 and later) only when needed, i.e., only
when a game has asked to use a stock of "extended characters" which
differs from the usual ISO Latin1 set.  Games compiled under the ISO
setting -C0, -C1 (the default setting) and -C9 will usually never
have a Unicode translation table, though they can have.

    (vi)  Property default values table

Note that Inform defines properties 1, 2 and 3 automatically: 1 is "name",
while 2 and 3 are used to support the veneer routines' implementation of
object orientation.

    (vii)  Class-number-to-object-number table
    (viii)  Individual property values table

See the notes on object orientation; these tables are used by the veneer
and are unspecified by the Z-machine.

    (ix)  Grammar table
    (x)  Actions table
    (xi)  "Preactions" table
    (xii)   "Adjectives" table

See the next section.  The Z-machine specifies none of these tables.

    (xiii)   Dictionary table

The format of this table is largely specified by the Z-machine.  Although
the Z-machine allows some flexibility in setting up the dictionary, Inform
sticks to the usual Infocom configuration:

    the separating characters are full stop, comma and space;
    the word-entry length is 7 (in version 3) or 9 (otherwise).

Thus the dictionary begins with seven header bytes:

    3  '.'  ','  ' '  7-or-9  <word containing the number of entries>

Each word is encoded with an entry giving 4 (in version 3) or 6 (otherwise)
bytes of textual representation, fully specified by the Z-machine, and
then three bytes of data, which Inform can do with as it pleases.

These are the so-called "dictionary parameters" dict_par1, 2 and 3.  Inform's
pleasure is to write into them as follows:

    dict_par1: flags
    dict_par2: verb number (counting downwards from 255)
    dict_par3: preposition number (counting downwards from 255) in
               grammar version 1; not used in grammar version 2

The flags are given as follows:

    bit:    7      6   5   4   3     2        1      0
            <noun>             <adj> <plural> <meta> <verb>

The bits <verb>, <noun> and <adj> are set if the word can be used in the
context of a verb, noun and/or preposition (all three can be simultaneously
set).  The <meta> bit indicates that the English verb is "meta", that is,
is a command to the program and not a request in the game.

The <plural> bit is set using the '...//p' notation, like so:

    'egg'  'eggs//p'

Library 6/3's parser uses this to automatically detect commands referring
to multiple game objects.

    (xiv)  Module map

This is an extension to the header, only used in module files (see later).

    (xv)  Linking data table

A table placed after the end of static strings, but only in module files
(see later).

To summarise, then, an Inform game has the following memory map:

      -----------------------------------------------------------
      Dynamic      Header
       memory      Low strings pool
                   Abbreviations table
                   Header extension                   
                   Character set table (if differs from normal)
                   Unicode table (if differs from normal)
                   Common property default values                  (*)
                   Object tree
                   Common property value tables                    (*)
                   Class number to object number conversion table
                   Property names table                            (+)
                   Attribute names table                           (+)
                   Array names table                               (+)
                   Individual property value tables                (*)
                   Global variable values, part of...              (*)
                   ...Dynamic array space                          (*)
      -----------------------------------------------------------
      Static       Table of grammar table addresses
      byte         The grammar tables (one for each Inform verb)   (-)
      addressed    Table of action routine addresses               (+)
      memory       Table of parsing routine addresses              (+)
                   Table of "adjectives", i.e., prepositions       (+)
                   Dictionary
                   (Module map)
                   Synchronising gap of up to 7 null bytes
      -----------------------------------------------------------
      Static       Z-code area                                     (*)
      "virtual"    Static strings area
      memory       (Link data table)
      -----------------------------------------------------------
      (*)  This area requires full value backpatching
      (+)  This area requires a simple automatic correction
           (e.g. adding on the static strings or code offsets),
           which is done without the full backpatching machinery
      (-)  In grammar version 1, this area requires no backpatching,
           but in GV2 each grammar token's data is checked to see
           if a dictionary word or routine address, and backpatched
           if so: so in GV2 it comes under category (+)


    8.6   Grammar version numbers GV1 and GV2
          -----------------------------------

The "grammar tables", used by the parser in an adventure game, are not
specified by the Z-machine at all (contrary to popular opinion) and must
therefore be fully specified here.  There are four such tables:

    Grammar table
    Actions table
    "Preactions" table
    "Adjectives" table

Inform can generate two different formats for these tables, known as
grammar version 1 (henceforth GV1) and grammar version 2 (GV2).

Inform makes the GV1 or GV2 decision according to the value of the symbol

    Grammar__Version

which Inform usually defines as 1, but allows programs to redefine.  Library
6/3 and later, in particular, contain the line

    Constant Grammar__Version = 2;

Note that it is essential for the library's parser to understand the
grammar tables being generated by Inform, so attempting to use GV2 with
library 6/2 or earlier would fail.


The designs of GV1 and GV2 have some points in common.  The Grammar table
is a --> array containing addresses of grammars, one for each Inform verb.
The Actions table is a --> array containing addresses of action routines
(such as "TakeSub"), one for each action.  (Fake actions have no action
routines and aren't in the table.)

(The term "Inform verb" means essentially a common parsing grammar, such as
that shared by "take" and "hold", and is used to distinguish this from an
"English verb", such as the word "take".)


GV1 is at heart an imitation of middle-period Infocom table formats, and
was used in order that common debugging tools would still work on Inform
games (an important consideration in the early days of debugging Inform 1).
In GV1, an individual grammar table has the format:

    <number of grammar lines>          1 byte

followed by that many 8-byte lines in the form:

    <parameter count>  <token1> ... <token6>  <action number>
    -- byte ---------  --byte--     --byte--  -- byte -------

The parameter count is the number of tokens which are not prepositions.
This is needed because the run of tokens is terminated by null bytes (00),
which is ambiguous: "noun" tokens are also encoded as 00.

The numerical token values are as follows:

    "noun"            0  
    "held"            1  
    "multi"           2  
    "multiheld"       3  
    "multiexcept"     4  
    "multiinside"     5  
    "creature"        6  
    "special"         7  
    "number"          8
    noun=Routine      16 + parsing-routine-number
    Routine           48 + parsing-routine-number
    scope=Routine     80 + parsing-routine-number
    attribute         128 + attribute number
    'preposition'     adjective number

    illegal values:   9-15, 112-127

This one-byte value has to identify particular prepositions and routines,
which is only possible using a numbering system for each.  GV1 numbers
parsing-routines upwards from 0 to 31, in order of first use.  A separate
table translates these into routine packed addresses: the "preactions"
table.  (As usual, the term is traditional: Inform has no concept of
preaction, but the Infocom games from which it inherits GV1 do have such a
concept.)  The preactions table is a simple --> array.

(Note that in Infocom's games the preactions table always has the same
length as the actions table: this is not true in either GV1 or GV2 Inform
games.)

Prepositions are also identified by their "adjective number".  (An early
misunderstanding in Z-machine decipherment led to the traditional use of
the word "adjective" for dictionary words explicitly written into grammar
lines, which are mainly prepositions like 'into' or 'against'.)  Adjective
numbers count downwards from 255 in order of first use.  They are translated
back into dictionary words using the "adjectives table".

The adjectives table contains 4-byte entries:

    <dictionary address of word>  00  <adjective number>
    ----2 bytes-----------------  ----2 bytes----------- 

To make life more interesting, these entries are stored in reverse order
(i.e., lowest adjective number first).  The address of this table is
rather difficult to deduce from the file header information, so the constant
#adjectives_table is set up by Inform to refer to it.


GV2 is a much more suitable data structure, easier to read and write,
less limiting and marginally faster and more economical of Z-machine and
Inform memory.  In GV2 an individual grammar table has the format:

    <number of grammar lines>          1 byte

followed by that many grammar lines.  Individual lines are no longer always
8 bytes long, as in GV1.  Instead they have the form:

    <action number>  <token 1> ... <token N>  <ENDIT>
    ----2 bytes----  -3 bytes-     -3 bytes-  -byte--

The action number is actually contained in the bottom 10 bits of the word
given first: the top five are reserved for future use, which leaves

    action_number & $400

as a bit meaning "reverse the order of the first and second parameters
if this action is to be chosen".

The ENDIT marker is the number 15.  There can be anything from 0 to 30
tokens, and each occupies three bytes, arranged as:

    <token type>   <token data>
    -- byte ----   --2 bytes---

Token type bytes are divided into the top two bits, the next two and the
bottom four.

The "next two bits" are used to indicate alternatives.  In a sequence of
tokens

    T1 / T2 / T3 / ... / Tn

then T1 will have $$10 in its "next two bits", and each of T2 to Tn will
have $$01.  Tokens not inside lists of alternatives always have $00.  (Note
that at present only prepositions are allowed as alternatives, but the
format is designed to open the possibility of extending this to all tokens.)

The bottom four are the "type" of the token.  The top two indicate what kind
of data is contained in the token data.  Strictly speaking this could be
deduced from the bottom six bits, but it's convenient for making backpatching
GV2 tables a simple matter within the compiler.

    Type  Means                       Data contains              Top bits
    0     illegal (never compiled)
    1     elementary token            0   "noun"                 00
                                      1   "held"
                                      2   "multi"
                                      3   "multiheld"
                                      4   "multiexcept"
                                      5   "multiinside"
                                      6   "creature"
                                      7   "special"
                                      8   "number"
                                      9   "topic"
    2     'preposition'               dictionary address         01
    3     noun = Routine              routine packed address     10
    4     attribute                   attribute number           00
    5     scope = Routine             routine packed address     10
    6     Routine                     routine packed address     10

GV2 makes no use of "adjective numbers" (so that dict_par3 is always zero
in GV2's dictionary words) and leaves both the adjectives table and the
preactions table empty.

There is one further difference between GV1 and GV2: in GV1, fake actions
are numbered from 256 upwards; in GV2, from 4096 upwards.


Note that although GV2 takes three bytes for a token as compared with GV1's
one byte, omission of the redundant null tokens and adjective table means
that when compiling a small "shell" library game, GV2 actually produces
more economical tables: 1920 bytes as opposed to 2337.

The first two entries in the following table are the real reason for GV2:

                                        Limit in GV1        Limit in GV2

    Prepositions per game               76                  unlimited
    Parsing routines (general ones,
       noun= filters, scope= routines
       all put together) per game       32                  unlimited
    Tokens per grammar line             6                   unlimited
    Actions per game                    256                 4096
    Inform verbs per game               256                 256

In practice the Inform compiler restrains the number of verbs (but that's
an adjustable memory setting) and lines per verb: in Inform 6.05 and earlier,
the maximum number of lines per verb is 20.  Inform 6.10 internally stores
grammar roughly as GV2 even when it's going to compile GV1 at the end, and
this allows a more economical use of Inform's memory: as a bonus, then, the
maximum number of lines per verb is raised to 32 in Inform 6.10.


    8.7   Value backpatching
          ------------------

Value backpatching is the final translation phase in Inform.  It is the
process of correcting temporary values which were written into the Z-machine
at a time when final values could not be known.

In addition to the Z-machine regions marked in the memory map above, the
symbols table also undergoes value backpatching: defined Constant symbols
flagged as CHANGE_SFLAG are backpatched as needed, before first use of their
values in backpatching something else.

The positions of such values have been "marked" with a "marker value"
indicating the type of value needed.  The backpatchable marker values are:

    Marker value       Significance of temporary value

    STRING_MV          Scaled address within static strings area
    ARRAY_MV           Byte address within dynamic array area
    IROUTINE_MV        Scaled address within Z-code area
    VROUTINE_MV        Veneer routine number (a *_VR value)
    NO_OBJS_MV         None
    INCON_MV           "Inform constant": the index in the #constants
                       keyword group (a *_SC value)
    DWORD_MV           Accession number of dictionary word
    INHERIT_MV         Byte address within common property values table
                       of the value which is being inherited here
    INHERIT_INDIV_MV   Ditto, but for individual property values table
    INDIVPT_MV         Offset in individual property values table
    MAIN_MV            None
    SYMBOL_MV          Index within symbols table

and these are backpatched as follows:

    Marker value       Final value

    STRING_MV          Packed address of static string
    ARRAY_MV           Byte address
    IROUTINE_MV        Packed address of routine
    VROUTINE_MV        Packed address of veneer routine
    NO_OBJS_MV         Number of objects in object tree
    INCON_MV           Value of constant (typically an address of some
                       Z-machine table)
    DWORD_MV           Byte address of word's entry in dictionary table
    INHERIT_MV         The value to inherit (after it has been backpatched)
    INHERIT_INDIV_MV   The value to inherit (after it has been backpatched)
    INDIVPT_MV         The byte address of this point in the individual
                       property valyes address
    MAIN_MV            Packed address of "Main" routine
    SYMBOL_MV          Value of symbol (after it has been backpatched)

The code NULL_MV is sometimes used to indicate "no marker", i.e., that a
value is correct as written.  Additional marker values also exist for
operands held within modules:

    IDENT_MV           Property identifier number
    ACTION_MV          Action number
    OBJECT_MV          Object number
    VARIABLE_MV        Variable number

(see chapter 10).  Note that modules are not value-backpatched at compilation
time, but at Link time.

Two different memory blocks are used to store backpatch markers: one for the
"Z-machine image", the byte-addressable memory at the bottom of the memory
map; and another for the Z-code area.  In the interests of economy, these use
different formats to hold marker data: see "bpatch.c".


    8.8   Packed address decoding
          -----------------------

Recall that the formula relating a packed address P to its byte address B is:

    B = 2P            version 3
        4P            versions 4 and 5
        4P + 8R       versions 6 and 7: routines
        4P + 8S       versions 6 and 7: static strings
        8P            version 8

In versions 6 and 7, R and S can be chosen fairly freely in the range 0 to
M/8, where M is the minimum address of a routine or static string as
appropriate.  The point is to expand the address space by making higher B
values available (whereas P has to lie within 0 and 65535): so one wants to
make R, S larger rather than smaller.  On the other hand, it's necessary to
the Inform language that the following are all different:

    (a) the number 0;
    (b) valid object numbers;
    (c) packed addresses of routines;
    (d) packed addresses of strings.

To be sure that (c) and (d) differ, Inform sets S = R.  To keep (c) from
(a) and (b), we must ensure that the packed address of every routine is at
least X, where X is slightly more than the largest object number.  Inform
also knows the byte address M of the lowest routine in memory ("Main__").
Note that

    M  =  4X + 8R

Inform therefore nudges up M and X to ensure that M and 4X are both divisible
by 8, and sets

    R  =  (M - 4X)/8
    S  =  R.

To compare this with the code: Write_Code_At is M; extend_offset is X;
scale_factor and length_scale_factor hold the magic constants 4 and 8.
code_offset and strings_offset are the scaled addresses of the first
routine and the first static string respectively, worked out by

    code_offset = 4X
    strings_offset = 4X + size of code area

The reason for having these is that when compilation was actually happening,
Inform did not know enough to calculate packed addresses, and instead
wrote scaled addresses starting from 0 for each area.  In backpatching,
then, Inform adds the above offsets to each packed address, and then they
come right.