6   Syntax analysis 2: the bottom-up expression parser
        --------------------------------------------------

    6.1   Map and structure
          -----------------

It would be possible to continue the top-down parser down into the level of
expressions: for instance, to have a routine parse_plus() which would
parse_expression(), then match a plus sign, then parse_expression() again,
and so on.  This would however be highly inefficient in terms of function
calls, difficult to recover from when an error occurs, and require either
much greater lookahead or rather complicated mechanisms for storing parse
trees.

Expressions (including assignments, conditions and constant values) are
therefore parsed by a different syntax analyser, which is "bottom up" (it
begins by reading in the tokens, and only works out what they mean afterward)
rather than "top down" (starting from an assumption about the meaning and
then trying to find the evidence to justify it).

The algorithm used is an operator precedence parser, a simple form of
shift-reduce parser.  There is alas no space to describe this beautiful
algorithm here: see ASU, pp.195-215.  In this account we treat it as a
"black box".

The expression parser "expressp.c" has the structure:

    -------->  "Level 4" of      -------->  Operator
     tokens    lexical analysis   etokens   precedence
       in                                   parser
                                              |
                                              | etokens re-ordered into RPN
                                             \|/ 
                                            Emitter
                                              |
                                              | plain parse tree
                                             \|/ 
                                            Lvalue and
                                            condition checking
                                              |
                                              | more detailed parse tree
                                             \|/         out

(This is effectively a magnification of the part of the source map in
section 1.2 above which reads just "expressp.c".)  Note that there is a
single input and a single output channel.

RPN is "reverse Polish notation": for example

    (2 + 4)*45 + 34         becomes  2 4 + 45 * 34 +

the idea being that one reads it left to right: each number is added to
a stack of numbers, and each operation takes some numbers off the stack
and puts an answer back.  (E.g., + takes 2 and 4 off and replaces them
with 6.  * takes 6 and 45 off, replacing them with 270.  + takes 34 and 270
off, replacing them with 304, which is the answer.)  The advantage of
this is that is unambiguous which operator acts on which operands, and the
brackets have gone.

The emitter builds a (plain) parse tree out of this sequence, as follows.
It stacks 2 and then 4:

    Stack: 2 4

It reads +, which it knows has two operands, and takes the top two items
off the stack, joining them into a tree segment like so:

                      +
                     / \
                    2   4

It then makes a special value, say E1, meaning "the sub-expression in this
tree", and stacks that:

    Stack: E1

It then stacks up 45, and reaches *: it takes off the top two values, which
are E1 and 45, and makes

                      *
                     / \
                   E1   45

which it calls E2, and so on.  When the process is finished, we have the
tree:
                            +   <--- this is E3
                           / \
        this is E2 --->   *   34
                         / \
      this is E1 --->   +   45
                       / \
                      2   4

and the stack contains just E3, which is the "answer".


    6.2   The operator precedence grammar
          -------------------------------

A list of productions for this grammar would be tiresomely long.  It can
be roughly summarised as:

    expression   ->   ( expression )
                      a single operator acting on some operands in some way
    operand      ->   token representing a constant
                      ( expression )
                      expression ( arguments )
    arguments    ->   <empty>
                      expression , arguments

The tokens representing a constant are given in the next section.  The
operators have tokens as follows:

    --------------------------------------------------------------------------
    Level    Lexeme    Usage   Associativity  Purpose
    --------------------------------------------------------------------------
      0      ,         binary  left           separating values to work out

      1      =         binary  right          set equal to

      2      &&        binary  left           logical AND
      2      ||        binary  left           logical OR
      2      ~~        unary   (prefix)       logical NOT

      3      ==        binary  none           equal to?
      3      ~=        binary  none           not equal to?
      3      >         binary  none           greater than?
      3      >=        binary  none           greater than or equal to?
      3      <         binary  none           less than?
      3      <=        binary  none           less than or equal to?
      3      has       binary  none           object has this attribute?
      3      hasnt     binary  none           object hasn't this attribute?
      3      in        binary  none           first obj a child of second?
      3      notin     binary  none           first obj not a child of second?
      3      ofclass   binary  none           obj inherits from class?
      3      provides  binary  none           obj provides this property?

      4      or        binary  left           separating alternative values

      5      +         binary  left           16-bit signed addition
      5      -         binary  left           16-bit signed subtraction

      6      *         binary  left           16-bit signed multiplication
      6      /         binary  left           16-bit signed integer division
      6      %         binary  left           16-bit signed remainder
      6      &         binary  left           bitwise AND
      6      |         binary  left           bitwise OR
      6      ~         unary   (prefix)       bitwise NOT

      7      ->        binary  left           byte array entry
      7      -->       binary  left           word array entry

      8      -         unary   (prefix)       16-bit (signed!) negation

      9      ++        unary   (pre/postfix)  read/increment or increment/read
      9      --        unary   (pre/postfix)  read/decrement or decrement/read

     10      .&        binary  left           property address
     10      .#        binary  left           property length
     10      ..&  (**) binary  left           individual property address
     10      ..#  (**) binary  left           individual property length

    11/14    ( )       binary  left           function call/message send

     12      .         binary  left           property value
     12      ..   (**) binary  left           individual property value

     13      ::        binary  left           "superclass" operator
    --------------------------------------------------------------------------
    (**): Illegal except internally, in Inform's own source for the run-time
          veneer
    --------------------------------------------------------------------------

Note that, unlike C, Inform does not provide unary plus.

A binary infix operator is used in the form "X op Y"; a unary prefix operator
in the form "op X"; a unary postfix operator in the form "X op".

The "level" is the precedence level: the higher this number, the more tightly
an operator is glued to the operands adjacent to it.

Cases where two binary infix operators have equal precedence are resolved
according to whether that level is left or right associative.  For instance,

   a / b / c      means    (a/b)/c

since /, on level 6, is left associative.  But

   a = b = c      means    a = (b = c)

since =, on level 1, is right associative.

Cases in the above table where the associativity is "none" mean that an
attempt to silently associate the operator will cause an error.  For example,

   a == b == c

generates an error as being ambiguous, but

   (a == b) == c   and   a == (b == c)

are both permitted.

The function-call-brackets have an asymmetric precedence level according
to whether they're being compared with something on the left or on the right:
the point is that

    object.message(...)
    function_returning_an_object(...).property

must be recognised as

    (object.message)(...)                               . 12      >  (...) 11
    (function_returning_an_object(...)).property        (...) 14  >  . 12

respectively.


    6.3   Lexical level 4: tokens to etokens
          ----------------------------------

The shift-reduce parser expects tokens sorted out into operators, such as
"+" or "ofclass", operands such as "34" or "score" (assuming that's a
variable) and three tokens of special significance:

    (     )     end-of-expression

"Level 4 of the lexer" does just this.  It converts the stream output from
"lexer.c" (i.e., from level 3) into a stream of tokens from the following
table (and it uses a good deal of context information to do so).

    Token type           Meaning
    -------------------------------------------------------------------------
    OP_TT                Operator: value is a *_OP constant

    LARGE_NUMBER_TT      Operand (number): value is the number
    SMALL_NUMBER_TT      Operand (number guaranteed in the range 0 to 255):
                             value is the number
    VARIABLE_TT          Operand: value is Z-machine variable number
    DQ_TT                Operand (text used as static string): text in text
    DICTWORD_TT          Operand (text used as dictionary word): text in text
    ACTION_TT            Operand (##action number): name of action in text
    SYSFUN_TT            Operand (system fn name): value is a *_SYSF constant
    SYSTEM_CONSTANT_TT   Operand (#system_constant): value is a *_SC constant

    SUBOPEN_TT           Open bracket used to open a sub-expression
    SUBCLOSE_TT          Close bracket used to close a sub-expression

    ENDEXP_TT            End of expression
    -------------------------------------------------------------------------

The tokens in this output stream are called "etokens", short for "expression
tokens": they are the raw material from which parse trees are built up.
(Although formally different from ordinary tokens, they have the same storage
type, "token_data", in the Inform source code.)

At first sight this seems an unnecessarily large stock: why not, for example,
compile static strings and dictionary words and convert them to their
addresses here and now?  Why not evaluate system constants and action numbers
now?  Because: (i) these tokens may not be accepted by the parser, so we
can't compile text on the strength of them (and indeed, the parse tree which
is being parsed may not ever be code-generated from); and (ii) we can't
allow ourselves to lose sight of where numbers are coming from so soon,
or backpatching will be impossible.


The presence of two different tokens for numbers - one for numbers guaranteed
to be unsigned 8 bit, one for numbers with no such guarantee and whose value
may be any signed 16 bit quantity - is due to the architecture of the target
Z-machine.  In Z-code considerable space savings are made by storing small
numbers in 1, rather than 2, bytes.  We are anxious to take advantage of
this.

Unfortunately, what seems to be a small number now may not be a small number
later (after backpatching).  Dictionary word values (such as 'word') are
internally stored as the accession number of that word into the dictionary
(say 23, if it is the 23rd different word to be entered) during the
compilation pass, and then backpatched later to the byte address of this
word's dictionary entry: but this will be a large number.

Most operands with non-NULL backpatch markers are forced into long form, to be
on the safe side.  (This includes all operands which are forward references
to constants not yet defined.)  The exceptions are that variable and action
numbers (though not fake action numbers) are guaranteed always to fit into
short form.


There are two tasks involved in translating tokens to etokens: evaluating
tokens representing quantities into one of the "operand" etokens, and sorting
other tokens into the right operator and special etokens.  The second process
is not as easy as it looks because of ambiguities, and is discussed in the
next section.  This section discusses the evaluation of quantity tokens.

Tokens in expressions are read in a context where the keywords are local
variables, textual operators, system function names and system constant
names.  Other identifiers are symbol names.  So the token types which must
represent quantities are:

    DQ_TT, SQ_TT, LOCAL_VARIABLE_TT, NUMBER_TT,
    SYMBOL_TT, SYSFUN_TT and SYSTEM_CONSTANT_TT.

(Actually, this omits a detail: the largely obsolete constant forms #a$...,
#r$..., #n$... are SEP_TT tokens converted into the above; and the very much
still with us constant form ##... is a SEP_TT token converted
straightforwardly into the ACTION_TT etoken.)

These all translate directly into etokens except for SYMBOL_TT and SQ_TT.
SQ_TT requires a little work because of the Inform syntax for single-quoted
strings:

    'x'        means the character code (i.e. ASCII value) for "x"
    '@ss'      means the character code for the German letter "sz"
               (i.e. the code specified in the accented set in the
               Z-Machine standard document)
    'xyzzy'    means the byte address of the dictionary word "xyzzy"

The first two become SMALL_NUMBER_TT etokens.  The third becomes a
DICTWORD_TT etoken.


This leaves SYMBOL_TT tokens: names for constants.  There are two
possibilities: the symbol is so far unknown, or else it has been assigned
a value in earlier syntax analysis.

In most languages it would be an error to use an unknown symbol name as
a value: in C, for example, a function name cannot be used in an expression
unless it has been declared beforehand.

Inform is more tolerant: as far as expressions are concerned, any symbol can
be used earlier than the point in the source code where its value is
assigned, excepting only a local or global variable name.  (This exception
is mainly for Z-machine architecture reasons, but it also makes Inform code
more legible.)

When an unknown symbol is reached, it is converted to a LARGE_NUMBER_TT and
marked as SYMBOL_MV, with the value being its index in the symbol table.
(This allows the backpatcher to know where to find the true value later.)

When a known symbol is reached which is flagged as CHANGE_SFLAG, this is
treated in the same way.  (CHANGE_SFLAG is given to symbols defined by
Constant definitions whose values themselves require backpatching: for
instance, symbols defined equal to dictionary words or routine addresses.)

Otherwise, the symbol is not only known, but its current value is known to
be correct, and so:

    if it's a variable, it's translated to etoken VARIABLE_TT;
    otherwise the LARGE_NUMBER_TT/SMALL_NUMBER_TT decision is now made:
        if it is marked with a marker other than VARIABLE_MV, it becomes
            "long";
        otherwise the value is compared with the range 0 to 255.

Any symbol name fed into the shift-reduce parser is flagged with USED_SFLAG
to indicate that its value has at some point been used in an expression.
(This information enables "This ... was declared but not used" warnings to be
made at the end of compilation.)


    6.4   Resolution of ambiguous tokens
          ------------------------------

It remains to translate to the etokens OP_TT, SUBOPEN_TT, SUBCLOSE_TT and
ENDEXP_TT.  The token types which represent operators are all from SEP_TT and
COND_TT; open and close bracket are also SEP_TT tokens.

There are two main difficulties here.  Firstly, shift-reduce parsers are
unsuitable for resolving ambiguities, and so it is not possible to map tokens
directly into operators.  The ambiguities in the expression grammar for
Inform are:

(a)  ++ and -- each refer to either a prefix or a postfix operator
(b)  - refers to either a unary prefix operator or a binary infix one
(c)  ( and ) are used both to indicate function calls and subexpressions
(d)  , is used both to indicate a sequence of values, each to be evaluated
        and thrown away with only the last one kept, and to separate
        function arguments
(e)  -> does not represent an operator when parsing constants as operand
        values in a line of "@" assembly language (because it needs to
        mean "the store value goes in")

These are resolved as follows:

(a), (b)  The "level 4 lexer" watches for context: for instance, MINUS_SEP
          translates to UNARY_MINUS_OP if there was no previous token, or
          it was an open bracket, or an infix or prefix operator, or a
          comma; otherwise it translates to MINUS_OP.

     (c)  Similarly. If an open bracket token is read, and the previous
          token was an operand of some kind, then it must represent a
          function call (unless we are in constant context); an extra
          etoken is generated for an imaginary infix operator called
          FCALL_OP.  The open bracket token is then translated to
          SUBOPEN_TT as usual; a close bracket is always SUBCLOSE_TT. Thus,
          the stream

              Eroica ( Beethoven , 3 )

          is translated into the etokens

              Eroica <fcall> ( Beethoven <comma> 3 )

          so that a tree will come out of the emitter looking like
          
                     <fcall>
                     /     \
                 Eroica   <comma>
                          /     \
                      Beethoven  3

          although in fact the emitter recognises what is going on and
          automatically simplifies this to:

                       <fcall>
                      /   |   \
                     /    |    \
               Eroica Beethoven 3

     (d)  A comma at the top level is either disallowed (in constant context)
          or treated as the operator <evaluate from left to right but throw
          away the result on the left and keep the one on the right>.  (Just
          as in C.)  Otherwise, comma is a separator of arguments.  "Top
          level" means "top bracket level", which the "level 4 lexer" keeps
          track of by counting the bracket tokens going through it.

     (e)  There is a special expression context called "assembly language",
          in which "->" is translated to ENDEXP_TT.


The second main difficulty is entirely of the language designer's own making. 
How is the end of an expression to be detected?  In C, it always possible
when parsing an expression to say "the end will have been reached when
such-and-such a token is reached": for example, ";" or "}".  This is not
possible in Inform.

Even if one knows in advance what the terminating token ought to be, it might
be impossible to recognise for context reasons.  For example, the statement

    move <first expression> to <second expression>;

appears to be no problem: the first expression is terminated by the keyword
"to", the second by ";".  But suppose "to" has another meaning, as the name
of a constant or variable?  In this case,

    move 2 + to to to;

is a legal statement.

For all these reasons, Inform actually detects the end of an expression at
lexical level 4 by watching the context carefully: for instance, in

    move 4 * banana to ...

when "4 * banana" has been passed, the next token is expected to be an open
bracket, an infix or postfix operator, or (perhaps) a comma.  Since the next
token is none of these, the expression has ended.  Note that this decision
can be taken without knowing whether "to" is a keyword or a symbol.


From a grammatical point of view the worst design flaw in the Inform language
is that array initialisations (either in Array directives or object property
values) consist of sequences of expressions given without any separator in
between.  For example,

    Array X --> 2 4 6 * MAX_FROGS 45 ;

This is arguably a convenient shorthand.  Unfortunately, it falls prey to
that predator of all computing grammars, unary minus.  The directive

    Array X --> 10 -5;

is ambiguous.  Is it an array of 10-5 = 5 entries, or an array of two entries,
10 and -5?

In Inform 5 there was no problem since constants were not allowed to contain
operators anyway (and negative numbers were represented by single tokens,
which is not true in Inform 6): the directive can only mean that there are
two entries, 10 and -5.

In Inform 6 the other answer is true, because the parser always makes the
longest match it can.  Since this may mean that some Inform 5 code needs
changing in a few places, a warning that "this is ambiguous" is issued
for unbracketed minus signs used in array initialisations.

(Illustrating the problem further, if the user tries to clarify things with

    Array A --> 10 (-5);

then this is parsed under the normal rules as the function call 10(-5);
therefore another constant-context rule is used to interpret brackets as
always referring to subexpressions, not function calls.)


Similar difficulties at first seem to occur with < and >, and with << and >>. 
Does

    <Action 3 -4>

refer to the pair (Action, -1) or the triple (Action, 3, -4)?  (It is now
obvious that the correct syntax design would be to insist on a comma in
between 3 and -4.)   Fortunately, though, both Informs 5 and 6 apply a "make
the longest expression possible" principle to this case, so they agree that
the answer is (Action, -1).  On the same grounds, the statement

    <Action x ++ y>

is understood as applying ++ to x, not to y, because the "longest expression
possible" when parsing the first expression is "x ++".  A similar slight
exception is made to the rules parsing open-brackets in action statements,
so that

    <Action firstobj (4*counter)>

is not parsed as having one argument, the result of the function call

    firstobj(4*counter).


    6.5   Constant folding
          ----------------

Recall that the emitter takes as input a stream of etokens from the
shift-reduce parser, consisting of the expression re-ordered into RPN.  When
the whole stream has been received (i.e., when it has received ENDEXP_TT),
the emitter has to produce an operand as "answer".

For instance, if the emitter receives just the operand 56, then the answer
is 56.  More often, the answer may be a special value (written above as E1,
E2, ...) meaning "the numerical value has to be worked out by working through
this tree".  (E1 is actually a new type of etoken, of type TREE_NODE_TT.)

If this were done strictly, then the original text 4 + 7 would result in the
emitter producing the answer E1, where E1 refers to the expression tree

                 +
                / \
               4   7

and this is not a helpful answer, for two reasons: it means the value
couldn't be used as a constant (since it seems to need Z-code to compiled to
work it out), and even if it's used in code, there's not much point in
compiling the Z-code instruction

    @add 4 7 -> sp;

just to put the value 11 on the stack.

Therefore, if the emitter finds an operator that it knows how to calculate
(such as +), whose operands are all constants (and which are not unknown
for linking reasons), it eliminates the sub-expression in favour of the
result.  This is called "constant folding": the constant value 4+7 is
folded up into 11.

Two issues arise: firstly, arithmetic must be done exactly as it would be
done in the Z-machine, that is, as signed 16-bit arithmetic.  (Otherwise
constants may fold differently in one port of Inform or another.)

Secondly, constant folding cannot safely be performed on a marked value.
(For otherwise: consider what might happen to the expression SODS - LAW,
at a time when neither SODS nor LAW have been defined.  Both are evaluated
as constants with value their symbol numbers and marker SYMBOL_MV: the result
is folded to the difference between their symbol numbers, and the marker
is lost: i.e., the result is nonsense.)

The emitter is able to fold the following operators:

     +  - (unary or binary)   *  /  %  &  |  ~
     ==  ~=  >  <  >=  <=  &&  ||  ~~

The inclusion of the conditionals here is quite important: it enables
directives like

     Iftrue #version_number == 3;

to work, since the expression "#version_number == 3" can be parsed in a
constant context.

The inclusion of unary minus is also important, as it is only by folding
this that the text "-45" can be parsed in constant context.


    6.6   Checking lvalues and simplifying the parse tree
          -----------------------------------------------

If the emitter produces a parse tree, then it's now modified according to
several rules.  The aim is to remove any ambiguities in the tree (for
instance, "=" does several different things according to what its left
operand is) and make a final correctness check (for instance, this is where
"45 = 12" results in an error message).


Firstly, the tree is traversed to find instances where a value is used where
a condition was expected.  (If the expression is in a condition context, the
top node is expected to be a condition; the children of a logical operator
&&, || or ~~ are expected to be conditions.)  The configuration

          <value or tree>

is replaced by

      <~= 0 condition operator>
                 |
          <value of tree>

If the <value or tree> has root node the "=" operator, a warning is printed
out, since it seems likely that

    if (x = 4) ...

(which would invariably be found true, since the value of "x=4" is 4 which is
non-zero) is a mistype for

    if (x == 4) ...


A second, depth-first, traversal is then made to remove usages of ~~, using
de Morgan's laws:

            ~~(x && y)   =   ~~x || ~~y
            ~~(x || y)   =   ~~x && ~~y
            ~~(x == y)   =   x ~= y
            ~~(x >= y)   =   x < y

and so on.  (Note that, internally, every operator etoken has a corresponding
negated operator etoken.)


One ambiguity remaining in the tree is that the operators ".", ".#" and ".&"
act on both common and individual properties (that is, properties handled by
the Z-machine's architecture directly, and properties handled by the run-time
veneer routines).  The decision about which is intended is made here, by
looking at the right operands of the operators.  (If it's a common property
number, or an individual property number, the decision is obvious; if it's a
variable or some compound expression, then we decide in favour of individual
properties (the run-time veneer routines can patch things up if this is a
wrong decision).)

So a traversal is made to carry out the following transformation:

               .                             <common .>          <indiv .>
              / \             becomes          /    \      or      /   \
             a   b                            a      b            a     b

(Since the veneer routines are themselves written in Inform code, a slightly
different translation rule applies to them: "." always means <common .>
and so on, while three "secret" operators are provided, "..", "..&" and
"..#", to mean <indiv .> and so on.  Outside of the veneer, use of these
operators is illegal.)

In the same traversal, function calls are distinguished from sent messages:

             <fcall>                             <send message>
              /   \ \ ...                           / | | \ ...
       <common .>  c d        becomes              a  b c  d
      or <indiv .>
          / \
         a   b


The final traversal performs "lvalue checking".  Five operators act directly
on storage objects (such as variables) rather than values: these are

       <lvalue> = <ordinary value>
       <lvalue> ++
       <lvalue> --
       ++ <lvalue>
       -- <lvalue>

where <lvalue> has to be a reference to a storage object.  There are up to
five kinds of storage object in the tree:

       local/global variable                       VARIABLE_TT
       common property value of an object          <common .> subexpression
       individual property value of an object      <indiv .> subexpression
       entry in byte -> array                      -> subexpression
       entry in word --> array                     --> subexpression

This makes 25 combinations.  As well as checking that what ought to be an
lvalue is indeed one of these five possibilities, the traversal also
rewrites the tree explicitly with one of 25 operators.  For instance:

             =                                <set byte array entry =>
            / \                                    /    |    \
          ->   v        becomes                   a     b     v
         /  \
        a    b

and so on.

Thus, the end of this process is a tree representing a syntactically correct
expression (if necessary having mended any errors in the source code's
description of the expression), which has a minimal number of logical
operators and no negation operators, and in which the action to be taken by
any node does not depend on what its children are.


    6.7   Summary of parse tree output
          ----------------------------

It is time to formally specify the input and output of the main routine in
"expressp.c":

    assembly_operand parse_expression(int context)

takes, as argument, one of the seven *_CONTEXT constants:

          QUANTITY_CONTEXT    the default: the result may be any quantity

          VOID_CONTEXT        the expression is used as a statement, so that
                              its value will be thrown away and it only
                              needs to exist for any resulting side-effects
                              (used in function calls and assignments)

          CONDITION_CONTEXT   the result must be a condition

          CONSTANT_CONTEXT    the result must be known now (at compile time)

          ASSEMBLY_CONTEXT    like QUANTITY_CONTEXT, but with the '->'
                              operator forbidden
                              (this is needed for assembly language to
                              indicate store destinations)

          FORINIT_CONTEXT     like VOID_CONTEXT, but with the '::' operator
                              forbidden at the top level
                              (needed to resolve ambiguity in grammar)

          ARRAY_CONTEXT       like CONSTANT_CONTEXT, but with different rules
                              to resolve whether a minus sign represents
                              unary or binary minus
                              (needed to resolve ambiguity in grammar)

The return value is the result of the expression.  This is an assembly
operand, the type of which indicates what has happened:

          LONG_CONSTANT_OT, SHORT_CONSTANT_OT
                              the result is this number

          VARIABLE_OT         the result is in this global/local variable

          OMITTED_OT          there is no resulting value (for instance,
                              this happens if the expression was successfully
                              parsed in void context)

          EXPRESSION_OT       the result can only be determined by running
                              the code generated from this parse tree

If an error has occurred in the expression, from which recovery was not
possible, then the return is (short constant) 0.  This should minimise the
chance of a cascade of further error messages.

The type EXPRESSION_OT is not a valid Z-machine assembly operand type, and
it's only used here and in the code generator "expressc.c".  The value of
such an operand is an index N, indicating that the root node of the tree is
in Nth position in the expression-tree-array ET.

The expression-tree-array ET[0], ET[1], ... is an array of "nodes", each of
which is a structure as follows:

    {   /*  Data used in tree construction  */

        int up, down, right;
        int operator_number;         /* Only meaningful for non-leaves */
        assembly_operand value;      /* Only meaningful for leaves */

        /*  Attributes synthesised during code generation  */

        ...

    }

The numbers ET[n].up, ET[n].down and ET[n].right are node numbers for the node
above, the leftmost node below, and the next child to the right of the same
parent: they hold -1 if there is no node in that position.

A "leaf" is a node for which ET[n].down == -1.  Leaves hold operands, and
other nodes hold operators.  (Note that leaf operands must be of operand type
SHORT_CONSTANT_OT, LONG_CONSTANT_OT or VARIABLE_OT: they must be honest
Z-machine operands.)

Nothing distinguishes the root node except that its number happens to be held
in an EXPRESSION_OT operand value somewhere.


A brief word on deallocation: clearly these parse trees need to survive
intact until code generation occurs for them.  No elaborate system is provided
here, since the syntax analyser always generates code for any expressions it
reads while it's still parsing the same syntax line as the expression was
parsed on.  Each time a new syntax line is reached, "expressp.c" is notified
and it then wipes the ET[n] array empty again.