DM4 §31: Tokens of grammar

§31 Tokens of grammar

The complete list of grammar tokens is given in the table below. These tokens are all described in this section except for scope = ‹Routine›, which is postponed to the next.

`'`‹word›`'`	that literal word only
`noun`	any object in scope
`held`	object held by the actor
`multi`	one or more objects in scope
`multiheld`	one or more held objects
`multiexcept`	one or more in scope, except the other object
`multiinside`	one or more in scope, inside the other object
‹attribute›	any object in scope which has the attribute
`creature`	an object in scope which is `animate`
`noun =` ‹Routine›	any object in scope passing the given test
`scope =` ‹Routine›	an object in this definition of scope
`number`	a number only
‹Routine›	any text accepted by the given routine
`topic`	any text at all

To recap, the parser goes through a line of grammar tokens trying to match each against some text from the player's input. Each token that matches must produce one of the following five results:

a single object;
a “multiple object”, that is, a set of objects;
a number;
a “consultation topic”, that is, a collection of words left unparsed to be looked through later;
no information at all.

Ordinarily, a single line, though it may contain many tokens, can produce at most two substantial results ((a) to (d)), at most one of which can be multiple (b). (See the exercises below if this is a problem.) For instance, suppose the text “green apple on the table” is parsed against the grammar line:

* multi 'on' noun -> Insert

The multi token matches “green apple” (result: a single object, since although multi can match a multiple object, it doesn't have to), 'on' matches “on” (result: nothing) and the second noun token matches “the table” (result: a single object again). There are two substantial results, both objects, so the action that comes out is <Insert apple table>. If the text had been “all the fruit on the table”, the multi token might have resulted in a list: perhaps of an apple, an orange and a pear. The parser would then have generated and run through three actions in turn: <Insert apple table>, then <Insert orange table> and finally <Insert pear table>, printing out the name of each item and a colon before running the action:

>put all the fruit on the table
Cox's pippin: Done.
orange: Done.
Conference pear: Done.

The library's routine InsertSub, which actually handles the action, only deals with single objects at a time, and in each case it printed “Done.”

· · · · ·

'‹word›' This matches only the literal word given, sometimes called a preposition because it usually is one, and produces no resulting information. (There can therefore be as many or as few of them on a grammar line as desired.) It often happens that several prepositions really mean the same thing for a given verb: for instance “in”, “into” and “inside” are often synonymous. As a convenient shorthand, then, you can write a series of prepositions (only) with slashes / in between, to mean “one of these words”. For example:

* noun 'in'/'into'/'inside' noun -> Insert

noun Matches any single object “in scope”, a term defined in the next section and which roughly means “visible to the player at the moment”.

held Matches any single object which is an immediate possession of the actor. (Thus, if a key is inside a box being carried by the actor, the box might match but the key cannot.) This is convenient for two reasons. Firstly, many actions, such as Eat or Wear, only sensibly apply to things being held. Secondly, suppose we have grammar

Verb 'eat' * held -> Eat;

and the player types “eat the banana” while the banana is, say, in plain view on a shelf. It would be petty of the game to refuse on the grounds that the banana is not being held. So the parser will generate a Take action for the banana and then, if the Take action succeeds, an Eat action. Notice that the parser does not just pick up the object, but issues an action in the proper way – so if the banana had rules making it too slippery to pick up, it won't be picked up. This is called “implicit taking”, and happens only for the player, not for other actors.

multi Matches one or more objects in scope. The multi- tokens indicate that a list of one or more objects can go here. The parser works out all the things the player has asked for, sorting out plural nouns and words like “except” in the process. For instance, “all the apples” and “the duck and the drake” could match a multi token but not a noun token.

multiexcept Matches one or more objects in scope, except that it does not match the other single object parsed in the same grammar line. This is provided to make commands like “put everything in the rucksack” come out right: the “everything” is matched by all of the player's possessions except the rucksack, which stops the parser from generating an action to put the rucksack inside itself.

multiinside Similarly, this matches anything inside the other single object parsed on the same grammar line, which is good for parsing commands like “remove everything from the cupboard”.

‹attribute› Matches any object in scope which has the given attribute. This is useful for sorting out actions according to context, and perhaps the ultimate example might be an old-fashioned “use” verb:

Verb 'use' 'employ' 'utilise'
    * edible    -> Eat
    * clothing  -> Wear
    ...
    * enterable -> Enter;

creature Matches any object in scope which behaves as if living. This normally means having animate: but, as an exceptional rule, if the action on the grammar line is Ask, Answer, Tell or AskFor then having talkable is also acceptable.

noun = ‹Routine› “Any single object in scope satisfying some condition”. When determining whether an object passes this test, the parser sets the variable noun to the object in question and calls the routine. If it returns true, the parser accepts the object, and otherwise it rejects it. For example, the following should only apply to animals kept in a cage:

[ CagedCreature;
  if (noun in wicker_cage) rtrue; rfalse;
];
Verb 'free' 'release'
     * noun=CagedCreature -> FreeAnimal;

So that only nouns which pass the CagedCreature test are allowed. The CagedCreature routine can appear anywhere in the source code, though it's tidier to keep it nearby.

scope = ‹Routine› An even more powerful token, which means “an object in scope” where scope is redefined specially. You can also choose whether or not it can accept a multiple object. See §32.

number Matches any decimal number from 0 upwards (though it rounds off large numbers to 10,000), and also matches the numbers “one” to “twenty” written in English. For example:

Verb 'type' * number -> TypeNum;

causes actions like <Typenum 504> when the player types “type 504”. Note that noun is set to 504, not to an object. (While inp1 is set to 1, indicating that this “first input” is intended as a number: if the noun had been the object which happened to have number 504, then inp1 would have been set to this object, the same as noun.) If you need more exact number parsing, without rounding off, and including negative numbers, see the exercise below.

• EXERCISE 83
Some games, such as David M. Baggett's game ‘The Legend Lives!’ produce footnotes every now and then. Arrange matters so that these are numbered [1], [2] and so on in order of appearance, to be read by the player when “footnote 1” is typed.

▲ The entry point ParseNumber allows you to provide your own number-parsing routine, which opens up many sneaky possibilities – Roman numerals, coordinates like “J4”, very long telephone numbers and so on. This takes the form

[ ParseNumber buffer length;
  ...returning false if no match is made, or the number otherwise...
];

and examines the supposed ‘number’ held at the byte address buffer, a row of characters of the given length. If you provide a ParseNumber routine but return false from it, then the parser falls back on its usual number-parsing mechanism to see if that does any better.

▲▲ Note that ParseNumber can't return 0 to mean the number zero, because 0 is the same as false. Probably “zero” won't be needed too often, but if it is you can always return some value like 1000 and code the verb in question to understand this as 0. (Sorry: this was a poor design decision made too long ago to change now.)

topic This token matches as much text as possible, regardless of what it says, producing no result. As much text as possible means “until the end of the typing, or, if the next token is a preposition, until that preposition is reached”. The only way this can fail is if it finds no text at all. Otherwise, the variable consult_from is set to the number of the first word of the matched text and consult_words to the number of words. See §16 and §18 for examples of topics being used.

‹Routine› The most flexible token is simply the name of a “general parsing routine”. As the name suggests, it is a routine to do some parsing which can have any outcome you choose, and many of the interesting things you can do with the parser involve writing one. A general parsing routine looks at the word stream using NextWord and wn (see §28) to make its decisions, and should return one of the following. Note that the values beginning GPR_ are constants defined by the library.

`GPR_FAIL`	if there is no match;
`GPR_MULTIPLE`	if the result is a multiple object;
`GPR_NUMBER`	if the result is a number;
`GPR_PREPOSITION`	if there is a match but no result;
`GPR_REPARSE`	to reparse the whole command from scratch; or
O	if the result is a single object O.

On an unsuccessful match, returning GPR_FAIL, it doesn't matter what the final value of wn is. On a successful match it should be left pointing to the next thing after what the routine understood. Since NextWord moves wn on by one each time it is called, this happens automatically unless the routine has read too far. For example:

[ OnAtorIn;
  if (NextWord() == 'on' or 'at' or 'in') return GPR_PREPOSITION;
  return GPR_FAIL;
];

duplicates the effect of 'on'/'at'/'in', that is, it makes a token which accepts any of the words “on", “at" or “in" as prepositions. Similarly,

[ Anything;
  while (NextWordStopped() ~= -1) ; return GPR_PREPOSITION;
];

accepts the entire rest of the line (even an empty text, if there are no more words on the line), ignoring it. NextWordStopped is a form of NextWord which returns the special value −1 once the original word stream has run out.

If you return GPR_NUMBER, the number which you want to be the result should be put into the library's variable parsed_number.

If you return GPR_MULTIPLE, place your chosen objects in the table multiple_object: that is, place the number of objects in multiple_object-->0 and the objects themselves in -->1, …

The value GPR_REPARSE should only be returned if you have actually altered the text you were supposed to be parsing. This is a feature used internally by the parser when it asks “Which do you mean…?” questions, and you can use it too, but be wary of loops in which the parser eternally changes and reparses the same text.

· · · · ·

▲ To parse a token, the parser uses a routine called ParseToken. This behaves almost exactly like a general parsing routine, and returns the same range of values. For instance,

ParseToken(ELEMENTARY_TT, NUMBER_TOKEN)

parses exactly as number does: similarly for NOUN_TOKEN, HELD_TOKEN, MULTI_TOKEN, MULTIHELD_TOKEN, MULTIEXCEPT_TOKEN, MULTIINSIDE_TOKEN and CREATURE_TOKEN. The call

ParseToken(SCOPE_TT, MyRoutine)

does what scope=MyRoutine does. In fact ParseToken can parse any kind of token, but these are the only cases which are both useful enough to mention and safe enough to use. It means you can conveniently write a token which matches, say, either the word “kit” or any named set of items in scope:

[ KitOrStuff; if (NextWord() == 'kit') return GPR_PREPOSITION;
  wn--; return ParseToken(ELEMENTARY_TT, MULTI_TOKEN);
];

· · · · ·

• EXERCISE 84
Write a token to detect small numbers in French, “un” to “cinq”.

• EXERCISE 85
Write a token called Team, which matches only against the word “team” and results in a multiple object containing each member of a team of adventurers in a game.

•▲ EXERCISE 86
Write a token to detect non-negative floating-point numbers like “21”, “5.4623”, “two point oh eight” or “0.01”, rounding off to two decimal places.

•▲ EXERCISE 87
Write a token to match a phone number, of any length from 1 to 30 digits, possibly broken up with spaces or hyphens (such as “01245 666 737” or “123-4567”).

•▲▲ EXERCISE 88
(Adapted from code in "timewait.h": see the references below.) Write a token to match any description of a time of day, such as “quarter past five”, “12:13 pm”, “14:03”, “six fifteen” or “seven o'clock”.

•▲ EXERCISE 89
Code a spaceship control panel with five sliding controls, each set to a numerical value, so that the game looks like:

>look
Machine Room
There is a control panel here, with five slides, each of which can be set to a numerical value.
>push slide one to 5
You set slide one to the value 5.
>examine the first slide
Slide one currently stands at 5.
>set four to six
You set slide four to the value 6.

•▲ EXERCISE 90
Write a general parsing routine accepting any amount of text, including spaces, full stops and commas, between double-quotes as a single token.

• EXERCISE 91
On the face of it, the parser only allows two parameters to an action, noun and second. Write a general parsing routine to accept a third. (This is easier than it looks: see the specification of the NounDomain library routine in §A3.)

• EXERCISE 92
Write a token to match any legal Inform decimal, binary or hexadecimal constant (such as -321, $4a7 or $$1011001), producing the correct numerical value in all cases, while not matching any number which overflows or underflows the legal Inform range of −32,768 to 32,767.

• EXERCISE 93
Add the ability to match the names of the built-in Inform constants true, false, nothing and NULL.

• EXERCISE 94
Now add the ability to match character constants like '7', producing the correct character value (in this case 55, the ZSCII value for the character ‘7’).

•▲▲ EXERCISE 95
Next add the ability to match the names of attributes, such as edible, or negated attributes with a tilde in front, such as ~edible. An ordinary attribute should parse to its number, a negated one should parse to its number plus 100. (Hint: the library has a printing rule called DebugAttribute which prints the name of an attribute.)

•▲▲ EXERCISE 96
And now add the names of properties.

• REFERENCES
Once upon a time, Andrew Clover wrote a neat library extension called "timewait.h" for parsing times of day, and allowing commands such as “wait until quarter to three”. L. Ross Raszewski, Nicholas Daley and Kevin Forchione each tinkered with and modernised this, so that there are now also "waittime.h" and "timesys.h". Each has its merits.