8 Assembly of code, text and data structures ------------------------------------------ 8.1 Assembly of initial code ------------------------ The front end of "asm.c" consists of three routines to assemble routine start and finish, and to place labels; plus many routines for assembling individual instructions. These all fill out data in the assembly_instruction AI and then call a single assemble-instruction routine. Thus, the back end of "asm.c" contains four operations to place data into the Z-machine's code area: Routine start assemble_routine_header() Routine end assemble_routine_end() Label assemble_label_no() Instruction assemble_instruction() For the format of Z-code instructions, see the Z-Machine Standard Document, which goes into the Z-machine's format with exhaustive thoroughness. Note that Inform 6 is much more compliant to the standard (so far as I know, perfectly compliant) than earlier releases: for example, the Flags 2 bits for "UNDO is needed" and so on are set correctly for the first time. One of the few pieces of information fed back to Inform's higher levels by "asm.c" is a flag indicating whether execution can possibly reach the next instruction or not. The assembly of certain instructions (returning from a routine, terminating the Z-machine, unconditionally jumping) sets the flag execution_never_reaches_here while the placing of a label removes it again (because by branching to that label it may be possible to reach this point after all). This is used to issue dead code warnings (and also to remove some unnecessary branches when code-generating "if"..."else"...). Assembling a routine header is straightforward: note that an option exists here to assemble some code along with it to print out tracing information. For this reason, and also to print out assembly trace at compile time, "asm.c" accesses the variable names in the symbols table. All labels are referred to in Inform's higher layers by number, rather than by address; only "asm.c" has any access to what their addresses are. Labels are numbered from 0 upwards, but the assembler also recognises -2 branch only to the next instruction (this converts a branch instruction such as @get_child to a non-branch one, since it causes the instruction to behave identically whether the branch is made or not) -3 return false rather than branch -4 return true rather than branch The code -1, used higher up for "no branch intended", should never descend this far. Note that the Z-machine has no formal concept of "routine end". Therefore, assemble_routine_end() only generates code to cause a return (and only then if the present instruction position can be reached). By long-standing (if unfortunate) Inform convention, the value returned in this case will be 0 for a routine in an object definition, or 1 for any other routine. The routine-start/end routines also keep track of the usage of local variables and labels, and issue warnings about those which were declared but not used. Since the scope of these symbols is restricted to the routine in which they are defined, label names are unassigned (i.e. reset to UNKNOWN_SFLAG CONSTANT_T's). Local variable names are not held in the symbols table and are automatically cleared when the next routine starts. Finally, we can now detect the use of a label name not defined in the routine, so we issue an error message if this has happened. What happens in this first phase of code assembly is that a form of Z-code called "initial code" is stored in a temporary holding area of memory until a routine has been completed. It is then treated by the branch optimiser (see the next section). "Initial code" is the same as Z-code, except that: (a) all branches to labels have long form, and instead of an offset the branch value is the label number within this routine (0, 1, 2, ...); (b) LABEL operands (to "jump" instructions) are, similarly, label numbers of the label to branch to; (c) parallel to the holding area is a uchar[] buffer of the same size, holding a "marker value" for each byte. (Note that branches to -2, -3 and -4 are assembled as short-form branches and not marked as BRANCH_MV.) The marker values are as follows: BRANCH_MV This byte is the first of two in a branch operand LABEL_MV Ditto but in a LABEL operand DELETED_MV (Inserted by the optimiser, rather than at initial code assembly.) This byte has been deleted and should be skipped over when the code is transferred to long-term storage. a "module value" marker value This byte is the first of two in a LONG_CONSTANT_OT operand which should be marked with this value in module compilation ditto + 128 Ditto but a one-byte SHORT_CONSTANT_OT or VARIABLE_OT operand 8.2 Branch offset backpatching and optimisation ------------------------------------------- There are three phases of this process: (i) Deciding whether each branch should or should not be optimised to the 1-byte short form, using the label positions in the initial code; (ii) Recalculating the label positions to take account of this (which will tend to move the labels closer together as second bytes of branch operands are deleted); (iii) Transferring the initial code to long-term storage, replacing branch and LABEL operand values with address offsets and issuing module markers as indicated (if in module mode). A branch can be "short form" if it is a forward branch of between 0 and 29 bytes from the PC position of the next instruction. To record a branch as "short form" during (i), the second byte of the branch operand is marked as DELETED_MV. The long-term storage alluded to is either temporary file 2 or else the zcode_area memory block. Note that the routines moved there are still not finally correct, but are "raw Z-code": Z-code which has incorrect operands because value backpatching has not yet occurred. 8.3 Text translation: ISO and Unicode resolution -------------------------------------------- The algorithm encoding ZSCII text to a compressed sequence of 5-bit "Z-characters" has been documented many times: see the Z-machine standards document. The section of Inform responsible, "text.c", keeps text for three Z-machine areas: the static strings area, the dictionary and the "low strings pool". (In addition, as requested by "asm.c", it encodes text into the Z-code area as part of the assembly instructions @print and @print_ret.) Static strings are written to a temporary holding area of memory, manipulated until correct and then transferred to long-term storage: either temporary file 1 or the static_strings_area memory block. The low strings pool holds a table of strings in low Z-machine memory used for abbreviations. Such strings need to be accessible in an unusual way (they are addressed by halved byte addresses, not packed addresses) owing to an anomaly in the Z-machine architecture, and therefore must reside in the bottom 64K of memory: they cannot be written to the static strings area. Source text arrives at the text translation routines still encoded in an ISO extension to the ASCII character set (depending on the current value of the -C switch). A character value of $ca might mean Latin E-circumflex, E-ogonek, Cyrillic capital hard-sign, Arabic Teh or Greek capital Kappa according to this context. It would be illegal in plain ASCII or ISO Hebrew, but since the Inform lexer has already filtered out any illegal ISO characters, this possibility no longer arises. Furthermore, characters can be specified by string escapes beginning with the @ character. The sequence @^E would specify Latin E-circumflex, and the sequence @{39A} would specify Greek capital Kappa (by its Unicode value, $039A) regardless of the current -C setting. The work of sorting out character codes is handled by "chars.c". It may be useful to summarise the different meanings which character codes can hold within Inform: (1) "Source". Raw source files can contain any character values between $00 and $ff. The lexer filters these characters to reduce them to: (2) "ISO". An unsigned 8-bit character code within which the values $20 to $7e always have their usual ASCII meanings, and $7f to $9f are always undefined, and some values in the range $a0 to $ff may have meanings as specified in ISO 8859-1 to -9 if the current Inform -C setting is 1 to 9. If -C is set to 0, "ISO" means the same as plain ASCII. (3) "Unicode". Unicode, or ISO 10646-1, is an unsigned 16-bit character code which includes essentially every human script. Inform uses it as an intermediate state between the diverse ISO possibilities and the final ZSCII result. (4) "Text". A string of ISO characters to specify a sequence of one or more Unicode characters. Usually, one ISO character is translated to one Unicode character: the exception is for string escapes. For instance, the text "a@{39a}c" specifies a sequence of 3 Unicode characters. (5) "ZSCII". An unsigned 10-bit character code used within the Z-machine. See the Standards document for its definition: note that between $20 and $7e, it agrees with ASCII. There are several translation routines in "chars.c" to move between these forms: for instance, "zscii_to_text" is used when printing out the contents of the dictionary into a transcript file. (Transcript files use the same ISO setting as the original source code.) However, the main sequence of translation is: Source -------> Text -------> Unicode -------> ZSCII lexer ISO-to- Unicode-to- filtration Unicode; ZSCII string escape parsing Note that the first two stages (lexer filtration and ISO to Unicode) depend only on the current -C setting, and are always possible. The third stage (Unicode to ZSCII) is more complex because it is programmable and may fail, depending on what the text is needed for. Unicode already specifies over 30000 different characters and compromise is inevitable in trying to squash this into the 1024 possible ZSCII values. The Z-machine's stock of "extra characters" (see Standard Document 1.0) is configured by the story file to correspond to a block of up to 97 Unicode characters, and these are the only ones usable in Inform strings, dictionary words or as quoted values. Inform sets up this block to a sensible default set given the current ISO setting. For example, if Inform reads in source code under -C6 (ISO Arabic) then it will choose all the Arabic letters from ISO Arabic. (This default setting can be overridden using the "Zcharacter" directive.) Note that if the designer wants, say, Unicode $274b ("heavy eight teardrop-spoked propeller asterisk") as well as plain old $0621 (Arabic letter Hamza), this can simply be added to the stock accumulated so far. The stock of extra characters is defined by a list of Unicode values written out as the "Unicode translation table" in "tables.c". 8.4 Dictionary management --------------------- As dictionary words are found in the source code, for instance in the value of "name" properties, or in grammar, or in constants like 'this', they are added to the dictionary by a routine called dictionary_add(), which numbers each different word in "accession order": word 0 is the first which was entered, and so on. References to dictionary words are actually stored in the Z-machine as byte addresses within the dictionary table, which cannot be known until after the compilation pass (when the full alphabetical ordering is known). Such references are therefore always marked (with DICT_MV) for backpatching. During compilation a dictionary table similar to the Z-machine format is kept: there is a 7-byte header (left blank here to be filled in at the construct_storyfile() stage in "tables.c") and then a sequence of records, one per word, in the form 4 or 6 bytes byte byte byte The difference between this and the final Z-machine dictionary table is that the records occur in accession order, not alphabetical order. (The construct_storyfile() routine in "tables.c" rearranges them.) The Z-machine does not specify what use is made of the three bytes of "dictionary parameters" (it doesn't even specify that there have to be only three): for this, see the next section. The type "dict_word" is used for a structure containing the (4 or) 6 bytes of Z-coded text of a word. Usefully, because the PAD character 5 is < all alphabetic characters, alphabetic order corresponds to numeric order of Z-encoding (regarding the Z-encoded bytes as a 32 or 48 bit big-end-first number; sign is unimportant as the initial bit is never set). For this reason, the dict_word is called the "sort code" of the original text word. A special text-translation routine is used to "prepare" such a sort code: as dictionary words do not use every feature possible in ordinary text translation (abbreviations, for instance, or two of the string escape forms) it's worth having a separate routine, optimised for speed. Note that this routine makes use of "text_to_unicode" and "unicode_to_zscii" in "chars.c" when, but only when, it hits a string escape character @. (For ordinary ISO characters, when no string escape is hit, these routines would be too slow.) Since dictionaries typically contain 500 to 2000 words, speed is extremely important when searching and sorting: a simple O(n^2) algorithm to search the dictionary, for example, greatly increases Inform's total compilation time on medium-size games. (Here "n" is the number of dictionary words.) Inform 1 to 3 used a crude O(n^2) mass comparison algorithm, which was unacceptably slow. Inform 4 to 6.05 used a form of hash table, with the first character of a word as hash function (there were 27 hash values, for A to Z plus "other"): this behaved very acceptably most of the time, but was ultimately still O(n^2) and the uneven distribution of initial letters in English meant that the hashing function was not ideal. Inform 6.1 instead stores the dictionary as a "red-black tree". (See, e.g., Robert Sedgewick, "Algorithms" (1983, second ed. 1988).) The tree is binary and always has the property that at node X, anything hanging from the left branch of X is alphabetically before what's at X, and anything hanging from the right is alphabetically after. For instance, a legal configuration is: fieldmouse / \ abacus patrician / \ a constantinople This is simple enough to build and to search through. The problem is that one usually ends up with a haphazardly arranged or "unbalanced" tree in which some branches are very much longer than others, so that searching times can be unpredictable, and in worst case the construction process is still O(n^2). Thus one wants a method of building the tree which attempts to restore the balance as it does so. The algorithm here involves "colouring" each branch either red or black (that is, storing one bit of information for each branch). Roughly speaking, suppose X is added as a branch of Y. Then the branch leading down from Y to X -- the new branch -- is coloured black. But the branch leading down to Y -- which was already there -- is recoloured red. We do not permit the tree to contain two consecutive red branches: whenever we notice this happening, we perform a "rotation". A typical rotation is: a a \ \ bag cat / \(red) ---> (red)/ \(red) baa cat bag dog / \(red) / \ bun dog baa bun It's like lifting the "bag" subtree and re-hanging it from the word "cat". (There is another case, when the red branches slope in opposite directions.) We also try to percolate redness upwards (redness being the possibility for change in the tree structure). So, when searching, if we ever find \(black) node (red)/ \(red) then we replace it with \(red) node (black)/ \(black) and check the "rotate if you get two consecutive red branches" rule again. It is known that: (i) if there are N words in the tree, then any search takes less than 2 log-to-base-2(N) + 2 comparisons (i.e., this is the maximum depth of the tree); (ii) at worst you need to perform only one rotation for every four comparisons. In practice, the observed values are even better: the average seems to be log-to-base-2(N) comparisons and 1 rotation (though this is unproven). Trees are almost optimally balanced most of the time. For an Inform game with a dictionary of 2000 words, then, we expect to perform about 12 comparisons per search. This compares with a typical figure of 75 using the Inform 6.05 hashing algorithm. Comparisons are not terribly expensive, but searches are quite frequent. 8.5 Format of tables unspecified by the Z-machine --------------------------------------------- The Z-machine standards document requires many tables to have particular formats and locations: but Inform makes several other tables, and this section is to document their formats. (See the construct_storyfile() routine in "tables.c" for exactly how the byte-addressable Z-machine memory is put together.) Details of differences in modules as compared with story files are given later. (i) Header Following standard conventions, Inform: leaves bits 2 and 3 of "Flags 1" clear; places the release number in the word at bytes 2 and 3; the grammar table address at bytes 14 and 15; and the serial number in bytes 18 to 23. The default release number is 1 and the default serial number is either 970000 (if no access to the current date is available) or the compilation date in the form YYMMDD. In addition, Inform writes its version number in ASCII form into bytes 60 to 63 (the last four bytes of the header) in the form "6.01". This makes story files produced by Inform 6 distinguishable from all other story files, something interpreter writers have long wanted. (ii) Low strings pool and abbreviations table By default, the 96 entries are all " " (this Z-encodes to the $80 $00, which makes it a convenient default value). The first two banks of 32 are used for strings declared by Abbreviate; the third is set at run time by the "string" statement, to values which have previously been declared by Low_string. The pool of low strings is always placed at $0040, immediately after the header and before the abbreviations table. (iii) Header extension table The Z-machine standard allows games not to provide a header extension table (formally called the "mouse data table" after the only information which Infocom ever stored in it) and Inform 6.01 to 6.11 never do. Inform 6.12 and later always generate this extension, which always contains at least 3 word entries. (iv) Character set table This is a table of 78 = 3*26 bytes, giving the Z-character numbers of the characters to appear in the three translation alphabets. The table is only written if the "Zcharacter" directive has been used to alter the standard settings. (See section 12.2.) (v) Unicode translation table This is generated (by Inform 6.12 and later) only when needed, i.e., only when a game has asked to use a stock of "extended characters" which differs from the usual ISO Latin1 set. Games compiled under the ISO setting -C0, -C1 (the default setting) and -C9 will usually never have a Unicode translation table, though they can have. (vi) Property default values table Note that Inform defines properties 1, 2 and 3 automatically: 1 is "name", while 2 and 3 are used to support the veneer routines' implementation of object orientation. (vii) Class-number-to-object-number table (viii) Individual property values table See the notes on object orientation; these tables are used by the veneer and are unspecified by the Z-machine. (ix) Grammar table (x) Actions table (xi) "Preactions" table (xii) "Adjectives" table See the next section. The Z-machine specifies none of these tables. (xiii) Dictionary table The format of this table is largely specified by the Z-machine. Although the Z-machine allows some flexibility in setting up the dictionary, Inform sticks to the usual Infocom configuration: the separating characters are full stop, comma and space; the word-entry length is 7 (in version 3) or 9 (otherwise). Thus the dictionary begins with seven header bytes: 3 '.' ',' ' ' 7-or-9 Each word is encoded with an entry giving 4 (in version 3) or 6 (otherwise) bytes of textual representation, fully specified by the Z-machine, and then three bytes of data, which Inform can do with as it pleases. These are the so-called "dictionary parameters" dict_par1, 2 and 3. Inform's pleasure is to write into them as follows: dict_par1: flags dict_par2: verb number (counting downwards from 255) dict_par3: preposition number (counting downwards from 255) in grammar version 1; not used in grammar version 2 The flags are given as follows: bit: 7 6 5 4 3 2 1 0 The bits , and are set if the word can be used in the context of a verb, noun and/or preposition (all three can be simultaneously set). The bit indicates that the English verb is "meta", that is, is a command to the program and not a request in the game. The bit is set using the '...//p' notation, like so: 'egg' 'eggs//p' Library 6/3's parser uses this to automatically detect commands referring to multiple game objects. (xiv) Module map This is an extension to the header, only used in module files (see later). (xv) Linking data table A table placed after the end of static strings, but only in module files (see later). To summarise, then, an Inform game has the following memory map: ----------------------------------------------------------- Dynamic Header memory Low strings pool Abbreviations table Header extension Character set table (if differs from normal) Unicode table (if differs from normal) Common property default values (*) Object tree Common property value tables (*) Class number to object number conversion table Property names table (+) Attribute names table (+) Array names table (+) Individual property value tables (*) Global variable values, part of... (*) ...Dynamic array space (*) ----------------------------------------------------------- Static Table of grammar table addresses byte The grammar tables (one for each Inform verb) (-) addressed Table of action routine addresses (+) memory Table of parsing routine addresses (+) Table of "adjectives", i.e., prepositions (+) Dictionary (Module map) Synchronising gap of up to 7 null bytes ----------------------------------------------------------- Static Z-code area (*) "virtual" Static strings area memory (Link data table) ----------------------------------------------------------- (*) This area requires full value backpatching (+) This area requires a simple automatic correction (e.g. adding on the static strings or code offsets), which is done without the full backpatching machinery (-) In grammar version 1, this area requires no backpatching, but in GV2 each grammar token's data is checked to see if a dictionary word or routine address, and backpatched if so: so in GV2 it comes under category (+) 8.6 Grammar version numbers GV1 and GV2 ----------------------------------- The "grammar tables", used by the parser in an adventure game, are not specified by the Z-machine at all (contrary to popular opinion) and must therefore be fully specified here. There are four such tables: Grammar table Actions table "Preactions" table "Adjectives" table Inform can generate two different formats for these tables, known as grammar version 1 (henceforth GV1) and grammar version 2 (GV2). Inform makes the GV1 or GV2 decision according to the value of the symbol Grammar__Version which Inform usually defines as 1, but allows programs to redefine. Library 6/3 and later, in particular, contain the line Constant Grammar__Version = 2; Note that it is essential for the library's parser to understand the grammar tables being generated by Inform, so attempting to use GV2 with library 6/2 or earlier would fail. The designs of GV1 and GV2 have some points in common. The Grammar table is a --> array containing addresses of grammars, one for each Inform verb. The Actions table is a --> array containing addresses of action routines (such as "TakeSub"), one for each action. (Fake actions have no action routines and aren't in the table.) (The term "Inform verb" means essentially a common parsing grammar, such as that shared by "take" and "hold", and is used to distinguish this from an "English verb", such as the word "take".) GV1 is at heart an imitation of middle-period Infocom table formats, and was used in order that common debugging tools would still work on Inform games (an important consideration in the early days of debugging Inform 1). In GV1, an individual grammar table has the format: 1 byte followed by that many 8-byte lines in the form: ... -- byte --------- --byte-- --byte-- -- byte ------- The parameter count is the number of tokens which are not prepositions. This is needed because the run of tokens is terminated by null bytes (00), which is ambiguous: "noun" tokens are also encoded as 00. The numerical token values are as follows: "noun" 0 "held" 1 "multi" 2 "multiheld" 3 "multiexcept" 4 "multiinside" 5 "creature" 6 "special" 7 "number" 8 noun=Routine 16 + parsing-routine-number Routine 48 + parsing-routine-number scope=Routine 80 + parsing-routine-number attribute 128 + attribute number 'preposition' adjective number illegal values: 9-15, 112-127 This one-byte value has to identify particular prepositions and routines, which is only possible using a numbering system for each. GV1 numbers parsing-routines upwards from 0 to 31, in order of first use. A separate table translates these into routine packed addresses: the "preactions" table. (As usual, the term is traditional: Inform has no concept of preaction, but the Infocom games from which it inherits GV1 do have such a concept.) The preactions table is a simple --> array. (Note that in Infocom's games the preactions table always has the same length as the actions table: this is not true in either GV1 or GV2 Inform games.) Prepositions are also identified by their "adjective number". (An early misunderstanding in Z-machine decipherment led to the traditional use of the word "adjective" for dictionary words explicitly written into grammar lines, which are mainly prepositions like 'into' or 'against'.) Adjective numbers count downwards from 255 in order of first use. They are translated back into dictionary words using the "adjectives table". The adjectives table contains 4-byte entries: 00 ----2 bytes----------------- ----2 bytes----------- To make life more interesting, these entries are stored in reverse order (i.e., lowest adjective number first). The address of this table is rather difficult to deduce from the file header information, so the constant #adjectives_table is set up by Inform to refer to it. GV2 is a much more suitable data structure, easier to read and write, less limiting and marginally faster and more economical of Z-machine and Inform memory. In GV2 an individual grammar table has the format: 1 byte followed by that many grammar lines. Individual lines are no longer always 8 bytes long, as in GV1. Instead they have the form: ... ----2 bytes---- -3 bytes- -3 bytes- -byte-- The action number is actually contained in the bottom 10 bits of the word given first: the top five are reserved for future use, which leaves action_number & $400 as a bit meaning "reverse the order of the first and second parameters if this action is to be chosen". The ENDIT marker is the number 15. There can be anything from 0 to 30 tokens, and each occupies three bytes, arranged as: -- byte ---- --2 bytes--- Token type bytes are divided into the top two bits, the next two and the bottom four. The "next two bits" are used to indicate alternatives. In a sequence of tokens T1 / T2 / T3 / ... / Tn then T1 will have $$10 in its "next two bits", and each of T2 to Tn will have $$01. Tokens not inside lists of alternatives always have $00. (Note that at present only prepositions are allowed as alternatives, but the format is designed to open the possibility of extending this to all tokens.) The bottom four are the "type" of the token. The top two indicate what kind of data is contained in the token data. Strictly speaking this could be deduced from the bottom six bits, but it's convenient for making backpatching GV2 tables a simple matter within the compiler. Type Means Data contains Top bits 0 illegal (never compiled) 1 elementary token 0 "noun" 00 1 "held" 2 "multi" 3 "multiheld" 4 "multiexcept" 5 "multiinside" 6 "creature" 7 "special" 8 "number" 9 "topic" 2 'preposition' dictionary address 01 3 noun = Routine routine packed address 10 4 attribute attribute number 00 5 scope = Routine routine packed address 10 6 Routine routine packed address 10 GV2 makes no use of "adjective numbers" (so that dict_par3 is always zero in GV2's dictionary words) and leaves both the adjectives table and the preactions table empty. There is one further difference between GV1 and GV2: in GV1, fake actions are numbered from 256 upwards; in GV2, from 4096 upwards. Note that although GV2 takes three bytes for a token as compared with GV1's one byte, omission of the redundant null tokens and adjective table means that when compiling a small "shell" library game, GV2 actually produces more economical tables: 1920 bytes as opposed to 2337. The first two entries in the following table are the real reason for GV2: Limit in GV1 Limit in GV2 Prepositions per game 76 unlimited Parsing routines (general ones, noun= filters, scope= routines all put together) per game 32 unlimited Tokens per grammar line 6 unlimited Actions per game 256 4096 Inform verbs per game 256 256 In practice the Inform compiler restrains the number of verbs (but that's an adjustable memory setting) and lines per verb: in Inform 6.05 and earlier, the maximum number of lines per verb is 20. Inform 6.10 internally stores grammar roughly as GV2 even when it's going to compile GV1 at the end, and this allows a more economical use of Inform's memory: as a bonus, then, the maximum number of lines per verb is raised to 32 in Inform 6.10. 8.7 Value backpatching ------------------ Value backpatching is the final translation phase in Inform. It is the process of correcting temporary values which were written into the Z-machine at a time when final values could not be known. In addition to the Z-machine regions marked in the memory map above, the symbols table also undergoes value backpatching: defined Constant symbols flagged as CHANGE_SFLAG are backpatched as needed, before first use of their values in backpatching something else. The positions of such values have been "marked" with a "marker value" indicating the type of value needed. The backpatchable marker values are: Marker value Significance of temporary value STRING_MV Scaled address within static strings area ARRAY_MV Byte address within dynamic array area IROUTINE_MV Scaled address within Z-code area VROUTINE_MV Veneer routine number (a *_VR value) NO_OBJS_MV None INCON_MV "Inform constant": the index in the #constants keyword group (a *_SC value) DWORD_MV Accession number of dictionary word INHERIT_MV Byte address within common property values table of the value which is being inherited here INHERIT_INDIV_MV Ditto, but for individual property values table INDIVPT_MV Offset in individual property values table MAIN_MV None SYMBOL_MV Index within symbols table and these are backpatched as follows: Marker value Final value STRING_MV Packed address of static string ARRAY_MV Byte address IROUTINE_MV Packed address of routine VROUTINE_MV Packed address of veneer routine NO_OBJS_MV Number of objects in object tree INCON_MV Value of constant (typically an address of some Z-machine table) DWORD_MV Byte address of word's entry in dictionary table INHERIT_MV The value to inherit (after it has been backpatched) INHERIT_INDIV_MV The value to inherit (after it has been backpatched) INDIVPT_MV The byte address of this point in the individual property valyes address MAIN_MV Packed address of "Main" routine SYMBOL_MV Value of symbol (after it has been backpatched) The code NULL_MV is sometimes used to indicate "no marker", i.e., that a value is correct as written. Additional marker values also exist for operands held within modules: IDENT_MV Property identifier number ACTION_MV Action number OBJECT_MV Object number VARIABLE_MV Variable number (see chapter 10). Note that modules are not value-backpatched at compilation time, but at Link time. Two different memory blocks are used to store backpatch markers: one for the "Z-machine image", the byte-addressable memory at the bottom of the memory map; and another for the Z-code area. In the interests of economy, these use different formats to hold marker data: see "bpatch.c". 8.8 Packed address decoding ----------------------- Recall that the formula relating a packed address P to its byte address B is: B = 2P version 3 4P versions 4 and 5 4P + 8R versions 6 and 7: routines 4P + 8S versions 6 and 7: static strings 8P version 8 In versions 6 and 7, R and S can be chosen fairly freely in the range 0 to M/8, where M is the minimum address of a routine or static string as appropriate. The point is to expand the address space by making higher B values available (whereas P has to lie within 0 and 65535): so one wants to make R, S larger rather than smaller. On the other hand, it's necessary to the Inform language that the following are all different: (a) the number 0; (b) valid object numbers; (c) packed addresses of routines; (d) packed addresses of strings. To be sure that (c) and (d) differ, Inform sets S = R. To keep (c) from (a) and (b), we must ensure that the packed address of every routine is at least X, where X is slightly more than the largest object number. Inform also knows the byte address M of the lowest routine in memory ("Main__"). Note that M = 4X + 8R Inform therefore nudges up M and X to ensure that M and 4X are both divisible by 8, and sets R = (M - 4X)/8 S = R. To compare this with the code: Write_Code_At is M; extend_offset is X; scale_factor and length_scale_factor hold the magic constants 4 and 8. code_offset and strings_offset are the scaled addresses of the first routine and the first static string respectively, worked out by code_offset = 4X strings_offset = 4X + size of code area The reason for having these is that when compilation was actually happening, Inform did not know enough to calculate packed addresses, and instead wrote scaled addresses starting from 0 for each area. In backpatching, then, Inform adds the above offsets to each packed address, and then they come right.