The following symbolic terms are used throughout this manual:
Symbol | Meaning (used as a placeholder for...) |
---|---|
sym | any symbol name. |
expr | any expression (see expressions, operators). |
cexpr | any constant expression (see constant expressions). |
lvalue | any lefthand side of an assignment. |
stmt | any statement (see statements). |
Each program is a set of declarations. The last declaration in each program must be a compound statement which forms the initial entry point of the program.
The translator will accept the following characters:
A comment may be placed between any two atomic parts (tokens) of a program or at the beginning of the program. It is introduced by an exclamation point (!) and extends up to the end of the line. The compiler interprets it like a single blank. The following objects are tokens:
Declaration | Description |
---|---|
CONST sym = cexpr, ... ; | Define constants and initialize them with the given values. |
DECL sym(cexpr), ... ; | Declare -- but do not define -- the given procedures. Cexpr is the type (arity) of the respective procedure. Used to create forward-references. |
EXTERN DECL sym(cexpr), ... ; | Like DECL, but declare externally defined procedures. The specified procedures will be exported as unresolved names. They have to be resolved by a linker. (Experimental) |
INTERFACE sym(cexpr) = slot, ... ; INTERFACE sym(cexpr), ... ; |
Define a procedure-style interface to a routine of the runtime environment located at the given `slot'. The slot numbers may be obtained from the description of the respective runtime library or VM interpreter. If no slot number is specified, the last assigned slot plus 1 will be used. Each slot number may be assigned only to a single interface name. |
PUBLIC sym(a0, ..., aN) stmt | Define procedure sym like in an ordinary definition, but also export its name for external linkage. (Experimental) |
STRUCT s = m0, ..., mN; | Define the layout of the compound data
type s with the members m0 ... mN.
This statement is equal to CONST s=N, m=0, ..., mN=N-1; To create an s-object, use VAR sym[s]; |
sym(a0, ... aN) stmt | Define procedure sym with the optional formal arguments a0...aN. Stmt is a single statement forming the body of the procedure (this may be a compound statement, of course). Each procedure returns the value specified in a RETURN statement, if any. When the end of a procedure is reached without encountering a RETURN command, its return value defaults to zero. |
VAR sym, ... ; | Define variables. (*) |
VAR sym [cexpr], ... ; | Define vectors. Valid vector members are
sym[0]...sym[cexpr-1]. (*)
Each member has the size of a machine word. The vector symbol itself is a constant holding the address of the actual vector. Therefore, assignments to vector symbols are not allowed. |
VAR sym :: cexpr, ... ; | Define byte vectors. Valid members are sym::0...sym::(cexpr-1). (*) The byte vector differs from the vector (above) only be the way its size is computed. A byte vector provides enough space to hold cexpr characters instead of machine words. |
(*) Definitions of ordinal variables, vectors and byte-vectors may be intermixed in a single VAR statement.
A compound statement (aka statement block) is used to group statements. All statements which are part of a compound statement are executed in sequence. Each compound statement is itself an ordinary statement and therefore, statement blocks may be nested. Compound statements are delimited by the keywords DO and END. There is no terminating semicolon. Compound statements may be empty.
Each compound statement may define local symbols at its beginning immediately after the keyword DO. Only the following objects may be declared in local scopes:
All objects which have not been declared inside of compound statements or formal argument lists are called global objects. Objects which are defined in local scopes are called local objects. Global objects become valid at the point of their declaration and they remain existant up to the end of the program. Local objects will be created when the flow of the program passes their declaration. The required storage (if any) also will be allocated at this time. The scope of a local object is the compound statement (or procedure) which it has been declared in. When this statement block is left, all of its local symbols get destroyed and the associated storage will be released.
No name may be redefined ever -- neither at the same level nor in embedded scopes. This means that a local object may not have the same name as a global one, and a local object in scope B which is embedded in A may not have a name which already has been used in A. Symbol names may be used in different subsequent scopes, though. In this context, procedure arguments are considered local symbols whose scope is the procedure they belong to.
Statement | Description |
---|---|
lvalue := expr; | Evaluate expr and assign its value to
lvalue. Lvalue may be any object which
has an address. This includes
variables, vector members,
byte vector members, struct
members,
but NOT:
constants, structs, vectors,
procedures.
A vector object in lvalue may be
multiply subscripted to assign a value
to an embedded vector: v[i1][i2]...[iN] := expr; Lvalue will be evaluated before expr. |
sym(expr1, ..., exprN); | Call the procedure sym passing the values of the expressions expr1...exprN as actual arguments to it. The number of parameters passed to a procedure must exactly match the number of is formal arguments as specified in a previous definition, declaration, or external declaration. The return value of the procedure will be discarded. |
CALL sym(expr1, ..., exprN); | Call the procedure whose address is
stored in the (ordinal) variable sym.
Arguments will be passed as described
above, but no type checking will be
performed. A valid procedure address may
be obtained using sym := @procedure; If sym names a procedure rather than a variable, the CALL prefix will be ignored. |
FOR (sym=expr, expr2, cexpr) stmt FOR (var=expr, expr2) stmt |
1) Initialize sym with expr. 2) If \/ sym <= expr2 /\ cexpr < 0 If cexpr is omitted, it defaults to 1. |
HALT; | Branch to the end of the entire program, thereby terminating it. |
IF (expr) stmt | Execute stmt, if expr evaluates to logical truth (a non-zero value). |
IE (expr) stmt-T ELSE stmt-F | Execute stmt-T, if expr evaluates to logical truth (any non-zero value). Otherwise execute stmt-F. |
LEAVE; | Immediately branch to the end of the innermost WHILE or FOR loop, thereby leaving it. (*) |
LOOP; | Immediately branch to the beginning of the innermost WHILE or FOR loop. In FOR loops, branch to the increment part where cexpr is added to sym and in WHILE loops branch to the point where the exit condition is checked. (*) |
RETURN expr; | Evaluate expr and prepare the result for passing it back to the calling procedure. Then, jump to the end of the procedure where local storage will be released and the procedure returns (*). RETURN may not be used in the main procedure. |
WHILE (expr) stmt | Execute the loop body stmt as long as expr evaluates to a true (non-zero) value. If expr is false before the first pass, skip the loop entirely. |
; DO END |
An empty statement does nothing. It may be used in places where the language expects a statement, but nothing is to be done. |
(*) If a branch command is used inside a scope which defines local symbols, the symbols will be destroyed and the associated storage will be released before the branch takes place.
The notation .symbol is used in this section to denote the unsigned value of a symbol. For example, if A = -1, then .A = 65535 on a 16-bit machine like the Tcode machine.
Operator | Prec | Assoc | Description |
---|---|---|---|
( expr ) | 0 | - | Override precedence and associtivity rules. A parenthesized expression is treated as a factor. Evaluate to the value of expr. |
P (...) CALL P() |
0 | - | Call the procedure P with some optional arguments. The value of this
operation is the return value of P. When the procedure call is prefixed
with the keyword CALL, the procedure whose address is stored in
the variable P is called. For details on procedure calls, see the sections about statements and declarations. |
A [ B ] | 0 | left | Evaluate to the B'th member of the vector A or the
vector, the variable A
points to. Since vectors may be nested, multiple subscripts are allowed to a
single symbols to provide access to embedded vectors: A [B1] [B2] ... |
A :: B | 0 | right | Evaluate to the B'th byte of the byte vector A or
the byte vector pointed to by the variable A. :: associates to the
right, because its results are limited to 8-bit patterns: A :: B :: C equals A :: (B::C) but no parentheses may be used to override this rule, because :: may be applied only to vectors, but not to other subexpressions. |
- A | 1 | right | Evaluate to the negative value of A. |
~ A | 1 | right | Evaluate to the bitwise inverted value of A (bitwise NOT). |
\ A | 1 | right | Evaluate to the logical complement of A (logical NOT). A=0 => -1 A\=0 => 0 Each non-zero value is considered true, and only zero denotes false. |
@ A | 1 | right | Evaluate to the address of A. A may be any
lvalue as described
in the section covering statements (assignments).
This includes variables and members of any kind of vector objects: @A[5][7] computes the address of the 7th member of the 5th embedded vector in A. @A::4 evaluates to the address of the 4th byte in A. |
A * B | 2 | left | Evaluate to the product of A and B. If A*B does not fit in a machine word, the result is undefined. |
A .* B | 2 | left | Evaluate to the (unsigned) product of .A and .B. If A.*B does not fit in a machine word, the result is undefined. |
A / B | 2 | left | Evaluate to the integer part of the quotient A/B. A/B is undefined, if B=0. |
A ./ B | 2 | left | Evaluate to the integer part of the unsigned quotient .A/.B. A./B is undefined, if B=0. |
A MOD B | 2 | left | Evaluate to the division remainder of A./B where `./' denotes an
unsigned integer division: A MOD B = A - A / B * B. If B=0, A MOD B is undefined. |
A + B | 3 | left | Evaluate to the sum of A and B. |
A - B | 3 | left | Evaluate to the difference A-B. |
A & B | 4 | left | Evaluate to the result of performing a bitwise AND on A and B. |
A | B | 4 | left | Evaluate to the result of performing a bitwise OR on A and B. |
A ^ B | 4 | left | Evaluate to the result of performing a bitwise exclusive OR on A and B. |
A << B | 4 | left | Evaluate to the result of shifting all bits of the value of A to the left by B positions (bitwise left shift). The sign bit is undefined after this operation. |
A >> B | 4 | left | Evaluate to the result of shifting all bits of the value of A to the right by B positions (bitwise right shift). The sign bit is undefined after this operation. |
A < B | 5 | left | Evaluate to true, if A is less than B. (*) |
A > B | 5 | left | Evaluate to true, if A is greater than B. (*) |
A <= B | 5 | left | Evaluate to true, if A is less than or equal to B. (*) |
A <= B | 5 | left | Evaluate to true, if A is greater than or equal to B. (*) |
A .< B | 5 | left | Evaluate to true, if .A is less than .B. (*) |
A .> B | 5 | left | Evaluate to true, if .A is greater than .B. (*) |
A .<= B | 5 | left | Evaluate to true, if .A is less than or equal to .B. (*) |
A .>= B | 5 | left | Evaluate to true, if .A is greater than than or equal to .B. (*) |
A = B | 6 | left | Evaluate to true, if A is equal to B. (*) |
A \= B | 6 | left | Evaluate to true, if A is not equal to B. (*) |
A /\ B | 7 | left | First, evaluate A. If A is true, evaluate B. If A is false, do not evaluate B. The result is the last evaluated subexpression. This is a generalization of the logical AND: A /\ B gives true, if A AND B have a true value and false, otherwise. (**) |
A \/ B | 7 | left | First, evaluate A. If A is false, evaluate B. If A is true, do not evaluate B. The result is the last evaluated subexpression. This is a generalization of the logical OR: A \/ B gives true, if either A OR B -- or both -- have a true value and false, otherwise. (**) |
A -> B : C | 8 | left | First, evaluate A. If A is true, evaluate
B, else evaluate C. If B is evaluated, do not
evaluate C and vice versa. The result is the last evaluated
subexpression. This is the expression form of the IE statement: X := A -> B: C is equal to IE (A) X:=B; ELSE X:=C; |
(*) All relational operations evaluate to false, if the respective condition does not apply.
(**) Technically speaking, these are `short circuit boolean operators'.
To form valid expressions, the above operators may be used to modify or combine the factors described in this section. The minimum form of an expression is a single factor. The following table summarizes all available types of factors.
Factor Type | Description |
---|---|
Symbols | The sort of value a symbol name evaluates to depends on the type of
the symbol.
Basically, every symbol evaluates to its value. For variables, this is
the value stored in it, and for constants, structs and struct members, this
is the value which has been assigned to them at declaration time. The value of a vector symbol is the address of the associated vector. Therefore, vector symbols actually evaluate to vector addresses. |
Numeric Literals |
Numeric literals are written in decimal notation and they evaluate to
the values they represent. A leading percent sign may be used to indicate
that the literal be negative: %123 = -123 Note: %123 is an atomic factor while -123 is the operator `-' applied to the literal `123'. See the section on constant expressions for more details. |
String Literals |
A string literal is a sequence of characters delimited by double quotes
("): "Hello, World" Each character occupies a full machine word unless the literal is prefixed with the keyword PACKED. In packed strings, each character requires only a single byte. Unpacked strings are terminated with a NUL word, packed strings are padded with NUL characters to the next word boundary. Special characters may be included in strings using escape sequences (which will be desribed in the following section). String literals evaluate to the addresses of their first characters. They may be considered a special form of the table (see below). |
Character Literals |
A character literal is a single character enclosed in single quotes
(apostrophes): 'a', '0', '\s', ''', 'X'. It evaluates to the ASCII code of the enclosed character. Like in string literals, escape sequences may be used to represent special characters. |
Tables | A table is a static initialized vector denoted by a comma-separated
sequence of `table members' delimited by square brackets: [ "IF", 0, @if_stmt ] The type of each table member may be any out of the following list:
v := [ [2,4,9], [7,5,3], [6,1,8] ]; for example, v[0][2]=9 applies. To embed a dynamic expression (whose value is not known at compile time), place it in parentheses: v := [ "x*x gives", (x*x) ]; Each time the table is passed, the value of the embedded expression (x*x) will be re-computed. |
Procedure Calls and Subscripts |
Since procedure calls and subscripts may be considered both, operations or factors, see the section about operators for their explanations. |
Only a subset of the available operators is allowed in constant expressions. There are no precedence rules and all operations evaluate from the left to the right. Since all operations can be explicitly specified by ordering, there are no parentheses.
Constant expressions are expected wherever a value must be known at compile time, like the types in procedure declarations, vector sizes, and the values of constants.
All operators have the same meanings as in ordinary runtime expressions. The following operator are allowed:
In string and character literals, each escape sequence consisting of a backslash (\) and the following character will be replaced with the associated non-printable or special character.
Escape | ASCII | ASCII | |
---|---|---|---|
Sequence | Code | Name | Description |
\a \A | 7 | BEL | Bell -- ring the terminal bell |
\b \B | 8 | BS | Backspace -- move over previously printed character |
\e \E | 27 | ESC | Escape -- introduce a control sequence |
\f \F | 12 | FF | Form Feed -- eject paper on printer |
\n \N | 10 | LF | Line Feed -- move to next line |
\q \Q | 34 | " | Quote -- used for inclusion in strings |
\" | 34 | " | The same as \Q |
\r \R | 13 | CR | Carriage Return -- move to column 1 |
\s \S | 32 | blank | A visual form of the blank character |
\t \T | 9 | HT | Horizontal TAB -- move to next horizontal TAB stop (*) |
\v \V | 11 | VT | Vertical TAB -- move to next vertical TAB stop (**) |
\\ | 92 | \ | Backslash -- used to escape \ itself |
(*) Horizontal TAB stops are mostly at located at every 8th position.
(**) VT frequently moves to the same column in the next line.
Meta commands are commands which do not actually belong to the T3X language, but will be evaluated by the compiler. Some of them affect the compiler itself and some affect the generated code. Normally, meta commands are generated by preprocessors or front ends and there should never be any need to include one of these commands manually. (Exception: #DEBUG;.)
All meta commands begin with a # sign and like all other statements, they are terminated with a semicolon. They might occur at any place where a statement or a declaration (either local or global) is expected, but not inside of statements or declarations.
Command | Description |
---|---|
#DEBUG; | Turn on the emission of debug information, like source code line numbers
and variable names and addresses. When the debug switch
is turned on, the T3X translator will generate a LINE instruction at
the beginning of each statement and an LSYM instruction
for each local and a GSYM instruction for each global variable. Debug information is intended to be used by a source level debugger. |
#L line "name" ; | Re-set input line number and file name. This command should be generated
by preprocessors when changing the order of input lines (for example when
inserting code by including a file). #L sets the internal line counter of
TXTRN to 'line' (which must be specified as a decimal number) and the input
file name to 'name'. When reporting errors, TXTRN will use the provided
line number and file name. The command #L line "" ; should be used to indicate that the following text belongs to the main program and has not been included from some other file. |
These procedures are available in all T3X programs. They do not require an explicit INTERFACE declaration.
Procedure | Description |
---|---|
ATON(S) String S |
Compute the value of the decimal number whose ASCII representation is
stored in the string S:
|
CLOSE(FD) Descriptor FD |
Close the file descriptor FD. Return zero upon success and -1 in case of an error (invalid file descriptor). |
ERASE(F) String F |
Erase the file whose path name is specified in the string F. Return 0 upon success, -1 in case of failure. |
NEWLINE() | Write a system-dependant newline sequence to the currently selected output stream. Return null. |
NTOA(N,W) Number N,W |
Create a string representing the value N in an internal buffer. If N is negative, prefix it with a minus sign (-). If W is greater than the number of characters required by the string, pad the string with blanks to the given length. W must be less than 256. |
OPEN(F,M) String F Number M |
Open the file whose path name is stored in the string F in mode
M. M may have one of the following values:
|
PACK(S,P) String S Pstring P |
Pack the string S into P by storing the least significant 8 bits of each machine word of S into a byte of P. P must provide enough space to store the packed string. S and P may denote the same storage location. In this case, S will get overwritten. PACK() returns the number of machine words required to store P (including the terminating NUL). |
READS(S,N) String S Number N |
Read up to N characters from the currently selected input port into an internal buffer, unpack them into S (see UNPACK), and return the number of characters read. A zero return value indicates that the EOF has been reached, a negative value indicates general failure. N must be <= 1024. |
SELECT(P,FD) Number P Descriptor FD |
If P=0, select a new input port and if P\=0 select a
new output port. The new port will be the file decriptor FD.
A valid descriptor may be obtained from OPEN(). There are also three predefined
standard descriptors:
|
UNPACK(P,S) Pstring P String S |
Unpack the packed string P into S by storing each byte of P in a separate machine word in S. Each word in S will be zero-extended. P and S may not point to the same storage location. S must provide enough space to hold the unpacked string. UNPACK() returns the number of machine words required to store S (including the terminating NUL word). |
WRITES(S) String S |
Pack the string S into an internal buffer (see PACK) and write it to the currently selected output port. Return the number of characters actually written. A return value which is not equal to the length of S indicates failure. The length of S may not exceed 1024 characters. |
These procedures must be declared explicitly using INTERFACE statements. The slot numbers of the procedures are given in the below table. Notice that the number of arguments in each INTERFACE declaration must exactly match the number of arguments expected by the respective procedure.
The preprocessor TXPP provides some includable files which contain the required declarations. See the documentation of TXPP for details.
Procedure | Slot | Description |
---|---|---|
MEMCOMP(R1,R2,L) Vector R1,R2 Number L |
16 | Compare each byte in memory region R1 with the byte at same
position in R2. If the first L bytes of both regions are
equal, return zero.
If two bytes at equal positions differ, return their difference R1::P - R2::P where P is the position of the mismatch. |
MEMCOPY(D,S,L) Vector D,S Number L |
15 | Copy L bytes from S to D. Return nothing. The two vectors D and S may overlap. |
READPACKED(FD,B,L) Descriptor FD Vector B Number L |
11 | Read up to L bytes from the file descriptor FD into the buffer B. Return the number of bytes read. A zero return value indicates that the end of the input file has been reached. A negative value indicates general failure. |
RENAME(OLD,NEW) String OLD,NEW |
14 | Rename the file whose path name is stored in the string OLD to NEW. Return zero upon success and -1 in case of failure. |
REPOSITION(FD,PH,PL,O) Descriptor FD Number PH,PL Number O |
13 | Move the file pointer of the descriptor FD to the position
specified by the values PH and PL. PH and
PL are both machine words while file
offsets are usually at least 32 bits wide. Therefore, the position is
computed using the formula Offset = PH * 65536 + PL . O specifies the origin of the move:
|
WRITEPACKD(FD,B,L) Descriptor FD Vector B Number L |
12 | Write L bytes from the buffer B to the file descriptor FD. Return the number of bytes actually written. Any number which is not equal to L indicates failure. |
This document is part of the T3X compiler package which is subject to the following terms.
T3X -- A Compiler for the Procedural Language T, version 3X
Copyright (C) 1996-2000 Nils M Holm. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.