Language Description
Introduction
The S06 language will be similar to standard algorithmic languages, but will not follow any one in particular. You will not have an extensive library for input, output, system calls, or memory allocation.
The lexigraphical structure of the language is defined as follows:
- Ignore white space (blanks, tabs, new lines, and form feeds).
- A file name is a string of characters made up of anything that is not white space. A file name comes after the key word include and is a special case for the input stream to lex. (Token name is FILENAME.)
- An identifier is a string made up of letters, digits, and underscores and does not begin with a digit. The maximum number of characters you should pay attention to are the first 32. Ignore everything after that. (Token name is IDENTIFIER.)
- A string constant is enclosed in double quotes. (Token name is SCONSTANT.)
- Special characters should be treated as C does using a backslash.
- An integer is a sequence of digits 0, ..., 9. (Token name is ICONSTANT.)
- A floating point number has 64 bits (i.e., a C or C++ double) and the
following forms:
- Mantissa only: 123.123.30.3.3 4.
- Mantissa and exponent: 0.123d421.23d-30.001d+10123.456d000
- Key words should processed specially. Unless noted, capitalize the
key word as its token after prepending T_ (e.g., token name T_DO for
do):
Key word Token name allocate T_ALLOCATE class T_CLASS do T_DO double T_DOUBLE dynamic T_DYNAMIC else T_ELSE Finalize T_FINALIZE free T_FREE function T_FUNCTION get_double T_GET_DOUBLE get_int T_GET_INT get_string T_GET_STRING include T_INCLUDE if T_IF Initialize T_INITIALIZE int T_INT main T_MAIN pointer T_POINTER print_double T_PRINT_DOUBLE print_int T_PRINT_INT print_string T_PRINT_STRING procedure T_PROCEDURE return T_RETURN sqrt T_SQRT string T_STRING update T_UPDATE while T_WHILE
- Some of the suggested key words to add to the base language include the following:
Key word Token name case T_CASE switch T_SWITCH boolean T_BOOLEAN true T_TRUE false T_FALSE exp T_EXP cos T_COS sin T_SIN abs T_ABS - Operators that need to be recognized are the following:
Operator Token name & O_AND = O_ASSIGN /= O_ASSIGN_DIVIDE -= O_ASSIGN_MINUS %= O_ASSIGN_MOD *= O_ASSIGN_MULTIPLY += O_ASSIGN_PLUS , O_COMMA // O_COMMENT / O_DIVIDE -- O_DECREMENT == O_DEQ >= O_GEQ > O_GT [ O_LBRACKET { O_LCURLY <= O_LEQ ( O_LPAREN < O_LT # O_MEMBER - O_MINUS % O_MOD * O_MULTIPLY != O_NEQ | O_OR ! O_NOT + O_PLUS ^ O_POWER ++ O_INCREMENT ] O_RBRACKET } O_RCURLY ) O_RPAREN ; O_SEMI
- Some of the suggested operators to add to the base language include the
following:
Operator Token name ^= O_ASSIGN_POWER */ O_COMMENT_CLOSE /* O_COMMENT_OPEN && O_DAND || O_DOR
Tokens do not cross new lines. If lex cannot produce a string or number, you do not have to either. An error message is appropriate if you truncate something, however. An underflow (a floating point number that rounds off to 0 instead of whatever it really is in infinite precision arithmetic), however, is not an error, and should be treated as 0.
There are several special functions for reading and printing data types (double, integer, and string). These will be provided as C code for use by your compiled programs. However, you will have to pass information to these functions correctly. This will be much clearer when you get to the code generation part of the project.
Defining It by Example
The easiest way to see what the language really is, is to study one or more examples. The base example code is contained in two files: mg-s06.h and mg.s06. The target machine is defined in s06.c. A sample of what your compiler should produce is in yourmain.h.
Class examples are also a nice way to spark discussions since none of your languages will be quite the same. Below are samples submitted as part of the parsing portion of the class project.
Name Short Examples
(< 20 Lines)Long Examples
(20-40 Lines)Longer Example
(41+ Lines)Eli Arnaudova 18 LOC Divya Bansal 35 LOC Cindy Burklow 67 LOC Pete Carey 36 LOC Soham Chakraborty 35 LOC Aarthi Jayaram 21 LOC Wei Li 34 LOC Nathan Liang 23 LOC Mark Maynard 14 LOC Derrick Spell 28 LOC Jianzhong Wang 12 LOC Li Wang 18 LOC