Tunnel Grammar Studio examples

In this page are listed grammar examples and generated parsing machines usages examples. The order is from the easiest to the hardest.

Content

Use cases
Using the parsing machine

Use cases

The following grammars are defining different use cases are ordered from the more simple to to the more complex.

Text Splitter

To recognize text expressions from the input one may write a lexer as:

Lexer grammar

word   = 1* ('A'-'Z' / 'a'-'z')
number = 1* '0'-'9'

The lexer grammar reads as: a 'word' is one or more case insensitive 'a'-'z' letters and a number is one or more digits. The parser grammar may be as:

Parser grammar

text       = 1*expression
expression = ({word} / {number})
             *(0*1 "," " " ({word} / {number}))
             "."

The parser grammar reads as: a text is one or more expressions that each is 'one or more words that in between have an OPTIONAL comma and a REQUIRED space' and MUST finish with a full stop. A valid input for a PM generated from this two grammars will be: Use case 1: Hello World, from ExperaSoft. Additionally one may use only a parser grammar to achieve the same:

Stand alone parser grammar

text       = 1*expression
expression = (word / number)
             *(0*1 "," " " (word / number))
             "."
word       = 1* ('A'-'Z' / 'a'-'z')
number     = 1* '0'-'9'

However the two approaches will have different syntax trees. The first way with the 2 grammars, is expected to run faster in the general case, because the Lexer operates faster then the Parser, because no context information is recorded, as a consequence each token 'word' and 'number' will be represented by a single string type of object. In the second approach with single grammar, the context information is recorded, that will take more memory and time to be recognized and constructed. As a general rule, everything that represents single tokens SHOULD be moved into the Lexer grammar.

Mathematical Expression

In many programming languages mathematical expressions are build in to match the most popular infix notation. Such recognition can be made with this grammars. The lexer grammar is:

Lexer grammar

varname    = 1* ('A'-'Z' / 'a'-'z')
integer    = 1* '0'-'9'
float      = 1* '0'-'9' "." * '0'-'9'

And the parser grammar:

Parser grammar

expression     = addition
addition       = multiplication 
                 *(("+" / "-") multiplication)
multiplication = number *(("*" / "/") number)
number         = {integer} /
                 {float} / 
                 {varname} / 
                 "(" addition ")"

As defined these grammars will generate PM which will be LL(1) and accept the following expressions:

Input examples

2*2
or
5*(10+60*2)
or
2*(21-(x/2)*2)
and so on...

If one wants to introduce space in between the digits and signs, one may change the parser grammar as in this example:

Parser grammar with white space

expression     = *WSP addition
addition       = multiplication 
                 *(("+" / "-") *WSP multiplication)
multiplication = number *(("*" / "/") *WSP number)
number         = ({integer} /
                  {float} /
                  {varname} /
                  "(" *WSP addition ")") 
                 *WSP
WSP            = %x20 / %x9

The new introduced WSP rule is short for white space (a space or a horizontal tab) and will be OPTIONAL at the beginning of the math expression and after each symbol and a phrase token - all currently in the number rule. The priority of the multiplication is higher and comes naturally from the rules relation. These grammars form a LL(1) parser that can parse linearly input as:

Input examples with white space

24 / 4.5 +   11 * (offset   - 2)
or
5644*(b + 1233.21 / 899.004)
and so on...

Log File Reader

Many computer programs are saving some kind of log files for the started or completed operations, for the errors and warnings occurred during the program runtime. This logs are often specifically formated for the company needs. One may define grammars that read this logs without the need to create a specific parser by hand, if one have a need to make a statistics on the log or another automatic analysis. Additionally when the log structure changes, one have to just update the grammars defined and recompile the PM. That is expected to save a lot of time from hand coding the parser at every change.

Lexer grammar

string = '"' 
         *(%x20-%x21 /
           %x23-%x5B / 
           %x5D-10FFFF / 
           '\' ('\' / '"')) 
         '"'
word   = ('A'-'Z' / 'a'-'z' )
         *('A'-'Z' / 'a'-'z' / '0'-'9' / '-' / '_')
number = 1* '0'-'9' ["." * '0'-'9']

Parser grammar

log  = *line
line = item *("," item) CRLF
item = {string} / {word} / {number}
CRLF = %xD %xA

In words, the example lexer grammar is recognizing simple tokens as:

string: starting and ending with double quotes with a possible escaping '\' char followed by '\' or double quote
number: an integer or a float
words: starting with a letter and optionally continuing with a letter, digit, dash or underscore

The parser grammar is recognizing an input of lines terminated with the Internet standard line terminator. Each line MUST have one or more items separated by a comma, and each item is one of the lexer defined tokens: string, word or a number. No white space is permitted inside the log file, but it can be added in the same way as in the mathematical expression use case.

Simple Markup Language

Many companies have option files that have to be human and machine readable. To define simple markup language for this purpose one may use as a base the Extended Markup Language (XML):

Lexer grammar

comment-start = "<!--"
comment-final = "-->"		
tag           = "<" 0*1 "/"	
word          = 1* ('A'-'Z' / 'a'-'z')

Parser grammar

document  = *WSP tag *WSP
tag       = tag-start content tag-final
tag-start = {tag, "<"}  *WSP {word} *WSP '>'
tag-final = {tag, "</"} *WSP {word} *WSP '>'
content   = *(tag / %x21-3B / %x3D-10FFFF / {word} / WSP)
WSP       = %x9       / ; tab
            %x13 %x10 / ; new line
            %x20      / ; space
            {comment-start} 
            *(%x0-10FFFF / {word} / {comment-start} / {tag})
            {comment-final}

Input example

<options>
<!-- this is an option file with one string -->
<string>hello word!</string>
</options>

Simple C/C++

If one wants to create a script language, one may base it on the C/C++ programming language. On the first phase of the parsing the syntax is checked with the PM, on a second phase the semantical meaning of the parsed code could be checked and finally an interpreter or a generator to another language (possibly Assembler) could execute the syntactically and semantically valid parsed input. Those are grammars that produce PM for C/C++ similar language.

Lexer grammar

comment = "/*" / "*/" / "//"	
CRLF    = %x13 %x10
keyword = %s"const" / %s"if"

Parser grammar

document       = *WSP *function
function       = %s"function" 1*WSP name *WSP 
                 func-args scope
func-args      = '(' *WSP [argument-list] ')'
func-call      = '(' *WSP [operation-list] ')'
argument-list  = argument 1*WSP *(',' 1*WSP argument 1*WSP)
argument       = [{keyword, "const"} 1*WSP] type-and-name
type-and-name  = name 1*WSP name
operation-list = operation *(',' 1*WSP operation) 
scope          = '{' WSP *declaration "}" 1*WSP 
declaration    = decl-var / decl-if
decl-var       = type-and-name *WSP ['=' *WSP operation] ';' *WSP
decl-if        = {keyword, "if"} *WSP '(' *WSP operation ')' *WSP
                 scope [{keyword, "else"} *WSP scope]
operation      = addition
addition       = multiplication 
                 *(("+" / "-") *WSP multiplication)
multiplication = value *WSP 
                 *(("*" / "/") *WSP value *WSP)
value          = name 0*1 func-call / 
                 number /
                 "(" *WSP operation ")" 
name           = ('a'-'z' / 'A'-'Z')
                 *('A'-'Z' / 'a'-'z' / '0'-'9' / '_')
number         = 1* '0'-'9' ['.' * '0'-'9']
WSP            = %x9 / %x20 / {CRLF} / 
                 line-comment / text-comment
line-comment   = {comment, "//"} *(%x0-10FFFF / {comment}) {CRLF}
text-comment   = {comment, "/*"} 
                 *(
                   %x0-10FFFF / 
                   {CRLF} / 
                   {comment, "/*"} / 
                   {comment, "//"}
                  )
                 {comment, "*/"}

The semantic analysis is not defined into the grammar, as for example is it an error if one calls a function that is not defined, or calls a function that is defined later in the source code. Because the parsing phase is completed fully before the semantic analysis is started, one may simply permit calling of functions that are defined later in the code, that is one of the advantages of having the parsing phrase separate of any other phase.

Using the parsing machine

This section contains C++ examples of how to use the generated parsing machine.

Beginner

This is 'just give me the Syntax Tree' example. It parses the grammar listed bellow. This example demonstrates parsing machine prepare, run, check result and a syntax tree retrieval.

Stand alone parser grammar

list    = integer *("," integer)
integer = 1* '0'-'9'

Requirements	Download	File Name	Hash SHA-256
Requirements ZIP File Support, C++ 98	Download Demo Beginner	File Name Demo Beginner.zip	Hash SHA-256 70f8a963 6bee14ab 03fa3105 a39a9fe6 cc991cd1 c9409672 43b67c0f 465928d8

Advanced

To report properly syntax errors found in the input functions in the PM events class must be overwritten by extending the class. This example adds CustomEvents class in the Basic example and uses it instead of default event handler events to receive and print the runtime parsing events. The grammar used is the same as in the beginner example.

Requirements	Download	File Name	Hash SHA-256
Requirements ZIP File Support, C++ 98	Download Demo Advanced	File Name Demo Advanced.zip	Hash SHA-256 0680332e 3e69c40a d046fcb0 d23fc0b3 43265df9 50686549 45064557 18caafd5

Expert

After a syntax tree element is available, it can be iterated as much as need before its released. The following glue code can be added to the advanced example to view the resulted syntax tree. The used grammar is the same as in the beginner example.

Requirements	Download	File Name	Hash SHA-256
Requirements ZIP File Support, C++ 98	Download Demo Expert	File Name Demo Expert.zip	Hash SHA-256 ba847903 1cc79fd0 89376a37 9cf4e5b6 ea888f78 55daba28 d3390654 07ba34ee