In this page are listed grammar examples and generated parsing machines usages examples. The order is from the easiest to the hardest.

Use cases


The following grammars are defining different use cases are ordered from the more simple to to the more complex.

Text Splitter


To recognize text expressions from the input one may write a lexer as:

Lexer grammar
word   = 1* ('A'-'Z' / 'a'-'z')
number = 1* '0'-'9'

The lexer grammar reads as: a 'word' is one or more case insensitive 'a'-'z' letters and a number is one or more digits. The parser grammar may be as:

Parser grammar
text       = 1*expression
expression = ({word} / {number})
             *(0*1 "," " " ({word} / {number}))
             "."

The parser grammar reads as: a text is one or more expressions that each is 'one or more words that in between have an OPTIONAL comma and a REQUIRED space' and MUST finish with a full stop. A valid input for a PM generated from this two grammars will be: Use case 1: Hello World, from ExperaSoft. Additionally one may use only a parser grammar to achieve the same:

Stand alone parser grammar
text       = 1*expression
expression = (word / number)
             *(0*1 "," " " (word / number))
             "."
word       = 1* ('A'-'Z' / 'a'-'z')
number     = 1* '0'-'9'

However the two approaches will have different syntax trees. The first way with the 2 grammars, is expected to run faster in the general case, because the Lexer operates faster then the Parser, because no context information is recorded, as a consequence each token 'word' and 'number' will be represented by a single string type of object. In the second approach with single grammar, the context information is recorded, that will take more memory and time to be recognized and constructed. As a general rule, everything that represents single tokens SHOULD be moved into the Lexer grammar.

Mathematical Expression


In many programming languages mathematical expressions are build in to match the most popular infix notation. Such recognition can be made with this grammars. The lexer grammar is:

Lexer grammar
varname    = 1* ('A'-'Z' / 'a'-'z')
integer    = 1* '0'-'9'
float      = 1* '0'-'9' "." * '0'-'9'

And the parser grammar:

Parser grammar
expression     = addition
addition       = multiplication 
                 *(("+" / "-") multiplication)
multiplication = number *(("*" / "/") number)
number         = {integer} /
                 {float} / 
                 {varname} / 
                 "(" addition ")"

As defined these grammars will generate PM which will be LL(1) and accept the following expressions:

Input examples
2*2
or
5*(10+60*2)
or
2*(21-(x/2)*2)
and so on...

If one wants to introduce space in between the digits and signs, one may change the parser grammar as in this example:

Parser grammar with white space
expression     = *WSP addition
addition       = multiplication 
                 *(("+" / "-") *WSP multiplication)
multiplication = number *(("*" / "/") *WSP number)
number         = ({integer} /
                  {float} /
                  {varname} /
                  "(" *WSP addition ")") 
                 *WSP
WSP            = %x20 / %x9

The new introduced WSP rule is short for white space (a space or a horizontal tab) and will be OPTIONAL at the beginning of the math expression and after each symbol and a phrase token - all currently in the number rule. The priority of the multiplication is higher and comes naturally from the rules relation. These grammars form a LL(1) parser that can parse linearly input as:

Input examples with white space
24 / 4.5 +   11 * (offset   - 2)
or
5644*(b + 1233.21 / 899.004)
and so on...

Log File Reader


Many computer programs are saving some kind of log files for the started or completed operations, for the errors and warnings occurred during the program runtime. This logs are often specifically formated for the company needs. One may define grammars that read this logs without the need to create a specific parser by hand, if one have a need to make a statistics on the log or another automatic analysis. Additionally when the log structure changes, one have to just update the grammars defined and recompile the PM. That is expected to save a lot of time from hand coding the parser at every change.

Lexer grammar
string = '"' 
         *(%x20-%x21 /
           %x23-%x5B / 
           %x5D-10FFFF / 
           '\' ('\' / '"')) 
         '"'
word   = ('A'-'Z' / 'a'-'z' )
         *('A'-'Z' / 'a'-'z' / '0'-'9' / '-' / '_')
number = 1* '0'-'9' ["." * '0'-'9']
Parser grammar
log  = *line
line = item *("," item) CRLF
item = {string} / {word} / {number}
CRLF = %xD %xA

In words, the example lexer grammar is recognizing simple tokens as:

  • string: starting and ending with double quotes with a possible escaping '\' char followed by '\' or double quote
  • number: an integer or a float
  • words: starting with a letter and optionally continuing with a letter, digit, dash or underscore
The parser grammar is recognizing an input of lines terminated with the Internet standard line terminator. Each line MUST have one or more items separated by a comma, and each item is one of the lexer defined tokens: string, word or a number. No white space is permitted inside the log file, but it can be added in the same way as in the mathematical expression use case.

Simple Markup Language


Many companies have option files that have to be human and machine readable. To define simple markup language for this purpose one may use as a base the Extended Markup Language (XML):

Lexer grammar
comment-start = "<!--"
comment-final = "-->"		
tag           = "<" 0*1 "/"	
word          = 1* ('A'-'Z' / 'a'-'z')
Parser grammar
document  = *WSP tag *WSP
tag       = tag-start content tag-final
tag-start = {tag, "<"}  *WSP {word} *WSP '>'
tag-final = {tag, "</"} *WSP {word} *WSP '>'
content   = *(tag / %x21-3B / %x3D-10FFFF / {word} / WSP)
WSP       = %x9       / ; tab
            %x13 %x10 / ; new line
            %x20      / ; space
            {comment-start} 
            *(%x0-10FFFF / {word} / {comment-start} / {tag})
            {comment-final}
Input example
<options>
<!-- this is an option file with one string -->
<string>hello word!</string>
</options>

Simple C/C++


If one wants to create a script language, one may base it on the C/C++ programming language. On the first phase of the parsing the syntax is checked with the PM, on a second phase the semantical meaning of the parsed code could be checked and finally an interpreter or a generator to another language (possibly Assembler) could execute the syntactically and semantically valid parsed input. Those are grammars that produce PM for C/C++ similar language.

Lexer grammar
comment = "/*" / "*/" / "//"	
CRLF    = %x13 %x10
keyword = %s"const" / %s"if"
Parser grammar
document       = *WSP *function
function       = %s"function" 1*WSP name *WSP 
                 func-args scope
func-args      = '(' *WSP [argument-list] ')'
func-call      = '(' *WSP [operation-list] ')'
argument-list  = argument 1*WSP *(',' 1*WSP argument 1*WSP)
argument       = [{keyword, "const"} 1*WSP] type-and-name
type-and-name  = name 1*WSP name
operation-list = operation *(',' 1*WSP operation) 
scope          = '{' WSP *declaration "}" 1*WSP 
declaration    = decl-var / decl-if
decl-var       = type-and-name *WSP ['=' *WSP operation] ';' *WSP
decl-if        = {keyword, "if"} *WSP '(' *WSP operation ')' *WSP
                 scope [{keyword, "else"} *WSP scope]
operation      = addition
addition       = multiplication 
                 *(("+" / "-") *WSP multiplication)
multiplication = value *WSP 
                 *(("*" / "/") *WSP value *WSP)
value          = name 0*1 func-call / 
                 number /
                 "(" *WSP operation ")" 
name           = ('a'-'z' / 'A'-'Z')
                 *('A'-'Z' / 'a'-'z' / '0'-'9' / '_')
number         = 1* '0'-'9' ['.' * '0'-'9']
WSP            = %x9 / %x20 / {CRLF} / 
                 line-comment / text-comment
line-comment   = {comment, "//"} *(%x0-10FFFF / {comment}) {CRLF}
text-comment   = {comment, "/*"} 
                 *(
                   %x0-10FFFF / 
                   {CRLF} / 
                   {comment, "/*"} / 
                   {comment, "//"}
                  )
                 {comment, "*/"}

The semantic analysis is not defined into the grammar, as for example is it an error if one calls a function that is not defined, or calls a function that is defined later in the source code. Because the parsing phase is completed fully before the semantic analysis is started, one may simply permit calling of functions that are defined later in the code, that is one of the advantages of having the parsing phrase separate of any other phase.

Using the parsing machine


This section contains C++ examples of how to use the generated parsing machine.

Beginner


This is 'just give me the Syntax Tree' example. It parses the grammar listed bellow. This example demonstrates parsing machine prepare, run, check result and a syntax tree retrieval.

Stand alone parser grammar
list    = integer *("," integer)
integer = 1* '0'-'9'
RequirementsDownloadFile NameHash SHA-256
Requirements
ZIP File Support, C++ 98
Download
Demo Beginner
File Name
Demo Beginner.zip
Hash SHA-256
70f8a963 6bee14ab 03fa3105 a39a9fe6 cc991cd1 c9409672 43b67c0f 465928d8

Advanced


To report properly syntax errors found in the input functions in the PM events class must be overwritten by extending the class. This example adds CustomEvents class in the Basic example and uses it instead of default event handler events to receive and print the runtime parsing events. The grammar used is the same as in the beginner example.

RequirementsDownloadFile NameHash SHA-256
Requirements
ZIP File Support, C++ 98
Download
Demo Advanced
File Name
Demo Advanced.zip
Hash SHA-256
0680332e 3e69c40a d046fcb0 d23fc0b3 43265df9 50686549 45064557 18caafd5

Expert


After a syntax tree element is available, it can be iterated as much as need before its released. The following glue code can be added to the advanced example to view the resulted syntax tree. The used grammar is the same as in the beginner example.

RequirementsDownloadFile NameHash SHA-256
Requirements
ZIP File Support, C++ 98
Download
Demo Expert
File Name
Demo Expert.zip
Hash SHA-256
ba847903 1cc79fd0 89376a37 9cf4e5b6 ea888f78 55daba28 d3390654 07ba34ee