C/C++ KAST syntax reference

In this topic:
Drafting KAST expressions Syntax Qualifiers Conditions Variables Other extensions Built-in functions C/C++ custom functions Checker performance

Drafting KAST expressions

A KAST expression describes one or more locations in the AST (abstract syntax tree) that you want your checker to target for syntax issues.

The starting point is a test case (the simpler the better) that contains the issue of interest.

The Checker Studio simplifies the drafting of KAST expressions by providing:

a visual representation of your code snippet
detailed KAST node type and dependency definitions in the form of context-sensitive help
an easy way to test your KAST expression

For examples, see C/C++ KAST examples.

Syntax

Each KAST expression starts with a double slash (//) followed by a sequence of steps separated by slash symbols. Each step specifies a subpattern to be applied to specific AST nodes.

You can use '*' instead of any type name.

When a checker is matched, the message is reported on the node that was matched with the last node in the described sequence.

For example:

//IfStmt

This KAST expression contains one step, IfStmt that matches any AST node of type IfStmt. The full KAST expression can be read as "find all if statements".

//IfStmt[Then::ExprStmt]

The above KAST expression breaks down into two steps, IfStmt and Then::ExprStmt. The second step specifies a child name (Then) and a type of child node (ExprStmt). The KAST expression above can be read as "find all if statements, with expression statement and then branch".

When a KAST expression contains more than one step, every subsequent step has to specify a child name. Specifying the child name is required because a single AST node may have multiple children of the same type. An example of this is, an if statement in which both branches are compound statements. Consequently the type of child node cannot be used to uniquely identify the path to follow. Child names are unique and unambiguously specify direction of traversal.

//IfStmt[Else::Null]

There are two steps in this KAST expression: IfStmt and Else::Null. It can be read as "find all if statements without an else branch".

Conditions

A step in a KAST expression can be more complex than just a restriction on the type of child node. The name of the child type can be followed by a sequence of predicates, which are conditions to be checked against the current AST node. A condition is enclosed in square brackets and can be either a boolean expression or a KAST subexpression:

//BinaryExpr [@Op = KTC_OPCODE_ADD]

There are two steps: BinaryExpr and [@Op = KTC_OPCODE_ADD].

The KAST expression is read as "find all binary expressions whose operation is +".

The second step specifies a condition ([@Op = KTC_OPCODE_ADD]) on an attribute of the BinaryExpr node -- the "at" symbol (@) is used to refer to the current node's attributes.

//CallExpr/Func::IdExpr[Name::Name[@Id='gets']]

This expression has four steps to "find all calls to gets() function". The second step includes a nested KAST expression, with two additional steps as a condition.

Qualifiers

There are several axis qualifiers:

parent

Type1/parent::Type2

means that node of type Type1 has a parent of type Type2

ancestor

Type1/ancestor::Type2

means that node of type Type1 has an ancestor of type Type2

descendant

Type1/descendant::Type2

means that node of type Type1 has a descendant of type Type2

descendant-or-self

Type1/descendant-or-self::Type2

means that 1) Type1 is a supertype for type Type2, or 2) node of type Type1 has a descendant of type Type2. In particular, this axis can be useful if '*' is used in place of Type1.

following-sibling

Type1/following-sibling::Type2

means that two nodes belong to the same list child, and the node whose type is Type2 follows the node of type Type1 in this list (immediately or transitively)

next-sibling

Type1/next-sibling::Type2

means that two nodes belong to the same list child, and the node whose type is Type2 immediately follows the node of type Type1 in this list

child_name

Type1/<child_name>::Type2

means that the child child_name of Type1 node is of type Type2. If there are no AST types with such a child, an error is reported

sibling_indexafter a child name

Type1/<child name>[<sibling_index>]::Type2

means that node of type Type1 is the sibling_index-th element of list child of node whose type is Type2

sibling qualifier

//CallExpr/Exprs[*]::BinaryExpr

If an asterisk is specified between square brackets (instead of a number), the first matching element of the corresponding list child is taken. The expression above is matched if the function call has at least one argument of type 'ExprBinary'

Null

//CallExpr/Args::Null

To represent an absent separate or list child, the Null keyword should be used

Conditions

Each element of a KAST expression can have zero or more conditions.

Each condition is a boolean expression and must be surrounded by square brackets. A path element is matched with an AST node if all the corresponding conditions are met.

Expressions can be of one of the following types:

BOOL
INT
DOUBLE
TREE (AST nodes)
SEMA (semantic information associated with AST nodes)
STRING

BOOL, INT and DOUBLE values can be converted into each other; TREE and SEMA values can be converted to the BOOL type.

The following types of expressions are supported:

Path expression: Starts with a child qualifier: [Expr::BinaryExpr/Left::ConstExpr]

When a path expression constitutes the whole KAST pattern, it starts with '//' followed by a node type name. This means that corresponding subtrees are matched anywhere in the AST. On the other hand, when a path expression is one of the conditions or a part of a condition in a KAST expression, it starts with a child name or any other axis qualifier instead. This means that the path must start from the node this expression is related to.

Or expression: Expression1 | Expression2

Not expression: not Expression

Function expression: Func(Expr1, Expr2, ...)

You can use both built-in functions and custom functions.

See C/C++ custom functions.

Dot expression: Expression.Func(...)

The right part of such expression must be a function call. The expression in the left part of the dot expression is treated as the first argument of the function in the right part: a.b(c) is equivalent to b(a, c).

Attribute expression: @attribute_name

Arithmetic expression: Expression1 [+, -, /, *] Expression2

Relational expression: Expression1 [>, <, >=, <=, =, !=] Expression2

Bracket expression: (Expression)

Constant expression: 'abc' , 2.0

There are a number of literal representations for integer constants representing operation types, class specifiers, etc.

Variable expression: $variable

See Variables.

Supported operations

The operations below are ordered by priority, with the highest priority at the top.

(), . not +, - >, >=, <, <= =, != |

Variables

Variables can make your KAST expressions more understandable and effective. Before using a variable, you should assign it a value:

[ $<varname> := <expression> ]

Here 'expression' can be any expression described in Conditions.

Now you can use this variable in conditions and other assignments associated with this expression element.

Example:

[$size1 := getTypeSize(Expr1::ExprIdent)] [$size2 := getTypeSize(Expr2::ConstExpr)] [$size1 + 1 > $size2 - 1]

We assigned a value to variables size1 and size2 and then used the variables in the expression.

Other extensions

KAST expressions and path conditions can contain iterative sequences. An iterative sequence is a subpath starting with a node type and ending with a qualifier.

Iterative sequences are surrounded by braces and can be placed between qualifiers and node type names:

//BinaryExpr/Left::{BinaryExpr/Right::}ConstExpr

An iterative sequence can be matched at the corresponding place zero or more times. The above example is equivalent to infinite number of expressions:

//BinaryExpr/Left::ConstExpr //BinaryExpr/Left::ConstExpr/Right::ConstExpr //BinaryExpr/Left::ConstExpr/Right::ConstExpr/Right::ConstExpr

When you are not interested in the actual type of a node, you may use the '*' symbol instead of a type name. But sometimes it's better to use the 'short anonymous children' notation:

<Child name>

is equivalent to

<Child name>::*

For example, the following conditions are equivalent:

[isConstant(Expr1::*)], [isConstant(Expr1)], [(Expr1::*).isConstant()] and [Expr1.isConstant()]

Note that '*' cannot be omitted in all cases. You should use the full syntax in iterative sequences and for non-final elements of KAST expressions. Consider the following example.

This expression:

//BinaryExpr/Right::*[isConstant()]

cannot be replaced with

//BinaryExpr/Right[isConstant()]

because square brackets after names are used for specifying sibling indices.

Built-in functions

Checker Studio provides a list of built-in functions for C/C++. To view the list:

Launch Checker Studio.
Go to Help > Help topics > KAST Reference > C/C++ KAST builtin functions.
The built-in functions are listed alphabetically.

The first argument of a function can be specified using 'dot' notation. See Conditions.

Related concepts

See Conditions.

C/C++ custom functions

The process for creating KAST checkers with custom functions is outlined in Tutorial 3 - Creating a C/C++ KAST checker with custom functions.

Checker performance

KAST checkers that use a combination of ascendant/descendant (or parent/child) specifiers in a single KAST expression dramatically slow analysis speed.

The ascendant node triggers the "climb" up to the root of the AST, trying to find all definitions for a given name, while the descendant node triggers a full traversal of the subtree.

For fast analysis results, it's best to avoid using combination of ascendant/descendant or parent/child combinations.

For C/C++ KAST checkers, try writing custom functions that work with semantic information, because it's more effective than traversing the AST. For example, you can use the function 'ktc_sema_findAllByName(scope, name)' to get a list of all functions/variables/types with a given name that are visible at a certain point of a C/C++ program.