C/C++ KAST syntax reference
|In this topic:|
Drafting KAST expressions
A KAST expression describes one or more locations in the AST (abstract syntax tree) that you want your checker to target for syntax issues.
The starting point is a test case (the simpler the better) that contains the issue of interest.
The Checker Studio simplifies the drafting of KAST expressions by providing:
- a visual representation of your code snippet
- detailed KAST node type and dependency definitions in the form of context-sensitive help
- an easy way to test your KAST expression
For examples, see C/C++ KAST examples.
Each KAST expression starts with a double slash (//) followed by a sequence of steps separated by slash symbols. Each step specifies a subpattern to be applied to specific AST nodes.
You can use '*' instead of any type name.
When a checker is matched, the message is reported on the node that was matched with the last node in the described sequence.
This KAST expression contains one step, IfStmt that matches any AST node of type IfStmt. The full KAST expression can be read as "find all if statements".
The above KAST expression breaks down into two steps, IfStmt and Then::ExprStmt. The second step specifies a child name (Then) and a type of child node (ExprStmt). The KAST expression above can be read as "find all if statements, with expression statement and then branch".
When a KAST expression contains more than one step, every subsequent step has to specify a child name. Specifying the child name is required because a single AST node may have multiple children of the same type. An example of this is, an if statement in which both branches are compound statements. Consequently the type of child node cannot be used to uniquely identify the path to follow. Child names are unique and unambiguously specify direction of traversal.
There are two steps in this KAST expression: IfStmt and Else::Null. It can be read as "find all if statements without an else branch".
A step in a KAST expression can be more complex than just a restriction on the type of child node. The name of the child type can be followed by a sequence of predicates, which are conditions to be checked against the current AST node. A condition is enclosed in square brackets and can be either a boolean expression or a KAST subexpression:
//BinaryExpr [@Op = KTC_OPCODE_ADD]
There are two steps: BinaryExpr and [@Op = KTC_OPCODE_ADD].
The KAST expression is read as "find all binary expressions whose operation is +".
The second step specifies a condition ([@Op = KTC_OPCODE_ADD]) on an attribute of the BinaryExpr node -- the "at" symbol (@) is used to refer to the current node's attributes.
This expression has four steps to "find all calls to gets() function". The second step includes a nested KAST expression, with two additional steps as a condition.
See also Conditions.
There are several axis qualifiers:
means that node of type Type1 has a parent of type Type2
means that node of type Type1 has an ancestor of type Type2
means that node of type Type1 has a descendant of type Type2
means that 1) Type1 is a supertype for type Type2, or 2) node of type Type1 has a descendant of type Type2. In particular, this axis can be useful if '*' is used in place of Type1.
means that two nodes belong to the same list child, and the node whose type is Type2 follows the node of type Type1 in this list (immediately or transitively)
means that two nodes belong to the same list child, and the node whose type is Type2 immediately follows the node of type Type1 in this list
means that the child child_name of Type1 node is of type Type2. If there are no AST types with such a child, an error is reported
sibling_indexafter a child name
means that node of type Type1 is the sibling_index-th element of list child of node whose type is Type2
If an asterisk is specified between square brackets (instead of a number), the first matching element of the corresponding list child is taken. The expression above is matched if the function call has at least one argument of type 'ExprBinary'
To represent an absent separate or list child, the Null keyword should be used
Each element of a KAST expression can have zero or more conditions.
Each condition is a boolean expression and must be surrounded by square brackets. A path element is matched with an AST node if all the corresponding conditions are met.
Expressions can be of one of the following types:
- TREE (AST nodes)
- SEMA (semantic information associated with AST nodes)
BOOL, INT and DOUBLE values can be converted into each other; TREE and SEMA values can be converted to the BOOL type.
The following types of expressions are supported:
Path expression: Starts with a child qualifier: [Expr::BinaryExpr/Left::ConstExpr]
Or expression: Expression1 | Expression2
Not expression: not Expression
Function expression: Func(Expr1, Expr2, ...)
You can use both built-in functions and custom functions.
Dot expression: Expression.Func(...)
The right part of such expression must be a function call. The expression in the left part of the dot expression is treated as the first argument of the function in the right part: a.b(c) is equivalent to b(a, c).
Attribute expression: @attribute_name
Arithmetic expression: Expression1 [+, -, /, *] Expression2
Relational expression: Expression1 [>, <, >=, <=, =, !=] Expression2
Bracket expression: (Expression)
Constant expression: 'abc' , 2.0
There are a number of literal representations for integer constants representing operation types, class specifiers, etc.
Variable expression: $variable
The operations below are ordered by priority, with the highest priority at the top.
(), . not +, - >, >=, <, <= =, != |
Variables can make your KAST expressions more understandable and effective. Before using a variable, you should assign it a value:
[ $<varname> := <expression> ]
Here 'expression' can be any expression described in Conditions.
Now you can use this variable in conditions and other assignments associated with this expression element.
[$size1 := getTypeSize(Expr1::ExprIdent)] [$size2 := getTypeSize(Expr2::ConstExpr)] [$size1 + 1 > $size2 - 1]
We assigned a value to variables size1 and size2 and then used the variables in the expression.
KAST expressions and path conditions can contain iterative sequences. An iterative sequence is a subpath starting with a node type and ending with a qualifier.
Iterative sequences are surrounded by braces and can be placed between qualifiers and node type names:
An iterative sequence can be matched at the corresponding place zero or more times. The above example is equivalent to infinite number of expressions:
When you are not interested in the actual type of a node, you may use the '*' symbol instead of a type name. But sometimes it's better to use the 'short anonymous children' notation:
is equivalent to
For example, the following conditions are equivalent:
[isConstant(Expr1::*)], [isConstant(Expr1)], [(Expr1::*).isConstant()] and [Expr1.isConstant()]
Note that '*' cannot be omitted in all cases. You should use the full syntax in iterative sequences and for non-final elements of KAST expressions. Consider the following example.
cannot be replaced with
because square brackets after names are used for specifying sibling indices.
Checker Studio provides a list of built-in functions for C/C++. To view the list:
- Launch Checker Studio.
- Go to Help > Help topics > KAST Reference > C/C++ KAST builtin functions.
The built-in functions are listed alphabetically.
C/C++ custom functions
The process for creating KAST checkers with custom functions is outlined in Tutorial 3 - Creating a C/C++ KAST checker with custom functions.
KAST checkers that use a combination of ascendant/descendant (or parent/child) specifiers in a single KAST expression dramatically slow analysis speed.
The ascendant node triggers the "climb" up to the root of the AST, trying to find all definitions for a given name, while the descendant node triggers a full traversal of the subtree.
For fast analysis results, it's best to avoid using combination of ascendant/descendant or parent/child combinations.
For C/C++ KAST checkers, try writing custom functions that work with semantic information, because it's more effective than traversing the AST. For example, you can use the function 'ktc_sema_findAllByName(scope, name)' to get a list of all functions/variables/types with a given name that are visible at a certain point of a C/C++ program.