Regular Expression Syntax
A regular expression defines a pattern to be matched. A regular expression can be a simple string (for example, if you want to find the word unhandled, you would use the regular expression unhandled.
Regular expressions can be used to match far more complex patterns, though. For example, you may want to match error messages that contain particular error codes. For example, a particular program may issue error messages formatted as "Error code: XX12345E", where XX12345 is the error code and the E suffix indicates a severe error. You want to find all error messages with this severe error suffix. The regular expression to do this would be:
Error code: .{7}E
This pattern would match any message containing "Error code: " followed by any seven characters and an E.
The following sections provide an overview of the syntax of regular expressions.
adTempus uses the Microsoft .NET regular expression syntax. Matching is always case-insensitive.
General Rules
Regular expressions use the following special characters:
. [ ] - ^ * ? + $ | ( ) { } \
Any characters other than these special characters match themselves. For example, shakespeare matches shakespeare.
To match a special character, you must precede the special character with a backslash (\). For example, if you want to match a string that contains the sentence "Continue with deletion?", you cannot use Continue with deletion? as your regular expression, because the question mark has a special meaning in regular expressions. Instead, you would have to use Continue with deletion\?. The backslash indicates that the following character should be treated as a plain character, not a special character.
Regular expressions used in adTempus are not case sensitive.
Single-Character Pattern Matching
The . (period) character matches any single character. For example, c.r would match car or cur.
Lists
The [ and ] characters (left and right square brackets) are used to denote a list of valid characters. For example, d[ou]g would match dog or dug but not dig.
The - character is used within brackets to specify a range of valid values. For example, d[a-z]g would match any string that began with d, ended with g, and had any letter between a and z (inclusive) in between.
The ^ character is used within brackets to specify characters that should not be matched. For example, d[^oi]g would match any three-character sequence beginning with d and ending with g except dog and dig.
Repetition
The * (asterisk) character is used to match zero or more occurrences of the previous regular expression. For example, b[a-z]*e would match the letters b and e, with zero or more letters in between (be, bare, bore, etc.).
The + (plus) character is the same as *, but requires at least one occurrence of the pattern. For example, the expression b[a-z]+e would match the letters b and e, with one or more letters in between (bare, bore, etc.).
The ? character indicates that the preceding regular expression is optional. For example ca?r would match cr or car but nothing else.
The {} characters can be used to specify repetition of the preceding regular expression. The repetition can be specified using one of the following syntaxes:
- {count}
- matches count occurrences of the preceding expression
- {min,}
- matches min or more occurrences of the preceding expression
- {min,max}
- matches at least min but no more than max occurrences of the preceding expression
Grouping
Parentheses () can be used to group characters together to delimit subexpressions. For example, the regular expression X12+ matches "X1" followed by one or more 2s. X(12)+, on the other hand, matches "X" followed by one or more "12"s.
Alternation
The | (vertical bar) character denotes alternatives between regular expressions. When placed between two regular expressions, the alternation character means that the pattern will match any string that matches either of the expressions. For example, (Error|Warning) Message would match the strings "Error Message" or "Warning Message".