| « Prev 4.5 Ignoring input sections - the drop statement | Table of Contents | Next » 4.7 Lexer modes |
A regular expression ("regex") is used to define lexer patterns in token,
pattern, and drop statements.
A regular expression begins and ends with a / character.
Example:
/#.*$/
Regular expressions can include many special characters:
. character matches any input character other than a newline.* character matches any number of the previous regex element.+ character matches one or more of the previous regex element.? character matches 0 or 1 of the previous regex element.[ character begins a character class.( character begins a matching group.{ character begins a count qualifier.\ character escapes the following character and changes its meaning:
\a sequence matches an ASCII bell character (0x07).\b sequence matches an ASCII backspace character (0x08).\d sequence matches any character 0 through 9.\f sequence matches an ASCII form feed character (0x0C).\n sequence matches an ASCII new line character (0x0A).\r sequence matches an ASCII carriage return character (0x0D).\s sequence matches a space, horizontal tab \t, carriage return
\r, a form feed \f, or a vertical tab \v character.\t sequence matches an ASCII tab character (0x09).\v sequence matches an ASCII vertical tab character (0x0B).| character creates an alternate match.Any other character just matches itself in the input stream.
A character class consists of a list of character alternates or character
ranges that can be matched by the character class.
For example [a-zA-Z_] matches any lowercase character between a and z or
any uppercase character between A and Z or the underscore _ character.
Character classes can also be negative character classes if the first character
after the [ is a ^ character.
In this case, the set of characters matched by the character class is the
inverse of what it otherwise would have been.
For example, [^0-9] matches any character other than 0 through 9.
A matching group can be used to override the pattern sequence that multiplicity
specifiers apply to.
For example, the pattern /foo+/ matches "foo" or "foooo", while the pattern
/(foo)+/ matches "foo" or "foofoofoo", but not "foooo".
A count qualifier in curly braces can be used to restrict the number of matches
of the preceding atom to an explicit minimum and maximum range.
For example, the pattern \d{3} matches exactly 3 digits 0-9.
Both a minimum and maximum multiplicity count can be specified and separated by
a comma.
For example, /a{1,5}/ matches between 1 and 5 a characters.
Either the minimum or maximum count can be omitted to omit the corresponding
restriction in the number of matches allowed.
An alternate match is created with the | character.
For example, the pattern /foo|bar/ matches either the sequence "foo" or the
sequence "bar".
| « Prev 4.5 Ignoring input sections - the drop statement | Table of Contents | Next » 4.7 Lexer modes |