Qore Programming Language Reference Manual 2.0.0
Loading...
Searching...
No Matches
Regular Expressions

Qore Regular Expression Introduction

Regular expression functionality in Qore is provided by PCRE: Perl-Compatible Regular Expression library.

Using this library, Qore implements regular expression pattern matching using very simple syntax with semantics similar to those of Perl 5. One difference between Qore and Perl to keep in mind is that backreferences in Qore are referenced as $1, $2, $3, etc which differs from Perl's syntax (which uses numbered backslashes instead).

Examples:
# call process() if the string starts with an alphanumeric character
if (str =~ /^[[:alnum:]]/)
process(str);
# example of using regular expressions in a switch statement
switch (str) {
case /^[^[:alnum:]]/: return True;
case /^[0-9]/: return False;
default: throw "ERROR", sprintf("invalid string %y", str);
}
# regular expression substitution + ignore case & global options
str =~ s/abc/xyz/gi;
# prefix all non-alphanumeric characters with a backslash
str =~ s/([^[:alnum:]])/\\$1/g;
# regular expression substring extraction
*list l = (str =~ x/(?:(\w+):(\w+):)(\w+)/);

Qore Regular Expression Operators

The following is a list of operators based on regular expressions (or similar to regular expressions in the case of the transliteration operator).

Regular Expression Operators

Operator Description
Regular Expression Match Operator (=~) Returns True if the regular expression matches a string
Regular Expression No Match Operator (!~) Returns True if the regular expression does not match a string
Regular Expression Substitution Operator Substitutes text in a string based on matching a regular expression
Regular Expression Pattern Extraction Operator Returns a list of substrings in a string based on matching patterns defined by a regular expression
Transliteration Operator Not a regular expression operator; transliterates one or more characters to other characters in a string

See the table below for valid regular expression options.

Qore Regular Expression Operator Options

Regular Expression Options

Option Description
i Ignores case when matching
m makes start-of-line (^) or end-of-line ($) match after or before any newline in the subject string
s makes a dot (.) match a newline character
x ignores whitespace characters and enables comments prefixed by #
u extends Posix character matching to Unicode characters
g makes global substitutions or global extractions (only applicable with the substitution and extraction operators)

Qore Regular Expression Functions

The following is a list of functions providing regular expression functionality where the pattern may be given at run-time:

Regular Expression Functions

Function Description
regex() Returns True if the regular expression matches a string
regex_subst() Substitutes a pattern in a string based on regular expressions and returns the new string
regex_extract() Returns a list of substrings in a string based on matching patterns defined by a regular expression

Qore Regular Expression Escape Codes

Escape characters in the pattern string are processed by the PCRE library similar to how Perl5 handles escape characters.

Qore Regular Expression Replacement String Escape Codes

Regular expression substitution expressions have the following pattern:

  • s/<pattern>/<replacement>/

The escape codes in the following table are supported in the replacement string.

Regular Expression Replacement String Escape Codes

Escape ASCII Decimal Octal Hex Description
\a BEL 7 007 07 alarm or bell
\b BS 8 010 08 backspace
\e ESC 27 033 1B escape character
\f FF 12 014 0C form feed
\n LF 10 012 0A line feed
\r CR 13 015 0D carriage return
\t HT 9 011 09 horizontal tab
\v VT 11 013 0B vertical tab
\$ $ 36 044 24 a literal dollar sign character
\\ \ 134 092 5C a literal backslash character
\[0-7][0-7][0-7] - - - - the ASCII character represented by the octal code

Otherwise any backslashes in the replacement string will be copied literally to the output string.

Qore Regular Expression Backreferences

Qore uses $num for backreferences in regular expression substitution expressions. The first backreference is $1, the second $2, and so on.

Example
# prefix all non-alphanumeric characters with a backslash
str =~ s/([^[:alnum:]])/\\$1/g;
# remove parentheses from string at the beginning of the line
str =~ s/^\((.*)\)/$1/;