Before you can actually match a regular expression, you must compile it. This is not true compilation—it produces a special data structure, not machine instructions. But it is like ordinary compilation in that its purpose is to enable you to “execute” the pattern fast. (See Matching POSIX Regexps, for how to use the compiled regular expression for matching.)
There is a special data type for compiled regular expressions:
This type of object holds a compiled regular expression. It is actually a structure. It has just one field that your programs should look at:
re_nsub
- This field holds the number of parenthetical subexpressions in the regular expression that was compiled.
There are several other fields, but we don't describe them here, because only the functions in the library should use them.
After you create a regex_t
object, you can compile a regular
expression into it by calling regcomp
.
The function
regcomp
“compiles” a regular expression into a data structure that you can use withregexec
to match against a string. The compiled regular expression format is designed for efficient matching.regcomp
stores it into*
compiled.It's up to you to allocate an object of type
regex_t
and pass its address toregcomp
.The argument cflags lets you specify various options that control the syntax and semantics of regular expressions. See Flags for POSIX Regexps.
If you use the flag
REG_NOSUB
, thenregcomp
omits from the compiled regular expression the information necessary to record how subexpressions actually match. In this case, you might as well pass0
for the matchptr and nmatch arguments when you callregexec
.If you don't use
REG_NOSUB
, then the compiled regular expression does have the capacity to record how subexpressions match. Also,regcomp
tells you how many subexpressions pattern has, by storing the number in compiled->re_nsub
. You can use that value to decide how long an array to allocate to hold information about subexpression matches.
regcomp
returns0
if it succeeds in compiling the regular expression; otherwise, it returns a nonzero error code (see the table below). You can useregerror
to produce an error message string describing the reason for a nonzero value; see Regexp Cleanup.
Here are the possible nonzero values that regcomp
can return:
REG_BADBR
REG_BADPAT
REG_BADRPT
REG_ECOLLATE
REG_ECTYPE
REG_EESCAPE
REG_ESUBREG
REG_EBRACK
REG_EPAREN
REG_EBRACE
REG_ERANGE
REG_ESPACE
regcomp
ran out of memory.