IPB
>  Man Pages > Linux > openSUSE 10.2 > Section 3p > regexec man page

regexec man page

Section 3p - openSUSE 10.2 Man Pages

Other operating system man pages available here


Advanced Search

Hopefully, this page is exactly what you are looking for, but if not, you can always find further assistance on Unix/Linux Forum!


REGCOMP(P)                 POSIX Programmer's Manual                REGCOMP(P)



NAME
       regcomp, regerror, regexec, regfree - regular expression matching

SYNOPSIS
       #include <regex.h>

       int regcomp(regex_t *restrict preg, const char *restrict pattern,
              int cflags);
       size_t regerror(int errcode, const regex_t *restrict preg,
              char *restrict errbuf, size_t errbuf_size);
       int regexec(const regex_t *restrict preg, const char *restrict string,
              size_t nmatch, regmatch_t pmatch[restrict], int eflags);
       void regfree(regex_t *preg);


DESCRIPTION
       These  functions  interpret  basic  and extended regular expressions as
       described in the Base Definitions volume of IEEE Std 1003.1-2001, Chap-
       ter 9, Regular Expressions.

       The regex_t structure is defined in <regex.h> and contains at least the
       following member:

          Member Type  Member Name  Description
          size_t       re_nsub      Number of parenthesized subexpressions.

       The regmatch_t structure is defined in <regex.h> and contains at  least
       the following members:

           Member Type Member Name Description
           regoff_t    rm_so       Byte offset from start of string to
                                   start of substring.
           regoff_t    rm_eo       Byte offset from start of string of the
                                   first character after the end of sub-
                                   string.

       The regcomp() function shall compile the regular  expression  contained
       in  the string pointed to by the pattern argument and place the results
       in the structure pointed to by preg.  The cflags argument is  the  bit-
       wise-inclusive  OR  of  zero  or more of the following flags, which are
       defined in the <regex.h> header:

       REG_EXTENDED
              Use Extended Regular Expressions.

       REG_ICASE
              Ignore case in  match.  (See  the  Base  Definitions  volume  of
              IEEE Std 1003.1-2001, Chapter 9, Regular Expressions.)

       REG_NOSUB
              Report only success/fail in regexec().

       REG_NEWLINE
              Change the handling of <newline>s, as described in the text.


       The  default  regular  expression  type  for pattern is a Basic Regular
       Expression. The application can specify  Extended  Regular  Expressions
       using the REG_EXTENDED cflags flag.

       If  the  REG_NOSUB flag was not set in cflags, then regcomp() shall set
       re_nsub to the number of  parenthesized  subexpressions  (delimited  by
       "\(\)" in basic regular expressions or "()" in extended regular expres-
       sions) found in pattern.

       The regexec() function compares the null-terminated string specified by
       string  with the compiled regular expression preg initialized by a pre-
       vious call to regcomp().  If it finds a match, regexec()  shall  return
       0; otherwise, it shall return non-zero indicating either no match or an
       error. The eflags argument is the bitwise-inclusive OR of zero or  more
       of the following flags, which are defined in the <regex.h> header:

       REG_NOTBOL
              The  first  character  of the string pointed to by string is not
              the beginning of the line. Therefore, the circumflex character (
              '^'  ),  when  taken as a special character, shall not match the
              beginning of string.

       REG_NOTEOL
              The last character of the string pointed to by string is not the
              end  of the line. Therefore, the dollar sign ( '$' ), when taken
              as a special character, shall not match the end of string.


       If nmatch is 0 or REG_NOSUB was set in  the  cflags  argument  to  reg-
       comp(), then regexec() shall ignore the pmatch argument. Otherwise, the
       application shall ensure that the pmatch argument points  to  an  array
       with at least nmatch elements, and regexec() shall fill in the elements
       of that array with offsets of the substrings of string that  correspond
       to the parenthesized subexpressions of pattern: pmatch[ i]. rm_so shall
       be the byte offset of the beginning and pmatch[ i]. rm_eo shall be  one
       greater  than the byte offset of the end of substring i. (Subexpression
       i begins at the ith matched open parenthesis, counting from 1.) Offsets
       in pmatch[0] identify the substring that corresponds to the entire reg-
       ular expression. Unused elements of  pmatch  up  to  pmatch[  nmatch-1]
       shall  be  filled with -1. If there are more than nmatch subexpressions
       in pattern ( pattern itself counts as a subexpression), then  regexec()
       shall  still  do the match, but shall record only the first nmatch sub-
       strings.

       When matching a basic or extended regular expression, any given  paren-
       thesized  subexpression  of  pattern  might participate in the match of
       several different substrings of string, or it might not match any  sub-
       string  even  though  the  pattern  as a whole did match. The following
       rules shall be used to determine which substrings to report  in  pmatch
       when matching regular expressions:

        1. If  subexpression i in a regular expression is not contained within
           another subexpression, and it participated  in  the  match  several
           times,  then  the byte offsets in pmatch[ i] shall delimit the last
           such match.


        2. If subexpression i is not contained within  another  subexpression,
           and  it  did  not participate in an otherwise successful match, the
           byte offsets in pmatch[ i] shall be -1. A  subexpression  does  not
           participate  in  the  match when: '*' or "\{\}" appears immediately
           after the subexpression in a basic regular expression, or '*' , '?'
           ,  or  "{}"  appears  immediately  after  the  subexpression  in an
           extended regular expression, and the subexpression  did  not  match
           (matched 0 times)

       or: '|' is used in an extended regular expression to select this subex-
       pression or another, and the other subexpression matched.


        3. If subexpression i is contained within another subexpression j, and
           i is not contained within any other subexpression that is contained
           within j, and a match of subexpression j is reported in pmatch[ j],
           then  the match or non-match of subexpression i reported in pmatch[
           i] shall be as described in 1. and 2.  above, but within  the  sub-
           string  reported  in  pmatch[  j] rather than the whole string. The
           offsets in pmatch[ i] are still relative to the start of string.


        4. If subexpression i is contained in subexpression j,  and  the  byte
           offsets in pmatch[ j] are -1, then the pointers in pmatch[ i] shall
           also be -1.


        5. If subexpression i matched a zero-length  string,  then  both  byte
           offsets  in pmatch[ i] shall be the byte offset of the character or
           null terminator immediately following the zero-length string.


       If, when regexec() is called, the locale is  different  from  when  the
       regular expression was compiled, the result is undefined.

       If  REG_NEWLINE  is  not  set in cflags, then a <newline> in pattern or
       string shall be treated as an ordinary  character.  If  REG_NEWLINE  is
       set, then <newline> shall be treated as an ordinary character except as
       follows:

        1. A <newline> in string shall not be matched by a  period  outside  a
           bracket  expression  or by any form of a non-matching list (see the
           Base Definitions volume of IEEE Std 1003.1-2001, Chapter 9, Regular
           Expressions).


        2. A  circumflex  (  '^' ) in pattern, when used to specify expression
           anchoring (see the Base Definitions volume of IEEE Std 1003.1-2001,
           Section  9.3.8,  BRE  Expression  Anchoring), shall match the zero-
           length string immediately after a <newline> in  string,  regardless
           of the setting of REG_NOTBOL.


        3. A  dollar  sign ( '$' ) in pattern, when used to specify expression
           anchoring, shall match the zero-length string immediately before  a
           <newline> in string, regardless of the setting of REG_NOTEOL.


       The  regfree() function frees any memory allocated by regcomp() associ-
       ated with preg.

       The following constants are defined as error return values:

       REG_NOMATCH
              regexec() failed to match.

       REG_BADPAT
              Invalid regular expression.

       REG_ECOLLATE
              Invalid collating element referenced.

       REG_ECTYPE
              Invalid character class type referenced.

       REG_EESCAPE
              Trailing '\' in pattern.

       REG_ESUBREG
              Number in "\digit" invalid or in error.

       REG_EBRACK
              "[]" imbalance.

       REG_EPAREN
              "\(\)" or "()" imbalance.

       REG_EBRACE
              "\{\}" imbalance.

       REG_BADBR
              Content of "\{\}" invalid: not a number, number too large,  more
              than two numbers, first larger than second.

       REG_ERANGE
              Invalid endpoint in range expression.

       REG_ESPACE
              Out of memory.

       REG_BADRPT
              '?' , '*' , or '+' not preceded by valid regular expression.


       The regerror() function provides a mapping from error codes returned by
       regcomp() and regexec() to unspecified printable strings. It  generates
       a  string corresponding to the value of the errcode argument, which the
       application shall ensure is the last non-zero value  returned  by  reg-
       comp()  or  regexec()  with  the given value of preg. If errcode is not
       such a value, the content of the generated string is unspecified.

       If preg is a null pointer, but errcode is a value returned by a  previ-
       ous  call  to regexec() or regcomp(), the regerror() still generates an
       error string corresponding to the value of errcode, but it might not be
       as detailed under some implementations.

       If the errbuf_size argument is not 0, regerror() shall place the gener-
       ated string into the buffer of size errbuf_size  bytes  pointed  to  by
       errbuf.  If  the  string (including the terminating null) cannot fit in
       the buffer, regerror() shall truncate the string and null-terminate the
       result.

       If  errbuf_size  is 0, regerror() shall ignore the errbuf argument, and
       return the size of the buffer needed to hold the generated string.

       If the preg argument to regexec() or regfree() is not a compiled  regu-
       lar  expression  returned by regcomp(), the result is undefined. A preg
       is no longer treated as a compiled regular expression after it is given
       to regfree().

RETURN VALUE
       Upon successful completion, the regcomp() function shall return 0. Oth-
       erwise, it shall  return  an  integer  value  indicating  an  error  as
       described in <regex.h>, and the content of preg is undefined. If a code
       is returned, the interpretation shall be as given in <regex.h>.

       If regcomp() detects an invalid RE, it may return REG_BADPAT, or it may
       return  one of the error codes that more precisely describes the error.

       Upon successful completion, the regexec() function shall return 0. Oth-
       erwise, it shall return REG_NOMATCH to indicate no match.

       Upon  successful  completion,  the regerror() function shall return the
       number of bytes needed to hold the entire generated  string,  including
       the  null termination. If the return value is greater than errbuf_size,
       the string returned in the buffer pointed to by errbuf has  been  trun-
       cated.

       The regfree() function shall not return a value.

ERRORS
       No errors are defined.

       The following sections are informative.

EXAMPLES
              #include <regex.h>


              /*
               * Match string against the extended regular expression in
               * pattern, treating errors as no match.
               *
               * Return 1 for match, 0 for no match.
               */


              int
              match(const char *string, char *pattern)
              {
                  int    status;
                  regex_t    re;


                  if (regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB) != 0) {
                      return(0);      /* Report error. */
                  }
                  status = regexec(&re, string, (size_t) 0, NULL, 0);
                  regfree(&re);
                  if (status != 0) {
                      return(0);      /* Report error. */
                  }
                  return(1);
              }

       The  following  demonstrates how the REG_NOTBOL flag could be used with
       regexec() to find all substrings in a line that match  a  pattern  sup-
       plied  by  a  user.  (For  simplicity of the example, very little error
       checking is done.)


              (void) regcomp (&re, pattern, 0);
              /* This call to regexec() finds the first match on the line. */
              error = regexec (&re, &buffer[0], 1, &pm, 0);
              while (error == 0) {  /* While matches found. */
                  /* Substring found between pm.rm_so and pm.rm_eo. */
                  /* This call to regexec() finds the next match. */
                  error = regexec (&re, buffer + pm.rm_eo, 1, &pm, REG_NOTBOL);
              }

APPLICATION USAGE
       An application could use:


              regerror(code,preg,(char *)NULL,(size_t)0)

       to find out how big a buffer is needed for the generated  string,  mal-
       loc()  a  buffer  to hold the string, and then call regerror() again to
       get the string. Alternatively, it could allocate a fixed, static buffer
       that is big enough to hold most strings, and then use malloc() to allo-
       cate a larger buffer if it finds that this is too small.

       To match a pattern as described in the Shell and  Utilities  volume  of
       IEEE Std 1003.1-2001,  Section 2.13, Pattern Matching Notation, use the
       fnmatch() function.

RATIONALE
       The regexec() function must fill in  all  nmatch  elements  of  pmatch,
       where  nmatch  and pmatch are supplied by the application, even if some
       elements of pmatch do not correspond to subexpressions in pattern.  The
       application  writer  should  note  that there is probably no reason for
       using a value of nmatch that is larger than preg-> re_nsub+1.

       The REG_NEWLINE flag supports a use of RE matching that  is  needed  in
       some  applications  like  text  editors. In such applications, the user
       supplies an RE asking the application to find a line that  matches  the
       given  expression.  An anchor in such an RE anchors at the beginning or
       end of any line. Such an application  can  pass  a  sequence  of  <new-
       line>-separated  lines to regexec() as a single long string and specify
       REG_NEWLINE to regcomp() to get the desired behavior.  The  application
       must  ensure  that  there  are  no explicit <newline>s in pattern if it
       wants to ensure that any match occurs entirely within a single line.

       The REG_NEWLINE flag affects the behavior of regexec(), but  it  is  in
       the  cflags  parameter to regcomp() to allow flexibility of implementa-
       tion. Some implementations will want to generate the same  compiled  RE
       in  regcomp()  regardless  of  the  setting  of  REG_NEWLINE  and  have
       regexec() handle anchors differently based on the setting of the  flag.
       Other implementations will generate different compiled REs based on the
       REG_NEWLINE.

       The REG_ICASE flag supports the operations taken by the grep -i  option
       and  the  historical implementations of ex and vi.  Including this flag
       will make it easier for application code to be written  that  does  the
       same thing as these utilities.

       The  substrings reported in pmatch[] are defined using offsets from the
       start of the string rather than pointers. Since this is  a  new  inter-
       face, there should be no impact on historical implementations or appli-
       cations, and offsets should be just as easy to  use  as  pointers.  The
       change to offsets was made to facilitate future extensions in which the
       string to be searched is presented to regexec() in blocks,  allowing  a
       string to be searched that is not all in memory at once.

       The  type  regoff_t is used for the elements of pmatch[] to ensure that
       the application can represent either the largest possible array in mem-
       ory (important for an application conforming to the Shell and Utilities
       volume of IEEE Std 1003.1-2001) or the largest possible file (important
       for  an  application  using  the  extension where a file is searched in
       chunks).

       The standard developers rejected the inclusion of a  regsub()  function
       that  would  be used to do substitutions for a matched RE. While such a
       routine would be useful to some applications, its utility would be much
       more limited than the matching function described here. Both RE parsing
       and substitution are possible to implement without support  other  than
       that  required by the ISO C standard, but matching is much more complex
       than substituting.  The only difficult part of substitution, given  the
       information  supplied  by regexec(), is finding the next character in a
       string when there can be multi-byte characters. That is a  much  larger
       issue, and one that needs a more general solution.

       The errno variable has not been used for error returns to avoid filling
       the errno name space for this feature.

       The interface is defined so that the matched substrings rm_sp and rm_ep
       are  in  a  separate  regmatch_t  structure instead of in regex_t. This
       allows a single compiled RE to be used simultaneously in  several  con-
       texts;  in main() and a signal handler, perhaps, or in multiple threads
       of lightweight processes. (The preg argument to regexec()  is  declared
       with  type  const,  so  the  implementation is not permitted to use the
       structure to store intermediate results.) It also allows an application
       to  request an arbitrary number of substrings from an RE. The number of
       subexpressions in the RE is reported in re_nsub  in  preg.   With  this
       change  to regexec(), consideration was given to dropping the REG_NOSUB
       flag since the user can now specify this with a zero nmatch argument to
       regexec().   However, keeping REG_NOSUB allows an implementation to use
       a different (perhaps more efficient) algorithm if it knows in regcomp()
       that  no  subexpressions  need  be reported. The implementation is only
       required to fill in pmatch if nmatch is not zero and  if  REG_NOSUB  is
       not specified. Note that the size_t type, as defined in the ISO C stan-
       dard, is unsigned, so the description of regexec()  does  not  need  to
       address negative values of nmatch.

       REG_NOTBOL  was  added  to allow an application to do repeated searches
       for the same pattern in a line. If the pattern  contains  a  circumflex
       character  that  should match the beginning of a line, then the pattern
       should only match when matched against the beginning of the line. With-
       out  the  REG_NOTBOL flag, the application could rewrite the expression
       for subsequent matches, but in the  general  case  this  would  require
       parsing the expression. The need for REG_NOTEOL is not as clear; it was
       added for symmetry.

       The addition of the regerror() function addresses the  historical  need
       for conforming application programs to have access to error information
       more than "Function failed to compile/match your RE  for  unknown  rea-
       sons".

       This interface provides for two different methods of dealing with error
       conditions. The specific error codes (REG_EBRACE, for example), defined
       in <regex.h>, allow an application to recover from an error if it is so
       able. Many applications, especially those that use patterns supplied by
       a  user,  will not try to deal with specific error cases, but will just
       use regerror() to obtain a human-readable error message to  present  to
       the user.

       The regerror() function uses a scheme similar to confstr() to deal with
       the problem of allocating memory to  hold  the  generated  string.  The
       scheme  used  by  strerror() in the ISO C standard was considered unac-
       ceptable since it creates difficulties for multi-threaded applications.

       The  preg argument is provided to regerror() to allow an implementation
       to generate a more descriptive message  than  would  be  possible  with
       errcode alone. An implementation might, for example, save the character
       offset of the offending character of the pattern in a  field  of  preg,
       and  then include that in the generated message string. The implementa-
       tion may also ignore preg.

       A REG_FILENAME flag was  considered,  but  omitted.  This  flag  caused
       regexec()  to  match  patterns  as described in the Shell and Utilities
       volume of IEEE Std 1003.1-2001, Section 2.13, Pattern Matching Notation
       instead of REs. This service is now provided by the fnmatch() function.

       Notice  that  there  is  a  difference  in   philosophy   between   the
       ISO POSIX-2:1993  standard  and IEEE Std 1003.1-2001 in how to handle a
       "bad" regular expression. The ISO POSIX-2:1993 standard says that  many
       bad constructs "produce undefined results", or that "the interpretation
       is undefined". IEEE Std 1003.1-2001, however, says that the interpreta-
       tion  of  such  REs is unspecified. The term "undefined" means that the
       action by the application is an error, of similar severity to passing a
       bad pointer to a function.

       The  regcomp() and regexec() functions are required to accept any null-
       terminated string as the pattern argument. If the meaning of the string
       is   "undefined",  the  behavior  of  the  function  is  "unspecified".
       IEEE Std 1003.1-2001 does not specify how the functions will  interpret
       the  pattern;  they  might return error codes, or they might do pattern
       matching in some completely unexpected way,  but  they  should  not  do
       something like abort the process.

FUTURE DIRECTIONS
       None.

SEE ALSO
       fnmatch()    ,    glob()    ,    Shell    and   Utilities   volume   of
       IEEE Std 1003.1-2001, Section 2.13,  Pattern  Matching  Notation,  Base
       Definitions  volume of IEEE Std 1003.1-2001, Chapter 9, Regular Expres-
       sions, <regex.h>, <sys/types.h>

COPYRIGHT
       Portions of this text are reprinted and reproduced in  electronic  form
       from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
       -- Portable Operating System Interface (POSIX),  The  Open  Group  Base
       Specifications  Issue  6,  Copyright  (C) 2001-2003 by the Institute of
       Electrical and Electronics Engineers, Inc and The Open  Group.  In  the
       event of any discrepancy between this version and the original IEEE and
       The Open Group Standard, the original IEEE and The Open Group  Standard
       is  the  referee document. The original Standard can be obtained online
       at http://www.opengroup.org/unix/online.html .



IEEE/The Open Group                  2003                           REGCOMP(P)


Man(1) output converted with man2html and wrapped by fishsponge

This page was generated on Sat Sep 8 16:41:51 GMT 2007

Your favourite pages:

No pages logged yet.
Trying to save cookie...

Top 10 most popular pages:

svn man page (6167 hits)
(FreeBSD 6.2)

sqlite3 man page (5598 hits)
(openSUSE 10.2)

adv_cap_autoneg man page (5045 hits)
(Solaris 10 11_06)

CPAN man page (4791 hits)
(Suse Linux 10.1)

ssh man page (4439 hits)
(Suse Linux 10.1)

ssh-socks5-proxy-connect man page (3525 hits)
(Solaris 10 11_06)

signal man page (3395 hits)
(Suse Linux 10.1)

netcat man page (3385 hits)
(Suse Linux 10.1)

pprosetup man page (2893 hits)
(Solaris 10 11_06)

startproc man page (2739 hits)
(Suse Linux 10.1)

Useful Links

Go Back

Visitor Statistics


Valid XHTML 1.0 Transitional     Valid CSS!

Partners: Cambridge Plus :: Pyrenees Places of Interest and Areas of Natural Beauty :: USB Temperature Monitor :: <Link Available>
Unix Man Pages / Linux Man Pages :: HiFi Forum :: SIP VoIP Phone & Provider Reviews :: UNIX/Linux Forum Archives

More info on advertising on Unix/Linux Forum