Sortix 1.1dev ports manual
This manual documents Sortix 1.1dev ports. You can instead view this document in the latest official manual.
PCRECOMPAT(3) | Library Functions Manual | PCRECOMPAT(3) |
NAME
PCRE - Perl-compatible regular expressionsDIFFERENCES BETWEEN PCRE AND PERL
This document describes the differences in the ways that PCRE and Perl handle regular expressions. The differences described here are with respect to Perl versions 5.10 and above. 1. PCRE has only a subset of Perl's Unicode support. Details of what it does have are given in the pcreunicode page. 2. PCRE allows repeat quantifiers only on parenthesized assertions, but they do not mean what you might think. For example, (?!a){3} does not assert that the next three characters are not "a". It just asserts that the next character is not "a" three times (in principle: PCRE optimizes this to run the assertion just once). Perl allows repeat quantifiers on other assertions such as \b, but these do not seem to have any use. 3. Capturing subpatterns that occur inside negative lookahead assertions are counted, but their entries in the offsets vector are never set. Perl sometimes (but not always) sets its numerical variables from inside negative assertions. 4. Though binary zero characters are supported in the subject string, they are not allowed in a pattern string because it is passed as a normal C string, terminated by zero. The escape sequence \0 can be used in the pattern to represent a binary zero. 5. The following Perl escape sequences are not supported: \l, \u, \L, \U, and \N when followed by a character name or Unicode value. (\N on its own, matching a non-newline character, is supported.) In fact these are implemented by Perl's general string-handling and are not part of its pattern matching engine. If any of these are encountered by PCRE, an error is generated by default. However, if the PCRE_JAVASCRIPT_COMPAT option is set, \U and \u are interpreted as JavaScript interprets them. 6. The Perl escape sequences \p, \P, and \X are supported only if PCRE is built with Unicode character property support. The properties that can be tested with \p and \P are limited to the general category properties such as Lu and Nd, script names such as Greek or Han, and the derived properties Any and L&. PCRE does support the Cs (surrogate) property, which Perl does not; the Perl documentation says "Because Perl hides the need for the user to understand the internal representation of Unicode characters, there is no need to implement the somewhat messy concept of surrogates." 7. PCRE does support the \Q...\E escape for quoting substrings. Characters in between are treated as literals. This is slightly different from Perl in that $ and @ are also handled as literals inside the quotes. In Perl, they cause variable interpolation (but of course PCRE does not have variables). Note the following examples:Pattern PCRE matches Perl matches
\Qabc$xyz\E abc$xyz abc followed by the
contents of $xyz
\Qabc\$xyz\E abc\$xyz abc\$xyz
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
AUTHOR
Philip Hazel University Computing Service Cambridge CB2 3QH, England.
REVISION
Last updated: 10 November 2013 Copyright (c) 1997-2013 University of Cambridge.
10 November 2013 | PCRE 8.34 |