Initial commit
This commit is contained in:
@@ -0,0 +1,462 @@
|
||||
RE2 regular expression syntax reference
|
||||
-------------------------------------
|
||||
|
||||
Single characters:
|
||||
. any character, possibly including newline (s=true)
|
||||
[xyz] character class
|
||||
[^xyz] negated character class
|
||||
\d Perl character class
|
||||
\D negated Perl character class
|
||||
[[:alpha:]] ASCII character class
|
||||
[[:^alpha:]] negated ASCII character class
|
||||
\pN Unicode character class (one-letter name)
|
||||
\p{Greek} Unicode character class
|
||||
\PN negated Unicode character class (one-letter name)
|
||||
\P{Greek} negated Unicode character class
|
||||
|
||||
Composites:
|
||||
xy «x» followed by «y»
|
||||
x|y «x» or «y» (prefer «x»)
|
||||
|
||||
Repetitions:
|
||||
x* zero or more «x», prefer more
|
||||
x+ one or more «x», prefer more
|
||||
x? zero or one «x», prefer one
|
||||
x{n,m} «n» or «n»+1 or ... or «m» «x», prefer more
|
||||
x{n,} «n» or more «x», prefer more
|
||||
x{n} exactly «n» «x»
|
||||
x*? zero or more «x», prefer fewer
|
||||
x+? one or more «x», prefer fewer
|
||||
x?? zero or one «x», prefer zero
|
||||
x{n,m}? «n» or «n»+1 or ... or «m» «x», prefer fewer
|
||||
x{n,}? «n» or more «x», prefer fewer
|
||||
x{n}? exactly «n» «x»
|
||||
x{} (== x*) NOT SUPPORTED vim
|
||||
x{-} (== x*?) NOT SUPPORTED vim
|
||||
x{-n} (== x{n}?) NOT SUPPORTED vim
|
||||
x= (== x?) NOT SUPPORTED vim
|
||||
|
||||
Implementation restriction: The counting forms «x{n,m}», «x{n,}», and «x{n}»
|
||||
reject forms that create a minimum or maximum repetition count above 1000.
|
||||
Unlimited repetitions are not subject to this restriction.
|
||||
|
||||
Possessive repetitions:
|
||||
x*+ zero or more «x», possessive NOT SUPPORTED
|
||||
x++ one or more «x», possessive NOT SUPPORTED
|
||||
x?+ zero or one «x», possessive NOT SUPPORTED
|
||||
x{n,m}+ «n» or ... or «m» «x», possessive NOT SUPPORTED
|
||||
x{n,}+ «n» or more «x», possessive NOT SUPPORTED
|
||||
x{n}+ exactly «n» «x», possessive NOT SUPPORTED
|
||||
|
||||
Grouping:
|
||||
(re) numbered capturing group (submatch)
|
||||
(?P<name>re) named & numbered capturing group (submatch)
|
||||
(?<name>re) named & numbered capturing group (submatch)
|
||||
(?'name're) named & numbered capturing group (submatch) NOT SUPPORTED
|
||||
(?:re) non-capturing group
|
||||
(?flags) set flags within current group; non-capturing
|
||||
(?flags:re) set flags during re; non-capturing
|
||||
(?#text) comment NOT SUPPORTED
|
||||
(?|x|y|z) branch numbering reset NOT SUPPORTED
|
||||
(?>re) possessive match of «re» NOT SUPPORTED
|
||||
re@> possessive match of «re» NOT SUPPORTED vim
|
||||
%(re) non-capturing group NOT SUPPORTED vim
|
||||
|
||||
Flags:
|
||||
i case-insensitive (default false)
|
||||
m multi-line mode: «^» and «$» match begin/end line in addition to begin/end text (default false)
|
||||
s let «.» match «\n» (default false)
|
||||
U ungreedy: swap meaning of «x*» and «x*?», «x+» and «x+?», etc (default false)
|
||||
Flag syntax is «xyz» (set) or «-xyz» (clear) or «xy-z» (set «xy», clear «z»).
|
||||
|
||||
Empty strings:
|
||||
^ at beginning of text or line («m»=true)
|
||||
$ at end of text (like «\z» not «\Z») or line («m»=true)
|
||||
\A at beginning of text
|
||||
\b at ASCII word boundary («\w» on one side and «\W», «\A», or «\z» on the other)
|
||||
\B not at ASCII word boundary
|
||||
\G at beginning of subtext being searched NOT SUPPORTED pcre
|
||||
\G at end of last match NOT SUPPORTED perl
|
||||
\Z at end of text, or before newline at end of text NOT SUPPORTED
|
||||
\z at end of text
|
||||
(?=re) before text matching «re» NOT SUPPORTED
|
||||
(?!re) before text not matching «re» NOT SUPPORTED
|
||||
(?<=re) after text matching «re» NOT SUPPORTED
|
||||
(?<!re) after text not matching «re» NOT SUPPORTED
|
||||
re& before text matching «re» NOT SUPPORTED vim
|
||||
re@= before text matching «re» NOT SUPPORTED vim
|
||||
re@! before text not matching «re» NOT SUPPORTED vim
|
||||
re@<= after text matching «re» NOT SUPPORTED vim
|
||||
re@<! after text not matching «re» NOT SUPPORTED vim
|
||||
\zs sets start of match (= \K) NOT SUPPORTED vim
|
||||
\ze sets end of match NOT SUPPORTED vim
|
||||
\%^ beginning of file NOT SUPPORTED vim
|
||||
\%$ end of file NOT SUPPORTED vim
|
||||
\%V on screen NOT SUPPORTED vim
|
||||
\%# cursor position NOT SUPPORTED vim
|
||||
\%'m mark «m» position NOT SUPPORTED vim
|
||||
\%23l in line 23 NOT SUPPORTED vim
|
||||
\%23c in column 23 NOT SUPPORTED vim
|
||||
\%23v in virtual column 23 NOT SUPPORTED vim
|
||||
|
||||
Escape sequences:
|
||||
\a bell (== \007)
|
||||
\f form feed (== \014)
|
||||
\t horizontal tab (== \011)
|
||||
\n newline (== \012)
|
||||
\r carriage return (== \015)
|
||||
\v vertical tab character (== \013)
|
||||
\* literal «*», for any punctuation character «*»
|
||||
\123 octal character code (up to three digits)
|
||||
\x7F hex character code (exactly two digits)
|
||||
\x{10FFFF} hex character code
|
||||
\C match a single byte even in UTF-8 mode
|
||||
\Q...\E literal text «...» even if «...» has punctuation
|
||||
|
||||
\1 backreference NOT SUPPORTED
|
||||
\b backspace NOT SUPPORTED (use «\010»)
|
||||
\cK control char ^K NOT SUPPORTED (use «\001» etc)
|
||||
\e escape NOT SUPPORTED (use «\033»)
|
||||
\g1 backreference NOT SUPPORTED
|
||||
\g{1} backreference NOT SUPPORTED
|
||||
\g{+1} backreference NOT SUPPORTED
|
||||
\g{-1} backreference NOT SUPPORTED
|
||||
\g{name} named backreference NOT SUPPORTED
|
||||
\g<name> subroutine call NOT SUPPORTED
|
||||
\g'name' subroutine call NOT SUPPORTED
|
||||
\k<name> named backreference NOT SUPPORTED
|
||||
\k'name' named backreference NOT SUPPORTED
|
||||
\lX lowercase «X» NOT SUPPORTED
|
||||
\ux uppercase «x» NOT SUPPORTED
|
||||
\L...\E lowercase text «...» NOT SUPPORTED
|
||||
\K reset beginning of «$0» NOT SUPPORTED
|
||||
\N{name} named Unicode character NOT SUPPORTED
|
||||
\R line break NOT SUPPORTED
|
||||
\U...\E upper case text «...» NOT SUPPORTED
|
||||
\X extended Unicode sequence NOT SUPPORTED
|
||||
|
||||
\%d123 decimal character 123 NOT SUPPORTED vim
|
||||
\%xFF hex character FF NOT SUPPORTED vim
|
||||
\%o123 octal character 123 NOT SUPPORTED vim
|
||||
\%u1234 Unicode character 0x1234 NOT SUPPORTED vim
|
||||
\%U12345678 Unicode character 0x12345678 NOT SUPPORTED vim
|
||||
|
||||
Character class elements:
|
||||
x single character
|
||||
A-Z character range (inclusive)
|
||||
\d Perl character class
|
||||
[:foo:] ASCII character class «foo»
|
||||
\p{Foo} Unicode character class «Foo»
|
||||
\pF Unicode character class «F» (one-letter name)
|
||||
|
||||
Named character classes as character class elements:
|
||||
[\d] digits (== \d)
|
||||
[^\d] not digits (== \D)
|
||||
[\D] not digits (== \D)
|
||||
[^\D] not not digits (== \d)
|
||||
[[:name:]] named ASCII class inside character class (== [:name:])
|
||||
[^[:name:]] named ASCII class inside negated character class (== [:^name:])
|
||||
[\p{Name}] named Unicode property inside character class (== \p{Name})
|
||||
[^\p{Name}] named Unicode property inside negated character class (== \P{Name})
|
||||
|
||||
Perl character classes (all ASCII-only):
|
||||
\d digits (== [0-9])
|
||||
\D not digits (== [^0-9])
|
||||
\s whitespace (== [\t\n\f\r ])
|
||||
\S not whitespace (== [^\t\n\f\r ])
|
||||
\w word characters (== [0-9A-Za-z_])
|
||||
\W not word characters (== [^0-9A-Za-z_])
|
||||
|
||||
\h horizontal space NOT SUPPORTED
|
||||
\H not horizontal space NOT SUPPORTED
|
||||
\v vertical space NOT SUPPORTED
|
||||
\V not vertical space NOT SUPPORTED
|
||||
|
||||
ASCII character classes:
|
||||
[[:alnum:]] alphanumeric (== [0-9A-Za-z])
|
||||
[[:alpha:]] alphabetic (== [A-Za-z])
|
||||
[[:ascii:]] ASCII (== [\x00-\x7F])
|
||||
[[:blank:]] blank (== [\t ])
|
||||
[[:cntrl:]] control (== [\x00-\x1F\x7F])
|
||||
[[:digit:]] digits (== [0-9])
|
||||
[[:graph:]] graphical (== [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~])
|
||||
[[:lower:]] lower case (== [a-z])
|
||||
[[:print:]] printable (== [ -~] == [ [:graph:]])
|
||||
[[:punct:]] punctuation (== [!-/:-@[-`{-~])
|
||||
[[:space:]] whitespace (== [\t\n\v\f\r ])
|
||||
[[:upper:]] upper case (== [A-Z])
|
||||
[[:word:]] word characters (== [0-9A-Za-z_])
|
||||
[[:xdigit:]] hex digit (== [0-9A-Fa-f])
|
||||
|
||||
Unicode character class names--general category:
|
||||
C other
|
||||
Cc control
|
||||
Cf format
|
||||
Cn unassigned code points NOT SUPPORTED
|
||||
Co private use
|
||||
Cs surrogate
|
||||
L letter
|
||||
LC cased letter NOT SUPPORTED
|
||||
L& cased letter NOT SUPPORTED
|
||||
Ll lowercase letter
|
||||
Lm modifier letter
|
||||
Lo other letter
|
||||
Lt titlecase letter
|
||||
Lu uppercase letter
|
||||
M mark
|
||||
Mc spacing mark
|
||||
Me enclosing mark
|
||||
Mn non-spacing mark
|
||||
N number
|
||||
Nd decimal number
|
||||
Nl letter number
|
||||
No other number
|
||||
P punctuation
|
||||
Pc connector punctuation
|
||||
Pd dash punctuation
|
||||
Pe close punctuation
|
||||
Pf final punctuation
|
||||
Pi initial punctuation
|
||||
Po other punctuation
|
||||
Ps open punctuation
|
||||
S symbol
|
||||
Sc currency symbol
|
||||
Sk modifier symbol
|
||||
Sm math symbol
|
||||
So other symbol
|
||||
Z separator
|
||||
Zl line separator
|
||||
Zp paragraph separator
|
||||
Zs space separator
|
||||
|
||||
Unicode character class names--scripts:
|
||||
Adlam
|
||||
Ahom
|
||||
Anatolian_Hieroglyphs
|
||||
Arabic
|
||||
Armenian
|
||||
Avestan
|
||||
Balinese
|
||||
Bamum
|
||||
Bassa_Vah
|
||||
Batak
|
||||
Bengali
|
||||
Bhaiksuki
|
||||
Bopomofo
|
||||
Brahmi
|
||||
Braille
|
||||
Buginese
|
||||
Buhid
|
||||
Canadian_Aboriginal
|
||||
Carian
|
||||
Caucasian_Albanian
|
||||
Chakma
|
||||
Cham
|
||||
Cherokee
|
||||
Chorasmian
|
||||
Common
|
||||
Coptic
|
||||
Cuneiform
|
||||
Cypriot
|
||||
Cypro_Minoan
|
||||
Cyrillic
|
||||
Deseret
|
||||
Devanagari
|
||||
Dives_Akuru
|
||||
Dogra
|
||||
Duployan
|
||||
Egyptian_Hieroglyphs
|
||||
Elbasan
|
||||
Elymaic
|
||||
Ethiopic
|
||||
Georgian
|
||||
Glagolitic
|
||||
Gothic
|
||||
Grantha
|
||||
Greek
|
||||
Gujarati
|
||||
Gunjala_Gondi
|
||||
Gurmukhi
|
||||
Han
|
||||
Hangul
|
||||
Hanifi_Rohingya
|
||||
Hanunoo
|
||||
Hatran
|
||||
Hebrew
|
||||
Hiragana
|
||||
Imperial_Aramaic
|
||||
Inherited
|
||||
Inscriptional_Pahlavi
|
||||
Inscriptional_Parthian
|
||||
Javanese
|
||||
Kaithi
|
||||
Kannada
|
||||
Katakana
|
||||
Kawi
|
||||
Kayah_Li
|
||||
Kharoshthi
|
||||
Khitan_Small_Script
|
||||
Khmer
|
||||
Khojki
|
||||
Khudawadi
|
||||
Lao
|
||||
Latin
|
||||
Lepcha
|
||||
Limbu
|
||||
Linear_A
|
||||
Linear_B
|
||||
Lisu
|
||||
Lycian
|
||||
Lydian
|
||||
Mahajani
|
||||
Makasar
|
||||
Malayalam
|
||||
Mandaic
|
||||
Manichaean
|
||||
Marchen
|
||||
Masaram_Gondi
|
||||
Medefaidrin
|
||||
Meetei_Mayek
|
||||
Mende_Kikakui
|
||||
Meroitic_Cursive
|
||||
Meroitic_Hieroglyphs
|
||||
Miao
|
||||
Modi
|
||||
Mongolian
|
||||
Mro
|
||||
Multani
|
||||
Myanmar
|
||||
Nabataean
|
||||
Nag_Mundari
|
||||
Nandinagari
|
||||
New_Tai_Lue
|
||||
Newa
|
||||
Nko
|
||||
Nushu
|
||||
Nyiakeng_Puachue_Hmong
|
||||
Ogham
|
||||
Ol_Chiki
|
||||
Old_Hungarian
|
||||
Old_Italic
|
||||
Old_North_Arabian
|
||||
Old_Permic
|
||||
Old_Persian
|
||||
Old_Sogdian
|
||||
Old_South_Arabian
|
||||
Old_Turkic
|
||||
Old_Uyghur
|
||||
Oriya
|
||||
Osage
|
||||
Osmanya
|
||||
Pahawh_Hmong
|
||||
Palmyrene
|
||||
Pau_Cin_Hau
|
||||
Phags_Pa
|
||||
Phoenician
|
||||
Psalter_Pahlavi
|
||||
Rejang
|
||||
Runic
|
||||
Samaritan
|
||||
Saurashtra
|
||||
Sharada
|
||||
Shavian
|
||||
Siddham
|
||||
SignWriting
|
||||
Sinhala
|
||||
Sogdian
|
||||
Sora_Sompeng
|
||||
Soyombo
|
||||
Sundanese
|
||||
Syloti_Nagri
|
||||
Syriac
|
||||
Tagalog
|
||||
Tagbanwa
|
||||
Tai_Le
|
||||
Tai_Tham
|
||||
Tai_Viet
|
||||
Takri
|
||||
Tamil
|
||||
Tangsa
|
||||
Tangut
|
||||
Telugu
|
||||
Thaana
|
||||
Thai
|
||||
Tibetan
|
||||
Tifinagh
|
||||
Tirhuta
|
||||
Toto
|
||||
Ugaritic
|
||||
Vai
|
||||
Vithkuqi
|
||||
Wancho
|
||||
Warang_Citi
|
||||
Yezidi
|
||||
Yi
|
||||
Zanabazar_Square
|
||||
|
||||
Vim character classes:
|
||||
\i identifier character NOT SUPPORTED vim
|
||||
\I «\i» except digits NOT SUPPORTED vim
|
||||
\k keyword character NOT SUPPORTED vim
|
||||
\K «\k» except digits NOT SUPPORTED vim
|
||||
\f file name character NOT SUPPORTED vim
|
||||
\F «\f» except digits NOT SUPPORTED vim
|
||||
\p printable character NOT SUPPORTED vim
|
||||
\P «\p» except digits NOT SUPPORTED vim
|
||||
\s whitespace character (== [ \t]) NOT SUPPORTED vim
|
||||
\S non-white space character (== [^ \t]) NOT SUPPORTED vim
|
||||
\d digits (== [0-9]) vim
|
||||
\D not «\d» vim
|
||||
\x hex digits (== [0-9A-Fa-f]) NOT SUPPORTED vim
|
||||
\X not «\x» NOT SUPPORTED vim
|
||||
\o octal digits (== [0-7]) NOT SUPPORTED vim
|
||||
\O not «\o» NOT SUPPORTED vim
|
||||
\w word character vim
|
||||
\W not «\w» vim
|
||||
\h head of word character NOT SUPPORTED vim
|
||||
\H not «\h» NOT SUPPORTED vim
|
||||
\a alphabetic NOT SUPPORTED vim
|
||||
\A not «\a» NOT SUPPORTED vim
|
||||
\l lowercase NOT SUPPORTED vim
|
||||
\L not lowercase NOT SUPPORTED vim
|
||||
\u uppercase NOT SUPPORTED vim
|
||||
\U not uppercase NOT SUPPORTED vim
|
||||
\_x «\x» plus newline, for any «x» NOT SUPPORTED vim
|
||||
|
||||
Vim flags:
|
||||
\c ignore case NOT SUPPORTED vim
|
||||
\C match case NOT SUPPORTED vim
|
||||
\m magic NOT SUPPORTED vim
|
||||
\M nomagic NOT SUPPORTED vim
|
||||
\v verymagic NOT SUPPORTED vim
|
||||
\V verynomagic NOT SUPPORTED vim
|
||||
\Z ignore differences in Unicode combining characters NOT SUPPORTED vim
|
||||
|
||||
Magic:
|
||||
(?{code}) arbitrary Perl code NOT SUPPORTED perl
|
||||
(??{code}) postponed arbitrary Perl code NOT SUPPORTED perl
|
||||
(?n) recursive call to regexp capturing group «n» NOT SUPPORTED
|
||||
(?+n) recursive call to relative group «+n» NOT SUPPORTED
|
||||
(?-n) recursive call to relative group «-n» NOT SUPPORTED
|
||||
(?C) PCRE callout NOT SUPPORTED pcre
|
||||
(?R) recursive call to entire regexp (== (?0)) NOT SUPPORTED
|
||||
(?&name) recursive call to named group NOT SUPPORTED
|
||||
(?P=name) named backreference NOT SUPPORTED
|
||||
(?P>name) recursive call to named group NOT SUPPORTED
|
||||
(?(cond)true|false) conditional branch NOT SUPPORTED
|
||||
(?(cond)true) conditional branch NOT SUPPORTED
|
||||
(*ACCEPT) make regexps more like Prolog NOT SUPPORTED
|
||||
(*COMMIT) NOT SUPPORTED
|
||||
(*F) NOT SUPPORTED
|
||||
(*FAIL) NOT SUPPORTED
|
||||
(*MARK) NOT SUPPORTED
|
||||
(*PRUNE) NOT SUPPORTED
|
||||
(*SKIP) NOT SUPPORTED
|
||||
(*THEN) NOT SUPPORTED
|
||||
(*ANY) set newline convention NOT SUPPORTED
|
||||
(*ANYCRLF) NOT SUPPORTED
|
||||
(*CR) NOT SUPPORTED
|
||||
(*CRLF) NOT SUPPORTED
|
||||
(*LF) NOT SUPPORTED
|
||||
(*BSR_ANYCRLF) set \R convention NOT SUPPORTED pcre
|
||||
(*BSR_UNICODE) NOT SUPPORTED pcre
|
||||
Reference in New Issue
Block a user