In Regular Expressions a pattern match is denoted by /Pattern/ or m/pattern/
characters[]
Meta characters[]
- * matches 0 or more of previous expression.
- + matches 1 or more of previous expression.
- ? matches 0 or 1 of previous expression; also forces minimal matching when an expression might match several strings within a search string.
- . matches Any character (except \n newline)
- ( ) matches Logical grouping of part of an expression.
- [ ] matches Explicit set of characters to match.
- { } matches Explicit quantifier notation.
- \ matches Preceding one of the above, it makes it a literal instead of a special character. Preceding a special matching character, see below.
- / matches
- | matches
- ^ matches Beginning of a string.
- $ matches End of a string.
literal characters[]
characters Classes[]
- . matches any character except new line
- [aeiou] matches any character in the specified set
- [^aeiou] matches any character not in the specified set
- [0-9a-eA-E] matches any character in the range of char before the hyphen and after the hyphen. In this example it would match any char between(and including) 0 thru 9 or lowercase a thru f or uppercase A thru F. Equivalent to [01234565789abcdeABCDE]
- \p{name} matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing.
- \P{name} matches text not included in groups and block ranges specified in {name}.
- \w matches any word character. Equivalent to the Unicode character categories [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}].
- \W matches Matches any nonword character. Equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]
- \s matches any white-space character. Equivalent to the Unicode character categories [\f\n\r\t\v\x85\p{Z}]
- \S matches any non-white-space character. Equivalent to the Unicode character categories [^\f\n\r\t\v\x85\p{Z}]
- \d Matches any decimal digit. Equivalent to \p{Nd} for Unicode and [0-9] for non-Unicode, ECMAScript behavior.
- \D matches any nondigit. Equivalent to \P{Nd} for Unicode and [^0-9] for non-Unicode
- \b A word boundary, the spot between word (\w) and non-word (\W) characters /\bfred\b/i matches Fred but not Alfred or Frederick
POSIX Character Classes[]
- [:alnum:] matches alphanumeric character [:alnum:]{3} matches any three letters or numbers, like 7Ds
- [:alpha:] alphabetic character, any case [:alpha:]{5} matches five alphabetic characters, any case, like aBcDe
- [:blank:] matches space and tab [:blank:]{3,5} matches any three, four, or five spaces and tabs
- [:digit:] matches digits [:digit:]{3,5} matches any three, four, or five digits, like 3, 05, 489
- [:lower:] matches lowercase alphabetics [:lower:] matches a but not A
- [:punct:] matches punctuation characters [:punct:] matches ! or . or, but not an or 3
- [:space:] matches all whitespace characters, including newline and carriage return [:space:] matches any space, tab, newline, or carriage return
- [:upper:] matches uppercase alphabetics [:upper:] matches A but not
Meta characters[]
- \t matches tab (HT, TAB)
- \n matches newline (LF, NL)
- \r matches return (CR)
- \f matches form feed (FF)
- \a matches alarm (bell) (BEL)
- \e matches escape (think troff) (ESC)
- \033 matches octal charcters (think of a PDP-11)
- \x1B matches hex characters]]
- \x{263a} matches wide hex characters (Unicode SMILEY)
- \c[ matches control characters
- \N{name} matches named characters
- \l matches lowercase next char (think vi)
- \u matches uppercase next char (think vi)
- \L matches lowercase till \E (think vi)
- \U matches uppercase till \E (think vi)
- \E matches end case modification (think vi)
- \Q matches quote (disable) pattern meta characters till \E
Repetitions Oborators[]
- * matches Match 0 or more times
- + matches Match 1 or more times
- ? matches Match 1 or 0 times
- {n} matches Match exactly n times
- {n,} matches Match at least n times
- {n,m} matches Match at least n but not more than m times
Anchoring Operators[]
- ^ matches match must start the beginning of the line. example ^foo
- $ matches match must start the beginning of the line.
Word Operators[]
- \b matches string at either the beginning or the end of a word. For example, `\brat\b' matches the separate word `rat'.
- \B matches string within a word. For example, `c\Brat\Be' matches `crate', but `dirty \Brat' doesn't match `dirty rat'.
- \< matches string at the beginning of a word
- \> matches string at the end of a word.
- \w matches any word-constituent character
- \W matches any character that is not word-constituent.
Buffer Operators[]
Following are operators which work on buffers. In Emacs, a buffer is, naturally, an Emacs buffer. For other programs, Regex considers the entire string to be matched as the buffer.
- \` matches a string at the beginning of the buffer
- \' matches a string at the end of the buffer
Greedy Wildcards and Repetitions[]
- ? Match 0 or more times
- +? Match 1 or more times
- ?? Match 0 or 1 time
- {n}? Match exactly n times
- {n,}? Match at least n times
- {n,m}? Match at least n but not more than m times
Groups, List[]
- ( ) group operator
- example (cat|hat) matched cat or hat
- [ ] class operator
- example [jfet] matches j or f or e or t
From HowTo Wiki, a Wikia wiki.