Wikia

How To Wiki

How to use regular expressions(regex) for pattern matching

1,795pages on
this wiki
Talk0


In Regular Expressions a pattern match is denoted by /Pattern/ or m/pattern/

charactersEdit

Meta charactersEdit

  • * matches 0 or more of previous expression.
  • + matches 1 or more of previous expression.
  • ? matches 0 or 1 of previous expression; also forces minimal matching when an expression might match several strings within a search string.
  • . matches Any character (except \n newline)
  • ( ) matches Logical grouping of part of an expression.
  • [ ] matches Explicit set of characters to match.
  • { } matches Explicit quantifier notation.
  • \ matches Preceding one of the above, it makes it a literal instead of a special character. Preceding a special matching character, see below.
  • / matches
  • | matches
  • ^ matches Beginning of a string.
  • $ matches End of a string.

literal charactersEdit

characters ClassesEdit

  • . matches any character except new line
  • [aeiou] matches any character in the specified set
  • [^aeiou] matches any character not in the specified set
  • [0-9a-eA-E] matches any character in the range of char before the hyphen and after
 the hyphen.  In this example it would match any char between(and including) 0 thru 9 or lowercase a thru f or uppercase A thru F.  Equivalent to [01234565789abcdeABCDE]  
  • \p{name} matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing.
  • \P{name} matches text not included in groups and block ranges specified in {name}.
  • \w matches any word character. Equivalent to the Unicode character categories [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}].
  • \W matches Matches any nonword character. Equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]
  • \s matches any white-space character. Equivalent to the Unicode character categories [\f\n\r\t\v\x85\p{Z}]
  • \S matches any non-white-space character. Equivalent to the Unicode character categories [^\f\n\r\t\v\x85\p{Z}]
  • \d Matches any decimal digit. Equivalent to \p{Nd} for Unicode and [0-9] for non-Unicode, ECMAScript behavior.
  • \D matches any nondigit. Equivalent to \P{Nd} for Unicode and [^0-9] for non-Unicode
  • \b A word boundary, the spot between word (\w) and non-word (\W) characters /\bfred\b/i matches Fred but not Alfred or Frederick

POSIX Character ClassesEdit

  • [:alnum:] matches alphanumeric character [:alnum:]{3} matches any three letters or numbers, like 7Ds
  • [:alpha:] alphabetic character, any case [:alpha:]{5} matches five alphabetic characters, any case, like aBcDe
  • [:blank:] matches space and tab [:blank:]{3,5} matches any three, four, or five spaces and tabs
  • [:digit:] matches digits [:digit:]{3,5} matches any three, four, or five digits, like 3, 05, 489
  • [:lower:] matches lowercase alphabetics [:lower:] matches a but not A
  • [:punct:] matches punctuation characters [:punct:] matches ! or . or, but not an or 3
  • [:space:] matches all whitespace characters, including newline and carriage return [:space:] matches any space, tab, newline, or carriage return
  • [:upper:] matches uppercase alphabetics [:upper:] matches A but not


Meta charactersEdit

  • \t matches tab (HT, TAB)
  • \n matches newline (LF, NL)
  • \r matches return (CR)
  • \f matches form feed (FF)
  • \a matches alarm (bell) (BEL)
  • \e matches escape (think troff) (ESC)
  • \033 matches octal charcters (think of a PDP-11)
  • \x1B matches hex characters]]
  • \x{263a} matches wide hex characters (Unicode SMILEY)
  • \c[ matches control characters
  • \N{name} matches named characters
  • \l matches lowercase next char (think vi)
  • \u matches uppercase next char (think vi)
  • \L matches lowercase till \E (think vi)
  • \U matches uppercase till \E (think vi)
  • \E matches end case modification (think vi)
  • \Q matches quote (disable) pattern meta characters till \E

Repetitions OboratorsEdit

  • * matches Match 0 or more times
  • + matches Match 1 or more times
  •  ? matches Match 1 or 0 times
  • {n} matches Match exactly n times
  • {n,} matches Match at least n times
  • {n,m} matches Match at least n but not more than m times

Anchoring OperatorsEdit

  • ^ matches match must start the beginning of the line. example ^foo
  • $ matches match must start the beginning of the line.

Word OperatorsEdit

  • \b matches string at either the beginning or the end of a word. For example, `\brat\b' matches the separate word `rat'.
  • \B matches string within a word. For example, `c\Brat\Be' matches `crate', but `dirty \Brat' doesn't match `dirty rat'.
  • \< matches string at the beginning of a word
  • \> matches string at the end of a word.
  • \w matches any word-constituent character
  • \W matches any character that is not word-constituent.

Buffer OperatorsEdit

Following are operators which work on buffers. In Emacs, a buffer is, naturally, an Emacs buffer. For other programs, Regex considers the entire string to be matched as the buffer.

  • \` matches a string at the beginning of the buffer
  • \' matches a string at the end of the buffer


Greedy Wildcards and RepetitionsEdit

  •  ? Match 0 or more times
  • +? Match 1 or more times
  •  ?? Match 0 or 1 time
  • {n}? Match exactly n times
  • {n,}? Match at least n times
  • {n,m}? Match at least n but not more than m times


Groups, List Edit

  • ( ) group operator
    • example (cat|hat) matched cat or hat
  • [ ] class operator
    • example [jfet] matches j or f or e or t
From HowTo Wiki, a Wikia wiki.