Fandom

How To Wiki

How to use regular expressions(regex) for pattern matching

1,795pages on
this wiki
Add New Page
Talk0 Share

Ad blocker interference detected!


Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.



In Regular Expressions a pattern match is denoted by /Pattern/ or m/pattern/

charactersEdit

Meta charactersEdit

  • * matches 0 or more of previous expression.
  • + matches 1 or more of previous expression.
  • ? matches 0 or 1 of previous expression; also forces minimal matching when an expression might match several strings within a search string.
  • . matches Any character (except \n newline)
  • ( ) matches Logical grouping of part of an expression.
  • [ ] matches Explicit set of characters to match.
  • { } matches Explicit quantifier notation.
  • \ matches Preceding one of the above, it makes it a literal instead of a special character. Preceding a special matching character, see below.
  • / matches
  • | matches
  • ^ matches Beginning of a string.
  • $ matches End of a string.

literal charactersEdit

characters ClassesEdit

  • . matches any character except new line
  • [aeiou] matches any character in the specified set
  • [^aeiou] matches any character not in the specified set
  • [0-9a-eA-E] matches any character in the range of char before the hyphen and after
 the hyphen.  In this example it would match any char between(and including) 0 thru 9 or lowercase a thru f or uppercase A thru F.  Equivalent to [01234565789abcdeABCDE]  
  • \p{name} matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing.
  • \P{name} matches text not included in groups and block ranges specified in {name}.
  • \w matches any word character. Equivalent to the Unicode character categories [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}].
  • \W matches Matches any nonword character. Equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]
  • \s matches any white-space character. Equivalent to the Unicode character categories [\f\n\r\t\v\x85\p{Z}]
  • \S matches any non-white-space character. Equivalent to the Unicode character categories [^\f\n\r\t\v\x85\p{Z}]
  • \d Matches any decimal digit. Equivalent to \p{Nd} for Unicode and [0-9] for non-Unicode, ECMAScript behavior.
  • \D matches any nondigit. Equivalent to \P{Nd} for Unicode and [^0-9] for non-Unicode
  • \b A word boundary, the spot between word (\w) and non-word (\W) characters /\bfred\b/i matches Fred but not Alfred or Frederick

POSIX Character ClassesEdit

  • [:alnum:] matches alphanumeric character [:alnum:]{3} matches any three letters or numbers, like 7Ds
  • [:alpha:] alphabetic character, any case [:alpha:]{5} matches five alphabetic characters, any case, like aBcDe
  • [:blank:] matches space and tab [:blank:]{3,5} matches any three, four, or five spaces and tabs
  • [:digit:] matches digits [:digit:]{3,5} matches any three, four, or five digits, like 3, 05, 489
  • [:lower:] matches lowercase alphabetics [:lower:] matches a but not A
  • [:punct:] matches punctuation characters [:punct:] matches ! or . or, but not an or 3
  • [:space:] matches all whitespace characters, including newline and carriage return [:space:] matches any space, tab, newline, or carriage return
  • [:upper:] matches uppercase alphabetics [:upper:] matches A but not


Meta charactersEdit

  • \t matches tab (HT, TAB)
  • \n matches newline (LF, NL)
  • \r matches return (CR)
  • \f matches form feed (FF)
  • \a matches alarm (bell) (BEL)
  • \e matches escape (think troff) (ESC)
  • \033 matches octal charcters (think of a PDP-11)
  • \x1B matches hex characters]]
  • \x{263a} matches wide hex characters (Unicode SMILEY)
  • \c[ matches control characters
  • \N{name} matches named characters
  • \l matches lowercase next char (think vi)
  • \u matches uppercase next char (think vi)
  • \L matches lowercase till \E (think vi)
  • \U matches uppercase till \E (think vi)
  • \E matches end case modification (think vi)
  • \Q matches quote (disable) pattern meta characters till \E

Repetitions OboratorsEdit

  • * matches Match 0 or more times
  • + matches Match 1 or more times
  •  ? matches Match 1 or 0 times
  • {n} matches Match exactly n times
  • {n,} matches Match at least n times
  • {n,m} matches Match at least n but not more than m times

Anchoring OperatorsEdit

  • ^ matches match must start the beginning of the line. example ^foo
  • $ matches match must start the beginning of the line.

Word OperatorsEdit

  • \b matches string at either the beginning or the end of a word. For example, `\brat\b' matches the separate word `rat'.
  • \B matches string within a word. For example, `c\Brat\Be' matches `crate', but `dirty \Brat' doesn't match `dirty rat'.
  • \< matches string at the beginning of a word
  • \> matches string at the end of a word.
  • \w matches any word-constituent character
  • \W matches any character that is not word-constituent.

Buffer OperatorsEdit

Following are operators which work on buffers. In Emacs, a buffer is, naturally, an Emacs buffer. For other programs, Regex considers the entire string to be matched as the buffer.

  • \` matches a string at the beginning of the buffer
  • \' matches a string at the end of the buffer


Greedy Wildcards and RepetitionsEdit

  •  ? Match 0 or more times
  • +? Match 1 or more times
  •  ?? Match 0 or 1 time
  • {n}? Match exactly n times
  • {n,}? Match at least n times
  • {n,m}? Match at least n but not more than m times


Groups, List Edit

  • ( ) group operator
    • example (cat|hat) matched cat or hat
  • [ ] class operator
    • example [jfet] matches j or f or e or t
From HowTo Wiki, a Wikia wiki.