Perl 5 Regex Cheat sheet Perl 5 Regex Cheat sheet When learning regexes, or when you need to use a feature you have not used yet or don't use often, it can be quite useful to have a place for quick look-up. I hope this Regex Cheat-sheet will provide such aid for you. View or Download the cheat sheet PDF file. Download the cheat sheet PDF file here. When it opens in a new browser tab, simply right click on the PDF and navigate to the download menu. What’s included in this cheat sheet. The following categories and items have been included in the cheat sheet: Perl help.
- Perl Operators Cheat Sheet 2020
- Perl Operators Cheat Sheet 2019
- Perl Cheat Sheet Pdf
- Perl Regular Expression Cheat Sheet
- Perl Operators Cheat Sheet Free
This document presents a tabular summary of the regular expression (regexp) syntax in Perl, then illustrates it with a collection of annotated examples.
Metacharacters
To present a metacharacter as a data character standing for itself, precede it with In the table above, the characters themselves, in the first column, are links to descriptions of characters in my The ISO Latin 1 character repertoire - a description with usage notes. Note that the physical appearance (glyph) of a character may vary from one device or program or font to another. | Repetition
Read the notation a’s as “occurrences of strings, each of which matches the pattern a”. Read repetition as any of the repetition expressions listed above it. Shortest match means that the shortest string matching the pattern is taken. The default is “greedy matching”, which finds the longest match. The repetition |
Special notations with
|
|
w | matches any single character classified as a “word” character (alphanumeric or “_ ”) |
W | matches any non-“word” character |
s | matches any whitespace character (space, tab, newline) |
S | matches any non-whitespace character |
d | matches any digit character, equiv. to [0-9] |
D | matches any non-digit character |
Character sets: specialities inside [
...]
Different meanings apply inside a character set (“character class”) denoted by [
...]
so that, instead of the normal rules given here, the following apply:
[ characters] | matches any of the characters in the sequence |
[ x- y] | matches any of the characters from x to y (inclusively) in the ASCII code |
[-] | matches the hyphen character “- ” |
[n ] | matches the newline; other single character denotations with apply normally, too |
[^ something] | matches any character except those that [ something] denotes; that is, immediately after the leading “[ ”, the circumflex “^ ” means “not” applied to all of the rest |
Examples
expression | matches... |
---|---|
abc | abc (that exact character sequence, but anywhere in the string) |
^abc | abc at the beginning of the string |
abc$ | abc at the end of the string |
a|b | either of a and b |
^abc|abc$ | the string abc at the beginning or at the end of the string |
ab{2,4}c | an a followed by two, three or four b ’s followed by a c |
ab{2,}c | an a followed by at least two b ’s followed by a c |
ab*c | an a followed by any number (zero or more) of b ’s followed by a c |
ab+c | an a followed by one or more b ’s followed by a c |
ab?c | an a followed by an optional b followed by a c ; that is, either abc or ac |
a.c | an a followed by any single character (not newline) followed by a c |
a.c | a.c exactly |
[abc] | any one of a , b and c |
[Aa]bc | either of Abc and abc |
[abc]+ | any (nonempty) string of a ’s, b ’s and c’s (such as a , abba , acbabcacaa ) |
[^abc]+ | any (nonempty) string which does not contain any of a , b and c (such as defg ) |
dd | any two decimal digits, such as 42 ; same as d{2} |
w+ | a “word”: a nonempty sequence of alphanumeric characters and low lines (underscores), such as foo and 12bar8 and foo_1 |
100s*mk | the strings 100 and mk optionally separated by any amount of white space (spaces, tabs, newlines) |
abcb | abc when followed by a word boundary (e.g. in abc! but not in abcd ) |
perlB | perl when not followed by a word boundary (e.g. in perlert but not in perl stuff ) |
Examples of simple use in Perl statements
These examples use very simple regexps only. The intent is just to show contexts where regexps might be used, as well as the effect of some “flags” to matching and replacements. Note in particular that matching is by default case-sensitive (Abc
does not match abc
unless specified otherwise).
s/foo/bar/;
replaces the first occurrence of the exact character sequence foo
in the “current string” (in special variable $_
) by the character sequence bar
; for example, foolish bigfoot
would become barlish bigfoot
s/foo/bar/g;
replaces any occurrence of the exact character sequence foo
in the “current string” by the character sequence bar
; for example, foolish bigfoot
would become barlish bigbart
s/foo/bar/gi;
replaces any occurrence of foo
case-insensitively in the “current string” by the character sequence bar
(e.g. Foo
and FOO
get replaced by bar
too)
if(m/foo/)
...
tests whether the current string contains the string foo
Date of creation: 2000-01-28. Last revision: 2007-04-16. Last modification: 2007-05-28.
Finnish translation – suomennos: Säännölliset lausekkeet Perlissä.
The inspiration for my writing this document was Appendix : A Summary of Perl Regular Expressions in Pankaj Kamthan’s CGI Security : Better Safe than Sorry, and my own repeated failures to memorize the syntax.
A perl regular expression usually comes in something like this:
Here we divide the expression into 4 parts:
=~ | Match Operators, the operator between the variable and the expression |
m// | Quote-like Operators, appears after match operator |
/i | Options, the modifiers after the expression |
PATTERN | the Expression |
=~
This operator appears between the string var you are comparing, and the regular expression you’re looking for (note that in selection or substitution a regular expression operates on the string var rather than comparing).
!~
Just like =~, except negated. With matching, returns true if it DOESN’T match. I can’t imagine what it would do in translates, etc.
qr/STRING/
This operator quotes (and possibly compiles) its STRING as a regular expression. STRING is interpolated the same way asPATTERN in m/PATTERN/. If “‘” is used as the delimiter, no interpolation is done. Returns a Perl value which may be used instead of the corresponding /STRING/msixpodual expression. The returned value is a normalized version of the original pattern. It magically differs from a string containing the same characters: ref(qr/x/) returns “Regexp”; however, dereferencing it is not well defined (you currently get the normalized version of the original pattern, but this may change).
m/PATTERN/
/PATTERN/
Searches a string for a pattern match, and in scalar context returns true if it succeeds, false if it fails. If no string is specified via the=~ or !~ operator, the $_ string is searched. (The string specified with =~ need not be an lvalue–it may be the result of an expression evaluation, but remember the =~ binds rather tightly.)
* The empty pattern //
If the PATTERN evaluates to the empty string, the last successfully matched regular expression is used instead.
* Matching in list context
If the /g option is not used, m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, that is, ($1, $2, $3…).
Perl Operators Cheat Sheet 2020
m?PATTERN?
?PATTERN?
This is just like the m/PATTERN/ search, except that it matches only once between calls to the reset() operator.
s/PATTERN/REPLACEMENT/
Searches a string for a pattern, and if found, replaces that pattern with the replacement text and returns the number of substitutions made. Otherwise it returns false (specifically, the empty string).
G assertion
You can intermix m//g matches with m/G…/g, where G is a zero-width assertion that matches the exact position where the previous m//g, if any, left off. Without the /g modifier, the G assertion still anchors at pos() as it was at the start of the operation (see “pos” in perlfunc), but the match is of course only attempted once.
More details can be found here.
Perl Operators Cheat Sheet 2019
Options (specified by the following modifiers) are:
- m Treat string as multiple lines.
- s Treat string as single line. (Make . match a newline)
- i Do case-insensitive pattern matching.
- x Use extended regular expressions.
- p Preserve a copy of the matched string in ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH}
- o Compile pattern only once.
- a ASCII-restrict
- l Use the locale
- u Use Unicode rules
- d Use Unicode or native charset, as in 5.12 and earlier
- g Match globally, i.e., find all occurrences
- c Do not reset search position on a failed match when /g is in effect
- e Evaluate ‘replacement’ as an expression
Perl Cheat Sheet Pdf
Basic Metacharacters
- . Match any single character except n (unless /s)
- | OR; (ab|ac) matches ab or ac
- [abc] Match one out of a set of characters
- [^abc] Match one character not in set
- [a-z] Match one character from range, often [a-zA-Z]
- Escape next character, such as / or ( or )
Perl Regular Expression Cheat Sheet
Quantifiers
- * Match zero or more of previous character/subexpression
- + Match one or more of previous character/subexpression
- ? Match 0 or 1 of previous character/subexpression
- {n} Match exactly n of previous character/subexpression
- {m,n} Match m to n (inclusive) of previous character/subexp.
- {n,} Match n or more of previous character/subexpression
- *?, ?? Lazy version of same (works for any quantifier)
- *+, ?+ Possessive version (works for any quantifier)
Specific Characters
- w Word character (alphanumeric, underscore)
- W Opposite of w
- s Whitespace character (space, tab, etc.)
- S Opposite of s
- d Digit
- D Opposite of d
- [b] Backspace (any use of b in a character set)
- n Newline
- c Control character
- f Form feed
- r Carriage return
- t Tab
- v Vertical tab
- x Hexadecimal number; xf0 matches hex f0
- Octal number; 21 matches octal 21
Anchors
- ^ Start of string (equivalent: $A unless /m is used)
- $ End of string (equivalent: $Z unless /m is used)
- b Word boundary, similar to: (wW|Ww)
- B Anything but a word boundary
Subexpressions
- ( ) Define a subexpression
- $a ath subexpression in or after substitution
- a ath subexpression inside match operation
- (?:a) Non-capturing parentheses (match a)
Case Conversion
- l Make next character lowercase
- u Make next character uppercase
- L Make entire string (up to E) lowercase
- U Make entire string (up to E) uppercase
- E End L or U (so they only apply before E)
- uL Capitalize first char, lowercase rest (sentence)
Look-around
Perl Operators Cheat Sheet Free
- ?= Look-ahead
- ?<= Look-behind
- ?! Negative look-ahead
- ?<! Negative look-behind
- ?(a)b Conditional; if a then b
- ?(a)b|c Conditional; if a then b else c