Matching Same Text Again & Again
\group_number
This tool (\1 references the first capturing group) matches the same text as previously matched by the capturing group.
For Example
1 | (\d)\1 |
It can match 00
, 11
, 22
, 33
, 44
, 55
, 66
, 77
, 88
or 99
.
Task
ab #1?AZa$ab #1?AZa$
1 | ^([a-z])(\w)(\s)(\W)(\d)(\D)([A-Z])([a-zA-Z])([aeiouAEIOU])(\S)\1\2\3\4\5\6\7\8\9\10$ |
Backreferences To Failed Groups
Backreference to a capturing group that match nothing is different from backreference to a capturing group that did not participate in the match at all.
Capturing group that match nothing
1 | (b?)o\1 |
is matched with o
Here, b?
is optional and matches nothing.
Thus, (b?)
is successfully matched and capture nothing.o
is matched with o
and \1
successfully matches the nothing captured by the group.
Capturing group that didn’t participate in the match at all
1 | (b)?o\1 |
is not matching o
In most regex flavors (excluding JavaScript), (b)?o\1
fails to match o
.
Here, (b)
fails to match at all. Since, the whole group is optional the regex engine does proceed to match o
.
The regex engine now arrives at \1
which references a group that did not participate in the match attempt at all.
Thus, the backreference fails to match at all.
Task
12-34-56-78
12345678
1
1 | ^\d{2}(-?)\d{2}\1\d{2}\1\d{2}$ |
2
1 | ^\d{2}(-?)(\d{2}\1){2}\d{2}$ |
Branch Reset Groups
NOTE - Branch reset group is supported by Perl, PHP, Delphi and R.
(?|regex)
A branch reset group consists of alternations and capturing groups. (?|(regex1)|(regex2))
Alternatives in branch reset group share same capturing group.
1 | (?|(Haa)|(Hee)|(bye)|(k))\1 |
is mathched with HaaHaa
and kk
Task
12-34-56-78
12:34:56:78
12---34---56---78
12.34.56.78
1 | /^\d{2}(?|(-)|(:)|(---)|(\.)|){1}(\d{2}\1){2}\d{2}$/ |
Forward References
NOTE - Forward reference is supported by JGsoft, .NET, Java, Perl, PCRE, PHP, Delphi and Ruby regex flavors.
Forward reference creates a back reference to a regex that would appear later.
Forward references are only useful if they’re inside a repeated group.
Then there may arise a case in which the regex engine evaluates the backreference after the group has been matched already.
1 | (\2amigo|(go!))+ |
is matched with go!go!amigo
Task
tactactic
tactactictactic
1 | ^(\2tic|(tac))+$ |