Regex for Alternating Numbers - python

I am trying to write a regex pattern for phone numbers consisting of 9 fixed digits.
I want to identify numbers that have two numbers alternating for four times such as 5XYXYXYXY
I used the below sample
number = 561616161
I tried the below pattern but it is not accurate
^5(\d)(?=\d\1).+
can someone point out what i am doing wrong?

I would use:
^(?=\d{9}$)\d*(\d)(\d)(?:\1\2){3}\d*$
Demo
Here is an explanation of the pattern:
^ from the start of the number
(?=\d{9}$) assert exactly 9 digits
\d* match optional leading digits
(\d) capture a digit in \1
(\d) capture another digit in \2
(?:\1\2){3} match the XY combination 3 more times
\d* more optional digits
$ end of the number

If you want to repeat 4 times 2 of the same pairs and matching 9 digits in total:
^(?=\d{9}$)\d*(\d\d)\1{3}\d*$
Explanation
^ Start of string
(?=\d{9}$) Positive lookahead, assert 9 digits till the end of the string
\d* Match optional digits
(\d\d)\1{3} Capture group 1, match 2 digits and then repeat what is captured in group 1 3 times
\d* Match optional digits
$ End of string
Regex demo
If you want to match a pattern repeating 4 times 2 digits where the 2 digits are not the same:
^(?=\d{9}$)\d*((\d)(?!\2)\d)\1{3}\d*$
Explanation
^ Start of string
(?=\d{9}$) Positive lookahead, assert 9 digits till the end of the string
\d* Match optional digits
( Capture group 1
(\d) Capture group 2, match a single digit
(?!\2)\d Negative lookahead, assert not the same char as captured in group 2. If that is the case, then match a single digit
) Close group 1
\1{3} Repeat the captured value of capture group 1 3 times
\d* Match optional digits
$ End of string
Regex demo

My first guess from OP's self tried regex ^5(\d)(?=\d\1).+ without any own additions was a regex is needed to verify numbers starting with 5 and followed by 4 pairs of same two digits.
^5(\d\d)\1{3}$
Demo at regex101
The same idea with the "added guess" to disallow all same digits like e.g. 511111111
^5((\d)(?!\2)\d)\1{3}$
Demo at regex101
Guessing further that 5 is a variable value and assuming if one variable at start/end with the idea of taking out invalid values early - already having seen the other nice provided answers.
^(?=\d?(\d\d)\1{3})\d{9}$
Demo at regex101
Solution 3 with solution 2's assumption of two different digits in first pairing.
^(?=\d?((\d)(?!\2)\d)\1{3})\d{9}$
Demo at regex101
Solutions 3 and 4 are most obvious playings with #4thBird's nice answer in changed order.

Related

Regex to match phone number 5ABXXYYZZ

I am using regex to match 9 digits phone numbers.
I have this pattern 5ABXXYYZZ that I want to match.
What I tried
I have this regex that matches two repetitions only 5ABCDYYZZ
S_P_2 = 541982277
S_P_2_pattern = re.sub(r"(?!.*(\d)\1(\d)\2(\d)\3).{4}\d(\d)\4(\d)\5", "Special", str(S_P_2))
print(S_P_2_pattern)
What I want to achieve
I would like to update it to match three repetitions 5ABXXYYZZ sample 541882277
Try:
^5\d\d(?:(\d)\1(?!.*\1)){3}$
See an online demo
^5\d\d - Start-line anchor and literal 5 before two random digits;
(?:(\d)\1(?!.*\1)){3} - Non-capture group matched three times with nested capture group followed by itself directly but (due to negative lookahead) again after 0+ chars;
$ - End-line anchor.

Regex match 7 Consecutive Numbers and Ignore first and last Characters

I want to test a number consisting of 9 fixed digits.
The number consists of 7 consecutive numbers in the middle. I want to ignore the first and last character. The pattern is 5YYYYYYYX
I am testing my regex using the below sample
577777773
I was able to write a regex that catches the middle 7 numbers. But i want to exclude the first and last character.
(?<!^)([0-9])\1{7}(?!$)
Any advice on how to do this
You could write the pattern as:
(?<=^\d)(\d)\1{6}(?=\d$)
Explanation
(?<=^\d) Assert a digit at the start of the string to the left
(\d) Capture a digit in group 1
\1{6} Repeat the captured value in group 1 six times
(?=\d$) Assert a digit at the end of the string to the right
See a regex demo.
Or a capture group variant instead of lookarounds:
^\d((\d)\2{6})\d$
See another regex demo.
If the patterns should not be bounded to the start and the end of the string, you can use word boundaries \b on the left and right instead of ^ and $
To match 7 consecutive digits in the middle, and the first and last char can not be the same as the consecutive ones:
^(?!(\d)\1)\d((\d)\3{6})(?!\3)\d$
Explanation
^ Start of string
(?!(\d)\1) Negative lookahead, assert not 2 of the same numbers at the start by capturing a single digit in group 1 and matching the same digit directly after it
\d Match a single digit (the first one)
( Capture group 2
(\d)\3{6} Capture a digit in group 3, and repeat that 6 times after it
) Close group 2
(?!\3)\d Match the last digit when it is not the same as the 7 preceding digits
$ End of string
See a regex demo.
The value of the 7 consecutive digits are in group 2
You may use this alternative solution using \B (not a word boundary):
\B(\d)\1{6}\B
RegEx Demo
RegEx Breakup:
\B: Inverse of word boundary
(\d): Match a digit and capture in group #1
\1{6}: Match 6 more occurrences of same digit captured in group #1
\B: Inverse of word boundary

regex to find a pair of adjacent digits with different digits around them

I'm a beginner to regex and I am trying to make an expression to find if there are two of the same digits next to each other, and the digit behind and in front of the pair is different.
For example,
123456678 should match as there is a double 6,
1234566678 should not match as there is no double with different surrounding numbers.
12334566 should match because there are two 3s.
So far i have this which works only with 1, and as long as the double is not at the start or end of the string, however I can deal with that by adding a letter at the start and end.
^.*([^1]11[^1]).*$
I know i can use [0-9] instead of the 1s but the problem is having them all be the same digit.
Thank you!
I have divided my answer into four sections.
The first section contains my solution to the problem. Readers interested in nothing else may skip the other sections.
The remaining three sections are concerned with identifying the pairs of equal digits that are preceded by a different digit and are followed by a different digit. The first of the three sections matches them; the other two capture them in a group.
I've included the last section because I wanted to share The Greatest Regex Trick Ever with those unfamiliar with it, because I find it so very cool and clever, yet simple. It is documented here. Be forewarned that, to build suspense, the author at that link has included a lengthy preamble before the drum-roll reveal.
Determine if a string contains two consecutive equal digits that are preceded by a different digit and are followed by a different digit
You can test the string as follows:
import re
r = r'(\d)(?!\1)(\d)\2(?!\2)\d'
arr = ["123456678", "1123455a666788"]
for s in arr:
print(s, bool(re.search(r, s)) )
displays
123456678 True
1123455a666788 False
Run Python code | Start your engine!1
The regex engine performs the following operations.
(\d) : match a digit and save to capture group 1 (preceding digit)
(?!\1) : next character cannot equal content of capture group 1
(\d) : match a digit in capture group 2 (first digit of pair)
\2 : match content of capture group 2 (second digit of pair)
(?!\2) : next character cannot equal content of capture group 2
\d : match a digit
(?!\1) and (?!\2) are negative lookaheads.
Use Python's regex module to match pairs of consecutive digits that have the desired property
You can use the following regular expression with Python’s regex module to obtain the matching pairs of digits.
r'(\d)(?!\1)\K(\d)\2(?=\d)(?!\2)'
Regex Engine
The regex engine performs the following operations.
(\d) : match a digit and save to capture group 1 (preceding digit)
(?!\1) : next character cannot equal content of capture group 1
\K : forget everything matched so far and reset start of match
(\d) : match a digit in capture group 2 (first digit of pair)
\2 : match content of capture group 2 (second digit of pair)
(?=\d) : next character must be a digit
(?!\2) : next character cannot equal content of capture group 2
(?=\d) is a positive lookahead. (?=\d)(?!\2) could be replaced with (?!\2|$|\D).
Save pairs of consecutive digits that have the desired property to a capture group
Another way to obtain the matching pairs of digits, which does not require the regex module, is to extract the contents of capture group 2 from matches of the following regular expression.
r'(\d)(?!\1)((\d)\3)(?!\3)(?=\d)'
Re engine
The following operations are performed.
(\d) : match a digit in capture group 1
(?!\1) : next character does not equal last character
( : begin capture group 2
(\d) : match a digit in capture group 3
\3 : match the content of capture group 3
) : end capture group 2
(?!\3) : next character does not equal last character
(?=\d) : next character is a digit
Use The Greatest Regex Trick Ever to identify pairs of consecutive digits that have the desired property
We use the following regular expression to match the string.
r'(\d)(?=\1)|\d(?=(\d)(?!\2))|\d(?=\d(\d)\3)|\d(?=(\d{2})\d)'
When there is a match, we pay no attention to which character was matched, but examine the content of capture group 4 ((\d{2})), as I will explain below.
The Trick in action
The first three components of the alternation correspond to the ways that a string of four digits can fail to have the property that the second and third digits are equal, the first and second are unequal and the third and fourth are equal. They are:
(\d)(?=\1) : assert first and second digits are equal
\d(?=(\d)(?!\2)) : assert second and third digits are not equal
\d(?=\d(\d)\3) : assert third and fourth digits are equal
It follows that if there is a match of a digit and the first three parts of the alternation fail the last part (\d(?=(\d{2})\d)) must succeed, and the capture group it contains (#4) must contain the two equal digits that have the required properties. (The final \d is needed to assert that the pair of digits of interest is followed by a digit.)
If there is a match how do we determine if the last part of the alternation is the one that is matched?
When this regex matches a digit we have no interest in what digit that was. Instead, we look to capture group 4 ((\d{2})). If that group is empty we conclude that one of the first three components of the alternation matched the digit, meaning that the two digits following the matched digit do not have the properties that they are equal and are unequal to the digits that precede and follow them.
If, however, capture group 4 is not empty, it means that none of the first three parts of the alternation matched the digit, so the last part of the alternation must have matched and the two digits following the matched digit, which are held in capture group 4, have the desired properties.
1. Move the cursor around for detailed explanations.
With regex, it is much more convenient to use a PyPi regex module with the (*SKIP)(*FAIL) based pattern:
import regex
rx = r'(\d)\1{2,}(*SKIP)(*F)|(\d)\2'
l = ["123456678", "1234566678"]
for s in l:
print(s, bool(regex.search(rx, s)) )
See the Python demo. Output:
123456678 True
1234566678 False
Regex details
(\d)\1{2,}(*SKIP)(*F) - a digit and then two or more occurrences of the same digit
| - or
(\d)\2 - a digit and then the same digit.
The point is to match all chunks of identical 3 or more digits and skip them, and then match a chunk of two identical digits.
See the regex demo.
Inspired by the answer or Wiktor Stribiżew, another variation of using an alternation with re is to check for the existence of the capturing group which contains a positive match for 2 of the same digits not surrounded by the same digit.
In this case, check for group 3.
((\d)\2{2,})|\d(\d)\3(?!\3)\d
Regex demo | Python demo
( Capture group 1
(\d)\2{2,} Capture group 2, match 1 digit and repeat that same digit 2+ times
) Close group
| Or
\d(\d) Match a digit, capture a digit in group 3
\3(?!\3)\d Match the same digit as in group 3. Match the 4th digit, but is should not be the same as the group 3 digit
For example
import re
pattern = r"((\d)\2{2,})|\d(\d)\3(?!\3)\d"
strings = ["123456678", "12334566", "12345654554888", "1221", "1234566678", "1222", "2221", "66", "122", "221", "111"]
for s in strings:
match = re.search(pattern, s)
if match and match.group(3):
print ("Match: " + match.string)
else:
print ("No match: " + s)
Output
Match: 123456678
Match: 12334566
Match: 12345654554888
Match: 1221
No match: 1234566678
No match: 1222
No match: 2221
No match: 66
No match: 122
No match: 221
No match: 111
If for example 2 or 3 digits only is also ok to match, you could check for group 2
(\d)\1{2,}|(\d)\2
Python demo
You can also use a simple way .
import re
l=["123456678",
"1234566678",
"12334566 "]
for i in l:
matches = re.findall(r"((.)\2+)", i)
if any(len(x[0])!=2 for x in matches):
print "{}-->{}".format(i, False)
else:
print "{}-->{}".format(i, True)
You can customize this based on you rules.
Output:
123456678-->True
1234566678-->False
12334566 -->True

Regex pattern to pass tests

I want to write regex pattern by def function lets name it is_number(string), which check if string is integer from range -49 to 49. Also number should no contain insignificant zeros.
So i want to pass test:
self.assertTrue(is_number("50"))
self.assertTrue(is_number("-50"))
self.assertTrue(is_number("-9"))
self.assertFalse(is_number("7"))
self.assertFalse(is_number("-200"))
self.assertTrue(is_number("-21"))
self.assertTrue(is_number("18"))
self.assertTrue(is_number("0"))
self.assertTrue(is_number("49"))
self.assertFalse(is_number("100"))
self.assertTrue(is_number("-49"))
I tried something like, but it doesnt work:
def is_number(string):
pattern = r'[-]?\d[1,4]{1,2}*'
return re.search(pattern, string)
You might use
^-?(?:[0-9]|[1-4][0-9])$
That will match
^ Start of string
-? Optional -
(?: Non capturing group
[0-9] Match a digit 0-9
| Or
[1-4][0-9] Match a digit 1-4 and a digit 0-9 to match a range 10 - 49
) Close group
$ End of string
Regex demo
If you also want to match 50 and -50 and 7 should not match you could add 50 to the alternation and match digits 0-6, 8 and 9 using
^-?(?:[0-689]|[1-4][0-9]|50)$
Regex demo
The pattern matches either
double digit numbers with leading 1, 2, 3 or 4 (positive and negative)
or any single digit number (positive and negative)
Regex:
^(-?[1-4]\d|-?\d)$
To fulfill the range of 49 to negative 49 your tests should actually look like this:
self.assertFalse(is_number("50")) # 50 must be assertFalse
self.assertFalse(is_number("-50")) # -50 must be assertFalse
self.assertTrue(is_number("-9"))
self.assertTrue(is_number("7")) # 7 must be assertTrue
self.assertFalse(is_number("-200"))
self.assertTrue(is_number("-21"))
self.assertTrue(is_number("18"))
self.assertTrue(is_number("0"))
self.assertTrue(is_number("49"))
self.assertFalse(is_number("100"))
self.assertTrue(is_number("-49"))
Try this regex pattern,
^[-]?[0-4]?\d$
The first digit must be within the four digits 0,1,2,3 and 4 and the last digit can be any.
pattern = r'^-?[0-4]?\d$'
Try out this regex pattern

Regular expression for three numbers

I created a regular expression that would match 2 numbers in any order from a four digit number. I am trying to create a regular expression that can math 3 numbers out of a four digit number in any order. Below is what I currently use to match two numbers:
regEx01 = re.compile(r'\b[0-9]*(?:0[0-9]*[0-9]?1|1[0-9]*[0-9]?0)[0-9]*\b')
Matches 0 and 1 in 7019, 8019, 2160.
Future regular expression must match 0, 1 and 2.
7012 positive
0190 negative
9201 positive
1226 negative
Any direction will be greatly appreciated.
You may use a regex based on positive lookaheads to make it concise:
\b(?=\d*0)(?=\d*1)(?=\d*2)\d+\b
See the regex demo.
Details
\b - word boundary
(?=\d*0) - a positive lookahead that requires a 0 after zero or more digits
(?=\d*1) - requires 1
(?=\d*2) - requires 2
\d+ - 1+ digits
\b - word boundary
Or, to increase performance, replace \d*s with "subtracted" values:
r'\b(?=[1-9]*0)(?=[02-9]*1)(?=[013-9]*2)\d+\b'
See this regex demo
Here, (?=[1-9]*0) quickly checks if there is a 0 after 0+ digits from 1 to 9, (?=[02-9]*1) checks for 1 and (?=[013-9]*2) checks for 2 in a similar way.

Categories