Regex matching for special characters between numbers - python

Condition:
1 to 2 digit number followed by optional [ . , ” ’] + 1 to 4 digit number
Examples:
7 M
13.6 M
8.205m
9.,56m
Expected Results:
7
13.6
8.205
9.,56
Regex Pattern I've tried:
(?:^\d{1,2})(?:[\.\,\’\"]{0,2})\d{0,4}
This doesn't work as expected. Any suggestions?

You don't need the spaces in the character class and you don't have to escape the chars as well.
Matching optional digits \d{0,4} could possibly also match 13.
You could make the second part optional inclusing the character class and the digits and use a quantifier +
^\d{1,2}(?:[.,”’]+\d{1,4})?
Regex demo
If the M should be present, you could use a capturing group
^(\d{1,2}(?:[.,”’]+\d{1,4})?) ?[Mm]$
Regex demo

Related

Regex for Alternating Numbers

I am trying to write a regex pattern for phone numbers consisting of 9 fixed digits.
I want to identify numbers that have two numbers alternating for four times such as 5XYXYXYXY
I used the below sample
number = 561616161
I tried the below pattern but it is not accurate
^5(\d)(?=\d\1).+
can someone point out what i am doing wrong?
I would use:
^(?=\d{9}$)\d*(\d)(\d)(?:\1\2){3}\d*$
Demo
Here is an explanation of the pattern:
^ from the start of the number
(?=\d{9}$) assert exactly 9 digits
\d* match optional leading digits
(\d) capture a digit in \1
(\d) capture another digit in \2
(?:\1\2){3} match the XY combination 3 more times
\d* more optional digits
$ end of the number
If you want to repeat 4 times 2 of the same pairs and matching 9 digits in total:
^(?=\d{9}$)\d*(\d\d)\1{3}\d*$
Explanation
^ Start of string
(?=\d{9}$) Positive lookahead, assert 9 digits till the end of the string
\d* Match optional digits
(\d\d)\1{3} Capture group 1, match 2 digits and then repeat what is captured in group 1 3 times
\d* Match optional digits
$ End of string
Regex demo
If you want to match a pattern repeating 4 times 2 digits where the 2 digits are not the same:
^(?=\d{9}$)\d*((\d)(?!\2)\d)\1{3}\d*$
Explanation
^ Start of string
(?=\d{9}$) Positive lookahead, assert 9 digits till the end of the string
\d* Match optional digits
( Capture group 1
(\d) Capture group 2, match a single digit
(?!\2)\d Negative lookahead, assert not the same char as captured in group 2. If that is the case, then match a single digit
) Close group 1
\1{3} Repeat the captured value of capture group 1 3 times
\d* Match optional digits
$ End of string
Regex demo
My first guess from OP's self tried regex ^5(\d)(?=\d\1).+ without any own additions was a regex is needed to verify numbers starting with 5 and followed by 4 pairs of same two digits.
^5(\d\d)\1{3}$
Demo at regex101
The same idea with the "added guess" to disallow all same digits like e.g. 511111111
^5((\d)(?!\2)\d)\1{3}$
Demo at regex101
Guessing further that 5 is a variable value and assuming if one variable at start/end with the idea of taking out invalid values early - already having seen the other nice provided answers.
^(?=\d?(\d\d)\1{3})\d{9}$
Demo at regex101
Solution 3 with solution 2's assumption of two different digits in first pairing.
^(?=\d?((\d)(?!\2)\d)\1{3})\d{9}$
Demo at regex101
Solutions 3 and 4 are most obvious playings with #4thBird's nice answer in changed order.

Python regular expression for height?

I am trying to create a regular expression that works for the different types of height inputs, it should work for the following examples below:
5-10
5-09
5-9
6'
6'0
5'9"
5'09"
5'9
5'09
I don't need to consider values below 4'0 or above 6'11.
Here's my regular expression so far:
[456][-']\d{1,2}"?
I need to make the " not work if there is a - between feet and inches.
Also, for the inches part, I am currently allowing for either 1 or 2 digits, when I really only want to allow for two digits when the first digit is a 0 or 1, and if it is 1, the second digit can only be 0 or 1.
For example, 00-09 should work but and 10 and 11 should work but not 12 or any other two-digit number.
You might use an alternation with an optional - and digits part, or match the ' followed by a second ' and use a capture group with an if clause to match up the "
\b(?<![-'"])(?:1[01]|0?\d)(?:'(?:(?:1[01]|0?\d)\b"?)?|-(?:1[01]|0?\d\b))(?![-'"])
The pattern matches:
\b A word boundary to prevent a partial word match
(?<![-'"]) Negative lookbehind, assert not ' or - or " directly to the left
(?:1[01]|0?\d) Match from 0-9 with optional leading 0 and 10 and 11
(?: Non capture group
' Match literally
(?: Non capture group
(?:1[01]|0?\d)\b
"? Match optional "
)? Close non capture group and make it optional
| Or
- Match literally
(?:1[01]|0?\d\b) Match 0-9 10 or 11 followed by a word boundary
) Close the outer group
(?![-'"]) Negative lookahead, assert not - or ' or " to the right
Regex demo

How to match a particular URL pattern in Python regex

I am having trouble matching a pattern of this format: p#.g.com where # is not a 1 or a 2. For instance if the pattern is p1.g.com, I don't need to match. If it it p2.g.com, I don't need to match.
But if it is any other number, such as p3.g.com or p29.g.com, then I need to match.
My current pattern is r"(?P<url>p([^1,2])\.g\.com)", but this fails if the pattern is p##.g.com, basically any two digit number it fails on. There is no upper limit on the #, so it could be a 3 or 999 or anything in between.
I also tried r"(?P<url>p([^1,2])\d+\.g\.com)" but that does not match any number beginning with a 1 or a 2. For instance 11 or 23 are not matched, which I do want matched.
Try this regex:
p(?:[03-9]|\d{2,})\.g\.com
Demo
Explanation:
Matches character p
Start of non-capturing group
Match one of:
The digits 0 or 3-9
Any double digit number like 10 or higher
Matches character .g.com

Regex pattern to pass tests

I want to write regex pattern by def function lets name it is_number(string), which check if string is integer from range -49 to 49. Also number should no contain insignificant zeros.
So i want to pass test:
self.assertTrue(is_number("50"))
self.assertTrue(is_number("-50"))
self.assertTrue(is_number("-9"))
self.assertFalse(is_number("7"))
self.assertFalse(is_number("-200"))
self.assertTrue(is_number("-21"))
self.assertTrue(is_number("18"))
self.assertTrue(is_number("0"))
self.assertTrue(is_number("49"))
self.assertFalse(is_number("100"))
self.assertTrue(is_number("-49"))
I tried something like, but it doesnt work:
def is_number(string):
pattern = r'[-]?\d[1,4]{1,2}*'
return re.search(pattern, string)
You might use
^-?(?:[0-9]|[1-4][0-9])$
That will match
^ Start of string
-? Optional -
(?: Non capturing group
[0-9] Match a digit 0-9
| Or
[1-4][0-9] Match a digit 1-4 and a digit 0-9 to match a range 10 - 49
) Close group
$ End of string
Regex demo
If you also want to match 50 and -50 and 7 should not match you could add 50 to the alternation and match digits 0-6, 8 and 9 using
^-?(?:[0-689]|[1-4][0-9]|50)$
Regex demo
The pattern matches either
double digit numbers with leading 1, 2, 3 or 4 (positive and negative)
or any single digit number (positive and negative)
Regex:
^(-?[1-4]\d|-?\d)$
To fulfill the range of 49 to negative 49 your tests should actually look like this:
self.assertFalse(is_number("50")) # 50 must be assertFalse
self.assertFalse(is_number("-50")) # -50 must be assertFalse
self.assertTrue(is_number("-9"))
self.assertTrue(is_number("7")) # 7 must be assertTrue
self.assertFalse(is_number("-200"))
self.assertTrue(is_number("-21"))
self.assertTrue(is_number("18"))
self.assertTrue(is_number("0"))
self.assertTrue(is_number("49"))
self.assertFalse(is_number("100"))
self.assertTrue(is_number("-49"))
Try this regex pattern,
^[-]?[0-4]?\d$
The first digit must be within the four digits 0,1,2,3 and 4 and the last digit can be any.
pattern = r'^-?[0-4]?\d$'
Try out this regex pattern

Regular Expression Matching with Carriage Returns in Python

I have the following data and want to match certain strings as commented below.
FTUS80 KWBC 081454 AAA\r\r TAF AMD #should match 'AAA'
LTUS41 KCTP 082111 RR3\r\r TMLLNS\r #should match 'RR3' and 'TMLLNS'
SRUS55 KSLC 082010\r\r HM5SLC\r\r #should match 'HM5SLC'
SRUS55 KSLC 082010\r\r SIGC \r\r #should match 'SIGC ' including whitespace
I need the following conditions met. But it doesn't work when I put it all together so I know I have mistakes. Thanks in advance.
Start match after 6 digit string: (?<=\d{6})
match if 3 character mixed uppercase/digits and before first 2 carriage returns: ([A-Z0-9]{3})(?=\r)
match if 6 characters mixed uppercase/digits after carriage returns: (?<=\r\r[A-Z0-9]{6})
match if 4 characters and two spaces: ([A-Z0-9]{4} )
There is probably a more elegant way, but you could do something like the following:
(?:\d{6}\s?)([A-Z\d]{3})?(?:[\r\n]{2}\s)([A-Z\d]{6}|[A-Z\d]{4}\s{2})?
(?:\d{6}\s?) non capture group of 6 digits followed by an optional space
([A-Z\d]{3})? optional capture group of 3 uppercase letters / digits
(?:[\r\n]{2}\s) non capture group of two line endings followed by 1 space
([A-Z\d]{6}|[A-Z\d]{4}\s{2})? optional capture group of either 6 uppercase letters / digits OR 4 uppercase letters / digits followed by 2 spaces
It's not clear what's the end of line here but assuming it's Unix one \n, the following expression captures strings as requested (double quotes added to show white space)
sed -rne 's/^.{18} ?([A-Z0-9]{3,3})?\r{2}?([^\r]+)?\r.*$/"\1\2"/p' text.txt
Result
"AAA"
"RR3 TMLLNS"
" HM5SLC"
" SIGC "
.{18} first 18 characters
?([A-Z0-9]{3,3})? matches AAA or RR3 without leading space
\r{2}?([^\r]+)?\r matches TMLLNS, HM5SLC or SIGC preceded by 2 \r and followed by 1 \r characters.

Categories