Regular expression to match a phone number in string - python

I am trying to match a phone number on a specific format like 021-768-4444 to do that i wrote a program that recognizes a valid phone number when a string is passed to the regular expression, and successfully my program accomplish this task, but when i pass a phone number other than this format it also recognizes rather than show me None:
Here is the code:
import re
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = phoneNumRegex.search('My number is 415-555-42424854-201.')
if mo is not None:
print('Phone number found: ' + mo.group())
else:
print("Pattern is not matched")
Above code give me this output:
Phone number found: 415-555-4242
while i expecting it to be None because i know that the search() method will return None if the regex pattern is not found in the string.
But if i pass a correct phone number it works as expected:
mo = phoneNumRegex.search('My number is 415-555-4242.')
It's very strange behavior for me, can someone guide me where i am wrong?
Any help would be really appreciated.Thanks

(?<!\d)\d{3}\-\d{3}\-\d{4}(?!\d)
https://regex101.com/r/VAaA7k/1
phoneNumRegex = re.compile(r'(?<!\d)\d{3}\-\d{3}\-\d{4}(?!\d)')
https://www.ideone.com/4Ajfqe
To prevent other unwanted matches, use a stronger regex pattern
(?:^|:|(?<=\s))(\d{3}\-\d{3}\-\d{4})(?=[\s,;.]|$)
https://regex101.com/r/VAaA7k/5

search check if the regex matches part of the string. i.e search the regex in the string.
match check if the regex matches the entire string.
mo = phoneNumRegex.match('My number is 415-555-42424854-201.')
mo is None //True
Another option is to match the start and end of the string -
phoneNumRegex = re.compile(r'^\d\d\d-\d\d\d-\d\d\d\d$')
mo = phoneNumRegex.search('My number is 415-555-42424854-201.')
mo is None //True

As #deceze reply, as per your given regular expression, it's working as you never mentioned "it shouldn't be longer".
Adding \b at end of regex will work for you
like below
import re
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d\b')
mo = phoneNumRegex.search('My number is 415-555-42424854-201.')
if mo is not None:
print('Phone number found: ' + mo.group())
else:
print("Pattern is not matched")

Related

Regex to exclude words followed by space

I tried a lot of solutions but can't get this Regex to work.
The string-
"Flow Control None"
I want to exclude "Flow Control" plus the blank space, and only return whatever is on the right.
Since you have tagged your question with #python and #regex, I'll outline a simple solution to your problem using these tools. Furthermore, the other two answers don't really tackle the exact problem of matching "whatever is on the right" of your "Flow Control " prefix.
First, start by importing the re builtin module (read the docs).
import re
Define the pattern you want to match. Here, we're matching "whatever is on the right" ((?P<suffix>.+)$) of ^Flow Control .
pattern = re.compile(r"^Flow Control (?P<suffix>.+)$")
Grab the match for a given string (e.g. "Flow Control None")
suffix = pattern.search("Flow Control None").group("suffix")
print(suffix) # Out: None
Hopefully, this complete working example will also help you
import re
def get_suffix(text: str):
pattern = re.compile(r"^Flow Control (?P<suffix>.+)$")
matches = pattern.search(text)
return matches.group("suffix") if matches else None
examples = [
"Flow Control None",
"Flow Control None None",
"Flow Control None",
"Flow Control ",
]
for example in examples:
suffix = get_suffix(text=example)
if suffix:
print(f"Matched: {repr(suffix)}")
else:
print(f"No matches for: {repr(example)}")
Use split like so:
my_str = 'Flow Control None'
out_str = my_str.split()[-1]
# 'None'
Or use re.findall:
import re
out_str = re.findall(r'^.*\s(\S+)$', my_str)[0]
If you really want a purely regex solution try this: (?<= )[a-zA-Z]*$
The (?<= ) matches a single ' ' but doesn't include it in the match. [a-zA-Z]* matches anything from a to z or A to Z any number of times. $ matches the end of the line.
You could also try replacing the * with a + if you want to ensure that your match has at least one letter (* will produce a 0-length match if your string ends in a space, + will match nothing).
But it may be clearer to do something like
data = "Flow Control None"
split = data.split(' ')
split[len(split) - 1] # returns "None"
EDIT data.split(' ')[-1] also returns "None"
or
data[data.rfind(' ') + 1:] # returns "None"
that don't involve regexes at all.

Python Regex Search Returns Positive Non-Integer Number <1 As Empty String

I am using Python Regex module to search a string, an example of string of interest is "*MBps 2.57".
I am using the following code:
temp_string = re.search('MBps, \d*\.?\d*', line)
if (temp_string != None):
temp_number = re.split(' ', temp_string.group(), 1)
I want to find instances where MBps is > 0, then take that number and process it.
The code works fine as long as the number after MBps is > 1. For example, if it's 'MBps 182.57', the RegEx object when converted to string shows 'MBps, 182.57'.
However, when the number after MBps is <1, for example, if it's 'MBps 0.31', then RegEx object returned shows 'MBps' but no number. It's just an empty string following the first match.
I have tried different regex matching strategies (re.match, re.findall), but none seemed to work correctly. In the regex101 testing site, it showed the regex expression working but I can't get Python regex module to match the behavior.
Any ideas on why it's happening and how to correct it?
Thanks
I would use re.findall here:
inp = "The first speed is 3.14 MBps and the second is 5.43 MBps"
matches = re.findall(r'\b(\d+(?:\.\d+)?) MBps\b', inp)
print(matches)
This prints:
['3.14', '5.43']
OK, I found a way to make this work.
I changed the code to:
temp_string = re.search('MBps, [0-9\.]+', line)
if (temp_string != None):
temp_number = re.split(' ', temp_string.group(), 1)
That worked to capture all the numbers. I think being explicit in Regex matching rather than just \d+ or \d* makes this work better.
Thanks

remove all characters aside from the decimal numbers immediately followed by the £ sign

I have text with values like:
this is a value £28.99 (0.28/ml)
I want to remove everything to return the price only so it returns:
£28.99
there could be any number of digits between the £ and .
I think
r"£[0-9]*\.[0-9]{2}"
matches the pattern I want to keep but i'm unsure on how to remove everything else and keep the pattern instead of replacing the pattern like in usual re.sub() cases.
I want to remove everything to return the price only so it returns:
Why not trying to extract the proper information instead?
import re
s = "this is a value £28.99 (0.28/ml)"
m = re.search("£\d*(\.\d+)?",s)
if m:
print(m.group(0))
to find several occurrences use findall or finditer instead of search
You don't care how many digits are before the decimal, so using the zero-or-more matcher was correct. However, you could just rely on the digit class (\d) to provide that more succinctly.
The same is true of after the decimal. You only need two so your limiting the matches to 2 is correct.
The issue then comes in with how you actually capture the value. You can use a capturing group to be sure that you only ever get the value you care about.
Complete regex:
(£\d*.\d{2})
Sample code:
import re
r = re.compile("(£\d*.\d{2})")
match = r.findall("this is a value £28.99 (0.28/ml)")
if match: # may bring back an empty list; check for that here
print(match[0]) # uses the first group, and will print £28.99
If it's a string, you can do something like this:
x = "this is a value £28.99 (0.28/ml)"
x_list = x.split()
for i in x_list:
if "£" in i: #or if i.startswith("£") Credit – Jean-François Fabre
value=i
print(value)
>>>£28.99
You can try:
import re
t = "this is a value £28.99 (0.28/ml)"
r = re.sub(".*(£[\d.]+).*", r"\1", t)
print(r)
Output:
£28.99
Python Demo

simple regex pattern not matching [duplicate]

>>> import re
>>> s = 'this is a test'
>>> reg1 = re.compile('test$')
>>> match1 = reg1.match(s)
>>> print match1
None
in Kiki that matches the test at the end of the s. What do I miss? (I tried re.compile(r'test$') as well)
Use
match1 = reg1.search(s)
instead. The match function only matches at the start of the string ... see the documentation here:
Python offers two different primitive operations based on regular expressions: re.match() checks for a match only at the beginning of the string, while re.search() checks for a match anywhere in the string (this is what Perl does by default).
Your regex does not match the full string. You can use search instead as Useless mentioned, or you can change your regex to match the full string:
'^this is a test$'
Or somewhat harder to read but somewhat less useless:
'^t[^t]*test$'
It depends on what you're trying to do.
It's because of that match method returns None if it couldn't find expected pattern, if it find the pattern it would return an object with type of _sre.SRE_match .
So, if you want Boolean (True or False) result from match you must check the result is None or not!
You could examine texts are matched or not somehow like this:
string_to_evaluate = "Your text that needs to be examined"
expected_pattern = "pattern"
if re.match(expected_pattern, string_to_evaluate) is not None:
print("The text is as you expected!")
else:
print("The text is not as you expected!")

python regex for repeating string

I am wanting to verify and then parse this string (in quotes):
string = "start: c12354, c3456, 34526; other stuff that I don't care about"
//Note that some codes begin with 'c'
I would like to verify that the string starts with 'start:' and ends with ';'
Afterward, I would like to have a regex parse out the strings. I tried the following python re code:
regx = r"start: (c?[0-9]+,?)+;"
reg = re.compile(regx)
matched = reg.search(string)
print ' matched.groups()', matched.groups()
I have tried different variations but I can either get the first or the last code but not a list of all three.
Or should I abandon using a regex?
EDIT: updated to reflect part of the problem space I neglected and fixed string difference.
Thanks for all the suggestions - in such a short time.
In Python, this isn’t possible with a single regular expression: each capture of a group overrides the last capture of that same group (in .NET, this would actually be possible since the engine distinguishes between captures and groups).
Your easiest solution is to first extract the part between start: and ; and then using a regular expression to return all matches, not just a single match, using re.findall('c?[0-9]+', text).
You could use the standard string tools, which are pretty much always more readable.
s = "start: c12354, c3456, 34526;"
s.startswith("start:") # returns a boolean if it starts with this string
s.endswith(";") # returns a boolean if it ends with this string
s[6:-1].split(', ') # will give you a list of tokens separated by the string ", "
This can be done (pretty elegantly) with a tool like Pyparsing:
from pyparsing import Group, Literal, Optional, Word
import string
code = Group(Optional(Literal("c"), default='') + Word(string.digits) + Optional(Literal(","), default=''))
parser = Literal("start:") + OneOrMore(code) + Literal(";")
# Read lines from file:
with open('lines.txt', 'r') as f:
for line in f:
try:
result = parser.parseString(line)
codes = [c[1] for c in result[1:-1]]
# Do something with teh codez...
except ParseException exc:
# Oh noes: string doesn't match!
continue
Cleaner than a regular expression, returns a list of codes (no need to string.split), and ignores any extra characters in the line, just like your example.
import re
sstr = re.compile(r'start:([^;]*);')
slst = re.compile(r'(?:c?)(\d+)')
mystr = "start: c12354, c3456, 34526; other stuff that I don't care about"
match = re.match(sstr, mystr)
if match:
res = re.findall(slst, match.group(0))
results in
['12354', '3456', '34526']

Categories