Python REGEX search returns None as answer - python

I'm getting None as answer for this code. When I do either email or phone alone the code works. But using both returns a none . Please help!
import re
string = 'Love, Kenneth, kenneth+challenge#teamtreehouse.com, 555-555-5555, #kennethlove Chalkley, Andrew, andrew#teamtreehouse.co.uk, 555-555-5556, #chalkers McFarland, Dave, dave.mcfarland#teamtreehouse.com, 555-555-5557, #davemcfarland Kesten, Joy, joy#teamtreehouse.com, 555-555-5558, #joykesten'
contacts = re.search(r'''
^(?P<email>[-\w\d.+]+#[-\w\d.]+) # Email
(?P<phone>\d{3}-\d{3}-\d{4})$ # Phone
''', string, re.X|re.M)
print(contacts.groupdict)

Perhaps you want:
(?P<email>[-\w\d.+]+#[-\w\d.]+), (?P<phone>\d{3}-\d{3}-\d{4})
This matches the parts:
kenneth+challenge#teamtreehouse.com, 555-555-5555
andrew#teamtreehouse.co.uk, 555-555-5556
dave.mcfarland#teamtreehouse.com, 555-555-5557
joy#teamtreehouse.com, 555-555-5558
Debuggex Demo

You are using ^ and $ to enforce a match on the entire string. Your regexp seems designed to match only a substring.

Related

Regular expression to match a phone number in string

I am trying to match a phone number on a specific format like 021-768-4444 to do that i wrote a program that recognizes a valid phone number when a string is passed to the regular expression, and successfully my program accomplish this task, but when i pass a phone number other than this format it also recognizes rather than show me None:
Here is the code:
import re
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = phoneNumRegex.search('My number is 415-555-42424854-201.')
if mo is not None:
print('Phone number found: ' + mo.group())
else:
print("Pattern is not matched")
Above code give me this output:
Phone number found: 415-555-4242
while i expecting it to be None because i know that the search() method will return None if the regex pattern is not found in the string.
But if i pass a correct phone number it works as expected:
mo = phoneNumRegex.search('My number is 415-555-4242.')
It's very strange behavior for me, can someone guide me where i am wrong?
Any help would be really appreciated.Thanks
(?<!\d)\d{3}\-\d{3}\-\d{4}(?!\d)
https://regex101.com/r/VAaA7k/1
phoneNumRegex = re.compile(r'(?<!\d)\d{3}\-\d{3}\-\d{4}(?!\d)')
https://www.ideone.com/4Ajfqe
To prevent other unwanted matches, use a stronger regex pattern
(?:^|:|(?<=\s))(\d{3}\-\d{3}\-\d{4})(?=[\s,;.]|$)
https://regex101.com/r/VAaA7k/5
search check if the regex matches part of the string. i.e search the regex in the string.
match check if the regex matches the entire string.
mo = phoneNumRegex.match('My number is 415-555-42424854-201.')
mo is None //True
Another option is to match the start and end of the string -
phoneNumRegex = re.compile(r'^\d\d\d-\d\d\d-\d\d\d\d$')
mo = phoneNumRegex.search('My number is 415-555-42424854-201.')
mo is None //True
As #deceze reply, as per your given regular expression, it's working as you never mentioned "it shouldn't be longer".
Adding \b at end of regex will work for you
like below
import re
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d\b')
mo = phoneNumRegex.search('My number is 415-555-42424854-201.')
if mo is not None:
print('Phone number found: ' + mo.group())
else:
print("Pattern is not matched")

How can I select this specific part of a string using regex?

Hi and thank you for your time.
I have the following example string: "Hola Luis," but the string template will always be "Hola {{name}},".
How would the regex be to match any name? You can assume the name will follow a blank space and "Hola" before that and it will have a comma right after it.
Thank you!
You can use the following regular expression, assuming that as you mention, the format is always the same:
import re
s = "Hola Luis,"
re.search('Hola (\w+),', s).group(1)
# 'Luis'
s = 'Hola test'
re.match(r'Hola (\w+)', s).groups()[0]
results:
'test'
Continuing from #yatu,
Without regex:
print("Hola Luis,".split(" ")[1].strip(","))
Explanation:
split(" ") # to split the string with spaces
[1] # to get the forthcoming part
strip(",") # to strip off any ','
OUTPUT:
Luis
According to Falsehoods Programmers Believe About Names and your requirements, I'll use the following regex: (?<=Hola )[^,]+(?=,).

How to extract function name python regex

Hello I am trying to extract the function name in python using Regex however I am new to Python and nothing seems to be working for me. For example: if i have a string "def myFunction(s): ...." I want to just return myFunction
import re
def extractName(s):
string = []
regexp = re.compile(r"\s*(def)\s+\([^\)]*\)\s*{?\s*")
for m in regexp.finditer(s):
string += [m.group()]
return string
Assumption: You want the name myFunction from "...def myFunction(s):..."
I find something missing in your regex and the way it is structured.
\s*(def)\s+\([^\)]*\)\s*{?\s*
Lets look at it step by step:
\s*: match to zero or more white spaces.
(def): match to the word def.
\s+: match to one or more white spaces.
\([^\)]*\): match to balanced ()
\s*: match to zero or more white spaces.
After that pretty much doesn't matter if you are going for just the name of the function. You are not matching the exact thing you want out of the regex.
You can try this regex if you are interested in doing it by regex:
\s*(def)\s([a-zA-Z]*)\([a-zA-z]*\)
Now the way I have structured the regex, you will get def myFunction(s) in group0, def in group1 and myFunction in group2. So you can use the following code to get you result:
import re
def extractName(s):
string = ""
regexp = re.compile(r"(def)\s([a-zA-Z]*)\([a-zA-z]*\)")
for m in regexp.finditer(s):
string += m.group(2)
return string
You can check your regex live by going on this site.
Hope it helps!

regex to extract data between quotes

As title says string is '="24digit number"' and I want to extract number between "" (example: ="000021484123647598423458" should get me '000021484123647598423458').
There are answers that answer how to get data between " but in my case I also need to confirm that =" exist without capturing (there are also other "\d{24}" strings, but they are for other stuff) it.
I couldn't modify these answers to get what I need.
My latest regex was ((?<=\")\d{24}(?=\")) and string is ="000021484123647598423458".
UPDATE: I think I will settle with pattern r'^(?:\=\")(\d{24})(?:\")' because I just want to capture digit characters.
word = '="000021484123647598423458"'
pattern = r'^(?:\=\")(\d{24})(?:\")'
match = re.findall(pattern, word)[0]
Thank you all for suggestions.
You could have it like:
=(['"])(\d{24})\1
See a demo on regex101.com.
In Python:
import re
string = '="000021484123647598423458"'
rx = re.compile(r'''=(['"])(\d{24})\1''')
print(rx.search(string).group(2))
# 000021484123647598423458
Any one of the following works:
>>> st = '="000021484123647598423458"'
>>> import re
>>> re.findall(r'".*\d+.*"',st)
['"000021484123647598423458"']
or
>>> re.findall(r'".*\d{24}.*"',st)
['"000021484123647598423458"']
or
>>> re.findall(r'"\d{24}"',st)
['"000021484123647598423458"']

Capturing emails with regex in Python

I will be gathering scattered emails from a larger CSV file. I am just now learning regex. I am trying to extract the emails from this example sentence. However, emails is populating with only the # symbol and the letter immediately before that. Can you help me see what's going wrong?
import re
String = "'Jessica's email is jessica#gmail.com, and Daniel's email is daniel123#gmail.com. Edward's is edwardfountain#gmail.com, and his grandfather, Oscar's, is odawg#gmail.com.'"
emails = re.findall(r'.[#]', String)
names = re.findall(r'[A-Z][a-z]*',String)
print(emails)
print(names)
your regex e-mail is not working at all: emails = re.findall(r'.[#]', String) matches anychar then #.
I would try a different approach: match the sentences and extract name,e-mails couples with the following empiric assumptions (if your text changes too much, that would break the logic)
all names are followed by 's" and is somewhere (using non-greedy .*? to match all that is in between
\w matches any alphanum char (or underscore), and only one dot for domain (else it matches the final dot of the sentence)
code:
import re
String = "'Jessica's email is jessica#gmail.com, and Daniel's email is daniel123#gmail.com. Edward's is edwardfountain#gmail.com, and his grandfather, Oscar's, is odawg#gmail.com.'"
print(re.findall("(\w+)'s.*? is (\w+#\w+\.\w+)",String))
result:
[('Jessica', 'jessica#gmail.com'), ('Daniel', 'daniel123#gmail.com'), ('Edward', 'edwardfountain#gmail.com'), ('Oscar', 'odawg#gmail.com')]
converting to dict would even give you a dictionary name => address:
{'Oscar': 'odawg#gmail.com', 'Jessica': 'jessica#gmail.com', 'Daniel': 'daniel123#gmail.com', 'Edward': 'edwardfountain#gmail.com'}
The general case needs more chars (not sure I'm exhaustive):
String = "'Jessica's email is jessica_123#gmail.com, and Daniel's email is daniel-123#gmail.com. Edward's is edward.fountain#gmail.com, and his grandfather, Oscar's, is odawg#gmail.com.'"
print(re.findall("(\w+)'s.*? is ([\w\-.]+#[\w\-.]+\.[\w\-]+)",String))
result:
[('Jessica', 'jessica_123#gmail.com'), ('Daniel', 'daniel-123#gmail.com'), ('Edward', 'edward.fountain#gmail.com'), ('Oscar', 'odawg#gmail.com')]
1. Emails
In [1382]: re.findall(r'\S+#\w+\.\w+', text)
Out[1382]:
['jessica#gmail.com',
'daniel123#gmail.com',
'edwardfountain#gmail.com',
'odawg#gmail.com']
How it works: All emails are xxx#xxx.xxx. One thing to note is a bunch of characters surrounding #, and the singular .. So, we use \S to demarcate anything that is not a whitespace. And + is to search for 1 or more such characters. \w+\.\w+ is just a fancy way of saying search for a string that only has one . in it.
2. Names
In [1375]: re.findall('[A-Z][\S]+(?=\')', text)
Out[1375]: ['Jessica', 'Daniel', 'Edward', 'Oscar']
How it works: Any word starting with an upper case. The (?=\') is a lookahead. As you see, all names follow the pattern Name's. We want everything before the apostrophe. Hence, the lookahead, which is not captured.
Now, if you want to map names to emails by capturing them together with one massive regex, you can. Jean-François Fabre's answer is a good start. But I recommend getting the basics down par first.
You need to find anchors, patterns to match. An improved pattern could be:
import re
String = "'Jessica's email is jessica#gmail.com, and Daniel's email is
daniel123#gmail.com. Edward's is edwardfountain#gmail.com, and his
grandfather, Oscar's, is odawg#gmail.com.'"
emails = re.findall(r'[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+', String)
names = re.findall(r'[A-Z][a-z]*', String)
print(emails)
print(names)
\w+ is missing '-' which are allowed in email adresses.
This is because you are not using the repeat operator. The below code uses the + operator which means the characters / sub patterns just before it can repeat 1 to many times.
s = '''Jessica's email is jessica#gmail.com, and Daniel's email is daniel123#gmail.com. Edward's is edwardfountain#gmail.com, and his grandfather, Oscar's, is odawg#gmail.com.'''
p = r'[a-z0-9]+#[a-z]+\.[a-z]+'
ans = re.findall(p, s)
print(ans)

Categories