Why the regex to match a project tag is not working? - python

Can anyone provide guidance on why the following regex is not working?
project = 'AirPortFamily'
line = 'AirPortFamily-1425.9:'
if re.findall('%s-(\d+):'%project,line):
print line
EXPECTED OUTPUT:-
AirPortFamily-1425.9

You should match the optional groups of digits preceded by a dot:
if re.findall(r'(%s-\d+(?:\.\d+)*):'%project,line):

Answer by Brandon looks good.
But in case there is a condition like "For a tag to be valid, it must end with a colon(:)"
In order to cover that condition, modifying Brandon's answer a little
project = 'AirPortFamily'
line = 'AirPortFamily-1425.9:'
matches = re.findall('%s-\d+\.+\d+\.*\d+:$'%project,line)
if matches:
for elem in matches:
print elem.split(':')[0]
Here is its working
#Matching lines with colon(:) at the end
>>> import re
>>> project = 'AirPortFamily'
>>> line = 'AirPortFamily-1425.9:'
>>> matches = re.findall('%s-\d+\.+\d+\.*\d+:$'%project,line)
>>> if matches:
... for elem in matches:
... print elem.split(':')[0]
...
AirPortFamily-1425.9 #Look, the output is the way you want.
#Below snippet with same regex and different line content (without :) doesn't match it
>>> line = 'AirPortFamily-1425.9'
>>> matches = re.findall('%s-\d+\.+\d+\.*\d+:$'%project,line)
>>> if matches:
... for elem in matches:
... print elem.split(':')[0]
...
>>> #Here, no output means no match

Your regex is missing the . after the first set of numbers. Here's a working example:
project = 'AirPortFamily'
line = 'AirPortFamily-1425.9:'
matches = re.findall('%s-\d+\.\d+'%project,line)
if matches:
print matches

On the possibilities with 1.1.1.1.1 I'm getting the : so just stripping it from the results
if re.findall(r'%s-[\d+\.\d+]+:'%project,line):
print(line.strip(':'))
(xenial)vash#localhost:~/python/stack_overflow$ python3.7 formats.py
AirPortFamily-1425.9.1.1.1

This is my regex,my test on regex101 edit again
import re
project = 'AirPortFamily'
line = 'AirPortFamily-1425.9:AirPortfamily-14.5.9AirPortFamily-14.2.5.9:'
result = re.findall('%s[0-9.-]+:'%project,line) #this dose not cantain ':' [s[:-1] for s in re.findall('%s[0-9.-]+:'%project,line)]
if result:
for each in result:
print (each)

Related

Regular expression for YYYY-MM-DDTHH:MM:SS is not detecting the presence of .00Z [duplicate]

Suppose I have a string like test-123.
I want to test whether it matches a pattern like test-<number>, where <number> means one or more digit symbols.
I tried this code:
import re
correct_string = 'test-251'
wrong_string = 'test-123x'
regex = re.compile(r'test-\d+')
if regex.match(correct_string):
print 'Matching correct string.'
if regex.match(wrong_string):
print 'Matching wrong_string.'
How can I make it so that only the correct_string matches, and the wrong_string doesn't? I tried using .search instead of .match but it didn't help.
Try with specifying the start and end rules in your regex:
re.compile(r'^test-\d+$')
For exact match regex = r'^(some-regex-here)$'
^ : Start of string
$ : End of string
Since Python 3.4 you can use re.fullmatch to avoid adding ^ and $ to your pattern.
>>> import re
>>> p = re.compile(r'\d{3}')
>>> bool(p.match('1234'))
True
>>> bool(p.fullmatch('1234'))
False
I think It may help you -
import re
pattern = r"test-[0-9]+$"
s = input()
if re.match(pattern,s) :
print('matched')
else :
print('not matched')
You can try re.findall():
import re
correct_string = 'test-251'
if len(re.findall("test-\d+", correct_string)) > 0:
print "Match found"
A pattern such as \btest-\d+\b should do you;
matches = re.search(r'\btest-\d+\', search_string)
Demo
This requires the matching of word boundaries, so prevents other substrings from occuring after your desired match.

Python Regex find all matches after specific word

I have a string as below
"Server: myserver.mysite.com\r\nAddress: 111.122.133.144\r\n\r\nName: myserver.mysite.com\r\nAddress: 123.144.412.111\r\nAliases: alias1.myserver.mysite.com\r\n\t myserver.mysite.com\r\n\r\n"
I'm currently struggling to write a function in python that will find all aliases and put them in a list. So basically, I need a list that will be ['alias1.myserver.mysite.com', 'myserver.mysite.com']
I tried the following code
pattern = '(?<=Aliases: )([\S*]+)'
name = re.findall(pattern, mystring)
but it only matches the first alias and not both of them.
Any ideas on this?
Greatly appreciated!
Try the following:
import re
s = "Server: myserver.mysite.com\r\nAddress: 111.122.133.144\r\n\r\nName: myserver.mysite.com\r\nAddress: 123.144.412.111\r\nAliases: alias1.myserver.mysite.com\r\n\t myserver.mysite.com\r\n\r\n"
l = re.findall(r'\S+', s.split('Aliases: ')[1])
print(l)
Prints:
['alias1.myserver.mysite.com', 'myserver.mysite.com']
Explanation
First we split the string into two pieces and keep the second piece with s.split('Aliases: ')[1]. This evaluates to the part of the string that follows 'Aliases: '.
Next we use findall with the regaular expression:
\S+
This matches all consecutive strings of one or more non-space characters.
But this can be more simply done in this case without using a regex:
s = "Server: myserver.mysite.com\r\nAddress: 111.122.133.144\r\n\r\nName: myserver.mysite.com\r\nAddress: 123.144.412.111\r\nAliases: alias1.myserver.mysite.com\r\n\t myserver.mysite.com\r\n\r\n"
l = s.split('Aliases: ')[1].split()
print(l)
Try this :
import re
regex = re.compile(r'[\n\r\t]')
t="Server: myserver.mysite.com\r\nAddress: 111.122.133.144\r\n\r\nName: myserver.mysite.com\r\nAddress: 123.144.412.111\r\nAliases: alias1.myserver.mysite.com\r\n\t myserver.mysite.com\r\n\r\n"
t = regex.sub(" ", t)
t = t.split("Aliases:")[1].strip().split()
print(t)

When using "re.search", how do I search for the second instance rather than the first?

re.search looks for the first instance of something. In the following code, "\t" appears twice. Is there a way to make it skip forward to the second instance?
code = ['69.22\t82.62\t134.549\n']
list = []
text = code
m = re.search('\t(.+?)\n', text)
if m:
found = m.group(1)
list.append(found)
result:
list = ['82.62\t134.549']
expected:
list = ['134.549']
This modified version of your expression does return the desired output:
import re
code = '69.22\t82.62\t134.549\n'
print(re.findall(r'.*\t(.+?)\n', code))
Output
['134.549']
I'm though guessing that maybe you'd like to design an expression, somewhat similar to:
(?<=[\t])(.+?)(?=[\n])
DEMO
There is only one solution for greater than the "second" tab.
You can do it like this :
^(?:[^\t]*\t){2}(.*?)\n
Explained
^ # BOS
(?: # Cluster
[^\t]* # Many not tab characters
\t # A tab
){2} # End cluster, do 2 times
( .*? ) # (1), anything up to
\n # first newline
Python code
>>> import re
>>> text = '69.22\t82.62\t134.549\n'
>>> m = re.search('^(?:[^\t]*\t){2}(.*?)\n', text)
>>> if m:
>>> print( m.group(1) )
...
134.549
>>>

How can I make a regex match the entire string?

Suppose I have a string like test-123.
I want to test whether it matches a pattern like test-<number>, where <number> means one or more digit symbols.
I tried this code:
import re
correct_string = 'test-251'
wrong_string = 'test-123x'
regex = re.compile(r'test-\d+')
if regex.match(correct_string):
print 'Matching correct string.'
if regex.match(wrong_string):
print 'Matching wrong_string.'
How can I make it so that only the correct_string matches, and the wrong_string doesn't? I tried using .search instead of .match but it didn't help.
Try with specifying the start and end rules in your regex:
re.compile(r'^test-\d+$')
For exact match regex = r'^(some-regex-here)$'
^ : Start of string
$ : End of string
Since Python 3.4 you can use re.fullmatch to avoid adding ^ and $ to your pattern.
>>> import re
>>> p = re.compile(r'\d{3}')
>>> bool(p.match('1234'))
True
>>> bool(p.fullmatch('1234'))
False
I think It may help you -
import re
pattern = r"test-[0-9]+$"
s = input()
if re.match(pattern,s) :
print('matched')
else :
print('not matched')
You can try re.findall():
import re
correct_string = 'test-251'
if len(re.findall("test-\d+", correct_string)) > 0:
print "Match found"
A pattern such as \btest-\d+\b should do you;
matches = re.search(r'\btest-\d+\', search_string)
Demo
This requires the matching of word boundaries, so prevents other substrings from occuring after your desired match.

Brackets in python rgex

Here's a strip of my code:
import re
prog2 = re.compile(r'\[\w\]')
activity = "[CS150]"
if prog2.match(activity):
print 'matched'
else:
print 'unmatched'
I don't know why it prints unmatched, because I stated the pattern well I think and gave a right input.
You need to match more than one character:
prog2 = re.compile(r'\[\w+\]')
Note the + quantifier. Without it, the \w character class will match just one character, with it, the pattern matches at least one character.
Demo:
>>> import re
>>> prog2 = re.compile(r'\[\w+\]')
>>> activity = "[CS150]"
>>> prog2.match(activity)
<_sre.SRE_Match object at 0x106b2f6b0>
You have to add an * at the end of your regexp. Otherwise it matches a single character.
This below gives the expected result
import re
prog2 = re.compile(r'\[\w+\]')
activity = "[CS150]"
if prog2.match(activity):
print ('matched')
else:
print ('unmatched')
>>matched
Hope that was helpful !

Categories