This is one of those things where I'm sure I'm missing something simple, but... In the sample program below, I'm trying to use Python's RE library to parse the string "line" to get the floating-point number just before the percent sign, i.e. "90.31". But the code always prints "no match".
I've tried a couple other regular expressions as well, all with the same result. What am I missing?
#!/usr/bin/python
import re
line = ' 0 repaired, 90.31% done'
pct_re = re.compile(' (\d+\.\d+)% done$')
#pct_re = re.compile(', (.+)% done$')
#pct_re = re.compile(' (\d+.*)% done$')
match = pct_re.match(line)
if match: print 'got match, pct=' + match.group(1)
else: print 'no match'
match only matches from the beginning of the string. Your code works fine if you do pct_re.search(line) instead.
You should use re.findall instead:
>>> line = ' 0 repaired, 90.31% done'
>>>
>>> pattern = re.compile("\d+[.]\d+(?=%)")
>>> re.findall(pattern, line)
['90.31']
re.match will match at the start of the string. So you would need to build the regex for complete string.
try this if you really want to use match:
re.match(r'.*(\d+\.\d+)% done$', line)
r'...' is a "raw" string ignoring some escape sequences, which is a good practice to use with regexp in python. – kratenko (see comment below)
Related
Suppose I have a string like test-123.
I want to test whether it matches a pattern like test-<number>, where <number> means one or more digit symbols.
I tried this code:
import re
correct_string = 'test-251'
wrong_string = 'test-123x'
regex = re.compile(r'test-\d+')
if regex.match(correct_string):
print 'Matching correct string.'
if regex.match(wrong_string):
print 'Matching wrong_string.'
How can I make it so that only the correct_string matches, and the wrong_string doesn't? I tried using .search instead of .match but it didn't help.
Try with specifying the start and end rules in your regex:
re.compile(r'^test-\d+$')
For exact match regex = r'^(some-regex-here)$'
^ : Start of string
$ : End of string
Since Python 3.4 you can use re.fullmatch to avoid adding ^ and $ to your pattern.
>>> import re
>>> p = re.compile(r'\d{3}')
>>> bool(p.match('1234'))
True
>>> bool(p.fullmatch('1234'))
False
I think It may help you -
import re
pattern = r"test-[0-9]+$"
s = input()
if re.match(pattern,s) :
print('matched')
else :
print('not matched')
You can try re.findall():
import re
correct_string = 'test-251'
if len(re.findall("test-\d+", correct_string)) > 0:
print "Match found"
A pattern such as \btest-\d+\b should do you;
matches = re.search(r'\btest-\d+\', search_string)
Demo
This requires the matching of word boundaries, so prevents other substrings from occuring after your desired match.
Suppose I have a string like test-123.
I want to test whether it matches a pattern like test-<number>, where <number> means one or more digit symbols.
I tried this code:
import re
correct_string = 'test-251'
wrong_string = 'test-123x'
regex = re.compile(r'test-\d+')
if regex.match(correct_string):
print 'Matching correct string.'
if regex.match(wrong_string):
print 'Matching wrong_string.'
How can I make it so that only the correct_string matches, and the wrong_string doesn't? I tried using .search instead of .match but it didn't help.
Try with specifying the start and end rules in your regex:
re.compile(r'^test-\d+$')
For exact match regex = r'^(some-regex-here)$'
^ : Start of string
$ : End of string
Since Python 3.4 you can use re.fullmatch to avoid adding ^ and $ to your pattern.
>>> import re
>>> p = re.compile(r'\d{3}')
>>> bool(p.match('1234'))
True
>>> bool(p.fullmatch('1234'))
False
I think It may help you -
import re
pattern = r"test-[0-9]+$"
s = input()
if re.match(pattern,s) :
print('matched')
else :
print('not matched')
You can try re.findall():
import re
correct_string = 'test-251'
if len(re.findall("test-\d+", correct_string)) > 0:
print "Match found"
A pattern such as \btest-\d+\b should do you;
matches = re.search(r'\btest-\d+\', search_string)
Demo
This requires the matching of word boundaries, so prevents other substrings from occuring after your desired match.
I am new to python and have a question about using regex on strings. Currently I have:
def find_ips(ip):
ip_str = '\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b'
p = re.compile(ip_str)
m = p.match(ip)
if m:
print 'match found'
else:
print 'no match'
global find_addr
find_addr = p.match(ip)
return find_addr
find_ips('this is an ip 127.0.0.1 10.0.10.5')
print find_addr
This returns 'no match'. I'm not seeing what i'm missing so far. I am trying to extract the ip addresses out of this string, but first I have to find them. Using a regex editor I can use that same line to discover those IPs. Any help is appreciated.
re.match only finds a match if it is at the beginning of the string. re.search will look in the entire string for a match.
Also, it's usually a good idea to use raw strings when making regex:
ip_str = r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b'
# ^
On a slightly unrelated note:
find_ips('this is an ip 127.0.0.1 10.0.10.5')
print find_addr
is a bit kludgy. Making use of the return value in the caller is much better than doing funky stuff with globals:
print find_ips('...')
re.match() matches from the beginning of the string, I would use re.findall() here if you want to match all. Also it's good practice to use raw string notation with your pattern.
>>> import re
>>> def find_ips(str):
... m = re.findall(r'\b(?:\d{1,3}\.){3}\d{1,3}\b', str)
... return ', '.join(m)
...
>>> print find_ips('this is an ip 127.0.0.1 10.0.10.5')
127.0.0.1, 10.0.10.5
from re import findall
# The string to be checked.
string = 'this is a string 126.32.13.1 with ips in 132.31.3.1 it'
# Print the matches of the regex in the string.
print findall('\d+\.\d+\.\d+\.\d+', string)
# Output
# ['126.32.13.1', '132.31.3.1']
I am wanting to verify and then parse this string (in quotes):
string = "start: c12354, c3456, 34526; other stuff that I don't care about"
//Note that some codes begin with 'c'
I would like to verify that the string starts with 'start:' and ends with ';'
Afterward, I would like to have a regex parse out the strings. I tried the following python re code:
regx = r"start: (c?[0-9]+,?)+;"
reg = re.compile(regx)
matched = reg.search(string)
print ' matched.groups()', matched.groups()
I have tried different variations but I can either get the first or the last code but not a list of all three.
Or should I abandon using a regex?
EDIT: updated to reflect part of the problem space I neglected and fixed string difference.
Thanks for all the suggestions - in such a short time.
In Python, this isn’t possible with a single regular expression: each capture of a group overrides the last capture of that same group (in .NET, this would actually be possible since the engine distinguishes between captures and groups).
Your easiest solution is to first extract the part between start: and ; and then using a regular expression to return all matches, not just a single match, using re.findall('c?[0-9]+', text).
You could use the standard string tools, which are pretty much always more readable.
s = "start: c12354, c3456, 34526;"
s.startswith("start:") # returns a boolean if it starts with this string
s.endswith(";") # returns a boolean if it ends with this string
s[6:-1].split(', ') # will give you a list of tokens separated by the string ", "
This can be done (pretty elegantly) with a tool like Pyparsing:
from pyparsing import Group, Literal, Optional, Word
import string
code = Group(Optional(Literal("c"), default='') + Word(string.digits) + Optional(Literal(","), default=''))
parser = Literal("start:") + OneOrMore(code) + Literal(";")
# Read lines from file:
with open('lines.txt', 'r') as f:
for line in f:
try:
result = parser.parseString(line)
codes = [c[1] for c in result[1:-1]]
# Do something with teh codez...
except ParseException exc:
# Oh noes: string doesn't match!
continue
Cleaner than a regular expression, returns a list of codes (no need to string.split), and ignores any extra characters in the line, just like your example.
import re
sstr = re.compile(r'start:([^;]*);')
slst = re.compile(r'(?:c?)(\d+)')
mystr = "start: c12354, c3456, 34526; other stuff that I don't care about"
match = re.match(sstr, mystr)
if match:
res = re.findall(slst, match.group(0))
results in
['12354', '3456', '34526']
In Perl it is possible to do something like this (I hope the syntax is right...):
$string =~ m/lalala(I want this part)lalala/;
$whatIWant = $1;
I want to do the same in Python and get the text inside the parenthesis in a string like $1.
If you want to get parts by name you can also do this:
>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcom Reynolds")
>>> m.groupdict()
{'first_name': 'Malcom', 'last_name': 'Reynolds'}
The example was taken from the re docs
See: Python regex match objects
>>> import re
>>> p = re.compile("lalala(I want this part)lalala")
>>> p.match("lalalaI want this partlalala").group(1)
'I want this part'
import re
astr = 'lalalabeeplalala'
match = re.search('lalala(.*)lalala', astr)
whatIWant = match.group(1) if match else None
print(whatIWant)
A small note: in Perl, when you write
$string =~ m/lalala(.*)lalala/;
the regexp can match anywhere in the string. The equivalent is accomplished with the re.search() function, not the re.match() function, which requires that the pattern match starting at the beginning of the string.
import re
data = "some input data"
m = re.search("some (input) data", data)
if m: # "if match was successful" / "if matched"
print m.group(1)
Check the docs for more.
there's no need for regex. think simple.
>>> "lalala(I want this part)lalala".split("lalala")
['', '(I want this part)', '']
>>> "lalala(I want this part)lalala".split("lalala")[1]
'(I want this part)'
>>>
import re
match = re.match('lalala(I want this part)lalala', 'lalalaI want this partlalala')
print match.group(1)
import re
string_to_check = "other_text...lalalaI want this partlalala...other_text"
p = re.compile("lalala(I want this part)lalala") # regex pattern
m = p.search(string_to_check) # use p.match if what you want is always at beginning of string
if m:
print m.group(1)
In trying to convert a Perl program to Python that parses function names out of modules, I ran into this problem, I received an error saying "group" was undefined. I soon realized that the exception was being thrown because p.match / p.search returns 0 if there is not a matching string.
Thus, the group operator cannot function on it. So, to avoid an exception, check if a match has been stored and then apply the group operator.
import re
filename = './file_to_parse.py'
p = re.compile('def (\w*)') # \w* greedily matches [a-zA-Z0-9_] character set
for each_line in open(filename,'r'):
m = p.match(each_line) # tries to match regex rule in p
if m:
m = m.group(1)
print m