Regex issue in python

Regex issue in python - python

I have a regex "value=4020a345-f646-4984-a848-3f7f5cb51f21"
if re.search( "value=\w*|\d*\-\w*|\d*\-\w*|\d*\-\w*|\d*\-\w*|\d*", x ):
x = re.search( "value=\w*|\d*\-\w*|\d*\-\w*|\d*\-\w*|\d*\-\w*|\d*", x )
m = x.group(1)
m only gives me 4020a345, not sure why it does not give me the entire "4020a345-f646-4984-a848-3f7f5cb51f21"
Can anyone tell me what i am doing wrong?

try out this regex, looks like you are trying to match a GUID
value=[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}

This should match what you want, if all the strings are of the form you've shown:
value=((\w*\d*\-?)*)
You can also use this website to validate your regular expressions:
http://regex101.com/

The below regex works as you expect.
value=([\w*|\d*\-\w*|\d*\-\w*|\d*\-\w*|\d*\-\w*|\d*]+)

You are trying to match on some hex numbers, that is why this regex is more correct than using [\w\d]
pattern = "value=([0-9a-fA-F]{8}-([0-9a-fA-F]{4}-){3}[0-9a-fA-F]{12})"
data = "value=4020a345-f646-4984-a848-3f7f5cb51f21"
res = re.search(pattern, data)
print(res.group(1))
If you dont care about the regex safety, aka checking that it is correct hex, there is no reason not to use simple string manipulation like shown below.
>>> data = "value=4020a345-f646-4984-a848-3f7f5cb51f21"
>>> print(data[7:])
020a345-f646-4984-a848-3f7f5cb51f21
>>> # or maybe
...
>>> print(data[7:].replace('-',''))
020a345f6464984a8483f7f5cb51f21

You can get the subparts of the value as a list
txt = "value=4020a345-f646-4984-a848-3f7f5cb51f21"
parts = re.findall('\w+', txt)[1:]
parts is ['4020a345', 'f646', '4984', 'a848', '3f7f5cb51f21']
if you really want the entire string
full = "-".join(parts)
A simple way
full = re.findall("[\w-]+", txt)[-1]
full is 4020a345-f646-4984-a848-3f7f5cb51f21

value=([\w\d]*\-[\w\d]*\-[\w\d]*\-[\w\d]*\-[\w\d]*)
Try this.Grab the capture.Your regex was not giving the whole as you had used | operator.So if regex on left side of | get satisfied it will not try the latter part.
See demo.
http://regex101.com/r/hQ1rP0/45

Related

Regex Expression not matching correctly

I'm tackling a python challenge problem to find a block of text in the format xXXXxXXXx (lower vs upper case, not all X's) in a chunk like this:
jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn
I have tested the following RegEx and found it correctly matches what I am looking for from this site (http://www.regexr.com/):
'([a-z])([A-Z]){3}([a-z])([A-Z]){3}([a-z])'
However, when I try to match this expression to the block of text, it just returns the entire string:
In [1]: import re
In [2]: example = 'jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn'
In [3]: expression = re.compile(r'([a-z])([A-Z]){3}([a-z])([A-Z]){3}([a-z])')
In [4]: found = expression.search(example)
In [5]: print found.string
jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn
Any ideas? Is my expression incorrect? Also, if there is a simpler way to represent that expression, feel free to let me know. I'm fairly new to RegEx.

You need to return the match group instead of the string attribute.
>>> import re
>>> s = 'jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn'
>>> rgx = re.compile(r'[a-z][A-Z]{3}[a-z][A-Z]{3}[a-z]')
>>> found = rgx.search(s).group()
>>> print found
nJDKoJIWh

The string attribute always returns the string passed as input to the match. This is clearly documented:
string
The string passed to match() or search().
The problem has nothing to do with the matching, you're just grabbing the wrong thing from the match object. Use match.group(0) (or match.group()).

Based on xXXXxXXXx if you want upper letters with len 3 and lower with len 1 between them this is what you want :
([a-z])(([A-Z]){3}([a-z]))+
also you can get your search function with group()
print expression.search(example).group(0)

Python regex to match multiple times

I'm trying to match a pattern against strings that could have multiple instances of the pattern. I need every instance separately. re.findall() should do it but I don't know what I'm doing wrong.
pattern = re.compile('/review: (http://url.com/(\d+)\s?)+/', re.IGNORECASE)
match = pattern.findall('this is the message. review: http://url.com/123 http://url.com/456')
I need 'http://url.com/123', http://url.com/456 and the two numbers 123 & 456 to be different elements of the match list.
I have also tried '/review: ((http://url.com/(\d+)\s?)+)/' as the pattern, but no luck.

Use this. You need to place 'review' outside the capturing group to achieve the desired result.
pattern = re.compile(r'(?:review: )?(http://url.com/(\d+))\s?', re.IGNORECASE)
This gives output
>>> match = pattern.findall('this is the message. review: http://url.com/123 http://url.com/456')
>>> match
[('http://url.com/123', '123'), ('http://url.com/456', '456')]

You've got extra /'s in the regex. In python the pattern should just be a string. e.g. instead of this:
pattern = re.compile('/review: (http://url.com/(\d+)\s?)+/', re.IGNORECASE)
It should be:
pattern = re.compile('review: (http://url.com/(\d+)\s?)+', re.IGNORECASE)
Also typically in python you'd actually use a "raw" string like this:
pattern = re.compile(r'review: (http://url.com/(\d+)\s?)+', re.IGNORECASE)
The extra r on the front of the string saves you from having to do lots of backslash escaping etc.

Use a two-step approach: First get everything from "review:" to EOL, then tokenize that.
msg = 'this is the message. review: http://url.com/123 http://url.com/456'
review_pattern = re.compile('.*review: (.*)$')
urls = review_pattern.findall(msg)[0]
url_pattern = re.compile("(http://url.com/(\d+))")
url_pattern.findall(urls)

What should I use the Non-greedy match in this case

Assume I have a string which includes some data fields that are separated by "|", like
|1|2|3|4|5|6|7|8|
My purpose is to get the 8th field. This is what I'm doing:
pattern = re.compile(r'^\s+(\|.*?\|){8}')
match = pattern.match(test_line)
if match:
print:match.group(8)
But looks like it can not match. I know in this case I need to use ? for non-greedy match, but why I can not get the 8th field?
Thanks

Regex might be complicating this problem rather than simplifying it. A simple way to get an eighth item from a | delimited string is using split():
a = '|here|is|some|data|separated|by|bars|hooray!|'
print a.split('|')[8]
RETURNS
hooray!
Using regex, one way to get it would be:
import re
a = '|here|is|some|data|separated|by|bars|hooray!|'
pattern = re.compile(r'([^\|]+)')
match = pattern.findall(a)
print match[7]
RETURNS
hooray!

Simple python regex, match after colon

I have a simple regex question that's driving me crazy.
I have a variable x = "field1: XXXX field2: YYYY".
I want to retrieve YYYY (note that this is an example value).
My approach was as follows:
values = re.match('field2:\s(.*)', x)
print values.groups()
It's not matching anything. Can I get some help with this? Thanks!

Your regex is good
field2:\s(.*)
Try this code
match = re.search(r"field2:\s(.*)", subject)
if match:
result = match.group(1)
else:
result = ""

re.match() only matches at the start of the string. You want to use re.search() instead.
Also, you should use a verbatim string:
>>> values = re.search(r'field2:\s(.*)', x)
>>> print values.groups()
('YYYY',)

regex in python 2.4

I have a string in python as below:
"\\B1\\B1xxA1xxMdl1zzInoAEROzzMofIN"
I want to get the string as
"B1xxA1xxMdl1zzInoAEROzzMofIN"
I think this can be done using regex but could not achieve it yet. Please give me an idea.

st = "\B1\B1xxA1xxMdl1zzInoAEROzzMofIN"
s = re.sub(r"\\","",st)
idx = s.rindex("B1")
print s[idx:]
output = 'B1xxA1xxMdl1zzInoAEROzzMofIN'
OR
st = "\B1\B1xxA1xxMdl1zzInoAEROzzMofIN"
idx = st.rindex("\\")
print st[idx+1:]
output = 'B1xxA1xxMdl1zzInoAEROzzMofIN'

Here is a try:
import re
s = "\\B1\\B1xxA1xxMdl1zzInoAEROzzMofIN"
s = re.sub(r"\\[^\\]+\\","", s)
print s
Tested on http://py-ide-online.appspot.com (couldn't find a way to share though)
[EDIT] For some explanation, have a look at the Python regex documentation page and the first comment of this SO question:
How to remove symbols from a string with Python?
because using brackets [] can be tricky (IMHO)
In this case, [^\\] means anything but two backslashes \\.
So [^\\]+ means one or more character that matches anything but two backslashes \\.

If the desired section of the string is always on the RHS of a \ char then you could use:
string = "\\B1\\B1xxA1xxMdl1zzInoAEROzzMofIN"
string.rpartition("\\")[2]
output = 'B1xxA1xxMdl1zzInoAEROzzMofIN'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Regex issue in python - python

try out this regex, looks like you are trying to match a GUID value=[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}

This should match what you want, if all the strings are of the form you've shown: value=((\w\d\-?)*) You can also use this website to validate your regular expressions: http://regex101.com/

The below regex works as you expect. value=([\w|\d\-\w|\d\-\w|\d\-\w|\d\-\w|\d]+)

value=([\w\d]\-[\w\d]\-[\w\d]\-[\w\d]\-[\w\d]*) Try this.Grab the capture.Your regex was not giving the whole as you had used | operator.So if regex on left side of | get satisfied it will not try the latter part. See demo. http://regex101.com/r/hQ1rP0/45

Related

Regex Expression not matching correctly

Python regex to match multiple times

What should I use the Non-greedy match in this case

Simple python regex, match after colon

regex in python 2.4

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Regex issue in python - python

try out this regex, looks like you are trying to match a GUID value=[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}

This should match what you want, if all the strings are of the form you've shown: value=((\w*\d*\-?)*) You can also use this website to validate your regular expressions: http://regex101.com/

The below regex works as you expect. value=([\w*|\d*\-\w*|\d*\-\w*|\d*\-\w*|\d*\-\w*|\d*]+)

value=([\w\d]*\-[\w\d]*\-[\w\d]*\-[\w\d]*\-[\w\d]*) Try this.Grab the capture.Your regex was not giving the whole as you had used | operator.So if regex on left side of | get satisfied it will not try the latter part. See demo. http://regex101.com/r/hQ1rP0/45

Related

Regex Expression not matching correctly

Python regex to match multiple times

What should I use the Non-greedy match in this case

Simple python regex, match after colon

regex in python 2.4

Categories

Resources

This should match what you want, if all the strings are of the form you've shown: value=((\w\d\-?)*) You can also use this website to validate your regular expressions: http://regex101.com/

The below regex works as you expect. value=([\w|\d\-\w|\d\-\w|\d\-\w|\d\-\w|\d]+)

value=([\w\d]\-[\w\d]\-[\w\d]\-[\w\d]\-[\w\d]*) Try this.Grab the capture.Your regex was not giving the whole as you had used | operator.So if regex on left side of | get satisfied it will not try the latter part. See demo. http://regex101.com/r/hQ1rP0/45