Regular Expression with python - python

I have a tricky regular expression and I can't succeed to implement it.
I need the regular expression for this :
AEBE52E7-03EE-455A-B3C4-E57283966239
I use it for an identification like this :
url(r'^user/(?P<identification>\<regular expression>)$', 'view_add')
I tried some expressions like these ones:
\[A-Za-z0-9]{8}^-{1}[A-Za-z0-9]{4}^-{1}[A-Za-z0-9]{4}^-{1}[A-Za-z0-9]{4}^-{1}[A-Za-z0-9]{12}
\........^-....^-....^-....^-............
Someone can help me?
Thanks.

Just remove all the ^ symbols present in your regex.
>>> s = 'AEBE52E7-03EE-455A-B3C4-E57283966239'
>>> re.match(r'[A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12}$', s)
<_sre.SRE_Match object; span=(0, 36), match='AEBE52E7-03EE-455A-B3C4-E57283966239'>
>>> re.match(r'[A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12}$', s).group()
'AEBE52E7-03EE-455A-B3C4-E57283966239'
-{1} would be written as - It seems like all delimited words are hex codes. So you could use [0-9a-fA-F] instead of [A-Za-z0-9] .
>>> re.match(r'[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$', s).group()
'AEBE52E7-03EE-455A-B3C4-E57283966239'

You dont need ^ and for - dont need {1},you can use the following pattern :
\w{8}-\w{4}-\w{4}-\w{4}-\w{12}
Note that \w will match any word character (A-Za-z0-9)
Or :
\w{8}-(\w{4}-){3}\w{12}
And as mentioned in comment if you are using a UUID as a more efficient way you can use the following pattern :
[a-fA-F\d]{8}(-[a-fA-F\d]{4}){3}-[a-fA-F\d]{12}
DEMO

Related

working through regex expression to print specific word [duplicate]

Say I have a string
"3434.35353"
and another string
"3593"
How do I make a single regular expression that is able to match both without me having to set the pattern to something else if the other fails? I know \d+ would match the 3593, but it would not do anything for the 3434.35353, but (\d+\.\d+) would only match the one with the decimal and return no matches found for the 3593.
I expect m.group(1) to return:
"3434.35353"
or
"3593"
You can put a ? after a group of characters to make it optional.
You want a dot followed by any number of digits \.\d+, grouped together (\.\d+), optionally (\.\d+)?. Stick that in your pattern:
import re
print re.match("(\d+(\.\d+)?)", "3434.35353").group(1)
3434.35353
print re.match("(\d+(\.\d+)?)", "3434").group(1)
3434
This regex should work:
\d+(\.\d+)?
It matches one ore more digits (\d+) optionally followed by a dot and one or more digits ((\.\d+)?).
Use the "one or zero" quantifier, ?. Your regex becomes: (\d+(\.\d+)?).
See Chapter 8 of the TextWrangler manual for more details about the different quantifiers available, and how to use them.
use (?:<characters>|). replace <characters> with the string to make optional. I tested in python shell and got the following result:
>>> s = re.compile('python(?:3|)')
>>> s
re.compile('python(?:3|)')
>>> re.match(s, 'python')
<re.Match object; span=(0, 6), match='python'>
>>> re.match(s, 'python3')
<re.Match object; span=(0, 7), match='python3'>```
Read up on the Python RegEx library. The link answers your question and explains why.
However, to match a digit followed by more digits with an optional decimal, you can use
re.compile("(\d+(\.\d+)?)")
In this example, the ? after the .\d+ capture group specifies that this portion is optional.
Example

How can Python's regular expressions work with patterns that have escaped special characters?

Is there a way to get Python's regular expressions to work with patterns that have escaped special characters? As far as my limited understanding can tell, the following example should work, but the pattern fails to match.
import re
string = r'This a string with ^g\.$s' # A string to search
pattern = r'^g\.$s' # The pattern to use
string = re.escape(string) # Escape special characters
pattern = re.escape(pattern)
print(re.search(pattern, string)) # This prints "None"
Note:
Yes, this question has been asked elsewhere (like here). But as you can see, I'm already implementing the solution described in the answers and it's still not working.
Why on earth are you applying re.escape to the string?! You want to find the "special" characters in that! If you just apply it to the pattern, you'll get a match:
>>> import re
>>> string = r'This a string with ^g\.$s'
>>> pattern = r'^g\.$s'
>>> re.search(re.escape(pattern), re.escape(string)) # nope
>>> re.search(re.escape(pattern), string) # yep
<_sre.SRE_Match object at 0x025089F8>
For bonus points, notice that you just need to re.escape the pattern one more times than the string:
>>> re.search(re.escape(re.escape(pattern)), re.escape(string))
<_sre.SRE_Match object at 0x025D8DE8>

Python RegEx or-ing problems

Hey there I'm just trying to do some simple regEx. What i want is anything in between a ? and a &, or a & and a &, or & and the end of a string. So I've been reading the docks, and I feel, like I should at least be getting close with patters such as:
p = re.compile('(\?.*?&)|(&.*?&)|(&.*?$)')
or
re.compile('[&\?](.*?)&')
but all variants I try are a little wonky. An explanation of what you did would also be nice. An Example:
?k=091910918&ack=901828312&p=999998
and it should yeild:
k=091910918, ack=901828312, and p=999998
as answers. Thanks !
You can use the following regular expression:
>>> import re
>>> re.findall(r'[?&]([^?&]+)', '?k=091910918&ack=901828312&p=999998')
['k=091910918', 'ack=901828312', 'p=999998']
Regular expression:
[?&] # any character of: '?', '&'
( # group and capture to \1:
[^?&]+ # any character except: '?', '&' (1 or more times)
) # end of \1
You could just split here as well... assuming your string looks like this:
>>> filter(None, re.split('[?&]', '?k=091910918&ack=901828312&p=999998'))
['k=091910918', 'ack=901828312', 'p=999998']
If you don't mind only having one matched group, use this:
[\?&](\w+\=\d+)
If you want to have two matched groups for each one, use this:
[\?&](\w+)\=(\d+)
Pretty much the main problem you were having was that you were giving the regex too much flexibility on what to capture by using .*. If you restrict what each group can be a little, it ends up being much more cooperative.
Here's a demo of the first regex on Regex101

Python re.match doesnt match the same regexp

I'm facing a weird problem; I hope nobody asked this question before
I need to match two regexp containing "(" ")".
Here is the kind of tests I made to see why it's not working:
>>> import re
>>> re.match("a","a")
<_sre.SRE_Match object at 0xb7467218>
>>> re.match(re.escape("a"),re.escape("a"))
<_sre.SRE_Match object at 0xb7467410>
>>> re.escape("a(b)")
'a\\(b\\)'
>>> re.match(re.escape("a(b)"),re.escape("a(b)"))
=> No match
Can someone explain me why the regexp doesn't match itself ?
Thanks a lot
You've escaped special characters, so your regex will match the string "a(b)", not the string 'a\(b\)' which is the result of re.escape('a(b)').
The first argument is the pattern object, the second is the actual string you are matching against. You shouldn't escape the string itself. Remember, re.escape escapes special characters in regexp.
>>> help(re.match)
Help on function match in module re:
match(pattern, string, flags=0)
Try to apply the pattern at the start of the string, returning
a match object, or None if no match was found.
>>> re.match(re.escape('a(b)'), 'a(b)')
<_sre.SRE_Match object at 0x10119ad30>

Python regular expression question - sub string but not prepended with

:)
I'm trying to sub foo to bar, but only if it's not prepended with ie. /. So...
foobar should change to barbar, but /foobar not.
I've tried to add [^/] at beginning of my re, but that doesn't work if foo is at beginning of string.
I hate regular expressions! :P
Use a negative lookbehind assertion.
>>> re.search('(?<!/)foo', 'foo')
<_sre.SRE_Match object at 0x7f44891518b8>
>>> re.search('(?<!/)foo', '/foo')
>>> re.search('(?<!/)foo', 'barfoo')
<_sre.SRE_Match object at 0x7f4489151850>
try using
\bfoo\b
\b is a word boundary, it deals with a lot of common cases like beginning of line, whitespace, etc.

Categories