matching number in python - python

I'm trying to match numbers with regex in python3.5
while re.match() works well, like this:
re.match(r"\d+(\.\d+)?", "12323.3 + 232131.2")
>>><_sre.SRE_Match object; span=(0, 7), match='12323.3'>
re.findall() did not return what I expect(I want ["12323.3","232131.2"]):
re.findall(r"\d+(\.\d+)?", "12323.3 + 232131.2")
>>>['.3', '.2']
please someone tell me why. Thanks.

If there are capturing parentheses, findall returns all captured groups. You are only capturing the portion beginning with the ..
Try: r"(\d+(?:\.\d+)?)"
or capture nothing:
r"\d+(?:\.\d+)?"

Related

working through regex expression to print specific word [duplicate]

Say I have a string
"3434.35353"
and another string
"3593"
How do I make a single regular expression that is able to match both without me having to set the pattern to something else if the other fails? I know \d+ would match the 3593, but it would not do anything for the 3434.35353, but (\d+\.\d+) would only match the one with the decimal and return no matches found for the 3593.
I expect m.group(1) to return:
"3434.35353"
or
"3593"
You can put a ? after a group of characters to make it optional.
You want a dot followed by any number of digits \.\d+, grouped together (\.\d+), optionally (\.\d+)?. Stick that in your pattern:
import re
print re.match("(\d+(\.\d+)?)", "3434.35353").group(1)
3434.35353
print re.match("(\d+(\.\d+)?)", "3434").group(1)
3434
This regex should work:
\d+(\.\d+)?
It matches one ore more digits (\d+) optionally followed by a dot and one or more digits ((\.\d+)?).
Use the "one or zero" quantifier, ?. Your regex becomes: (\d+(\.\d+)?).
See Chapter 8 of the TextWrangler manual for more details about the different quantifiers available, and how to use them.
use (?:<characters>|). replace <characters> with the string to make optional. I tested in python shell and got the following result:
>>> s = re.compile('python(?:3|)')
>>> s
re.compile('python(?:3|)')
>>> re.match(s, 'python')
<re.Match object; span=(0, 6), match='python'>
>>> re.match(s, 'python3')
<re.Match object; span=(0, 7), match='python3'>```
Read up on the Python RegEx library. The link answers your question and explains why.
However, to match a digit followed by more digits with an optional decimal, you can use
re.compile("(\d+(\.\d+)?)")
In this example, the ? after the .\d+ capture group specifies that this portion is optional.
Example

Need a specific explanation of part of a regex code

I'm developing a calculator program in Python, and need to remove leading zeros from numbers so that calculations work as expected. For example, if the user enters "02+03" into the calculator, the result should return 5. In order to remove these leading zeroes in-front of digits, I asked a question on here and got the following answer.
self.answer = eval(re.sub(r"((?<=^)|(?<=[^\.\d]))0+(\d+)", r"\1\2", self.equation.get()))
I fully understand how the positive lookbehind to the beginning of the string and lookbehind to the non digit, non period character works. What I'm confused about is where in this regex code can I find the replacement for the matched patterns?
I found this online when researching regex expressions.
result = re.sub(pattern, repl, string, count=0, flags=0)
Where is the "repl" in the regex code above? If possible, could somebody please help to explain what the r"\1\2" is used for in this regex also?
Thanks for your help! :)
The "repl" part of the regex is this component:
r"\1\2"
In the "find" part of the regex, group capturing is taking place (ordinarily indicated by "()" characters around content, although this can be overridden by specific arguments).
In python regex, the syntax used to indicate a reference to a positional captured group (sometimes called a "backreference") is "\n" (where "n" is a digit refering to the position of the group in the "find" part of the regex).
So, this regex is returning a string in which the overall content is being replaced specifically by parts of the input string matched by numbered groups.
Note: I don't believe the "\1" part of the "repl" is actually required. I think:
r"\2"
...would work just as well.
Further reading: https://www.regular-expressions.info/brackets.html
Firstly, repl includes what you are about to replace.
To understand \1\2 you need to know what capture grouping is.
Check this video out for basics of Group capturing.
Here , since your regex splits every match it finds into groups which are 1,2... so on. This is so because of the parenthesis () you have placed in the regex.
$1 , $2 or \1,\2 can be used to refer to them.
In this case: The regex is replacing all numbers after the leading 0 (which is caught by group 2) with itself.
Note: \1 is not necessary. works fine without it.
See example:
>>> import re
>>> s='awd232frr2cr23'
>>> re.sub('\d',' ',s)
'awd frr cr '
>>>
Explanation:
As it is, '\d' is for integer so removes them and replaces with repl (in this case ' ').

Match everything expect a specific string

I am using Python 2.7 and have a question with regards to regular expressions. My string would be something like this...
"SecurityGroup:Pub HDP SG"
"SecurityGroup:Group-Name"
"SecurityGroup:TestName"
My regular expression looks something like below
[^S^e^c^r^i^t^y^G^r^o^u^p^:].*
The above seems to work but I have the feeling it is not very efficient and also if the string has the word "group" in it, that will fail as well...
What I am looking for is the output should find anything after the colon (:). I also thought I can do something like using group 2 as my match... but the problem with that is, if there are spaces in the name then I won't be able to get the correct name.
(SecurityGroup):(\w{1,})
Why not just do
security_string.split(':')[1]
To grab the second part of the String after the colon?
You could use lookbehind:
pattern = re.compile(r"(?<=SecurityGroup:)(.*)")
matches = re.findall(pattern, your_string)
Breaking it down:
(?<= # positive lookbehind. Matches things preceded by the following group
SecurityGroup: # pattern you want your matches preceded by
) # end positive lookbehind
( # start matching group
.* # any number of characters
) # end matching group
When tested on the string "something something SecurityGroup:stuff and stuff" it returns matches = ['stuff and stuff'].
Edit:
As mentioned in a comment, pattern = re.compile(r"SecurityGroup:(.*)") accomplishes the same thing. In this case you are matching the string "SecurityGroup:" followed by anything, but only returning the stuff that follows. This is probably more clear than my original example using lookbehind.
Maybe this:
([^:"]+[^\s](?="))
Regex live here.

Regular Expression with python

I have a tricky regular expression and I can't succeed to implement it.
I need the regular expression for this :
AEBE52E7-03EE-455A-B3C4-E57283966239
I use it for an identification like this :
url(r'^user/(?P<identification>\<regular expression>)$', 'view_add')
I tried some expressions like these ones:
\[A-Za-z0-9]{8}^-{1}[A-Za-z0-9]{4}^-{1}[A-Za-z0-9]{4}^-{1}[A-Za-z0-9]{4}^-{1}[A-Za-z0-9]{12}
\........^-....^-....^-....^-............
Someone can help me?
Thanks.
Just remove all the ^ symbols present in your regex.
>>> s = 'AEBE52E7-03EE-455A-B3C4-E57283966239'
>>> re.match(r'[A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12}$', s)
<_sre.SRE_Match object; span=(0, 36), match='AEBE52E7-03EE-455A-B3C4-E57283966239'>
>>> re.match(r'[A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12}$', s).group()
'AEBE52E7-03EE-455A-B3C4-E57283966239'
-{1} would be written as - It seems like all delimited words are hex codes. So you could use [0-9a-fA-F] instead of [A-Za-z0-9] .
>>> re.match(r'[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$', s).group()
'AEBE52E7-03EE-455A-B3C4-E57283966239'
You dont need ^ and for - dont need {1},you can use the following pattern :
\w{8}-\w{4}-\w{4}-\w{4}-\w{12}
Note that \w will match any word character (A-Za-z0-9)
Or :
\w{8}-(\w{4}-){3}\w{12}
And as mentioned in comment if you are using a UUID as a more efficient way you can use the following pattern :
[a-fA-F\d]{8}(-[a-fA-F\d]{4}){3}-[a-fA-F\d]{12}
DEMO

Can't make regex work with Python

I need to extract the date in format of: dd Month yyyy (20 August 2013).
I tried the following regex:
\d{2} (January|February|March|April|May|June|July|August|September|October|November|December) \d{4}
It works with regex testers (chcked with several the text - Monday, 19 August 2013), but It seems that Python doesn't understand it. The output I get is:
>>>
['August']
>>>
Can somebody please understand me why is that happening ?
Thank you !
Did you use re.findall? By default, if there's at least one capture group in the pattern, re.findall will return only the captured parts of the expression.
You can avoid this by removing every capture group, causing re.findall to return the entire match:
\d{2} (?:January|February|...|December) \d{4}
or by making a single big capture group:
(\d{2} (?:January|February|...|December) \d{4})
or, possibly more conveniently, by making every component a capture group:
(\d{2}) (January|February|...|December) (\d{4})
This latter form is more useful if you will need to process the individual day/month/year components.
It looks like you are only getting the data from the capture group, try this:
(\d{2} (?:January|February|March|April|May|June|July|August|September|October|November|December) \d{4})
I put a capture group around the entire thing and made the month a non-capture group. Now whatever was giving you "August" should give you the entire thing.
I just looked at some python regex stuff here
>>> p = re.compile('(a(b)c)d')
>>> m = p.match('abcd')
>>> m.group(0)
'abcd'
>>> m.group(1)
'abc'
>>> m.group(2)
'b'
Seeing this, I'm guessing (since you didn't show how you were actually using this regex) that you were doing group(1) which will now work with the regex I supplied above.
It also looks like you could have used group(0) to get the whole thing (if I am correct in the assumption that this is what you were doing). This would work in your original regex as well as my modified version.

Categories