Match integer values from end of string until second dot - python

I have following string GA1.2.4451363243.9414195136 and I want to match 4451363243.9414195136 using regular expression for python.
I have tried the following which is not working ([\d].[\d])$
Where am I going wrong here?

A few ideas (string operations or regex):
s = 'GA1.2.4451363243.9414195136'
out = '.'.join(s.rsplit('.', 2)[-2:])
# '4451363243.9414195136'
import re
out = re.search(r'[^.]*\.[^.]*$', s)
# <re.Match object; span=(6, 27), match='4451363243.9414195136'>
NB. to ensure matching digits, you can replace [^.] (any character but .) with \d.
For an arbitrary N:
N = 3
out = '.'.join(s.rsplit('.', N)[-N:])
# '2.4451363243.9414195136'
out = re.search(fr'[^.]*(?:\.[^.]*){{{N-1}}}$', s)
# <re.Match object; span=(4, 27), match='2.4451363243.9414195136'>

It could be done using pure python! but if you want to use regex here is the code:
regex:
(?:[\w\d]*.){2}(.*)
python:
import re
s = 'GA1.2.4451363243.9414195136'
re.match(r'(?:[\w\d]*.){2}(.*)',s).groups()[0] # output: '4451363243.9414195136'
OR
Just use python:
s.split('.',2)[-1] # output: '4451363243.9414195136'

The following regex ([0-9]+.[0-9]+)$ matches the expected part of the example. Note that more specific solutions may arise as you provide more details, restrictions, etc. regarding the part to be matched:
>>> import re
>>> data = "GA1.2.4451363243.941419513"
>>> re.findall(r"([0-9]+.[0-9]+)$", data)
['4451363243.941419513']
It requests the matched part to be made of:
digit(s)
dot
digit(s)
end of line.

Related

Python Regex: OR statement does not work in regex module

Hi I want apply the following expression to check substitutions, insertions, deletion counts. However the OR statement seems like it does not work. Regex check only the first statement in the paranthesis.
For example:
correct_string = "20181201"
regex_pattern = r"((20[0-9]{2})(0[1-9]|1[0-2])(0[1-9]|1[0-9]|2[0-9]|3[0-1])){e}"
regex.fullmatch(regex_pattern, correct_string)
Output:
<regex.Match object; span=(0, 8), match='20181201', fuzzy_counts=(1, 0, 0)>
It says there is one substitution because of the 5th digit however if in the OR statement it exist
Another example:
correct_string = "20180201"
regex_pattern = r"((20[0-9]{2})(0[1-9]|1[0-2])(0[1-9]|1[0-9]|2[0-9]|3[0-1])){e}"
regex.fullmatch(regex_pattern, correct_string)
Output:
<regex.Match object; span=(0, 8), match='20180201'>
In this case it says there are no substitutions which is correct according to first statement in the OR.
How can I solve this. Thank you.
You need to use regex.ENHANCEMATCH:
By default, fuzzy matching searches for the first match that meets the given constraints. The ENHANCEMATCH flag will cause it to attempt to improve the fit (i.e. reduce the number of errors) of the match that it has found.
Python demo:
import regex
correct_string = "20181201"
regex_pattern = r"((20[0-9]{2})(0[1-9]|1[0-2])(0[1-9]|1[0-9]|2[0-9]|3[0-1])){e}"
print(regex.fullmatch(regex_pattern, correct_string, regex.ENHANCEMATCH))
// => <regex.Match object; span=(0, 8), match='20181201'>
See the online Python demo.

Python Regular Expression Extracting 'name= ....'

I'm using a Python script to read data from our corporate instance of JIRA. There is a value that is returned as a string and I need to figure out how to extract one bit of info from it. What I need is the 'name= ....' and I just need the numbers from that result.
<class 'list'>: ['com.atlassian.greenhopper.service.sprint.Sprint#6f68eefa[id=30943,rapidViewId=10468,state=CLOSED,name=2016.2.4 - XXXXXXXXXX,startDate=2016-05-26T08:50:57.273-07:00,endDate=2016-06-08T20:59:00.000-07:00,completeDate=2016-06-09T07:34:41.899-07:00,sequence=30943]']
I just need the 2016.2.4 portion of it. This number will not always be the same either.
Any thoughts as how to do this with RE? I'm new to regular expressions and would appreciate any help.
A simple regular expression can do the trick: name=([0-9.]+).
The primary part of the regex is ([0-9.]+) which will search for any digit (0-9) or period (.) in succession (+).
Now, to use this:
import re
pattern = re.compile('name=([0-9.]+)')
string = '''<class 'list'>: ['com.atlassian.greenhopper.service.sprint.Sprint#6f68eefa[id=30943,rapidViewId=10468,state=CLOSED,name=2016.2.4 - XXXXXXXXXX,startDate=2016-05-26T08:50:57.273-07:00,endDate=2016-06-08T20:59:00.000-07:00,completeDate=2016-06-09T07:34:41.899-07:00,sequence=30943]']'''
matches = pattern.search(string)
# Only assign the value if a match is found
name_value = '' if not matches else matches.group(1)
Use a capturing group to extract the version name:
>>> import re
>>> s = 'com.atlassian.greenhopper.service.sprint.Sprint#6f68eefa[id=30943,rapidViewId=10468,state=CLOSED,name=2016.2.4 - XXXXXXXXXX,startDate=2016-05-26T08:50:57.273-07:00,endDate=2016-06-08T20:59:00.000-07:00,completeDate=2016-06-09T07:34:41.899-07:00,sequence=30943]'
>>> re.search(r"name=([0-9.]+)", s).group(1)
'2016.2.4'
where ([0-9.]+) is a capturing group matching one or more digits or dots, parenthesis define a capturing group.
A non-regex option would involve some splitting by ,, = and -:
>>> l = [item.split("=") for item in s.split(",")]
>>> next(value[1] for value in l if value[0] == "name").split(" - ")[0]
'2016.2.4'
This, of course, needs testing and error handling.

Python Regex expression

Trying to write a Regex expression in Python to match strings.
I want to match input that starts as first, first?21313 but not first.
So basically, I don't want to match to anything that has . the period character.
I've tried word.startswith(('first[^.]?+')) but that doesn't work. I've also tried word.startswith(('first.?+')) but that hasn't worked either. Pretty stumped here
import re
def check(word):
regexp = re.compile('^first([^\..])+$')
return regexp.match(word)
And if you dont want the dot:
^first([^..])+$
(first + allcharacter except dot and first cant be alone).
You really don't need regex for this at all.
word.startswith('first') and word.find('.') == -1
But if you really want to take the regex route:
>>> import re
>>> re.match(r'first[^.]*$', 'first')
<_sre.SRE_Match object; span=(0, 5), match='first'>
>>> re.match(r'first[^.]*$', 'first.') is None
True

Python Regular Expression not match

I want to use python re to match this kind of input:12,13,45,23.
The input is combined by four non-negative integers, separated by ",".
However, my re does not match...
print re.match(u'^([1−9]\d*|0),([1−9]\d*|0),([1−9]\d*|0),([1−9]\d*|0)$',u"0,1001,13,2")
#output is None
However, the next re works well.
print re.match(u'^([1−9]\d*|0),([1−9]\d*|0),([1−9]\d*|0)$',u"0,1001,13")
#<_sre.SRE_Match object at 0x024151B0>
I am totally confused.
For this, you don't need to repeat the regex that match each integer, you can use the {x} where x is the number of times it should appear, something like:
import re
matcher = re.compile(u"([1-9]\d*|0)(,([1-9]\d*|0)){3}$")
print matcher.match(u"12,45")
# None
print matcher.match(u"0,1001,13,578")
# <_sre.SRE_Match object at 0x7fb0e911ca48>

Python regular expression; why do the search & match appear to find alpha chars in a number string?

I'm running search below Idle, in Python 2.7 in a Windows Bus. 64 bit environment.
According to RegexBuddy, the search pattern ('patternalphaonly') should not produce a match against a string of digits.
I looked at "http://docs.python.org/howto/regex.html", but did not see anything there that would explain why the search and match appear to be successful in finding something matching the pattern.
Does anyone know what I'm doing wrong, or misunderstanding?
>>> import re
>>> numberstring = '3534543234543'
>>> patternalphaonly = re.compile('[a-zA-Z]*')
>>> result = patternalphaonly.search(numberstring)
>>> print result
<_sre.SRE_Match object at 0x02CEAD40>
>>> result = patternalphaonly.match(numberstring)
>>> print result
<_sre.SRE_Match object at 0x02CEAD40>
Thanks
The star operator (*) indicates zero or more repetitions. Your string has zero repetitions of an English alphabet letter because it is entirely numbers, which is perfectly valid when using the star (repeat zero times). Instead use the + operator, which signifies one or more repetitions. Example:
>>> n = "3534543234543"
>>> r1 = re.compile("[a-zA-Z]*")
>>> r1.match(n)
<_sre.SRE_Match object at 0x07D85720>
>>> r2 = re.compile("[a-zA-Z]+") #using the + operator to make sure we have at least one letter
>>> r2.match(n)
Helpful link on repetition operators.
Everything eldarerathis says is true. However, with a variable named: 'patternalphaonly' I would assume that the author wants to verify that a string is composed of alpha chars only. If this is true then I would add additional end-of-string anchors to the regex like so:
patternalphaonly = re.compile('^[a-zA-Z]+$')
result = patternalphaonly.search(numberstring)
Or, better yet, since this will only ever match at the beginning of the string, use the preferred match method:
patternalphaonly = re.compile('[a-zA-Z]+$')
result = patternalphaonly.match(numberstring)
(Which, as John Machin has pointed out, is evidently faster for some as-yet unexplained reason.)

Categories