Trying to write a Regex expression in Python to match strings.
I want to match input that starts as first, first?21313 but not first.
So basically, I don't want to match to anything that has . the period character.
I've tried word.startswith(('first[^.]?+')) but that doesn't work. I've also tried word.startswith(('first.?+')) but that hasn't worked either. Pretty stumped here
import re
def check(word):
regexp = re.compile('^first([^\..])+$')
return regexp.match(word)
And if you dont want the dot:
^first([^..])+$
(first + allcharacter except dot and first cant be alone).
You really don't need regex for this at all.
word.startswith('first') and word.find('.') == -1
But if you really want to take the regex route:
>>> import re
>>> re.match(r'first[^.]*$', 'first')
<_sre.SRE_Match object; span=(0, 5), match='first'>
>>> re.match(r'first[^.]*$', 'first.') is None
True
Related
I have following string GA1.2.4451363243.9414195136 and I want to match 4451363243.9414195136 using regular expression for python.
I have tried the following which is not working ([\d].[\d])$
Where am I going wrong here?
A few ideas (string operations or regex):
s = 'GA1.2.4451363243.9414195136'
out = '.'.join(s.rsplit('.', 2)[-2:])
# '4451363243.9414195136'
import re
out = re.search(r'[^.]*\.[^.]*$', s)
# <re.Match object; span=(6, 27), match='4451363243.9414195136'>
NB. to ensure matching digits, you can replace [^.] (any character but .) with \d.
For an arbitrary N:
N = 3
out = '.'.join(s.rsplit('.', N)[-N:])
# '2.4451363243.9414195136'
out = re.search(fr'[^.]*(?:\.[^.]*){{{N-1}}}$', s)
# <re.Match object; span=(4, 27), match='2.4451363243.9414195136'>
It could be done using pure python! but if you want to use regex here is the code:
regex:
(?:[\w\d]*.){2}(.*)
python:
import re
s = 'GA1.2.4451363243.9414195136'
re.match(r'(?:[\w\d]*.){2}(.*)',s).groups()[0] # output: '4451363243.9414195136'
OR
Just use python:
s.split('.',2)[-1] # output: '4451363243.9414195136'
The following regex ([0-9]+.[0-9]+)$ matches the expected part of the example. Note that more specific solutions may arise as you provide more details, restrictions, etc. regarding the part to be matched:
>>> import re
>>> data = "GA1.2.4451363243.941419513"
>>> re.findall(r"([0-9]+.[0-9]+)$", data)
['4451363243.941419513']
It requests the matched part to be made of:
digit(s)
dot
digit(s)
end of line.
I'm using a Python script to read data from our corporate instance of JIRA. There is a value that is returned as a string and I need to figure out how to extract one bit of info from it. What I need is the 'name= ....' and I just need the numbers from that result.
<class 'list'>: ['com.atlassian.greenhopper.service.sprint.Sprint#6f68eefa[id=30943,rapidViewId=10468,state=CLOSED,name=2016.2.4 - XXXXXXXXXX,startDate=2016-05-26T08:50:57.273-07:00,endDate=2016-06-08T20:59:00.000-07:00,completeDate=2016-06-09T07:34:41.899-07:00,sequence=30943]']
I just need the 2016.2.4 portion of it. This number will not always be the same either.
Any thoughts as how to do this with RE? I'm new to regular expressions and would appreciate any help.
A simple regular expression can do the trick: name=([0-9.]+).
The primary part of the regex is ([0-9.]+) which will search for any digit (0-9) or period (.) in succession (+).
Now, to use this:
import re
pattern = re.compile('name=([0-9.]+)')
string = '''<class 'list'>: ['com.atlassian.greenhopper.service.sprint.Sprint#6f68eefa[id=30943,rapidViewId=10468,state=CLOSED,name=2016.2.4 - XXXXXXXXXX,startDate=2016-05-26T08:50:57.273-07:00,endDate=2016-06-08T20:59:00.000-07:00,completeDate=2016-06-09T07:34:41.899-07:00,sequence=30943]']'''
matches = pattern.search(string)
# Only assign the value if a match is found
name_value = '' if not matches else matches.group(1)
Use a capturing group to extract the version name:
>>> import re
>>> s = 'com.atlassian.greenhopper.service.sprint.Sprint#6f68eefa[id=30943,rapidViewId=10468,state=CLOSED,name=2016.2.4 - XXXXXXXXXX,startDate=2016-05-26T08:50:57.273-07:00,endDate=2016-06-08T20:59:00.000-07:00,completeDate=2016-06-09T07:34:41.899-07:00,sequence=30943]'
>>> re.search(r"name=([0-9.]+)", s).group(1)
'2016.2.4'
where ([0-9.]+) is a capturing group matching one or more digits or dots, parenthesis define a capturing group.
A non-regex option would involve some splitting by ,, = and -:
>>> l = [item.split("=") for item in s.split(",")]
>>> next(value[1] for value in l if value[0] == "name").split(" - ")[0]
'2016.2.4'
This, of course, needs testing and error handling.
I'm tackling a python challenge problem to find a block of text in the format xXXXxXXXx (lower vs upper case, not all X's) in a chunk like this:
jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn
I have tested the following RegEx and found it correctly matches what I am looking for from this site (http://www.regexr.com/):
'([a-z])([A-Z]){3}([a-z])([A-Z]){3}([a-z])'
However, when I try to match this expression to the block of text, it just returns the entire string:
In [1]: import re
In [2]: example = 'jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn'
In [3]: expression = re.compile(r'([a-z])([A-Z]){3}([a-z])([A-Z]){3}([a-z])')
In [4]: found = expression.search(example)
In [5]: print found.string
jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn
Any ideas? Is my expression incorrect? Also, if there is a simpler way to represent that expression, feel free to let me know. I'm fairly new to RegEx.
You need to return the match group instead of the string attribute.
>>> import re
>>> s = 'jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn'
>>> rgx = re.compile(r'[a-z][A-Z]{3}[a-z][A-Z]{3}[a-z]')
>>> found = rgx.search(s).group()
>>> print found
nJDKoJIWh
The string attribute always returns the string passed as input to the match. This is clearly documented:
string
The string passed to match() or search().
The problem has nothing to do with the matching, you're just grabbing the wrong thing from the match object. Use match.group(0) (or match.group()).
Based on xXXXxXXXx if you want upper letters with len 3 and lower with len 1 between them this is what you want :
([a-z])(([A-Z]){3}([a-z]))+
also you can get your search function with group()
print expression.search(example).group(0)
How can I match 'suck' only if not part of 'honeysuckle'?
Using lookbehind and lookahead I can match suck if not 'honeysuck' or 'suckle', but it also fails to catch something like 'honeysucker'; here the expression should match, because it doesn't end in le:
re.search(r'(?<!honey)suck(?!le)', 'honeysucker')
You need to nest the lookaround assertions:
>>> import re
>>> regex = re.compile(r"(?<!honey(?=suckle))suck")
>>> regex.search("honeysuckle")
>>> regex.search("honeysucker")
<_sre.SRE_Match object at 0x00000000029B6370>
>>> regex.search("suckle")
<_sre.SRE_Match object at 0x00000000029B63D8>
>>> regex.search("suck")
<_sre.SRE_Match object at 0x00000000029B6370>
An equivalent solution would be suck(?!(?<=honeysuck)le).
here is a solution without using regular expressions:
s = s.replace('honeysuckle','')
and now:
re.search('suck',s)
and this would work for any of these strings : honeysuckle sucks, this sucks and even regular expressions suck.
I believe you should separate your exceptions in a different Array, just in case in the future you wish to add a different rule. This will be easier to read, and will be faster in the future to change if needed.
My suggestion in Ruby is:
words = ['honeysuck', 'suckle', 'HONEYSUCKER', 'honeysuckle']
EXCEPTIONS = ['honeysuckle']
def match_suck word
if (word =~ /suck/i) != nil
# should not match any of the exceptions
return true unless EXCEPTIONS.include? word.downcase
end
false
end
words.each{ |w|
puts "Testing match of '#{w}' : #{match_suck(w)}"
}
>>>string = 'honeysucker'
>>>print 'suck' in string
True
I want to check either given words contain special character or not.
so below is my python code
The literal 'a#bcd' has '#', so it will be matchd and it's ok.
but 'a1bcd' has no special character. but it was filtered too!!
import re
regexp = re.compile('[~`!##$%^&*()-_=+\[\]{}\\|;:\'\",.<>/?]+')
if regexp.search('a#bcd') :
print 'matched!! nich catch!!'
if regexp.search('a1bcd') :
print 'something is wrong here!!!'
result :
python ../special_char.py
matched!! nich catch!!
something is wrong here!!!
I have no idea why it works like above..someone help me..T_T;;;
thanks~
Move the dash in you regular expression to the start of the [] group, like this:
regexp = re.compile('[-~`!##$%^&*()_=+\[\]{}\\|;:\'\",.<>/?]+')
Where you had the dash, it was read with the surrounding characters as )-_ and since it is inside [] it is interpreted as asking to match a range from ) to _. If you move the dash to just after the [ it has no special meaning and instead matches itself.
Here's an interactive session showing the specific problem there was in your regular expression:
>>> import re
>>> print re.search('[)-_]', 'abcd')
None
>>> print re.search('[)-_]', 'a1b')
<_sre.SRE_Match object at 0x7f71082247e8>
>>> print re.search('[)-_]', 'a1b').group(0)
1
After fixing it:
>>> print re.search('[-)_]', 'a1b')
None
Unless there's some reason not visible in your question, I'd also say that the final + is not needed.
re will be relatively slow for this
I'd suggest trying
specialchars = '''-~`!##$%^&*()_=+[]{}\\|;:'",.<>/?'''
len(word) != len(word.translate(None, specialchars))
or
set(word) & set(specialchars)