python find digits with leading '_v' - python

Is there a better way of finding digits in a string which starts with '_v' which stands for version number? What I want is just '001'
filename = 'greatv02_v001_jam.mb'
parts = re.split('_v|\_',filename)
>>['greatv02', '001', 'jam.mb']
b = re.findall(r'\d+', filename)
>>['02', '001']
Is there a way to split a string with something along these lines?
parts = re.split('_v###_',filename)
or
parts = re.split('_v*_',filename)

You could use lookarounds:
>>> filename = 'greatv02_v001_jam.mb'
>>> import re
>>> re.findall(r'(?<=_v)\d+', filename)
['001']
>>>
>>> filename = 'greatv02_v001_av456jam.mb'
>>> re.findall(r'(?<=_v)\d+', filename)
['001']
>>> filename = 'greatv02_v001_v456jam.mb'
>>> re.findall(r'(?<=_v)\d+', filename)
['001', '456']
>>>

Ugly, but you could partition the file name twice
>>> filename.partition('_v')[2].partition('_')[0]
'001'

Use regex's grouping like this:
.*_v(\d+).*
Demo:
>>> filename = 'greatv02_v001_jam.mb'
>>> pattern = re.compile(r'.*_v(\d+).*')
>>> re.search(pattern, filename).group(1)
'001'

How about the regex _v(?P<version>\d+).*:
>>> regex = re.compile("_v(?P<version>\d+).*")
>>> r = regex.search(string)
# List the groups found
>>> r.groups()
(u'001',)
# List the named dictionary objects found
>>> r.groupdict()
{u'version': u'001'}
# Run findall
>>> regex.findall(string)
[u'001']
# Run timeit test
>>> setup = ur"import re; regex =re.compile("_v(?P<version>\d+).*");string="""greatv02_v00 ...
>>> t = timeit.Timer('regex.search(string)',setup)
>>> t.timeit(10000)
0.005126953125

Related

Match a string and replace remaning characters

I would like to extract the server type from the hostname and replace the remaining characters with underscores so I can then use it with the LIKE pattern match in SQLite
My initial approach was something like this (this is the excepted output):
>>> host = 'webus01'
>>> location = 'us'
>>> parts = list(host.partition(location))
>>> parts
['web', 'us', '01']
>>> parts[1] = "_" * len(parts[1])
>>> parts[2] = "_" * len(parts[2])
>>> "".join(parts) + ".%"
'web____.%'
But this won't work if the hostname starts with or contains the location name:
>>> host = 'utilityit01'
>>> pod = 'it'
>>> parts = list(host.partition(location))
>>> parts
['utilityit01', '', '']
>>> parts[1] = "_" * len(parts[1])
>>> parts[2] = "_" * len(parts[2])
>>> "".join(parts) + ".%"
'utilityit01.%'
Then I though it will be better to use RegEx for this to match only the location before digits.
The re.sub function seems to be a good candidate for this task but I'm not sure how to replace all characters instead of the match group as a whole:
>>> import re
>>> re.sub(r'it\d+.*', '_', 'utilityit01a')
'utility_'
The output in this case should be: utility_____.%.
Looks like you need.
import re
print(re.sub(r'(it\d+.*)', lambda x: '_'*len(x.group(1))+".%" , 'utilityit01a'))
print(re.sub(r'(us\d+.*)', lambda x: '_'*len(x.group(1))+".%" , 'webus01'))
Output:
utility_____.%
web____.%
You can find the substring that you want to replace first, and then replace it based on its length:
>>> import re
>>> host = 'utilityit01'
>>> substr = re.findall(r'it\d+.*', host)
>>> substr
['it01a']
>>> host.replace(substr[0], len(substr)*'_')
'utility_____'

how to write a regex to parse refs/changes

I have the following as input. I am trying to write a regular expression which yields the below output. Can anyone provide
input on how to do this?
INPUT:-
refs/changes/44/1025744/3
refs/changes/62/1025962/5
refs/changes/45/913745/2
OUTPUT:-
1025744/3
1025962/5
913745/2
If that is the actual import format, a regex is not needed:
>>> source = """\
... refs/changes/44/1025744/3
... refs/changes/62/1025962/5
... refs/changes/45/913745/2
... """
>>> output = [line.split('/', 3)[-1] for line in source.splitlines()]
>>> output[0]
'1025744/3'
>>> output[1]
'1025962/5'
You can also have them all in one string, like this:
>>> ' '.join(line.split('/', 3)[-1] for line in source.splitlines())
'1025744/3 1025962/5 913745/2'
If you are feeding the input line by line, you could do this:
import re
instr = 'refs/changes/44/1025744/3'
print get_match(instr)
def get_match():
match = re.match("^(refs/changes/[0-9]*/)([0-9/]*)", instr)
if match:
return match.group(2)
>>> import re
>>> input="""refs/changes/44/1025744/3
... refs/changes/62/1025962/5
... refs/changes/45/913745/2"""
>>> res=re.findall(r'.*/.*/.*/(.*/.*)',input)
>>> for i in res:
... print i
...
1025744/3
1025962/5
913745/2

Find a regular extension of domains on a string python Regex

I would like to find extension "COM" from a sentence using regex in python.
>>> import re
>>> str = 'finding exstention from string on http://domain.coms/index/page/2'
>>> pattern = re.compile(r'([^\s.\s\:]+\.[^\.\s\:]*)')
>>>
Result:
domain : 'domain.com' ### notes: not domain.coms
url : 'http://domain.coms/index/page/2'
may be you are looking for this:
>>> import re
>>> str = 'finding exstention from string on http://domain.coms/index/page/2'
>>> pattern = re.compile(r'([^\/]*\.(?:com|en|org))')
>>> m = pattern.search(str)
>>> print m.group(1)
domain.com
((?:https?:\/\/)?(?:([^\s.\s\:]+\.[^\/]*)(?:\/|$)[^\.\s\:]*))
Try this.Group 1 will be the url.Group 2 will be the domain.
See demo.
http://regex101.com/r/sK8oK9/1
You could try the below.
>>> s = "finding exstention from string on http://domain.coms/index/page/2"
>>> m = re.search(r'(\S+?([^/.]+\.[^/]+)\S+)', s).group(1)
>>> m
'http://domain.coms/index/page/2'
>>> m = re.search(r'(\S+?([^/.]+\.[^/]+)\S+)', s).group(2)
>>> m
'domain.coms'

How to search and export value with python and re?

I'm trying to export some value from the text to a txt file.
my text has this form:
"a='one' b='2' c='3' a='two' b='8' c='3'"
I want to export all the value of the key "a"
The result must be like
one
two
The other answers are correct for your particular case, but I think a regex with lookbehind/lookahead is a more general solution, i.e.:
import re
text = "a='one' b='2' c='3' a='two' b='8' c='3'"
expr = r"(?<=a=')[^']*(?=')"
matches = re.findall(expr,text)
for m in matches:
print m ##or whatever
This will match for any expression between single quotes preceded by a=, i.e. a='xyz', a='my#1.abcd' and a='a=5%' will all match
This regex is very easy to understand:
pattern = r"a='(.*?)'"
It doesn't use lookarounds (like (?<=a=')[^']*(?=') ) - so it's very simple ..
Whole program:
#!/usr/bin/python
import re
text = "a='one' b='2' c='3' a='two' b='8' c='3'"
pattern = r"a='(.*?)'"
for m in re.findall( pattern, text ):
print m
you can use something like this:
import re
r = re.compile(r"'([a-z]+)'")
f = open('input')
text = f.read()
m = r.finditer(text)
for mm in m:
print mm.group(1)
thought i would give a solution without re:
>>> text = "a='one' b='2' c='3' a='two' b='8' c='3'"
>>> step1 = text.split(" ")
>>> step1
["a='one'", "b='2'", "c='3'", "a='two'", "b='8'", "c='3'"]
>>> step2 = []
>>> for pair in step1:
split_pair = pair.split("=")
step2.append([split_pair[0],split_pair[1]])
>>> print step2
[['a', "'one'"], ['b', "'2'"], ['c', "'3'"], ['a', "'two'"], ['b', "'8'"], ['c', "'3'"]]
>>> results = []
>>> for split_pair in step2:
if split_pair[0] == "a":
results.append(split_pair[1])
>>> results
["'one'", "'two'"]
not the most elegant method, but it works.
Another non-regex solution: you could use the shlex module and the .partition method (or .split() with maxsplit=1):
>>> import shlex
>>> s = "a='one' b='2' c='3' a='two' b='8' c='3'"
>>> shlex.split(s)
['a=one', 'b=2', 'c=3', 'a=two', 'b=8', 'c=3']
>>> shlex.split(s)[0].partition("=")
('a', '=', 'one')
and so it's simply
>>> for group in shlex.split(s):
... key, eq, val = group.partition("=")
... if key == 'a':
... print val
...
one
two
with lots of variations of the same.

Python - using variable in regular expression

I'm trying to change width value in the string using regex.
Width can have multiple formats, such as width="500px" or width:500px or width=500px (without quotes), etc..
Currently, I'm searching and replacing individually like this
p = re.compile('width="\w{3}')
embed_url = p.sub('width="555', embed_url)
# width:"555
p = re.compile('width:"\w{3}')
embed_url = p.sub('width:"555', embed_url)
Is there any way to use one regular expression and replace strings : or = accordingly?
EDIT
Changed the above code, so : and = is changed accordingly, instead of replacing all of them with "="
Try this regex:
width(:|=)"?\w{3}
it matches:
width="300px
width=300px
width:"300px
width:300px
You can use | and grouping like this:
>>> p = re.compile('width(:|=)"\w{3}')
>>> print(p.sub('width="555', 'width="500px"'))
width="555px"
>>> print(p.sub('width="555', 'width:"500px"'))
width="555px"
If you include quotes / apostrophes in a group, you can do this:
>>> p = re.compile('width(:|=)("|\')\w{3}')
>>> print(p.sub('width="555', 'width:"500px"'))
width="555px"
>>> print(p.sub('width="555', 'width:\'500px"'))
width="555px"
>>> print(p.sub('width="555', 'width="500px"'))
width="555px"
>>> print(p.sub('width="555', 'width=\'500px"'))
width="555px"
Adding a ? will make the previous element optional:
>>> p = re.compile('width(:|=)("|\')?\w{3}')
>>> print(p.sub('width="555', 'width="500px"'))
width="555px"
>>> print(p.sub('width="555', 'width=500px'))
width="555px
Hope this helps.

Categories