I have the following as input. I am trying to write a regular expression which yields the below output. Can anyone provide
input on how to do this?
INPUT:-
refs/changes/44/1025744/3
refs/changes/62/1025962/5
refs/changes/45/913745/2
OUTPUT:-
1025744/3
1025962/5
913745/2
If that is the actual import format, a regex is not needed:
>>> source = """\
... refs/changes/44/1025744/3
... refs/changes/62/1025962/5
... refs/changes/45/913745/2
... """
>>> output = [line.split('/', 3)[-1] for line in source.splitlines()]
>>> output[0]
'1025744/3'
>>> output[1]
'1025962/5'
You can also have them all in one string, like this:
>>> ' '.join(line.split('/', 3)[-1] for line in source.splitlines())
'1025744/3 1025962/5 913745/2'
If you are feeding the input line by line, you could do this:
import re
instr = 'refs/changes/44/1025744/3'
print get_match(instr)
def get_match():
match = re.match("^(refs/changes/[0-9]*/)([0-9/]*)", instr)
if match:
return match.group(2)
>>> import re
>>> input="""refs/changes/44/1025744/3
... refs/changes/62/1025962/5
... refs/changes/45/913745/2"""
>>> res=re.findall(r'.*/.*/.*/(.*/.*)',input)
>>> for i in res:
... print i
...
1025744/3
1025962/5
913745/2
Related
I would like to find extension "COM" from a sentence using regex in python.
>>> import re
>>> str = 'finding exstention from string on http://domain.coms/index/page/2'
>>> pattern = re.compile(r'([^\s.\s\:]+\.[^\.\s\:]*)')
>>>
Result:
domain : 'domain.com' ### notes: not domain.coms
url : 'http://domain.coms/index/page/2'
may be you are looking for this:
>>> import re
>>> str = 'finding exstention from string on http://domain.coms/index/page/2'
>>> pattern = re.compile(r'([^\/]*\.(?:com|en|org))')
>>> m = pattern.search(str)
>>> print m.group(1)
domain.com
((?:https?:\/\/)?(?:([^\s.\s\:]+\.[^\/]*)(?:\/|$)[^\.\s\:]*))
Try this.Group 1 will be the url.Group 2 will be the domain.
See demo.
http://regex101.com/r/sK8oK9/1
You could try the below.
>>> s = "finding exstention from string on http://domain.coms/index/page/2"
>>> m = re.search(r'(\S+?([^/.]+\.[^/]+)\S+)', s).group(1)
>>> m
'http://domain.coms/index/page/2'
>>> m = re.search(r'(\S+?([^/.]+\.[^/]+)\S+)', s).group(2)
>>> m
'domain.coms'
Is there a better way of finding digits in a string which starts with '_v' which stands for version number? What I want is just '001'
filename = 'greatv02_v001_jam.mb'
parts = re.split('_v|\_',filename)
>>['greatv02', '001', 'jam.mb']
b = re.findall(r'\d+', filename)
>>['02', '001']
Is there a way to split a string with something along these lines?
parts = re.split('_v###_',filename)
or
parts = re.split('_v*_',filename)
You could use lookarounds:
>>> filename = 'greatv02_v001_jam.mb'
>>> import re
>>> re.findall(r'(?<=_v)\d+', filename)
['001']
>>>
>>> filename = 'greatv02_v001_av456jam.mb'
>>> re.findall(r'(?<=_v)\d+', filename)
['001']
>>> filename = 'greatv02_v001_v456jam.mb'
>>> re.findall(r'(?<=_v)\d+', filename)
['001', '456']
>>>
Ugly, but you could partition the file name twice
>>> filename.partition('_v')[2].partition('_')[0]
'001'
Use regex's grouping like this:
.*_v(\d+).*
Demo:
>>> filename = 'greatv02_v001_jam.mb'
>>> pattern = re.compile(r'.*_v(\d+).*')
>>> re.search(pattern, filename).group(1)
'001'
How about the regex _v(?P<version>\d+).*:
>>> regex = re.compile("_v(?P<version>\d+).*")
>>> r = regex.search(string)
# List the groups found
>>> r.groups()
(u'001',)
# List the named dictionary objects found
>>> r.groupdict()
{u'version': u'001'}
# Run findall
>>> regex.findall(string)
[u'001']
# Run timeit test
>>> setup = ur"import re; regex =re.compile("_v(?P<version>\d+).*");string="""greatv02_v00 ...
>>> t = timeit.Timer('regex.search(string)',setup)
>>> t.timeit(10000)
0.005126953125
I have this code:
while i<len(line):
if re.findall(pattern, line[i]):
k,v = line[i].split('=')
print k
token = dict(k=v)
print token
break
and the result I'm getting is :
ptk
{'k': 'ptk_first'}
how to make this few lines of code nicer and dictionary that will look like this:
{'ptk': 'ptk_first'}
for line in lines:
if re.match(pattern, line):
k,v = line.split('=')
token = {k:v}
print token
Something like this:
lines="""\
key1=data on the rest of line 1
key2=data on the rest of line 2
key3=data on line 3"""
d={}
for line in lines.splitlines():
k,v=line.split('=')
d[k]=v
print d
In [112]: line="ptk=ptk_first"
In [113]: dict([line.split("=")])
Out[113]: {'ptk': 'ptk_first'}
for your code:
for line in lines:
if re.findall(pattern, line):
token = dict([line.split("=")])
print token
with regex you can try this:
>>> import re
>>> lines="""
... ptk=ptk_first
... ptk1=ptk_second
... """
>>> dict(re.findall('(\w+)=(\w+)',lines,re.M))
{'ptk1': 'ptk_second', 'ptk': 'ptk_first'}
I'm trying to export some value from the text to a txt file.
my text has this form:
"a='one' b='2' c='3' a='two' b='8' c='3'"
I want to export all the value of the key "a"
The result must be like
one
two
The other answers are correct for your particular case, but I think a regex with lookbehind/lookahead is a more general solution, i.e.:
import re
text = "a='one' b='2' c='3' a='two' b='8' c='3'"
expr = r"(?<=a=')[^']*(?=')"
matches = re.findall(expr,text)
for m in matches:
print m ##or whatever
This will match for any expression between single quotes preceded by a=, i.e. a='xyz', a='my#1.abcd' and a='a=5%' will all match
This regex is very easy to understand:
pattern = r"a='(.*?)'"
It doesn't use lookarounds (like (?<=a=')[^']*(?=') ) - so it's very simple ..
Whole program:
#!/usr/bin/python
import re
text = "a='one' b='2' c='3' a='two' b='8' c='3'"
pattern = r"a='(.*?)'"
for m in re.findall( pattern, text ):
print m
you can use something like this:
import re
r = re.compile(r"'([a-z]+)'")
f = open('input')
text = f.read()
m = r.finditer(text)
for mm in m:
print mm.group(1)
thought i would give a solution without re:
>>> text = "a='one' b='2' c='3' a='two' b='8' c='3'"
>>> step1 = text.split(" ")
>>> step1
["a='one'", "b='2'", "c='3'", "a='two'", "b='8'", "c='3'"]
>>> step2 = []
>>> for pair in step1:
split_pair = pair.split("=")
step2.append([split_pair[0],split_pair[1]])
>>> print step2
[['a', "'one'"], ['b', "'2'"], ['c', "'3'"], ['a', "'two'"], ['b', "'8'"], ['c', "'3'"]]
>>> results = []
>>> for split_pair in step2:
if split_pair[0] == "a":
results.append(split_pair[1])
>>> results
["'one'", "'two'"]
not the most elegant method, but it works.
Another non-regex solution: you could use the shlex module and the .partition method (or .split() with maxsplit=1):
>>> import shlex
>>> s = "a='one' b='2' c='3' a='two' b='8' c='3'"
>>> shlex.split(s)
['a=one', 'b=2', 'c=3', 'a=two', 'b=8', 'c=3']
>>> shlex.split(s)[0].partition("=")
('a', '=', 'one')
and so it's simply
>>> for group in shlex.split(s):
... key, eq, val = group.partition("=")
... if key == 'a':
... print val
...
one
two
with lots of variations of the same.
I'm pretty new to Python, and I'm trying to parse a file. Only certain lines in the file contain data of interest, and I want to end up with a dictionary of the stuff parsed from valid matching lines in the file.
The code below works, but it's a bit ugly and I'm trying to learn how it should be done, perhaps with a comprehension, or else with a multiline regex. I'm using Python 3.2.
file_data = open('x:\\path\\to\\file','r').readlines()
my_list = []
for line in file_data:
# discard lines which don't match at all
if re.search(pattern, line):
# icky, repeating search!!
one_tuple = re.search(pattern, line).group(3,2)
my_list.append(one_tuple)
my_dict = dict(my_list)
Can you suggest a better implementation?
Thanks for the replies. After putting them together I got
file_data = open('x:\\path\\to\\file','r').read()
my_list = re.findall(pattern, file_data, re.MULTILINE)
my_dict = {c:b for a,b,c in my_list}
but I don't think I could have gotten there today without the help.
Here's some quick'n'dirty optimisations to your code:
my_dict = dict()
with open(r'x:\path\to\file', 'r') as data:
for line in data:
match = re.search(pattern, line)
if match:
one_tuple = match.group(3, 2)
my_dict[one_tuple[0]] = one_tuple[1]
In the spirit of EAFP I'd suggest
with open(r'x:\path\to\file', 'r') as data:
for line in data:
try:
m = re.search(pattern, line)
my_dict[m.group(2)] = m.group(3)
except AttributeError:
pass
Another way is to keep using lists, but redesign the pattern so that it contains only two groups (key, value). Then you could simply do:
matches = [re.findall(pattern, line) for line in data]
mydict = dict(x[0] for x in matches if x)
matchRes = pattern.match(line)
if matchRes:
my_dict = matchRes.groupdict()
I'm not sure I'd recommend it, but here's a way you could try to use a comprehension instead(I substituted a string for the file for simplicity)
>>> import re
>>> data = """1foo bar
... 2bing baz
... 3spam eggs
... nomatch
... """
>>> pattern = r"(.)(\w+)\s(\w+)"
>>> {x[0]: x[1] for x in (m.group(3, 2) for m in (re.search(pattern, line) for line in data.splitlines()) if m)}
{'baz': 'bing', 'eggs': 'spam', 'bar': 'foo'}