How to search and export value with python and re? - python

I'm trying to export some value from the text to a txt file.
my text has this form:
"a='one' b='2' c='3' a='two' b='8' c='3'"
I want to export all the value of the key "a"
The result must be like
one
two

The other answers are correct for your particular case, but I think a regex with lookbehind/lookahead is a more general solution, i.e.:
import re
text = "a='one' b='2' c='3' a='two' b='8' c='3'"
expr = r"(?<=a=')[^']*(?=')"
matches = re.findall(expr,text)
for m in matches:
print m ##or whatever
This will match for any expression between single quotes preceded by a=, i.e. a='xyz', a='my#1.abcd' and a='a=5%' will all match

This regex is very easy to understand:
pattern = r"a='(.*?)'"
It doesn't use lookarounds (like (?<=a=')[^']*(?=') ) - so it's very simple ..
Whole program:
#!/usr/bin/python
import re
text = "a='one' b='2' c='3' a='two' b='8' c='3'"
pattern = r"a='(.*?)'"
for m in re.findall( pattern, text ):
print m

you can use something like this:
import re
r = re.compile(r"'([a-z]+)'")
f = open('input')
text = f.read()
m = r.finditer(text)
for mm in m:
print mm.group(1)

thought i would give a solution without re:
>>> text = "a='one' b='2' c='3' a='two' b='8' c='3'"
>>> step1 = text.split(" ")
>>> step1
["a='one'", "b='2'", "c='3'", "a='two'", "b='8'", "c='3'"]
>>> step2 = []
>>> for pair in step1:
split_pair = pair.split("=")
step2.append([split_pair[0],split_pair[1]])
>>> print step2
[['a', "'one'"], ['b', "'2'"], ['c', "'3'"], ['a', "'two'"], ['b', "'8'"], ['c', "'3'"]]
>>> results = []
>>> for split_pair in step2:
if split_pair[0] == "a":
results.append(split_pair[1])
>>> results
["'one'", "'two'"]
not the most elegant method, but it works.

Another non-regex solution: you could use the shlex module and the .partition method (or .split() with maxsplit=1):
>>> import shlex
>>> s = "a='one' b='2' c='3' a='two' b='8' c='3'"
>>> shlex.split(s)
['a=one', 'b=2', 'c=3', 'a=two', 'b=8', 'c=3']
>>> shlex.split(s)[0].partition("=")
('a', '=', 'one')
and so it's simply
>>> for group in shlex.split(s):
... key, eq, val = group.partition("=")
... if key == 'a':
... print val
...
one
two
with lots of variations of the same.

Related

How to replace text between parentheses in Python?

I have a dictionary containing the following key-value pairs: d={'Alice':'x','Bob':'y','Chloe':'z'}
I want to replace the lower case variables(values) by the constants(keys) in any given string.
For example, if my string is:
A(x)B(y)C(x,z)
how do I replace the characters in order to get a resultant string of :
A(Alice)B(Bob)C(Alice,Chloe)
Should I use regular expressions?
re.sub() solution with replacement function:
import re
d = {'Alice':'x','Bob':'y','Chloe':'z'}
flipped = dict(zip(d.values(), d.keys()))
s = 'A(x)B(y)C(x,z)'
result = re.sub(r'\([^()]+\)', lambda m: '({})'.format(','.join(flipped.get(k,'')
for k in m.group().strip('()').split(','))), s)
print(result)
The output:
A(Alice)B(Bob)C(Alice,Chloe)
Extended version:
import re
def repl(m):
val = m.group().strip('()')
d = {'Alice':'x','Bob':'y','Chloe':'z'}
flipped = dict(zip(d.values(), d.keys()))
if ',' in val:
return '({})'.format(','.join(flipped.get(k,'') for k in val.split(',')))
else:
return '({})'.format(flipped.get(val,''))
s = 'A(x)B(y)C(x,z)'
result = re.sub(r'\([^()]+\)', repl, s)
print(result)
Bonus approach for particular input case A(x)B(y)C(Alice,z):
...
s = 'A(x)B(y)C(Alice,z)'
result = re.sub(r'\([^()]+\)', lambda m: '({})'.format(','.join(flipped.get(k,'') or k
for k in m.group().strip('()').split(','))), s)
print(result)
I assume you want to replace the values in a string with the respective keys of the dictionary. If my assumption is correct you can try this without using regex.
First the swap the keys and values using dictionary comprehension.
my_dict = {'Alice':'x','Bob':'y','Chloe':'z'}
my_dict = { y:x for x,y in my_dict.iteritems()}
Then using list_comprehension, you replace the values
str_ = 'A(x)B(y)C(x,z)'
output = ''.join([i if i not in my_dict.keys() else my_dict[i] for i in str_])
Hope this is what you need ;)
Code
import re
d={'Alice':'x','Bob':'y','Chloe':'z'}
keys = d.keys()
values = d.values()
s = "A(x)B(y)C(x,z)"
for i in range(0, len(d.keys())):
rx = r"" + re.escape(values[i])
s = re.sub(rx, keys[i], s)
print s
Output
A(Alice)B(Bob)C(Alice,Chloe)
Also you could use the replace method in python like this:
d={'x':'Alice','y':'Bob','z':'Chloe'}
str = "A(x)B(y)C(x,z)"
for key in d:
str = str.replace(key,d[key])
print (str)
But yeah you should swipe your dictionary values like Kishore suggested.
This is the way that I would do it:
import re
def sub_args(text, tosub):
ops = '|'.join(tosub.keys())
for argstr, _ in re.findall(r'(\(([%s]+?,?)+\))' % ops, text):
args = argstr[1:-1].split(',')
args = [tosub[a] for a in args]
subbed = '(%s)' % ','.join(map(str, args))
text = re.sub(re.escape(argstr), subbed, text)
return text
text = 'A(x)B(y)C(x,z)'
tosub = {
'x': 'Alice',
'y': 'Bob',
'z': 'Chloe'
}
print(sub_args(text, tosub))
Basically you just use the regex pattern to find all of the argument groups and substitute in the proper values--the nice thing about this approach is that you don't have to worry about subbing where you don't want to (for example, if you had a string like 'Fn(F,n)'). You can also have multi-character keys, like 'F(arg1,arg2)'.

how to write a regex to parse refs/changes

I have the following as input. I am trying to write a regular expression which yields the below output. Can anyone provide
input on how to do this?
INPUT:-
refs/changes/44/1025744/3
refs/changes/62/1025962/5
refs/changes/45/913745/2
OUTPUT:-
1025744/3
1025962/5
913745/2
If that is the actual import format, a regex is not needed:
>>> source = """\
... refs/changes/44/1025744/3
... refs/changes/62/1025962/5
... refs/changes/45/913745/2
... """
>>> output = [line.split('/', 3)[-1] for line in source.splitlines()]
>>> output[0]
'1025744/3'
>>> output[1]
'1025962/5'
You can also have them all in one string, like this:
>>> ' '.join(line.split('/', 3)[-1] for line in source.splitlines())
'1025744/3 1025962/5 913745/2'
If you are feeding the input line by line, you could do this:
import re
instr = 'refs/changes/44/1025744/3'
print get_match(instr)
def get_match():
match = re.match("^(refs/changes/[0-9]*/)([0-9/]*)", instr)
if match:
return match.group(2)
>>> import re
>>> input="""refs/changes/44/1025744/3
... refs/changes/62/1025962/5
... refs/changes/45/913745/2"""
>>> res=re.findall(r'.*/.*/.*/(.*/.*)',input)
>>> for i in res:
... print i
...
1025744/3
1025962/5
913745/2

python find digits with leading '_v'

Is there a better way of finding digits in a string which starts with '_v' which stands for version number? What I want is just '001'
filename = 'greatv02_v001_jam.mb'
parts = re.split('_v|\_',filename)
>>['greatv02', '001', 'jam.mb']
b = re.findall(r'\d+', filename)
>>['02', '001']
Is there a way to split a string with something along these lines?
parts = re.split('_v###_',filename)
or
parts = re.split('_v*_',filename)
You could use lookarounds:
>>> filename = 'greatv02_v001_jam.mb'
>>> import re
>>> re.findall(r'(?<=_v)\d+', filename)
['001']
>>>
>>> filename = 'greatv02_v001_av456jam.mb'
>>> re.findall(r'(?<=_v)\d+', filename)
['001']
>>> filename = 'greatv02_v001_v456jam.mb'
>>> re.findall(r'(?<=_v)\d+', filename)
['001', '456']
>>>
Ugly, but you could partition the file name twice
>>> filename.partition('_v')[2].partition('_')[0]
'001'
Use regex's grouping like this:
.*_v(\d+).*
Demo:
>>> filename = 'greatv02_v001_jam.mb'
>>> pattern = re.compile(r'.*_v(\d+).*')
>>> re.search(pattern, filename).group(1)
'001'
How about the regex _v(?P<version>\d+).*:
>>> regex = re.compile("_v(?P<version>\d+).*")
>>> r = regex.search(string)
# List the groups found
>>> r.groups()
(u'001',)
# List the named dictionary objects found
>>> r.groupdict()
{u'version': u'001'}
# Run findall
>>> regex.findall(string)
[u'001']
# Run timeit test
>>> setup = ur"import re; regex =re.compile("_v(?P<version>\d+).*");string="""greatv02_v00 ...
>>> t = timeit.Timer('regex.search(string)',setup)
>>> t.timeit(10000)
0.005126953125

Build a dictionary from successful regex matches in python

I'm pretty new to Python, and I'm trying to parse a file. Only certain lines in the file contain data of interest, and I want to end up with a dictionary of the stuff parsed from valid matching lines in the file.
The code below works, but it's a bit ugly and I'm trying to learn how it should be done, perhaps with a comprehension, or else with a multiline regex. I'm using Python 3.2.
file_data = open('x:\\path\\to\\file','r').readlines()
my_list = []
for line in file_data:
# discard lines which don't match at all
if re.search(pattern, line):
# icky, repeating search!!
one_tuple = re.search(pattern, line).group(3,2)
my_list.append(one_tuple)
my_dict = dict(my_list)
Can you suggest a better implementation?
Thanks for the replies. After putting them together I got
file_data = open('x:\\path\\to\\file','r').read()
my_list = re.findall(pattern, file_data, re.MULTILINE)
my_dict = {c:b for a,b,c in my_list}
but I don't think I could have gotten there today without the help.
Here's some quick'n'dirty optimisations to your code:
my_dict = dict()
with open(r'x:\path\to\file', 'r') as data:
for line in data:
match = re.search(pattern, line)
if match:
one_tuple = match.group(3, 2)
my_dict[one_tuple[0]] = one_tuple[1]
In the spirit of EAFP I'd suggest
with open(r'x:\path\to\file', 'r') as data:
for line in data:
try:
m = re.search(pattern, line)
my_dict[m.group(2)] = m.group(3)
except AttributeError:
pass
Another way is to keep using lists, but redesign the pattern so that it contains only two groups (key, value). Then you could simply do:
matches = [re.findall(pattern, line) for line in data]
mydict = dict(x[0] for x in matches if x)
matchRes = pattern.match(line)
if matchRes:
my_dict = matchRes.groupdict()
I'm not sure I'd recommend it, but here's a way you could try to use a comprehension instead(I substituted a string for the file for simplicity)
>>> import re
>>> data = """1foo bar
... 2bing baz
... 3spam eggs
... nomatch
... """
>>> pattern = r"(.)(\w+)\s(\w+)"
>>> {x[0]: x[1] for x in (m.group(3, 2) for m in (re.search(pattern, line) for line in data.splitlines()) if m)}
{'baz': 'bing', 'eggs': 'spam', 'bar': 'foo'}

Python - using variable in regular expression

I'm trying to change width value in the string using regex.
Width can have multiple formats, such as width="500px" or width:500px or width=500px (without quotes), etc..
Currently, I'm searching and replacing individually like this
p = re.compile('width="\w{3}')
embed_url = p.sub('width="555', embed_url)
# width:"555
p = re.compile('width:"\w{3}')
embed_url = p.sub('width:"555', embed_url)
Is there any way to use one regular expression and replace strings : or = accordingly?
EDIT
Changed the above code, so : and = is changed accordingly, instead of replacing all of them with "="
Try this regex:
width(:|=)"?\w{3}
it matches:
width="300px
width=300px
width:"300px
width:300px
You can use | and grouping like this:
>>> p = re.compile('width(:|=)"\w{3}')
>>> print(p.sub('width="555', 'width="500px"'))
width="555px"
>>> print(p.sub('width="555', 'width:"500px"'))
width="555px"
If you include quotes / apostrophes in a group, you can do this:
>>> p = re.compile('width(:|=)("|\')\w{3}')
>>> print(p.sub('width="555', 'width:"500px"'))
width="555px"
>>> print(p.sub('width="555', 'width:\'500px"'))
width="555px"
>>> print(p.sub('width="555', 'width="500px"'))
width="555px"
>>> print(p.sub('width="555', 'width=\'500px"'))
width="555px"
Adding a ? will make the previous element optional:
>>> p = re.compile('width(:|=)("|\')?\w{3}')
>>> print(p.sub('width="555', 'width="500px"'))
width="555px"
>>> print(p.sub('width="555', 'width=500px'))
width="555px
Hope this helps.

Categories