python re.compile() and re.findall() - python

So I try to print only the month, and when I use :
regex = r'([a-z]+) \d+'
re.findall(regex, 'june 15')
And it prints : june
But when I try to do the same for a list like this :
regex = re.compile(r'([a-z]+) \d+')
l = ['june 15', 'march 10', 'july 4']
filter(regex.findall, l)
it prints the same list like they didn't take in count the fact that I don't want the number.

Use map instead of filter like this example:
import re
a = ['june 15', 'march 10', 'july 4']
regex = re.compile(r'([a-z]+) \d+')
# Or with a list comprehension
# output = [regex.findall(k) for k in a]
output = list(map(lambda x: regex.findall(x), a))
print(output)
Output:
[['june'], ['march'], ['july']]
Bonus:
In order to flatten the list of lists you can do:
output = [elm for k in a for elm in regex.findall(k)]
# Or:
# output = list(elm for k in map(lambda x: regex.findall(x), a) for elm in k)
print(output)
Output:
['june', 'march', 'july']

Related

How do I remove a specific string from some strings in a list?

I've been trying to figure out a more universal fix for my code and having a hard time with it. This is what I have:
lst = ['Thursday, June ##', 'some string', 'another string', 'etc', 'Friday, June ##', 'more strings', 'etc']
I'm trying to remove everything after the comma in the strings that contain commas (which can only be the day of the week strings).
My current fix that works is:
new_lst = [x[:-9] if ',' in x else x for x in lst]
But this won't work for every month since they're not always going to be a 4 letter string ('June'). I've tried splitting at the commas and then removing any string that starts with a space but it wasn't working properly so I'm not sure what I'm doing wrong.
We can use a list comprehension along with split() here:
lst = ['Thursday, June ##', 'some string', 'another string', 'etc', 'Friday, June ##', 'more strings', 'etc']
output = [x.split(',', 1)[0] for x in lst]
print(output)
# ['Thursday', 'some string', 'another string', 'etc', 'Friday', 'more strings', 'etc']
With regex:
>>> import re
>>> lst = [re.sub(r',.*', '', x) for x in lst]
>>> lst
['Thursday,', 'some string', 'another string', 'etc', 'Friday,', 'more strings', 'etc']
However, this is slower than the split answer
You can use re.search in the following way:
import re
lst = ['Thursday, June ##', 'some string', 'another string', 'etc', 'Friday, June ##', 'more strings', 'etc']
for i, msg in enumerate(lst):
match = re.search(",", msg)
if match != None:
lst[i] = msg[:match.span()[0]]
print(lst)
Output:
['Thursday', 'some string', 'another string', 'etc', 'Friday', 'more strings', 'etc']

parse parenthesized numbers to negative numbers

How can i Parse parenthesized numbers in a list of strings to negative numbers (or strings with negative sign).
example
input
list1= ['abcd','(1,234)','Level-2 (2):','(31)%', 'others','(3,102.2)%']
output
['abcd',-1234,'Level-2 (2):','-31%', 'others','-3102.2%']
strings only with numbers inside parenthesis or numbers with comma/dot inside parenthesis followed by a percentage(%) sign, should be parsed . other strings such as 'Level-2 (2):' should not be parsed.
I have tried
translator = str.maketrans(dict.fromkeys('(),'))
['-'+(x.translate(translator)) for x in list1]
but the output is (every element has a - appended)
['-abcd', '-1234', '-Level-2 2:', '-31%', '-others', '-3102.2%']
You can try using re.sub, eg:
import re
list1 = ['abcd','(1,234)','Level-2 (2):','(31)%', 'others','(3,102.2)%']
res = [re.sub(r'^\(([\d+.,]+)\)(%?)$', r'-\1\2', el) for el in list1]
# ['abcd', '-1,234', 'Level-2 (2):', '-31%', 'others', '-3,102.2%']
Try using re.match
Ex:
import re
list1= ['abcd','(1,234)','Level-2 (2):','(31)%', 'others','(31.2)%']
result = []
for i in list1:
m = re.match(r"\((\d+[.,]?\d*)\)(%?)", i)
if m:
result.append("-" + m.group(1)+m.group(2))
else:
result.append(i)
print(result)
Output:
['abcd', '-1,234', 'Level-2 (2):', '-31%', 'others', '-31.2%']
Update as per comment
import re
list1 = ['abcd','(1,234)','Level-2 (2):','(31)%', 'others','(3,102.2)%']
result = []
for i in list1:
m = re.match(r"\((\d+(?:,\d+)*(?:\.\d+)?)\)(%?)", i)
if m:
result.append("-" + m.group(1).replace(",", "")+m.group(2))
else:
result.append(i)
print(result)
Output:
['abcd', '-1234', 'Level-2 (2):', '-31%', 'others', '-3102.2%']
If you do not need to convert the value to int or float, re.match and str.translate should do the trick:
rx = re.compile('\([\d,.]+\)%?$')
tab = str.maketrans({i: None for i in '(),'})
output = ['-' + i.translate(tab) if rx.match(i) else i for i in list1]
It gives:
['abcd', '-1234', 'Level-2 (2):', '-31%', 'others', '-3102.2%']
for item in list1:
idx = list1.index(item)
list1[idx] = '-' + list1[idx].replace('(','').replace(')','').replace(',','')
print (list1)
output:
['-abcd', '-1234', '-Level-2 2:', '-31%', '-others', '-3102.2%']
or just:
list1= ['abcd','(1,234)','Level-2 (2):','(31)%', 'others','(3,102.2)%']
print (['-' + item.replace('(','').replace(')','').replace(',','') for item in list1])
output:
['-abcd', '-1234', '-Level-2 2:', '-31%', '-others', '-3102.2%']

Convert each item of list of tuples to string

How to convert
[('a',1),('b',3)]
to
[('a','1'),('b','3')]
My end goal is to get:
['a 1','b 3']
I tried:
[' '.join(col).strip() for col in [('a',1),('b',3)]]
and
[' '.join(str(col)).strip() for col in [('a',1),('b',3)]]
This ought to do it:
>>> x = [('a',1),('b',3)]
>>> [' '.join(str(y) for y in pair) for pair in x]
['a 1', 'b 3']
If you want to avoid the list comprehensions in jme's answer:
mylist = [('a',1),('b',3)]
map(lambda xs: ' '.join(map(str, xs)), mylist)

combine list of lists in python (similar to string.join but as a list comprehension?) [duplicate]

This question already has answers here:
How do I make a flat list out of a list of lists?
(34 answers)
Closed 9 years ago.
If you had a long list of lists in the format [['A',1,2],['B',3,4]] and you wanted to combine it into ['A, 1, 2', 'B, 3, 4'] is there a easy list comprehension way to do so?
I do it like this:
this_list = [['A',1,2],['B',3,4]]
final = list()
for x in this_list:
final.append(', '.join([str(x) for x in x]))
But is this possible to be done as a one-liner?
Thanks for the answers. I like the map() based one. I have a followup question - if the sublists were instead of the format ['A',0.111,0.123456] would it be possible to include a string formatting section in the list comprehension to truncate such as to get out 'A, 0.1, 0.12'
Once again with my ugly code it would be like:
this_list = [['A',0.111,0.12345],['B',0.1,0.2]]
final = list()
for x in this_list:
x = '{}, {:.1f}, {:.2f}'.format(x[0], x[1], x[2])
final.append(x)
I solved my own question:
values = ['{}, {:.2f}, {:.3f}'.format(c,i,f) for c,i,f in values]
>>> lis = [['A',1,2],['B',3,4]]
>>> [', '.join(map(str, x)) for x in lis ]
['A, 1, 2', 'B, 3, 4']
You can use nested list comprehensions with str.join:
>>> lst = [['A',1,2],['B',3,4]]
>>> [", ".join([str(y) for y in x]) for x in lst]
['A, 1, 2', 'B, 3, 4']
>>>
li = [['A',1,2],['B',3,4],['A',0.111,0.123456]]
print [', '.join(map(str,sli)) for sli in li]
def func(x):
try:
return str(int(str(x)))
except:
try:
return '%.2f' % float(str(x))
except:
return str(x)
print map(lambda subli: ', '.join(map(func,subli)) , li)
return
['A, 1, 2', 'B, 3, 4', 'A, 0.111, 0.123456']
['A, 1, 2', 'B, 3, 4', 'A, 0.11, 0.12']

Pythonic way of sorting a list

For example i would have a list of of
lists = ['jack 20', 'ben 10', 'alisdar 50', 'ollie 35']
and I would need to sort it so based on the number,
lists.sort() = ['ben 10', 'jack 20', 'ollie 35', 'alisdar 50']
Possible somehow use formatting with split()?
Use a key function:
lists.sort(key=lambda s: int(s.rsplit(None, 1)[-1]))
The key callable is passed each and every element in lists and that element is sorted according to the return value. In this case we
split once on whitespace, starting on the right
take the last element of the split
turn that into an integer
The argument to key can be any callable, but a lambda is just more compact. You can try it out in the command prompt:
>>> key_function = lambda s: int(s.rsplit(None, 1)[-1])
>>> key_function('ben 10')
10
>>> key_function('Guido van Rossum 42')
42
In effect, when sorting the values are augmented with the return value of that function, and what is sorted is:
[(20, 0, 'jack 20'), (10, 1, 'ben 10'), (50, 2, 'alisdar 50'), (35, 3, 'ollie 35')]
instead (with the second value, the index of the element, added to keep the original order in case of equal sort keys).
Result:
>>> lists = ['jack 20', 'ben 10', 'alisdar 50', 'ollie 35']
>>> lists.sort(key=lambda s: int(s.rsplit(None, 1)[-1]))
>>> lists
['ben 10', 'jack 20', 'ollie 35', 'alisdar 50']
Use a key function that does what you want:
lists.sort(key=lambda e: int(e.split()[1]))
If some of your items don't follow that format, you'll have to write something a little more elaborate.
It would be better if you had a more appropriate data type than a string to represent, say, a person's name and age. One way would be a dictionary:
lists = ['jack 20', 'ben 10', 'alisdar 50', 'ollie 35']
d = dict(item.split(' ') for item in lists)
This constructs a dictionary from a stream of two-element lists.
Then you can sort like this:
print sorted((v, k) for k, v in d.iteritems())
and get this:
>>> lists = ['jack 20', 'ben 10', 'alisdar 50', 'ollie 35']
>>> d = dict(item.split(' ') for item in lists)
>>> print sorted((v, k) for k, v in d.iteritems())
[('10', 'ben'), ('20', 'jack'), ('35', 'ollie'), ('50', 'alisdar')]
Or you could convert age to integer:
>>> lists = ['jack 20', 'ben 10', 'alisdar 50', 'ollie 35']
>>> person_iter = (item.split(' ') for item in lists)
>>> d = {k: int(v) for k, v in person_iter}
>>> print sorted((v, k) for k, v in d.iteritems())
[(10, 'ben'), (20, 'jack'), (35, 'ollie'), (50, 'alisdar')]
person_iter is a generator that produces pairs of name-age. You feed that to the dictionary comprehension and convert the second argument to an integer.
The basic idea, though, is that you will have an easier time if you use more precise data types for your purposes.

Categories