This question already has answers here:
Python list comprehension - want to avoid repeated evaluation
(12 answers)
Closed 5 years ago.
I want to transform a string such as following:
' 1 , 2 , , , 3 '
into a list of non-empty elements:
['1', '2', '3']
My solution is this list comprehension:
print [el.strip() for el in mystring.split(",") if el.strip()]
Just wonder, is there a nice, pythonic way to write this comprehension without calling el.strip() twice?
You can use a generator inside the list comprehension:
[x for x in (el.strip() for el in mylist.split(",")) if x]
# \__________________ ___________________/
# v
# internal generator
The generator thus will provide stripped elements, and we iterate over the generator, and only check the truthiness. We thus save on el.strip() calls.
You can also use map(..) for this (making it more functional):
[x for x in map(str.strip, mylist.split(",")) if x]
# \______________ ________________/
# v
# map
But this is basically the same (although the logic of the generator is - in my opinion - better encapsulated).
As a simple alternative to get a list of non-empty elements (in addition to previous good answers):
import re
s = ' 1 , 2 , , , 3 '
print(re.findall(r'[^\s,]+', s))
The output:
['1', '2', '3']
How about some regex to extract all the numbers from the string
import re
a = ' 1 , 2 , , , 3 '
print(re.findall(r'\d+', a))
Output:
['1', '2', '3']
In just one line of code that's about as terse you're going to get. Ofcourse, if you want to get fanciful you can try the functional approach:
filter(lambda x: x, map(lambda x: x.strip(), mylist.split(',')))
But this gets you terseness in exchange for visibility
Go full functional with map and filter by using:
s = ' 1 , 2 , , , 3 '
res = filter(None, map(str.strip, s.split(',')))
though similar to #omu_negru's answer, this avoids using lambdas which are arguably pretty ugly but, also, slow things down.
The argument None to filter translates to: filter on truthness, essentially x for x in iterable if x, while the map just maps the method str.strip (which has a default split value of whitespace) to the iterable obtained from s.split(',').
On Python 2, where filter still returns a list, this approach should easily edge out the other approaches in speed.
In Python 3 one would have to use:
res = [*filter(None, map(str.strip, s.split(',')))]
in order to get the list back.
If you have imported "re", then re.split() will work:
import re
s=' 1 , 2 , , , 3 '
print ([el for el in re.split(r"[, ]+",s) if el])
['1', '2', '3']
If strings separated by only spaces (with no intervening comma) should not be separated, then this will work:
import re
s=' ,,,,, ,,,, 1 , 2 , , , 3,,,,,4 5, 6 '
print ([el for el in re.split(r"\s*,\s*",s.strip()) if el])
['1', '2', '3', '4 5', '6']
List comprehensions are wonderful, but it's not illegal to use more than one line of code! You could even - heaven forbid - use a for loop!
result = []
for el in mystring.split(",")
x = el.strip()
if x:
result.append(x)
Here's a two-line version. It's actually the same as the accepted answer by Willem Van Onsem, but with a name given to a subexpression (and a generator changed to a list but it makes essentially no difference for a problem this small). In my view, this makes it a lot easier to read, despite taking fractionally more code.
all_terms = [el.strip() for el in mystring.split(",")]
non_empty_terms = [x for x in all_terms if x]
Some of the other answers are certainly shorter, but I'm not convinced any of them are simpler/easier to understand. Actually, I think the best answer is just the one in your question, because the repetition in this case is quite minor.
Related
TL DR: How can I best use map to filter a list based on logical indexing?
Given a list:
values = ['1', '2', '3', '5', 'N/A', '5']
I would like to map the following function and use the result to filter my list. I could do this with filter and other methods but mostly looking to learn if this can be done solely using map.
The function:
def is_int(val):
try:
x = int(val)
return True
except ValueError:
return False
Attempted solution:
[x for x in list(map(is_int, values)) if x is False]
The above gives me the values I need. However, it does not return the index or allow logical indexing. I have tried to do other ridiculous things like:
[values[x] for x in list(map(is_int, values)) if x is False]
and many others that obviously don't work.
What I thought I could do:
values[[x for x in list(map(is_int, values)) if x is False]]
Expected outcome:
['N/A']
[v for v in values if not is_int(v)]
If you have a parallel list of booleans:
[v for v, b in zip(values, [is_int(x) for x in values]) if not b]
you can get the expected outcome using the simple snippet written below which does not involve any map function
[x for x in values if is_int(x) is False]
And, if you want to strictly use map function then the snippet below will help you
[values[i] for i,y in enumerate(list(map(is_int,values))) if y is False]
map is just not the right tool for the job, as that would transform the values, whereas you just want to check them. If anything, you are looking for filter, but you have to "inverse" the filter-function first:
>>> values = ['1', '2', "foo", '3', '5', 'N/A', '5']
>>> not_an_int = lambda x: not is_int(x)
>>> list(filter(not_an_int, values))
['foo', 'N/A']
In practice, however, I would rather use a list comprehension with a condition.
You can do this using a bit of help from itertools and by negating the output of your original function since we want it to return True where it is not an int.
from itertools import compress
from operator import not_
list(compress(values, map(not_, map(is_int, values))))
['N/A']
You cannot use map() alone to perform a reduction. By its very definition, map() preserves the number of items (see e.g. here).
On the other hand, reduce operations are meant to be doing what you want. In Python these may be implemented normally with a generator expression or for the more functional-style inclined programmers, with filter(). Other non-primitive approach may exist, but they essentially boil down to one of the two, e.g.:
values = ['1', '2', '3', '5', 'N/A', '5']
list(filter(lambda x: not is_int(x), values))
# ['N/A']
Yet, if what you want is to combine the result of map() to use it for within slicing, this cannot be done with Python alone.
However, NumPy supports precisely what you want except that the result will not be a list:
import numpy as np
np.array(values)[list(map(lambda x: not is_int(x), values))]
# array(['N/A'], dtype='<U3')
(Or you could have your own container defined in such a way as to implement this behavior).
That being said, it is quite common to use the following generator expression in Python in place of map() / filter().
filter(func, items)
is roughly equivalent to:
item for item in items if func(item)
while
map(func, items)
is roughly equivalent to:
func(item) for item in items
and their combination:
filter(f_func, map(m_func, items))
is roughly equivalent to:
m_func(item) for item in items if f_func(item)
Not exactly what I had in mind but something I learnt from this problem, we could do the following(which might be computationally less efficient). This is almost similar to #aws_apprentice 's answer. Clearly one is better off using filter and/or list comprehension:
from itertools import compress
list(compress(values, list(map(lambda x: not is_int(x), values))))
Or as suggested by #aws_apprentice simply:
from itertools import compress
list(compress(values, map(lambda x: not is_int(x), values)))
I'm trying to extract numbers that are mixed in sentences. I am doing this by splitting the sentence into elements of a list, and then I will iterate through each character of each element to find the numbers. For example:
String = "is2 Thi1s T4est 3a"
LP = String.split()
for e in LP:
for i in e:
if i in ('123456789'):
result += i
This can give me the result I want, which is ['2', '1', '4', '3']. Now I want to write this in list comprehension. After reading the List comprehension on a nested list?
post I understood that the right code shall be:
[i for e in LP for i in e if i in ('123456789') ]
My original code for the list comprehension approach was wrong, but I'm trying to wrap my heads around the result I get from it.
My original incorrect code, which reversed the order:
[i for i in e for e in LP if i in ('123456789') ]
The result I get from that is:
['3', '3', '3', '3']
Could anyone explain the process that leads to this result please?
Just reverse the same process you found in the other post. Nest the loops in the same order:
for i in e:
for e in LP:
if i in ('123456789'):
print(i)
The code requires both e and LP to be set beforehand, so the outcome you see depends entirely on other code run before your list comprehension.
If we presume that e was set to '3a' (the last element in LP from your code that ran full loopss), then for i in e will run twice, first with i set to '3'. We then get a nested loop, for e in LP, and given your output, LP is 4 elements long. So that iterates 4 times, and each iteration, i == '3' so the if test passes and '3' is added to the output. The next iteration of for i in e: sets i = 'a', the inner loop runs 4 times again, but not the if test fails.
However, we can't know for certain, because we don't know what code was run last in your environment that set e and LP to begin with.
I'm not sure why your original code uses str.split(), then iterates over all the characters of each word. Whitespace would never pass your if filter anyway, so you could just loop directly over the full String value. The if test can be replaced with a str.isdigit() test:
digits = [char for char in String if char.isdigit()]
or a even a regular expression:
digits = re.findall(r'\d', String)
and finally, if this is a reordering puzzle, you'd want to split out your strings into a number (for ordering) and the remainder (for joining); sort the words on the extracted number, and extract the remainder after sorting:
# to sort on numbers, extract the digits and turn to an integer
sortkey = lambda w: int(re.search(r'\d+', w).group())
# 'is2' -> 2, 'Th1s1' -> 1, etc.
# sort the words by sort key
reordered = sorted(String.split(), key=sortkey)
# -> ['Thi1s', 'is2', '3a', 'T4est']
# replace digits in the words and join again
rejoined = ' '.join(re.sub(r'\d+', '', w) for w in reordered)
# -> 'This is a Test'
From the question you asked in a comment ("how would you proceed to reorder the words using the list that we got as index?"):
We can use custom sorting to accomplish this. (Note that regex is not required, but makes it slightly simpler. Use any method to extract the number out of the string.)
import re
test_string = 'is2 Thi1s T4est 3a'
words = test_string.split()
words.sort(key=lambda s: int(re.search(r'\d+', s).group()))
print(words) # ['Thi1s', 'is2', '3a', 'T4est']
To remove the numbers:
words = [re.sub(r'\d', '', w) for w in words]
Final output is:
['This', 'is', 'a', 'Test']
I have strings of this shape:
d="M 997.14282,452.3622 877.54125,539.83678 757.38907,453.12006 802.7325,312.0516 950.90847,311.58322 Z"
which are (x, y) coordinates of a pentagon (the first and last letters are metadata and to be ignored). What I want is a list of 2-tuples that would represent the coordinates in floating points without all the cruft:
d = [(997.14282, 452.3622), (877.54125, 539.83678), (757.38907, 453.12006), (802.7325,312.0516), (950.90847, 311.58322)]
Trimming the string was easy:
>>> d.split()[1:-2]
['997.14282,452.3622', '877.54125,539.83678', '757.38907,453.12006', '802.7325,312.0516']
but now I want to create the tuples in a succinct way. This obviously didn't work:
>>> tuple('997.14282,452.3622')
('9', '9', '7', '.', '1', '4', '2', '8', '2', ',', '4', '5', '2', '.', '3', '6', '2', '2')
Taking the original string, I could write something like this:
def coordinates(d):
list_of_coordinates = []
d = d.split()[1:-2]
for elem in d:
l = elem.split(',')
list_of_coordinates.append((float(l[0]), float(l[1])))
return list_of_coordinates
which works fine:
>>> coordinates("M 997.14282,452.3622 877.54125,539.83678 757.38907,453.12006 802.7325,312.0516 950.90847,311.58322 Z")
[(997.14282, 452.3622), (877.54125, 539.83678), (757.38907, 453.12006), (802.7325, 312.0516)]
However this processing is a small and trivial part of a bigger program and I'd rather keep it as short and succinct as possible. Can anyone please show me a less verbose way to convert the string to the list of 2-tuples?
A note, not sure if this is intended - when you do d.split()[1:-2] , you are losing the last coordinate. Assuming that is not intentional , A one liner for this would be -
def coordinates1(d):
return [tuple(map(float,coords.split(','))) for coords in d.split()[1:-1]]
If losing the last coordinate is intentional, use [1:-2] in the above code.
You can do this in one line using list comprehension.
x = [tuple(float(j) for j in i.split(",")) for i in d.split()[1:-2]]
This goes through d.split()[1:-2]], each pair that should be grouped together, splits them by a comma, converts each item in that to a float, and groups them together in a tuple.
Also, you might want to use d.split()[1:-1] because using -2 cuts out the last pair of coordinates.
While you do all right, it's could be some compressed using list comprehension or some functional stuff (i mean "map"):
def coordinates(d):
d = d[2:-2].split() # yeah, split here into pairs
d = map(str.split, d, ","*len(d)) # another split, into tokens
# here we'd multiplied string to get right size iterable
return list(map(tuple, d)) # and last map with creating list
# from "map object"
Of couse it can be reduced into one-line with list comprehension, but readablity would be reduced too (while right now code is read hard). And although Guido hates functional programming i'm find this more logical... After some practice. Good luck!
Before I asked, I did some googling, and was unable to find an answer.
The scenario I have is this:
A list of numbers are passed to the script, either \n-delimited via a file, or comma-delimited via a command line arg. The numbers can be singular, or in blocks, like so:
File:
1
2
3
7-10
15
20-25
Command Line Arg:
1, 2, 3, 7-10, 15, 20-25
Both end up in the same list[]. I would like to expand the 7-10 or 20-25 blocks (obviously in the actual script these numbers will vary) and append them onto a new list with the final list looking like this:
['1','2','3','7','8','9','10','15','20','21','22','23','24','25']
I understand that something like .append(range(7,10)) could help me here, but I can't seem to be able to find out which elements of the original list[] have the need for expansion.
So, my question is this:
Given a list[]:
['1','2','3','7-10','15','20-25'],
how can I get a list[]:
['1','2','3','7','8','9','10','15','20','21','22','23','24','25']
So let's say you're given the list:
L = ['1','2','3','7-10','15','20-25']
and you want to expand out all the ranges contained therein:
answer = []
for elem in L:
if '-' not in elem:
answer.append(elem)
continue
start, end = elem.split('-')
answer.extend(map(str, range(int(start), int(end)+1)))
Of course, there's a handy one-liner for this:
answer = list(itertools.chain.from_iterable([[e] if '-' not in e else map(str, range(*[int(i) for i in e.split('-')]) + [int(i)]) for e in L]))
But this exploits the nature of leaky variables in python2.7, which I don't think will work in python3. Also, it's not exactly the most readable line of code. So I wouldn't really use it in production, if I were you... unless you really hate your manager.
References: append() continue split() extend() map() range() list() itertools.chain.from_iterable() int()
Input:
arg = ['1','2','3','7-10','15','20-25']
Output:
out = []
for s in arg:
a, b, *_ = map(int, s.split('-') * 2)
out.extend(map(str, range(a, b+1)))
Or (in Python 2):
out = []
for s in arg:
r = map(int, s.split('-'))
out.extend(map(str, range(r[0], r[-1]+1)))
Good old map + reduce will come handy:
>>> elements = ['1','2','3','7-10','15','20-25']
>>> reduce(lambda original_list, element_list: original_list + map(str, element_list), [[element] if '-' not in element else range(*map(int, element.split('-'))) for element in elements])
['1', '2', '3', '7', '8', '9', '15', '20', '21', '22', '23', '24']
Well that would do the trick except that you want 20-25 to also contain 25... so here comes even more soup:
reduce(
lambda original_list, element_list: original_list + map(str, element_list),
[[element] if '-' not in element
else range(int(element.split('-')[0]), int(element.split('-')[1]) + 1)
for element in elements])
Now even though this works you are probably better off with some for-loop. Well that is a reason why they removed reduce in python 3.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Python and Line Breaks
In Python I know that when your handling strings you can use "replace" to replace a certain character in a string. For example:
h = "2,3,45,6"
h = h.replace(',','\n')
print h
Returns:
2
3
45
6
Is there anyway to do this with a list? For example replace all the "," in a list with "\n"?
A list like:
h = ["hello","goodbye","how are you"]
"hello"
"goodbye"
"how are you"
And the Script should output something like this:
Any suggestions would be helpful!
Looking at your example and desire, you can use the str.join and this is probably what you want
>>> h
['2', '3', '45', '6']
>>> print '\n'.join(str(i) for i in h)
2
3
45
6
similarly for your second example
>>> h = ["hello","goodbye","how are you"]
>>> print '\n'.join(str(i) for i in h)
hello
goodbye
how are you
If you really wan't the quotation mark for strings you can use the following
>>> h = ["hello","goodbye","how are you"]
>>> print '\n'.join('"{0}"'.format(i) if isinstance(i,str) else str(i) for i in h)
"hello"
"goodbye"
"how are you"
>>>
You could use list comprehension for that:
>>> search = 'foo'
>>> replace = 'bar'
>>> lst = ['my foo', 'foo', 'bip']
>>> print [x.replace(search, replace) for x in lst]
['my bar', 'bar', 'bip']
In a list like your h = [2,5,6,8,9], there really are no commas to replace in the list itself. The list contains the items 2, 5 and so on, the commas are merely part of the external representation to make it easier to separate the items visually.
So, to generate some output form from the list but without the commas, you can use any number of techniques. For instance, to join them all up into a single string without commas, use:
"".join([str(x) for x in h])
This will evaluate to 25689.
for each in h: print each
In 3.x:
for each in h: print(each)
A list is simply a representation of data. You can only affect the way it looks in the output.
You can replace the ',' in the string because the ',' is part of the string itself but you cannot replace the ',' in a list because it is not an item in the list rather it is what is used by python for delineating different items in such a list together with the opening and closing square brackets. It is just like asking if you could replace the '"' used in creating the string. On the other hand if the ',' is an item in the list and you want to replace it with a newline item then you could use list comprehensions like:
['\n' if x=="," else x for x in yourlist]
or if you want to print each item on a single line you could use:
for item in list:
print item