This question already has answers here:
Python for-in loop preceded by a variable [duplicate]
(5 answers)
Closed 8 years ago.
I am reading an article about python removing duplicate element in a list.
there is a function defined as:
def f8(seq): # Dave Kirby
# Order preserving
seen = set()
return [x for x in seq if x not in seen and not seen.add(x)]
However, i don't really understand the syntax for
[x for x in seq if x not in seen and not seen.add(x)]
what is this syntax ? how do I read it?
thank you.
Firstly list comprehensions are usually easy to read, here is a simple example:
[x for x in seq if x != 2]
translates to:
result = []
for x in seq:
if x != 2:
result.append(x)
The reason why you can't read this code is because it is not readable and hacky code as I stated in this question:
def f8(seq):
seen = set()
return [x for x in seq if x not in seen and not seen.add(x)]
translates to:
def f8(seq):
seen = set()
result = []
for x in seq:
if x not in seen and not seen.add(x): # not seen.add(...) always True
result.append(x)
and relies on the fact that set.add is an in-place method that always returns None so not None evaluates to True.
>>> s = set()
>>> y = s.add(1) # methods usually return None
>>> print s, y
set([1]) None
The reason why the code has been written this way is to sneakily take advantage of Python's list comprehension speed optimizations.
Python methods will usually return None if they modify the data structure (pop is one of the exceptions)
I also noted that the current accepted way of doing this (2.7+) which is more readable and doesn't utilize a hack is as follows:
>>> from collections import OrderedDict
>>> items = [1, 2, 0, 1, 3, 2]
>>> list(OrderedDict.fromkeys(items))
[1, 2, 0, 3]
Dictionary keys must be unique, therefore the duplicates are filtered out.
It is called a list comprehension, they provide a syntactically more compact and more efficient way of writing a normal for-loop based solution.
def f8(seq): # Dave Kirby
# Order preserving
seen = set()
return [x for x in seq if x not in seen and not seen.add(x)]
The above list comprehension is roughly equivalent to:
def f8(seq):
seen = set()
lis =[]
for x in seq:
if x not in seen:
lis.append(x)
seen.add(x)
return lis
The construct is called a list comprehension
[x for x in seq if some_condition]. In this case the condition is that x isn't already in the resulting list. You can't inspect the result of a list comprehension while you are running it, so it keeps track of the items that are in there using a set called seen
This condition here is a bit tricky because it relies on a side-effect
not in seen and not seen.add(x)
seen.add() always returns None. If the item is in seen,
not in seen is False, so the and shortcircuits.
If the item is not in seen,
not in seen is True and not seen.add(x) is also True, so the item is included, and as a side-effect, it is added to the seen set
While, this type of thing can be fun, it's not a particularly clear way to express the intent.
I think the less tricky way is much more readable
def f8(seq):
seen = set()
result = []
for x in seq:
if x not in seen:
result.append(x)
seen.add(x)
return result
Related
if i have a list in python were i have a lot of numers example:
list = [1,2,3,12,4,5,23,5,6,56,8,57,8678,345,234,13, .....]
so a list of all integers, that all can have different values ... (i my case this list could be very long, and is actually not an integer list, but a list of Objects which are all from the same type an have a attribute age which is a integer)
whats the most efficient way to add all those values by +1
is it
new_list = [x+1 for x in list]
or maybe
new_list = []
for x in list:
new_list.append(x+1)
(i guess it makes no difference)
or is there maby a more efficent way or an other data sructure to make this more performant?
I really wonder if there is not any method, maybe using something different the a list were i can do this operation more efficient because the +1 seems son simple ...
If you actually have a list of objects with an integer attribute, as you say, you can update the attribute in-place:
class A(object):
def __init__(self, x):
self.x = x
my_list = [A(2), A(5), A(1)]
for obj in my_list:
obj.x += 1
assert my_list[0].x == 3
assert my_list[1].x == 6
assert my_list[2].x == 2
There is no fastest way but I think this is an interesting attempt since u never use more RAM than actually needed (u delete the entries once you copied them)
list1 = []
while (len(list) != 0):
list1.append(list.pop(0)+1)
After receiving a comment I did some testing and found the results very interesting!
Try to use map. It must be fast due to some built-in optimizations.
t = [1, 2, 3, 4, 5, 6, 7, 8]
print (map(lambda x:x + 1, t))
I am implementing few list methods manually like append(), insert(), etc. I was trying to add element at the end of list (like append method). This the working code i am using:
arr = [4,5,6]
def push(x, item):
x += [item]
return x
push(arr,7)
print(arr) #Output: [4,5,6,7]
But when I am implementing same code with little difference. I am getting different output.
arr = [4,5,6]
def push(x, item):
x = x + [item]
return x
push(arr,7)
print(arr) #Output: [4,5,6]
And I am facing same for insert method. Here is code for insert method:
arr = [4,5,7,8]
def insert(x, index, item):
x = x[:index] + [item] + x[index:]
return x
insert(arr,2,6)
print(arr) #Output: [4,5,7,8]
I know I can store return value to the list by arr=insert(arr,2,6) but I want an alternative solution, that list automatically gets update after calling function like in my first code sample.
Edit 1:
I think x[index:index] = [item] is better solution for the problem.
x += [item] and x = x + [item] are not a little difference. In the first case, you are asking to make a change to the list referenced by x; this is why the result reflects the change. In the second, you are asking to have x reference a new list, the one made by combining x's original value and [item]. Note that this does not change x, which is why your result is unchanged.
Also note that your return statements are irrelevant, since the values being returned are ignored.
In your first example you mutated(a.k.a changed) the list object referred to by x. When Python sees x += [item] it translates it to:
x.__iadd__([item])
As you can see, we are mutating the list object referred to by x by calling it's overloaded in-place operator function __iadd__. As already said, __iadd__() mutates the existing list object:
>>> lst = [1, 2]
>>> lst.__iadd__([3])
[1, 2, 3]
>>> lst
[1, 2, 3]
>>>
In your second example, you asked Python to assign x to a new reference. The referenced now referrer to a new list object made by combining (not mutating) the x and [item] lists. Thus, x was never changed.
When Python sees x = x + [item] it can be translated to:
x = x.__add__([item])
The __add__ function of lists does not mutate the existing list object. Rather, it returns a new-list made by combing the value of the existing list and the argument passed into __add__():
>>> lst = [1, 2]
>>> lst.__add__([3]) # lst is not changed. A new list is returned.
[1, 2, 3]
>>>
You need to return the the result of the version of push to the arr list. The same goes for insert.
You can assign to a slice of the list to implement your insert w/o using list.insert:
def insert(x, index, item):
x[:] = x[:index] + [item] + x[index:]
this replaces the contents of the object referenced by x with the new list. No need to then return it since it is performed in-place.
The problem is that you haven't captured the result you return. Some operations (such as +=) modify the original list in place; others (such as x = x + item) evaluate a new list and reassign the local variable x.
In particular, note that x is not bound to arr; x is merely a local variable. To get the returned value into arr, you have to assign it:
arr = push(arr, 7)
or
arr = insert(arr, 2, 6)
class DerivedList(list):
def insertAtLastLocation(self,obj):
self.__iadd__([obj])
parameter=[1,1,1]
lst=DerivedList(parameter)
print(lst) #output[1,1,1]
lst.insertAtLastLocation(5)
print(lst) #output[1,1,1,5]
lst.insertAtLastLocation(6)
print(lst) #output[1,1,1,5,6]
You can use this code to add one element at last position of list
class DerivedList(list):
def insertAtLastLocation(self,*obj):
self.__iadd__([*obj])
parameter=[1,1,1]
lst=DerivedList(parameter)
print(lst) #output[1,1,1]
lst.insertAtLastLocation(5)
print(lst) #output[1,1,1,5]
lst.insertAtLastLocation(6,7)
print(lst) #output[1,1,1,5,6,7]
lst.insertAtLastLocation(6,7,8,9,10)
print(lst) #output[1,1,1,5,6,7,8,9,10]
This code can add multiple items at last location
i am refreshing my python (2.7) and i am discovering iterators and generators.
As i understood, they are an efficient way of navigating over values without consuming too much memory.
So the following code do some kind of logical indexing on a list:
removing the values of a list L that triggers a False conditional statement represented here by the function f.
I am not satisfied with my code because I feel this code is not optimal for three reasons:
I read somewhere that it is better to use a for loop than a while loop.
However, in the usual for i in range(10), i can't modify the value of 'i' because it seems that the iteration doesn't care.
Logical indexing is pretty strong in matrix-oriented languages, and there should be a way to do the same in python (by hand granted, but maybe better than my code).
Third reason is just that i want to use generator/iterator on this example to help me understand.
Third reason is just that i want to use generator/iterator on this example to help me understand.
TL;DR : Is this code a good pythonic way to do logical indexing ?
#f string -> bool
def f(s):
return 'c' in s
L=['','a','ab','abc','abcd','abcde','abde'] #example
length=len(L)
i=0
while i < length:
if not f(L[i]): #f is a conditional statement (input string output bool)
del L[i]
length-=1 #cut and push leftwise
else:
i+=1
print 'Updated list is :', L
print length
This code has a few problems, but the main one is that you must never modify a list you're iterating over. Rather, you create a new list from the elements that match your condition. This can be done simply in a for loop:
newlist = []
for item in L:
if f(item):
newlist.append(item)
which can be shortened to a simple list comprehension:
newlist = [item for item in L if f(item)]
It looks like filter() is what you're after:
newlist = filter(lambda x: not f(x), L)
filter() filters (...) an iterable and only keeps the items for which a predicate returns True. In your case f(..) is not quite the predicate but not f(...).
Simpler:
def f(s):
return 'c' not in s
newlist = filter(f, L)
See: https://docs.python.org/2/library/functions.html#filter
Never modify a list with del, pop or other methods that mutate the length of the list while iterating over it. Read this for more information.
The "pythonic" way to filter a list is to use reassignment and either a list comprehension or the built-in filter function:
List comprehension:
>>> [item for item in L if f(item)]
['abc', 'abcd', 'abcde']
i want to use generator/iterator on this example to help me understand
The for item in L part is implicitly making use of the iterator protocol. Python lists are iterable, and iter(somelist) returns an iterator .
>>> from collections import Iterable, Iterator
>>> isinstance([], Iterable)
True
>>> isinstance([], Iterator)
False
>>> isinstance(iter([]), Iterator)
True
__iter__ is not only being called when using a traditional for-loop, but also when you use a list comprehension:
>>> class mylist(list):
... def __iter__(self):
... print('iter has been called')
... return super(mylist, self).__iter__()
...
>>> m = mylist([1,2,3])
>>> [x for x in m]
iter has been called
[1, 2, 3]
Filtering:
>>> filter(f, L)
['abc', 'abcd', 'abcde']
In Python3, use list(filter(f, L)) to get a list.
Of course, to filter a list, Python needs to iterate over it, too:
>>> filter(None, mylist())
iter has been called
[]
"The python way" to do it would be to use a generator expression:
# list comprehension
L = [l for l in L if f(l)]
# alternative generator comprehension
L = (l for l in L if f(l))
It depends on your context if a list or a generator is "better" (see e.g. this so question). Because your source data is coming from a list, there is no real benefit of using a generator here.
For simply deleting elements, especially if the original list is no longer needed, just iterate backwards:
Python 2.x:
for i in xrange(len(L) - 1, -1, -1):
if not f(L[i]):
del L[i]
Python 3.x:
for i in range(len(L) - 1, -1, -1):
if not f(L[i]):
del L[i]
By iterating from the end, the "next" index does not change after deletion and a for loop is possible. Note that you should use the xrange generator in Python 2, or the range generator in Python 3, to save memory*.
In cases where you must iterate forward, use your given solution above.
*Note that Python 2's xrange will break if there are >= 2 ** 32 - 1 elements. Python 3's range, as well as the less efficient Python 2's range do not have this limitation.
Groovy has a very handy operator ?.. This checks if the object is not null and, if it is not, accesses a method or a property. Can I do the same thing in Python?
The closest I have found is the ternary conditional operator. Right now I am doing
l = u.find('loc')
l = l.string if l else None
whereas it would be nice to write
l = u.find('loc')?.string
Update: in addition to getattr mentioned below, I found a relatively nice way to do it with a list:
[x.string if x else None for x in [u.find('loc'), u.find('priority'), ...]]
Another alternative, if you want to exclude None:
[x.string for x in [u.find('loc'), u.find('priority'), ...] if x]
You could write something like this
L = L and L.string
Important to note that as in your ternary example, this will do the "else" part for any "Falsy" value of L
If you need to check specifically for None, it's clearer to write
if L is not None:
L = L.string
or for the any "Falsy" version
if L:
L = L.string
I think using getattr is kind of awkward for this too
L = getattr(L, 'string', None)
I created a line that appends an object to a list in the following manner
>>> foo = list()
>>> def sum(a, b):
... c = a+b; return c
...
>>> bar_list = [9,8,7,6,5,4,3,2,1,0]
>>> [foo.append(sum(i,x)) for i, x in enumerate(bar_list)]
[None, None, None, None, None, None, None, None, None, None]
>>> foo
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
>>>
The line
[foo.append(sum(i,x)) for i, x in enumerate(bar_list)]
would give a pylint W1060 Expression is assigned to nothing, but since I am already using the foo list to append the values I don't need to assing the List Comprehension line to something.
My questions is more of a matter of programming correctness
Should I drop list comprehension and just use a simple for expression?
>>> for i, x in enumerate(bar_list):
... foo.append(sum(i,x))
or is there a correct way to use both list comprehension an assign to nothing?
Answer
Thank you #user2387370, #kindall and #Martijn Pieters. For the rest of the comments I use append because I'm not using a list(), I'm not using i+x because this is just a simplified example.
I left it as the following:
histogramsCtr = hist_impl.HistogramsContainer()
for index, tupl in enumerate(local_ranges_per_histogram_list):
histogramsCtr.append(doSubHistogramData(index, tupl))
return histogramsCtr
Yes, this is bad style. A list comprehension is to build a list. You're building a list full of Nones and then throwing it away. Your actual desired result is a side effect of this effort.
Why not define foo using the list comprehension in the first place?
foo = [sum(i,x) for i, x in enumerate(bar_list)]
If it is not to be a list but some other container class, as you mentioned in a comment on another answer, write that class to accept an iterable in its constructor (or, if it's not your code, subclass it to do so), then pass it a generator expression:
foo = MyContainer(sum(i, x) for i, x in enumerate(bar_list))
If foo already has some value and you wish to append new items:
foo.extend(sum(i,x) for i, x in enumerate(bar_list))
If you really want to use append() and don't want to use a for loop for some reason then you can use this construction; the generator expression will at least avoid wasting memory and CPU cycles on a list you don't want:
any(foo.append(sum(i, x)) for i, x in enumerate(bar_list))
But this is a good deal less clear than a regular for loop, and there's still some extra work being done: any is testing the return value of foo.append() on each iteration. You can write a function to consume the iterator and eliminate that check; the fastest way uses a zero-length collections.deque:
from collections import deque
do = deque([], maxlen=0).extend
do(foo.append(sum(i, x)) for i, x in enumerate(bar_list))
This is actually fairly readable, but I believe it's not actually any faster than any() and requires an extra import. However, either do() or any() is a little faster than a for loop, if that is a concern.
I think it's generally frowned upon to use list comprehensions just for side-effects, so I would say a for loop is better in this case.
But in any case, couldn't you just do foo = [sum(i,x) for i, x in enumerate(bar_list)]?
You should definitely drop the list comprehension. End of.
You are confusing anyone reading your code. You are building a list for the side-effects.
You are paying CPU cycles and memory for building a list you are discarding again.
In your simplified case, you are overlooking the fact you could have used a list comprehension directly:
[sum(i,x) for i, x in enumerate(bar_list)]