I've got a list in which some items shall be moved into a separate list (by a comparator function). Those elements are pure dicts. The question is how should I iterate over such list.
When iterating the simplest way, for element in mylist, then I don't know the index of the element. There's no .iteritems() methods for lists, which could be useful here. So I've tried to use for index in range(len(mylist)):, which [1] seems over-complicated as for python and [2] does not satisfy me, since range(len()) is calculated once in the beginning and if I remove an element from the list during iteration, I'll get IndexError: list index out of range.
Finally, my question is - how should I iterate over a python list, to be able to remove elements from the list (using a comparator function and put them in another list)?
You can use enumerate function and make a temporary copy of the list:
for i, value in enumerate(old_list[:]):
# i == index
# value == dictionary
# you can safely remove from old_list because we are iterating over copy
Creating a new list really isn't much of a problem compared to removing items from the old one. Similarly, iterating twice is a very minor performance hit, probably swamped by other factors. Unless you have a very good reason to do otherwise, backed by profiling your code, I'd recommend iterating twice and building two new lists:
from itertools import ifilter, ifilterfalse
l1 = list(ifilter(condition, l))
l2 = list(ifilterfalse(condition, l))
You can slice-assign the contents of one of the new lists into the original if you want:
l[:] = l1
If you're absolutely sure you want a 1-pass solution, and you're absolutely sure you want to modify the original list in place instead of creating a copy, the following avoids quadratic performance hits from popping from the middle of a list:
j = 0
l2 = []
for i in range(len(l)):
if condition(l[i]):
l[j] = l[i]
j += 1
else:
l2.append(l[i])
del l[j:]
We move each element of the list directly to its final position without wasting time shifting elements that don't really need to be shifted. We could use for item in l if we wanted, and it'd probably be a bit faster, but when the algorithm involves modifying the thing we're iterating over, I prefer the explicit index.
I prefer not to touch the original list and do as #Martol1ni, but one way to do it in place and not be affected by the removal of elements would be to iterate backwards:
for i in reversed(range(len()):
# do the filtering...
That will affect only the indices of elements that you have tested/removed already
Try the filter command, and you can override the original list with it too if you don't need it.
def cmp(i): #Comparator function returning a boolean for a given item
...
# mylist is the initial list
mylist = filter(cmp, mylist)
mylist is now a generator of suitable items. You can use list(mylist) if you need to use it more than once.
Haven't tried this yet but.. i'll give it a quick shot:
new_list = [old.pop(i) for i, x in reversed(list(enumerate(old))) if comparator(x)]
You can do this, might be one line too much though.
new_list1 = [x for x in old_list if your_comparator(x)]
new_list2 = [x for x in old_list if x not in new_list1]
Related
What it the best/fastest way to delete objects from list?
Deleting some objects:
[objects.remove(o) for o in objects if o.x <= 0]
or recreating new object:
new_objects = [o for o in objects if o.x > 0]
For starters, don't use list comprehensions for side effects. It needlessly creates a list of None's here, which is simply ignored and garbage collected, and is just bad style. List comprehensions are for functional mapping/filtering operations to create new lists.
However, even converted to an equivalent loop there is a classic bug:
>>> objects = [1,1,2,2,1,1]
>>> for obj in objects:
... if obj == 2:
... objects.remove(obj)
...
>>> objects
[1, 1, 2, 1, 1]
This is because the internal iterator essentially keeps and index which it simply increments. Since the list changes size by removing an item, every index is shifted down, and an item is skipped. So when there are two matching items to be removed in a row, one is skipped.
But more to the point, removing from a list in a loop is inefficient even if you do it correctly.
It is quadratic time, whereas creating the new list is linear. So really, I think those are the clear advantages of creating a new list.
As pointed out by #Selcuk in the comments, the advantage of modifying the list is that you don't use auxiliary space.
There is a potential issue with modifying a list this way while iterating over it. When you remove an item from the list, the indices of the remaining items shift down by one, which might be an issue if you are trying to access it by index.
The best method is using a del statement as it is faster than remove() method, because it avoids the need to search the list for the item to remove.
i = 0
while i < len(objects):
if objects[i].x <= 0:
del objects[i]
else:
i += 1
Just the the readability of code has decreased now .
For question of recreating vs updating , no doubt recreating is faster as while updating the index gets changed again and again , creating a new list avoids the need to shift the indices of the remaining items in the list.
But it can increase the space complexity by huge amount if the list is large.
For a faster way of your problem you can consider a generator expression instead of a list comprehension. A generator expression is similar to a list comprehension, but it produces a generator object that can be iterated over lazily, rather than creating a new list in memory.
new_objects = (o for o in objects if o.x > 0)
For more information about generator expression , you can check this out : Generator expressions vs. list comprehensions
You can safely modify your list in situ by parsing it in reverse.
For example:
objects = [1, 0, 2, -1, 1, -2]
for i in range(len(objects)-1, -1, -1):
if objects[i] <= 0:
objects.pop(i)
print(objects)
Output:
[1, 2, 1]
I have tried this, it gives the output it should give([1,3,2]), however the problem is it keeps printing the output for infinite times without stop,, is there any solutions with out changing the idea of the code.
a= [1,2,2,2,1,3,2]
def rem_dup(L):
while len(L):
for i in L:
y= L.count(i)
if y>1:
L.remove(i)
print L
rem_dup(a)
Unless the point of this function is to exercise your python skills, it sounds like you want a set. A set is like a list but does not allow duplicate values. If you want your final data structure to be a list, you could do something like this:
final_list = list(set(original_list))
One way to safely do this is to loop over the list in reverse and remove only from the back:
>>> for i in range(len(a) - 1, -1, -1):
... if a.count(a[i]) > 1:
... del a[i]
...
>>> a
[1, 2, 3]
But this will be polynomial time, since a.count is linear and so is del a[i].
while len(L) will always be true as long as L had something in it to begin with
Modifying L while using it with the for loop can cause items to be skipped, so you have a bug for some inputs.
If you fix that problem, you shouldn't need the while loop.
As long as the items in a are hashable and you don't mind that the remaining items aren't in the same order when you started, you can create an intermediate set and replace the original contents in-place.
a[:] = set(a)
This question already has answers here:
Strange result when removing item from a list while iterating over it
(8 answers)
Closed 7 years ago.
For quite a bit of time now I have been trying to figure out a way to loop through a list and remove the current item that I'm at. I can't seem to get this working as I would like it to. It loops just 1 time through, but I wanted it to loop 2 times. When I remove the removal line - it loops 2 times.
a = [0, 1]
for i in a:
z = a
print z.remove(i)
The output:
[1]
The output that I was expecting:
[1]
[0]
You're changing the list while iterating over it -- z = a doesn't make a copy, it just points z at the same place a points.
Try
for i in a[:]: # slicing a list makes a copy
print i # remove doesn't return the item so print it here
a.remove(i) # remove the item from the original list
or
while a: # while the list is not empty
print a.pop(0) # remove the first item from the list
If you don't need an explicit loop, you can remove items that match a condition with a list comprehension:
a = [i for i in a if i] # remove all items that evaluate to false
a = [i for i in a if condition(i)] # remove items where the condition is False
It is bad practice modify a list while you're looping through it†. Create a copy of the list:
oldlist = ['a', 'b', 'spam', 'c']
newlist = [x for x in oldlist if x != 'spam']
To modify the original list, write the copy back in-place with a slice assignment:
oldlist[:] = [x for x in oldlist if x != 'spam']
† For a gist of why this might be bad practice, consider the implementation details of what goes on with the iterator over the sequence when the sequence changes during iteration. If you've removed the current item, should the iterator point to the next item in the original list or to the next item in the modified list? What if your decision procedure instead removes the previous (or next) item to the current?
The problem is that you're modifying a with remove so the loop exits because the index is now past the end of it.
Don't try to remove multiple items of a list while looping the list. I think it's a general rule you should follow not only in python but also in other programming languages as well.
You could add the item to be removed into a separate list. And then remove all objects in that new list from the original list.
A multidimensional list like l=[[1,2],[3,4]] could be converted to a 1D one by doing sum(l,[]). How does this happen?
(This doesn't work directly for higher multidimensional lists, but it can be repeated to handle those cases. For example if A is a 3D-list, then sum(sum(A),[]),[]) will flatten A to a 1D list.)
If your list nested is, as you say, "2D" (meaning that you only want to go one level down, and all 1-level-down items of nested are lists), a simple list comprehension:
flat = [x for sublist in nested for x in sublist]
is the approach I'd recommend -- much more efficient than summing would be (sum is intended for numbers -- it was just too much of a bother to somehow make it block all attempts to "sum" non-numbers... I was the original proposer and first implementer of sum in the Python standard library, so I guess I should know;-).
If you want to go down "as deep as it takes" (for deeply nested lists), recursion is the simplest way, although by eliminating the recursion you can get higher performance (at the price of higher complication).
This recipe suggests a recursive solution, a recursion elimination, and other approaches
(all instructive, though none as simple as the one-liner I suggested earlier in this answer).
sum adds a sequence together using the + operator. e.g sum([1,2,3]) == 6. The 2nd parameter is an optional start value which defaults to 0. e.g. sum([1,2,3], 10) == 16.
In your example it does [] + [1,2] + [3,4] where + on 2 lists concatenates them together. Therefore the result is [1,2,3,4]
The empty list is required as the 2nd paramter to sum because, as mentioned above, the default is for sum to add to 0 (i.e. 0 + [1,2] + [3,4]) which would result in unsupported operand type(s) for +: 'int' and 'list'
This is the relevant section of the help for sum:
sum(sequence[, start]) -> value
Returns the sum of a sequence of
numbers (NOT strings) plus the value
of parameter 'start' (which defaults
to 0).
Note
As wallacoloo comented this is not a general solution for flattening any multi dimensional list. It just works for a list of 1D lists due to the behavior described above.
Update
For a way to flatten 1 level of nesting see this recipe from the itertools page:
def flatten(listOfLists):
"Flatten one level of nesting"
return chain.from_iterable(listOfLists)
To flatten more deeply nested lists (including irregularly nested lists) see the accepted answer to this question (there are also some other questions linked to from that question itself.)
Note that the recipe returns an itertools.chain object (which is iterable) and the other question's answer returns a generator object so you need to wrap either of these in a call to list if you want the full list rather than iterating over it. e.g. list(flatten(my_list_of_lists)).
For any kind of multidiamentional array, this code will do flattening to one dimension :
def flatten(l):
try:
return flatten(l[0]) + (flatten(l[1:]) if len(l) > 1 else []) if type(l) is list else [l]
except IndexError:
return []
It looks to me more like you're looking for a final answer of:
[3, 7]
For that you're best off with a list comprehension
>>> l=[[1,2],[3,4]]
>>> [x+y for x,y in l]
[3, 7]
I wrote a program to do multi-dimensional flattening using recursion. If anyone has comments on making the program better, you can always see me smiling:
def flatten(l):
lf=[]
li=[]
ll=[]
p=0
for i in l:
if type(i).__name__=='list':
li.append(i)
else:
lf.append(i)
ll=[x for i in li for x in i]
lf.extend(ll)
for i in lf:
if type(i).__name__ =='list':
#not completely flattened
flatten(lf)
else:
p=p+1
continue
if p==len(lf):
print(lf)
I've written this function:
def make_array_single_dimension(l):
l2 = []
for x in l:
if type(x).__name__ == "list":
l2 += make_array_single_dimension(x)
else:
l2.append(x)
return l2
It works as well!
The + operator concatenates lists and the starting value is [] an empty list.
I have a list L.
I can delete element i by doing:
del L[i]
But what if I have a set of non contiguous indexes to delete?
I=set([i1, i2, i3,...])
Doing:
for i in I:
del L[i]
Won't work.
Any ideas?
Eine Minuten bitte, Ich hap eine
kleine Problemo avec diese Religione.
-- Eddie Izzard (doing his impression
of Martin Luther)
Deleting by reverse-iterating over a list to preserve the iterator is a common solution to this problem. But another solution is to change this into a different problem. Instead of deleting items from the list using some criteria (in your case, the index exists in a list of indexes to be deleted), create a new list that leaves out the offending items.
L[:] = [ item for i,item in enumerate(L) if i not in I ]
For that matter, where did you come up with the indexes in I in the first place? You could combine the logic of getting the indexes to be removed and building the new list. Assuming this is a list of objects and you only want to keep those that pass an isValid test:
L[:] = [ item for item in L if item.isValid() ]
This is much more straightforward than:
I = set()
for i in range(len(L)):
if not L[i].isValid():
I.add(i)
for i in sorted(I, reverse=True):
del L[i]
For the most part, I turn any question about "how to delete from a list the items that I don't want" into "how to create a new list containing just the items I want".
EDITED: changed "L = ..." to "L[:] = ..." per Alex Martelli's answer to this question.
for i in I:
del L[i]
won't work, because (depending on the order) you may invalidate the iterator -- this will usually show up as some items which you intended to delete remaining in the list.
It's always safe to delete items from the list in the reverse order of their indices. The easiest way to do this is with sorted():
for i in sorted(I, reverse=True):
del L[i]
You can use numpy.delete as follows:
import numpy as np
a = ['a', 'l', 3.14, 42, 'u']
I = [1, 3, 4]
np.delete(a, I).tolist()
# Returns: ['a', '3.14']
If you don't mind ending up with a numpy array at the end, you can leave out the .tolist(). You should see some pretty major speed improvements, too, making this a more scalable solution. I haven't benchmarked it, but numpy operations are compiled code written in either C or Fortran.
If your original list data can safely be turned into a set (i.e. all unique values and doesn't need to maintain order), you could also use set operations:
Lset = set(L)
newset = Lset.difference(I)
You could also maybe do something with a Bag/Multiset, though it probably isn't worth the effort. Paul McGuire's second listcomp solution is certainly best for most cases.
L = [ item for item in L if L.index(item) not in I ]