deleting elenents from list or recreating new list (python) - python

What it the best/fastest way to delete objects from list?
Deleting some objects:
[objects.remove(o) for o in objects if o.x <= 0]
or recreating new object:
new_objects = [o for o in objects if o.x > 0]

For starters, don't use list comprehensions for side effects. It needlessly creates a list of None's here, which is simply ignored and garbage collected, and is just bad style. List comprehensions are for functional mapping/filtering operations to create new lists.
However, even converted to an equivalent loop there is a classic bug:
>>> objects = [1,1,2,2,1,1]
>>> for obj in objects:
... if obj == 2:
... objects.remove(obj)
...
>>> objects
[1, 1, 2, 1, 1]
This is because the internal iterator essentially keeps and index which it simply increments. Since the list changes size by removing an item, every index is shifted down, and an item is skipped. So when there are two matching items to be removed in a row, one is skipped.
But more to the point, removing from a list in a loop is inefficient even if you do it correctly.
It is quadratic time, whereas creating the new list is linear. So really, I think those are the clear advantages of creating a new list.
As pointed out by #Selcuk in the comments, the advantage of modifying the list is that you don't use auxiliary space.

There is a potential issue with modifying a list this way while iterating over it. When you remove an item from the list, the indices of the remaining items shift down by one, which might be an issue if you are trying to access it by index.
The best method is using a del statement as it is faster than remove() method, because it avoids the need to search the list for the item to remove.
i = 0
while i < len(objects):
if objects[i].x <= 0:
del objects[i]
else:
i += 1
Just the the readability of code has decreased now .
For question of recreating vs updating , no doubt recreating is faster as while updating the index gets changed again and again , creating a new list avoids the need to shift the indices of the remaining items in the list.
But it can increase the space complexity by huge amount if the list is large.
For a faster way of your problem you can consider a generator expression instead of a list comprehension. A generator expression is similar to a list comprehension, but it produces a generator object that can be iterated over lazily, rather than creating a new list in memory.
new_objects = (o for o in objects if o.x > 0)
For more information about generator expression , you can check this out : Generator expressions vs. list comprehensions

You can safely modify your list in situ by parsing it in reverse.
For example:
objects = [1, 0, 2, -1, 1, -2]
for i in range(len(objects)-1, -1, -1):
if objects[i] <= 0:
objects.pop(i)
print(objects)
Output:
[1, 2, 1]

Related

Generating list of iterators from list of lists

I have a list of lists and my goal is to create the list of iterators corresponding to the slices of these lists.
My original attempt is as follows (this is minimal "working" example without most of the unrelated logic):
my_lists = [1,2,3], [6,7,8]
iterators = []
for my_list in my_lists:
it = (my_list[i] for i in range(1,3))
iterators.append(it)
print(next(iterators[0]))
Unfortunately this does not work as intended as the scope of the iterator is shared and the my_list variable is thus identical in all the iterators. Meaning that the message of the print is 6 and not 1 as desired.
I am quite struggling to come up with some clean solution. All my attempt to get one fails. They either create unnecessary lists, like:
it = iter(my_list[1:3])
and
it = iter([my_list[i] for i in range(1,3)])
are unnecessarily clutterring:
def apply_range(l, start, stop):
return (l[i] for i in range(start, stop))
...
it = apply_range(my_list, 1, 3)
or outright hack-ish:
it = (lambda l: (l[i] for i in range(1,3)))(my_list)
or iterate from the starts of the lists:
from itertools import islice
...
it = islice(my_list, 1, 3)
Please note that the code i need is a bit more complicated than in the first snippet (i.e. start and stop of the range is computed -- i.e. not constant -- and I need more than just first element of first iterator -- in particular I combine them in one iterator with inner logic to pick correct iterator to select next element from).
The issue with your original code is that the generator expressions need to use the my_list variable from the outside scope. Because that variable changes as the loop continues to run, all the iterators end up yielding values from the last list in the input.
You can fix this by encapsulating the creation of the generator in a function. The function will have its own namespace where the specific list to be iterated on will be bound to a local variable (that won't change based on outside code). The function can either be a generator function, or it can be a normal function that returns a generator expression (the two approaches are almost identical).
def make_iter(lst, start, stop):
# this loop could be replaced by `return (lst[i] for i in range(start, stop))`
for i in range(start, stop):
yield lst[i]
my_lists = [1,2,3], [6,7,8]
start = 1
stop = 3
iterators = [make_iter(lst, start, stop) for lst in my_lists]
This is a simple neat code that I think satisfies you desire to make some iterators;
my_lists = [1,2,3], [6,7,8]
iterators = []
for my_list in my_lists:
iterators.append(iter(my_list))
allows you to access each "iterator" and the values by using "next";
next(iterators[1])
>>> 6
next(iterators[1])
>>> 7

How to remove an item in list once used from a large list in python to save the memory?

If i have large list which runs in millions of items, i want to iterate through each of them. Once i use the item it will never be used again, so how do i delete the item from the list once used? What is the best approach?
I know numpy is fast and efficient but want to know how it can be done using normal list.
mylst = [item1, item2,............millions of items]
for each_item in mylist:
#use the item
#delete the item to free that memory
You cannot delete an object directly in Python - an object's memory is automatically reclaimed, by garbage collection, when it's no longer possible to reference the object. So long as an object is in a list, it may be referenced again later (via the list).
So you need to destroy the list too. For example, like so:
while mylst:
each_item = mylst.pop() # removes an object from the end of the list
# use the item
Assuming you can copy a list (memory constraints might cause issues here) and only need to remove specific elements from it, you can create a shallow copy of the list and remove elements from it while iterating through the original list:
a_list = [1, 2, 3, 4, 5]
b_list = a_list.copy()
removal_key = 0
for element in a_list:
if element % 2 == 0:
b_list.pop(removal_key)
removal_key -= 1; # we need to push the removal key back afer every deletion as our array b_list becomes smaller than the original after every deletion
removal_key += 1
print(b_list) #[1, 3, 5]
If creating the 2nd list is not an option, you can store the key's of elements to be removed from the list and then use a second list to remove them :
a_list = [1, 2, 3, 4, 5]
elements_to_remove = []
for key, element in enumerate(a_list):
if element % 2 == 0:
elements_to_remove.append(key)
removed_emelent_count = 0
for element in elements_to_remove:
a_list.pop(element - removed_emelent_count)
removed_emelent_count += 1
print(a_list) #[1, 3, 5]
Note that the 1st solution is more time efficient (especially when removing a lot of elements) while the 2nd solution is more memory efficient, especially when removing smal number of elements from the list.
This is probably the case in which you should use generators.
A generator is a function that returns an object which we can iterate over, one value at a time, using the special keyword yield instead of return.
They allows you to have a smaller memory footprint, by keeping only one element per iteration.
In python3.x, range is actually a generator (python2.x is xrange).
Overly simple example:
>>> def range(start, end):
... current = start
... while current < end:
... yield current
... current += 1
...
>>> for i in range(0, 2):
... print(i)
...
0
1
How is this million entries list made?

I want to write a function that takes a list and returns it with all duplicates removed, without creating another list or string

I have tried this, it gives the output it should give([1,3,2]), however the problem is it keeps printing the output for infinite times without stop,, is there any solutions with out changing the idea of the code.
a= [1,2,2,2,1,3,2]
def rem_dup(L):
while len(L):
for i in L:
y= L.count(i)
if y>1:
L.remove(i)
print L
rem_dup(a)
Unless the point of this function is to exercise your python skills, it sounds like you want a set. A set is like a list but does not allow duplicate values. If you want your final data structure to be a list, you could do something like this:
final_list = list(set(original_list))
One way to safely do this is to loop over the list in reverse and remove only from the back:
>>> for i in range(len(a) - 1, -1, -1):
... if a.count(a[i]) > 1:
... del a[i]
...
>>> a
[1, 2, 3]
But this will be polynomial time, since a.count is linear and so is del a[i].
while len(L) will always be true as long as L had something in it to begin with
Modifying L while using it with the for loop can cause items to be skipped, so you have a bug for some inputs.
If you fix that problem, you shouldn't need the while loop.
As long as the items in a are hashable and you don't mind that the remaining items aren't in the same order when you started, you can create an intermediate set and replace the original contents in-place.
a[:] = set(a)

I can't delete a list of used numbers from another list of lists [duplicate]

This question already has answers here:
How to remove items from a list while iterating?
(25 answers)
Is there a simple way to delete a list element by value?
(25 answers)
Closed 8 months ago.
Given a list of numbers:
L = [1, 2, 3, 4, 5]
How do I delete an element, let's say 3, from the list while I iterate over it?
I tried the following code but it didn't do it:
for el in L:
if el == 3:
del el
Best is usually to proceed constructively -- build the new list of the items you want instead of removing those you don't. E.g.:
L[:] = [el for el in L if el != 3]
the list comprehension builds the desired list and the assignment to the "whole-list slice", L[:], ensure you're not just rebinding a name, but fully replacing the contents, so the effects are identically equal to the "removals" you wanted to perform. This is also fast.
If you absolutely, at any cost, must do deletions instead, a subtle approach might work:
>>> ndel = 0
>>> for i, el in enumerate(list(L)):
... if el==3:
... del L[i-ndel]
... ndel += 1
nowhere as elegant, clean, simple, or well-performing as the listcomp approach, but it does do the job (though its correctness is not obvious at first glance and in fact I had it wrong before an edit!-). "at any cost" applies here;-).
Looping on indices in lieu of items is another inferior but workable approach for the "must do deletions" case -- but remember to reverse the indices in this case...:
for i in reversed(range(len(L))):
if L[i] == 3: del L[i]
indeed this was a primary use case for reversed back when we were debating on whether to add that built-in -- reversed(range(... isn't trivial to obtain without reversed, and looping on the list in reversed order is sometimes useful. The alternative
for i in range(len(L) - 1, -1, -1):
is really easy to get wrong;-).
Still, the listcomp I recommended at the start of this answer looks better and better as alternatives are examined, doesn't it?-).
for el in L:
if el == 2:
del L[el]

python list.iteritems replacement

I've got a list in which some items shall be moved into a separate list (by a comparator function). Those elements are pure dicts. The question is how should I iterate over such list.
When iterating the simplest way, for element in mylist, then I don't know the index of the element. There's no .iteritems() methods for lists, which could be useful here. So I've tried to use for index in range(len(mylist)):, which [1] seems over-complicated as for python and [2] does not satisfy me, since range(len()) is calculated once in the beginning and if I remove an element from the list during iteration, I'll get IndexError: list index out of range.
Finally, my question is - how should I iterate over a python list, to be able to remove elements from the list (using a comparator function and put them in another list)?
You can use enumerate function and make a temporary copy of the list:
for i, value in enumerate(old_list[:]):
# i == index
# value == dictionary
# you can safely remove from old_list because we are iterating over copy
Creating a new list really isn't much of a problem compared to removing items from the old one. Similarly, iterating twice is a very minor performance hit, probably swamped by other factors. Unless you have a very good reason to do otherwise, backed by profiling your code, I'd recommend iterating twice and building two new lists:
from itertools import ifilter, ifilterfalse
l1 = list(ifilter(condition, l))
l2 = list(ifilterfalse(condition, l))
You can slice-assign the contents of one of the new lists into the original if you want:
l[:] = l1
If you're absolutely sure you want a 1-pass solution, and you're absolutely sure you want to modify the original list in place instead of creating a copy, the following avoids quadratic performance hits from popping from the middle of a list:
j = 0
l2 = []
for i in range(len(l)):
if condition(l[i]):
l[j] = l[i]
j += 1
else:
l2.append(l[i])
del l[j:]
We move each element of the list directly to its final position without wasting time shifting elements that don't really need to be shifted. We could use for item in l if we wanted, and it'd probably be a bit faster, but when the algorithm involves modifying the thing we're iterating over, I prefer the explicit index.
I prefer not to touch the original list and do as #Martol1ni, but one way to do it in place and not be affected by the removal of elements would be to iterate backwards:
for i in reversed(range(len()):
# do the filtering...
That will affect only the indices of elements that you have tested/removed already
Try the filter command, and you can override the original list with it too if you don't need it.
def cmp(i): #Comparator function returning a boolean for a given item
...
# mylist is the initial list
mylist = filter(cmp, mylist)
mylist is now a generator of suitable items. You can use list(mylist) if you need to use it more than once.
Haven't tried this yet but.. i'll give it a quick shot:
new_list = [old.pop(i) for i, x in reversed(list(enumerate(old))) if comparator(x)]
You can do this, might be one line too much though.
new_list1 = [x for x in old_list if your_comparator(x)]
new_list2 = [x for x in old_list if x not in new_list1]

Categories