"list index out of range" - Python - python

I was trying to write a piece of program that will remove any repeating items in the list, but I get a list index out of range
Here's the code:
a_list = [1, 4, 3, 2, 3]
def repeating(any_list):
list_item, comparable = any_list, any_list
for x in any_list:
list_item[x]
comparable[x]
if list_item == comparable:
any_list.remove(x)
print(any_list)
repeating(a_list)
So my question is, what's wrong?

Your code does not do what you think it does.
First you are creating additional references to the same list here:
list_item, comparable = any_list, any_list
list_item and comparable are just additional names to access the same list object.
You then loop over the values contained in any_list:
for x in any_list:
This assigns first 1, then 4, then 3, then 2, then 3 again to x.
Next, use those values as indexes into the other two references to the list, but ignore the result of those expressions:
list_item[x]
comparable[x]
This doesn't do anything, other than test if those indexes exist.
The following line then is always true:
if list_item == comparable:
because the two variables reference the same list object.
Because that is always true, the following line is always executed:
any_list.remove(x)
This removes the first x from the list, making the list shorter, while still iterating. This causes the for loop to skip items as it'll move the pointer to the next element. See Loop "Forgets" to Remove Some Items for why that is.
All in all, you end up with 4, then 3 items in the list, so list_item[3] then fails and throws the exception.
The proper way to remove duplicates is to use a set object:
def repeating(any_list):
return list(set(any_list))
because a set can only hold unique items. It'll alter the order however. If the order is important, you can use a collections.OrderedDict() object:
def repeating(any_list):
return list(OrderedDict.fromkeys(any_list))
Like a set, a dictionary can only hold unique keys, but an OrderedDict actually also keeps track of the order of insertion; the dict.fromkeys() method gives each element in any_list a value of None unless the element was already there. Turning that back in to a list gives you the unique elements in a first-come, first serve order:
>>> from collections import OrderedDict
>>> a_list = [1, 4, 3, 2, 3]
>>> list(set(a_list))
[1, 2, 3, 4]
>>> list(OrderedDict.fromkeys(a_list))
[1, 4, 3, 2]
See How do you remove duplicates from a list in whilst preserving order? for more options still.

The easiest way to solve your issue is to convert the list to a set and then, back to a list...
def repeating(any_list):
print list(set(any_list))
You're probably having an issue, because you're modifying the list (removing), while iterating over it.

If you want to remove duplicates in a list but don't care about the elements formatting then you can
def removeDuplicate(numlist):
return list(set(numlist))
If you want to preserve the order then
def removeDuplicate(numlist):
return sorted(list(set(numlist)), key=numlist.index)

Related

I want to remove duplicates from the list using for loop.I am getting index out of bound error here:if lst[i]==lst[j]:

I have declared a list and sorted it.While iterating the second for loop I am getting :index out
of range error.
lst=[]
for i in range(5):
a=int(input())
lst.append(a)
lst.sort()
print(lst)
for i in range(0,len(lst)):
j=i+1
for j in range(len(lst)):
if lst[i]==lst[j]:
print("hii")
lst.pop(j)
print(lst)
As you remove elements from lst it's length change so your iteration breaks when it tries to access an empty index: "Index out of bound"
If you want to keep your structure you could catch that and exit the loop using a try/catch on IndexError but it's really ugly.
A more pythonic solution is to simply cast your list as a set then back to list. This works because sets remove all of their duplicate elements:
>>> list(set([1, 1, 2, 2, 3, 4,]))
[1, 2, 3, 4]
You shouldn't change the length of your list while iterating over it, I am assuming you want to delete all duplicates in your original list. You can use cast the list to a set with
set(lst)
before you sort or after you sort.
You're getting this error thrown as you are changing the length of a list whist iterating over it.
A simpler approach to removing all duplicates from a list would be to create a set based on the list which is then converted back to a list - this should be done before you sort your list as a set (by definition) does not preserve ordering of elements.
Using your example:
lst = []
for i in range(5):
a=int(input())
lst.append(a)
lst = list(set(lst)) # Remove duplicates from list
lst.sort() # Sort list

How do I remove the first element in a list within a list?

I would like to know if there's any way to remove an index in a list within a list. For example, given [[1,2,3],[4,5,6],[7,8,9]], I would like to remove the the first element of each list so that it becomes [[2,3],[5,6],[8,9]]. I can do a for loop to slowly remove them but I was wondering if there's a more efficient manner to do so?
You can do it with iterating over the list, and then call the pop() function on the inner list. This will remove an element at the specified index.
test_list = [[1,2,3],[4,5,6],[7,8,9]]
for i in test_list:
i.pop(0)
print(test_list)
output: [[2, 3], [5, 6], [8, 9]]
I would say you have two options.
You can use pop():
for item in list:
item.pop(0)
Or recreate a new list :
list2 = [item[1:] for item in list]
For small lists as in your example, doing something like list.pop(0) is OK:
nested = [[1,2,3], [4,5,6], [7,8,9]]
for inner in nested:
inner.pop(0) # Remove first element of each inner list
However, as lists are basically implemented as arrays (of pointers), removing the first element means that all other elements has to be shifted back by one, resulting in a lot of (pointer) copying if your lists are large. Also, using inner[1:] rather than inner.pop(0) does not solve the problem, as inner[1:] creates a new list rather than returning a view.
One way out is to make the redundant elements appear as the last element instead of the first, so that you can do inner.pop() instead, which removes the last element. No further shifting/copying is then required.
Another solution is to switch out your data structure from a list to a deque ("double-ended queue"). This has a popleft() method, which is equivalent to pop(0) but fast, as deques support fast popping from both ends:
import collections
nested = [[1,2,3], [4,5,6], [7,8,9]]
nested = [collections.deque(inner) for inner in nested] # list of deques
for inner in nested:
inner.popleft() # Remove first element of each inner list, fast!

How to remove an item in list once used from a large list in python to save the memory?

If i have large list which runs in millions of items, i want to iterate through each of them. Once i use the item it will never be used again, so how do i delete the item from the list once used? What is the best approach?
I know numpy is fast and efficient but want to know how it can be done using normal list.
mylst = [item1, item2,............millions of items]
for each_item in mylist:
#use the item
#delete the item to free that memory
You cannot delete an object directly in Python - an object's memory is automatically reclaimed, by garbage collection, when it's no longer possible to reference the object. So long as an object is in a list, it may be referenced again later (via the list).
So you need to destroy the list too. For example, like so:
while mylst:
each_item = mylst.pop() # removes an object from the end of the list
# use the item
Assuming you can copy a list (memory constraints might cause issues here) and only need to remove specific elements from it, you can create a shallow copy of the list and remove elements from it while iterating through the original list:
a_list = [1, 2, 3, 4, 5]
b_list = a_list.copy()
removal_key = 0
for element in a_list:
if element % 2 == 0:
b_list.pop(removal_key)
removal_key -= 1; # we need to push the removal key back afer every deletion as our array b_list becomes smaller than the original after every deletion
removal_key += 1
print(b_list) #[1, 3, 5]
If creating the 2nd list is not an option, you can store the key's of elements to be removed from the list and then use a second list to remove them :
a_list = [1, 2, 3, 4, 5]
elements_to_remove = []
for key, element in enumerate(a_list):
if element % 2 == 0:
elements_to_remove.append(key)
removed_emelent_count = 0
for element in elements_to_remove:
a_list.pop(element - removed_emelent_count)
removed_emelent_count += 1
print(a_list) #[1, 3, 5]
Note that the 1st solution is more time efficient (especially when removing a lot of elements) while the 2nd solution is more memory efficient, especially when removing smal number of elements from the list.
This is probably the case in which you should use generators.
A generator is a function that returns an object which we can iterate over, one value at a time, using the special keyword yield instead of return.
They allows you to have a smaller memory footprint, by keeping only one element per iteration.
In python3.x, range is actually a generator (python2.x is xrange).
Overly simple example:
>>> def range(start, end):
... current = start
... while current < end:
... yield current
... current += 1
...
>>> for i in range(0, 2):
... print(i)
...
0
1
How is this million entries list made?

Modify a list while iterating [duplicate]

This question already has answers here:
How to modify list entries during for loop?
(10 answers)
Closed 5 months ago.
I know you should not add/remove items while iterating over a list. But can I modify an item in a list I'm iterating over if I do not change the list length?
class Car(object):
def __init__(self, name):
self.name = name
def __repr__(self):
return type(self).__name__ + "_" + self.name
my_cars = [Car("Ferrari"), Car("Mercedes"), Car("BMW")]
print(my_cars) # [Car_Ferrari, Car_Mercedes, Car_BMW]
for car in my_cars:
car.name = "Moskvich"
print(my_cars) # [Car_Moskvich, Car_Moskvich, Car_Moskvich]
Or should I iterate over the list indices instead? Like that:
for car_id in range(len(my_cars)):
my_cars[car_id].name = "Moskvich"
The question is: are the both ways above allowed or only the second one is error-free?
If the answer is yes, will the following snippet be valid?
lovely_numbers = [[41, 32, 17], [26, 55]]
for numbers_pair in lovely_numbers:
numbers_pair.pop()
print(lovely_numbers) # [[41, 32], [26]]
UPD. I'd like to see the python documentation where it says "these operations are allowed" rather than someone's assumptions.
You are not modifying the list, so to speak. You are simply modifying the elements in the list. I don't believe this is a problem.
To answer your second question, both ways are indeed allowed (as you know, since you ran the code), but it would depend on the situation. Are the contents mutable or immutable?
For example, if you want to add one to every element in a list of integers, this would not work:
>>> x = [1, 2, 3, 4, 5]
>>> for i in x:
... i += 1
...
>>> x
[1, 2, 3, 4, 5]
Indeed, ints are immutable objects. Instead, you'd need to iterate over the indices and change the element at each index, like this:
>>> for i in range(len(x)):
... x[i] += 1
...
>>> x
[2, 3, 4, 5, 6]
If your items are mutable, then the first method (of directly iterating over the elements rather than the indices) is more efficient without a doubt, because the extra step of indexing is an overhead that can be avoided since those elements are mutable.
I know you should not add/remove items while iterating over a list. But can I modify an item in a list I'm iterating over if I do not change the list length?
You're not modifying the list in any way at all. What you are modifying is the elements in the list; That is perfectly fine. As long as you don't directly change the actual list, you're fine.
There's no need to iterate over the indices. In fact, that's unidiomatic. Unless you are actually trying to change the list itself, simply iterate over the list by value.
If the answer is yes, will the following snippet be valid?
lovely_numbers = [[41, 32, 17], [26, 55]]
for numbers_pair in lovely_numbers:
numbers_pair.pop()
print(lovely_numbers) # [[41, 32], [26]]
Absolutely. For the exact same reasons as I said above. Your not modifying lovely_numbers itself. Rather, you're only modifying the elements in lovely_numbers.
Examples where the list is modified and not during while iterating over the elements of the list
list_modified_during_iteration.py
a = [1,2]
i = 0
for item in a:
if i<5:
print 'append'
a.append(i+2)
print a
i += 1
list_not_modified_during_iteration.py (Changed item to i)
a = [1,2]
i = 0
for i in range(len(a)):
if i<5:
print 'append'
a.append(i+2)
print a
i += 1
Of course, you can. The first way is normal, but in some cases you can also use list comprehensions or map().

Assigning values to elements of list

When I have a for loop:
for row in list:
row = something_or_other
It seems that sometimes I can assign a value (or append/extend etc.) directly to row and the list changes accordingly, and sometimes I have to do something roundabout like:
for row in list:
list[list.index(row)] = something_or_other
What gives?!?
You can never reassign the value row (or in general, whatever your iterating variable is) like this:
x = [1, 2, 3]
for x in lst:
x = # code
because this is reassigning the variable x entirely (it's saying "forget that x was a member of a list").
However, if x is mutable, for example if it's a list, you can do:
lst = [[1, 2], [3, 4]]
for x in lst:
x.append(10)
and it will actually change the values (to [[1, 2, 10], [3, 4, 10]]). In technical terms, this is the difference between a rebinding and mutating operations.
Assigning to lst[lst.index(row)] results in O(n²) performance instead of O(n), and may cause errors if the list contains multiple identical items.
Instead, assign a new list, constructed with a list comprehension or map:
lst = [1,2,3,4]
doubled = [n*2 for n in lst]
Alternatively, you can use enumerate if you really want to modify the original list:
for i,n in enumerate(lst):
lst[i] = n*2
row in the for loop is just a name for the original (but re-assigning it inside the for - effectively breaks the link). So if it's mutable then you can use methods on it (such as append, add, extend etc...) which will reflect in the underlying object.
The correct idiom is to use:
for rowno, row in enumerate(some_list):
some_list[rowno] = #...

Categories