Problems removing element while iterating over list [duplicate] - python

This question already has answers here:
Modify a list while iterating [duplicate]
(4 answers)
Closed 2 years ago.
As a beginner, I am writing a simple script to better acquaint myself with python. I ran the code below and I am not getting the expected output. I think the for-loop ends before the last iteration and I don't know why.
letters = ['a', 'b', 'c', 'c', 'c']
print(letters)
for item in letters:
if item != 'c':
print('not c')
else:
letters.remove(item)
continue
print(letters)
output returned:
['a', 'b', 'c', 'c', 'c']
not c
not c
['a', 'b', 'c']
Expected Output:
['a', 'b', 'c', 'c', 'c']
not c
not c
['a', 'b']
Basically, I am not expecting to have 'c' within my list anymore.
If you have a better way to write the code that would be appreciated as well.

WARNING: This is an inefficient solution that I will provide to answer your question. I'll post a more concise and faster solution in answer #2.
Answer #1
When you are removing items like this, it changes the length of the list, so it is better to loop backwards. Try for item in letters[::-1] to reverse the list:
letters = ['a', 'b', 'c', 'c', 'c']
print(letters)
for item in letters[::-1]:
if item != 'c':
print('not c')
else:
letters.remove(item)
continue
print(letters)
output:
['a', 'b', 'c', 'c', 'c']
not c
not c
['a', 'b']
Answer #2 - Use list comprehension instead of looping (more detail: Is there a simple way to delete a list element by value?):
letters = ['a', 'b', 'c', 'c', 'c']
letters = [x for x in letters if x != 'c']
output:
['a', 'b']

the letters.remove(item) removes only a single instance of the element, but has the unintentional effect of reducing the size of the list as you are iterating over it. This is something you want to generally avoiding doing, modifying the same element you are iterating over. As a result the list becomes shorter and the iterator believes you have traversed all of the elements, even though the last 'c' is still in the list. This is seen with the output of:
letters = ['a', 'b', 'c', 'c', 'c']
print(letters)
for idx,item in enumerate(letters):
print("Index: {} Len: {}".format(idx,len(letters)))
if item != 'c':
print('not c')
else:
letters.remove(item)
continue
print(letters)
"""Index: 0 Len: 5
not c
Index: 1 Len: 5
not c
Index: 2 Len: 5
Index: 3 Len: 4"""
You never iterate over the last element because the index (4) would exceed the indexable elements of the list (0-3 now)!
If you want to filter a list you can use the built in filter function:
filter(lambda x: x!='c', letters)

Related

How to efficiently get common items from two lists that may have duplicates?

my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
The common items are:
c = ['a', 'b', 'a']
The code:
for e in my_list:
if e in my_list_2:
c.append(e)
...
If the my_list is long, this would be very inefficient. If I convert both lists into two sets, then use set's intersection() function to get the common items, I will lose the duplicates in my_list.
How to deal with this efficiently?
dict is already a hashmap, so lookups are practically as efficient as a set, so you may not need to do any extra work collecting the values - if it wasn't, you could pack the values into a set to check before checking the dict
However, a large improvement may be to make a generator for the values, rather than creating a new intermediate list, to iterate over where you actually want the values
def foo(src_dict, check_list):
for value in check_list:
if value in my_dict:
yield value
With the edit, you may find you're better off packing all the inputs into a set
def foo(src_list, check_list):
hashmap = set(src_list)
for value in check_list:
if value in hashmap:
yield value
If you know a lot about the inputs, you can do better, but that's an unusual case (for example if the lists are ordered you could bisect, or if you have a huge verifying list or very very few values to check against it you may find some efficiency in the ordering and if you make a set)
I am not sure about time efficiency, but, personally speaking, list comprehension would always be more of interest to me:
[x for x in my_list if x in my_list_2]
Output
['a', 'b', 'a']
First, utilize the set.intersection() method to get the intersecting values in the list. Then, use a nested list comprehension to include the duplicates based on the original list's count on each value:
my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
c = [x for x in set(my_list).intersection(set(my_list_2)) for _ in range(my_list.count(x))]
print(c)
The above may be slower than just
my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
c = []
for e in my_list:
if e in my_list_2:
c.append(e)
print(c)
But when the lists are significantly larger, the code block utilizing the set.intersection() method will be significantly more efficient (faster).
sorry for not reading the post carefully and now it is not possible to delete.. however, it is an attempt for solution.
c = lambda my_list, my_list_2: (my_list, my_list_2, list(set(my_list).intersection(set(my_list_2))))
print("(list_1,list_2,duplicate_items) -", c(my_list, my_list_2))
Output:
(list_1,list_2,duplicate_items) -> (['a', 'b', 'a', 'd', 'e', 'f'], ['a', 'b', 'c'], ['b', 'a'])
or can be
[i for i in my_list if i in my_list_2]
output:
['a', 'b', 'a']

How to remove elements from a list that appear less than k = 2?

I am trying to keep elements of a list that appear at least twice, and remove the elements that appear less than twice.
For example, my list can look like:
letters = ['a', 'a', 'b', 'b', 'b', 'c']
I want to get a list with the numbers that appear at least twice, to get the following list:
letters_appear_twice = ['a', 'b'].
But since this is part of a bigger code, I don't know exactly what my lists looks like, only that I want to keep the letters that are repeated at least twice. But for the sake of understanding, we can assume I know what the list looks like!
I have tried the following:
'''
letters = ['a', 'a', 'b', 'b', 'b', 'c']
for x in set(letters):
if letters.count(x) > 2:
while x in letters:
letters.remove(x)
print(letters)
'''
But this doesn't quite work like I want it too...
Thank you in advance for any help!
letters = ['a', 'a', 'b', 'b', 'b', 'c']
res = []
for x in set(letters):
if letters.count(x) >= 2:
res.append(x)
print(res)
Prints:
['b', 'a']
Using your code above. You can make a new list, and append to it.
new_list = []
for x in set(letters):
if letters.count(x) >= 2:
new_list.append(x)
print(new_list)
Output
['b', 'a']
Easier to create a new list instead of manipulating the source list
def letters_more_or_equal_to_k(letters, k):
result = []
for x in set(letters):
if letters.count(x) >= k:
result.append(x)
result.sort()
return result
def main():
letters = ['a', 'a', 'b', 'b', 'b', 'c']
k = 2
result = letters_more_or_equal_to_k(letters, k)
print(result) # prints ['a', 'b']
if __name__ == "__main__":
main()
If you don't mind shuffling the values, here's one possible solution:
from collections import Counter
letters = ['a', 'a', 'b', 'b', 'b', 'c']
c = Counter(letters)
to_remove = {x for x, i in c.items() if i < 2}
result = list(set(letters) - to_remove)
print(result)
Output:
['a', 'b']
You can always sort later.
This solution is efficient for lists with more than ~10 unique elements.

Creating an irregular list of lists from a single list

I'm trying to create a list of lists from a single list. I'm able to do this if the new list of lists have the same number of elements, however this will not always be the case
As said earlier, the function below works when the list of lists have the same number of elements.
I've tried using regular expressions to determine if an element matches a pattern using
pattern2=re.compile(r'\d\d\d\d\d\d') because the first value on my new list of lists will always be 6 digits and it will be the only one that follows that format. However, i'm not sure of the syntax of getting it to stop at the next match and create another list
def chunks(l,n):
for i in range(0,len(l),n):
yield l[i:i+n]
The code above works if the list of lists will contain the same number of elements
Below is what I expect.
OldList=[111111,a,b,c,d,222222,a,b,c,333333,a,d,e,f]
DesiredList=[[111111,a,b,c,d],[222222,a,b,c],[333333,a,d,e,f]]
Many thanks indeed.
Cheers
Likely a much more efficient way to do this (with fewer loops), but here is one approach that finds the indexes of the breakpoints and then slices the list from index to index appending None to the end of the indexes list to capture the remaining items. If your 6 digit numbers are really strings, then you could eliminate the str() inside re.match().
import re
d = [111111,'a','b','c','d',222222,'a','b','c',333333,'a','d','e','f']
indexes = [i for i, x in enumerate(d) if re.match(r'\d{6}', str(x))]
groups = [d[s:e] for s, e in zip(indexes, indexes[1:] + [None])]
print(groups)
# [[111111, 'a', 'b', 'c', 'd'], [222222, 'a', 'b', 'c'], [333333, 'a', 'd', 'e', 'f']]
You can use a fold.
First, define a function to locate the start flag:
>>> def is_start_flag(v):
... return len(v) == 6 and v.isdigit()
That will be useful if the flags are not exactly what you expected them to be, or to exclude some false positives, or even if you need a regex.
Then use functools.reduce:
>>> L = d = ['111111', 'a', 'b', 'c', 'd', '222222', 'a', 'b', 'c', '333333', 'a', 'd', 'e', 'f']
>>> import functools
>>> functools.reduce(lambda acc, x: acc+[[x]] if is_start_flag(x) else acc[:-1]+[acc[-1]+[x]], L, [])
[['111111', 'a', 'b', 'c', 'd'], ['222222', 'a', 'b', 'c'], ['333333', 'a', 'd', 'e', 'f']]
If the next element x is the start flag, then append a new list [x] to the accumulator. Else, add the element to the current list, ie the last list of the accumulator.

Check list item against multiple lists and remove if present in any of them. Python [duplicate]

This question already has answers here:
remove elements in one list present in another list [duplicate]
(2 answers)
Closed 8 years ago.
I have a main list such as:
mainlst = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
and I want to search each item in this mainlst against multiple other search lists and if it's present in any of them to remove it from the main list, so for example:
searchlst1 = ['a', 'b', 'c']
searchlst2 = ['a', 'd', 'f']
searchlst3 = ['e', 'f', 'g']
The issue Im having is I cant work out how to make the loop go through each statement, so if I use and if elif statement it exits the loop as soon as it has found a match
for item in mainlst:
if item in searchlst1:
mainlst.remove(item)
elif item in searchlst2:
mainlst.remove(item)
elif item in searchlst3
mainlst.remove(item)
but obviously this exits the loop as soon as one condition is true, how do I make the loop go through all the conditions?
set objects are great for stuff like this -- the in operator takes O(1) time compared to O(N) time for a list -- And it's easy to create a set from a bunch of existing lists using set.union:
search_set = set().union(searchlst1, searchlst2, searchlst3)
mainlst = [x for x in mainlst if x not in search_set]
Example:
>>> search_set = set().union(searchlst1, searchlst2, searchlst3)
>>> search_set
set(['a', 'c', 'b', 'e', 'd', 'g', 'f'])
>>> mainlst = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
>>> mainlst = [x for x in mainlst if x not in search_set]
>>> mainlst
['h']
How about using a list comprehension and a set:
[i for i in mainlst if i not in set(searchlst1 + searchlst2 + searchlst3)]
returns ['h']
set() takes an iterable (in this case a group of lists) and returns a set containing the unique values. Tests for membership in a set always take the same amount of time, whereas testing for membership in a list scales linearly with the length of the list.
The list comprehension goes through each element of mainlst and constructs a new list whose members are not in the set:
>>> mainlst = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
>>> search = set(searchlst1 + searchlst2 + searchlst3)
>>> search
set(['a', 'c', 'b', 'e', 'd', 'g', 'f'])
>>> [i for i in mainlst if i not in search]
['h']
Replacing the elif statements with if statements will fix your problem.
for item in mainlst:
if item in searchlst1:
mainlst.remove(item)
if item in searchlst2:
mainlst.remove(item)
if item in searchlst3:
mainlst.remove(item)
The problem now is that your doing three searches through the list to remove items. This will become more time consuming as the list or searchlists grow. And in your example there are duplicates in your searchlists.
Combining the searchlists will reduce number of comparisons.

Removing element messes up the index [duplicate]

This question already has answers here:
Loop "Forgets" to Remove Some Items [duplicate]
(10 answers)
Closed 8 years ago.
I have a simple question about lists
Suppose that I want to delete all 'a's from a list:
list = ['a', 'a', 'b', 'b', 'c', 'c']
for element in list:
if element == 'a':
list.remove('a')
print list
==> result:
['a', 'b', 'b', 'c', 'c', 'd', 'd']
I know this is happening because, after I remove the first 'a', the list index gets
incremented while all the elements get pushed left by 1.
In other languages, I guess one way to solve this is to iterate backwards from the end of the list..
However, iterating through reversed(list) returns the same error.
Is there a pythonic way to solve this problem??
Thanks
One of the more Pythonic ways:
>>> filter(lambda x: x != 'a', ['a', 'a', 'b', 'b', 'c', 'c'])
['b', 'b', 'c', 'c']
You should never modify a list while iterating over it.
A better approach would be to use a list comprehension to exclude an item:
list1 = ['a', 'a', 'b', 'b', 'c', 'c']
list2 = [x for x in list1 if x != 'a']
Note: Don't use list as a variable name in Python - it masks the built-in list type.
You are correct, when you remove an item from a list while iterating over it, the list index gets out of sync. What both the other existing answers are hinting at is that you need to create a new list and copy over only the items you want.
For example:
existing_list = ['a', 'a', 'b', 'c', 'd', 'e']
new_list = []
for element in existing_list:
if element != 'a':
new_list.append(element)
existing_list = new_list
print existing_list
outputs: ['b', 'c', 'd', 'e']

Categories