how to remove duplicates from arraylist using python? - python

Q1:
I have an arraylist
x= [[1,2,-1],[1,-1,0],[-1,0,1]]
finally I want to get x = [[1,2,-1],[1,-1,0]] because [1,-1,0] and [-1,0,1] are the same but just different order.
Q2:
For
temp = [[0,0,0],[0,0,0],[0,0,0],[0,0,0]]
The same idea, I want to get temp = [[0,0,0]], which means droping all the other duplicates in the arraylist just like Q1.
My code does not work. It says list index out of range, but I use temp2 to len(temp1) changes.....why?
temp1 = [[0,0,0],[0,0,0],[0,0,0],[0,0,0]]
temp2 = temp1
for i in range(0, len(temp1)):
for j in range(i+1, len(temp1)):
if(set(temp1[i]) == set(temp1[j])):
temp2.remove(temp2[i])

You shouldn't change the list you're iterating over! Also temp2 = temp1 doesn't make a copy. You only have two names that refer to the same list afterwards. If you want to make a (shallow) copy, you could use temp2 = temp1.copy() or temp2 = temp1[:] or temp2 = list(temp1).
A general note: Using two iterations will have quadratic runtime behaviour it would be faster to keep the already processed items in a set which has O(1) lookup (most of the time):
temp1 = [[0,0,0],[0,0,0],[0,0,0],[0,0,0]]
temp2 = [] # simply creating a new list is probably easier.
seen = set()
for item in temp1:
# lists are not hashable so convert it to a frozenset (takes care of the order as well)
item_as_tuple = frozenset(item)
if item_as_tuple not in seen:
temp2.append(item)
seen.add(item_as_tuple)
If you can and want to use a third-party package, I have one that contains an iterator that does exactly that iteration_utilities.unique_everseen:
>>> from iteration_utilities import unique_everseen
>>> temp1 = [[0,0,0], [0,0,0], [0,0,0], [0,0,0]]
>>> list(unique_everseen(temp1, key=frozenset))
[[0, 0, 0]]
>>> x = [[1,2,-1], [1,-1,0], [-1,0,1]]
>>> list(unique_everseen(x, key=frozenset))
[[1, 2, -1], [1, -1, 0]]

Q1. If you want to consider lists equal when they contain the same elements, one way to do this is to sort them before comparison, like here:
def return_unique(list_of_lists):
unique = []
already_added = set()
for item in list_of_lists:
# Convert to tuple, because lists are not hashable.
# We consider two things to be the same regardless of the order
# so before converting to tuple, we also sort the list.
# This way [1, -1, 0] and [-1, 0, 1] both become (-1, 0, 1)
sorted_tuple = tuple(sorted(item))
# Check if we've already seen this tuple.
# If we haven't seen it yet, add the original list (in its
# original order) to the list of unique items
if sorted_tuple not in already_added:
already_added.add(sorted_tuple)
unique.append(item)
return unique
temp1 = [[1, 2, -1], [1, -1, 0], [-1, 0, 1]]
temp2 = [[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
print(return_unique(temp1))
print(return_unique(temp2))
Q2. Just assigning temp2 = temp1 does not create a new independent copy -- they both still refer to the same list. In this case, it would be possible to create an independent copy using copy.deepcopy:
import copy
temp2 = copy.deepcopy(temp1)

in order to remove dups we can first sort the list:
lsts = [[1,2,-1],[1,-1,0],[-1,0,1]]
lsts = [sorted(x) for x in lsts]
then convert the lists to tuples and add them to a set which will eliminate duplications (we cannot add lists to a set since they're not hashable, so we have to convert them to tuples first):
res = set()
for x in lsts:
res.add(tuple(x))
then we can convert the tuples and the set back to lists:
lsts = list(list(x) for x in res)
print(lsts) # [[-1, 1, 2], [-1, 0, 1]]
The reason you're failing is because you're modifying the list you're iterating, so by removing items you make the list shorter and then you're trying to access an index which no longer exists, but you can fix it by iterating the list without using indexes:
temp1 = [[0,0,0],[0,0,0],[0,0,0],[0,0,0]]
for x in temp1:
temp2 = temp1[:] # create a real copy of temp1
temp2.remove(x) # remove x so we won't consider it as dup of itself
for y in temp2:
if set(x) == set(y):
temp1.remove(x)
print(temp1) # [[0, 0, 0]]

A set will do:
lsts = [[1,2,-1],[1,-1,0],[-1,0,1]]
result = {tuple(sorted(x)) for x in lsts}

You may use groupby
from itertools import groupby
[i for i,k in groupby(x, lambda j:sorted(j))]
output:
[[-1, 1, 2], [-1, 0, 1]]

This works for Q2.
temp1 = [[0,0,0],[0,0,0],[0,0,0],[0,0,0]]
temp2 = []
for element in temp1:
if element not in temp2:
temp2.append(element)
temp2
>>>[[0, 0, 0]]

Related

Remove list from list of lists if condition is met

I have a list of lists containing an index and two coordinates, [i,x,y] eg:
L=[[1,0,0][2,0,1][3,1,2]]
I want to check if L[i][1] is repeated (as is the case in the example for i=0 and i=1) and keep in the list only the list with the smallest i. In the example [2,0,1] would be removed and L would be:
L=[[1,0,0][3,1,2]]
Is there a simple way to do such a thing?
Keep a set of the x coordinates we've already seen, traverse the input list sorted by ascending i and build and output list adding only the sublists whose x we haven't seen yet:
L = [[1, 0, 0], [2, 0, 1], [3, 1, 2]]
ans = []
seen = set()
for sl in sorted(L):
if sl[1] not in seen:
ans.append(sl)
seen.add(sl[1])
L = ans
It works as required:
L
=> [[1, 0, 0], [3, 1, 2]]
There are probably better solution but you can do with:
i1_list=[]
result_list=[]
for i in L:
if not i[1] in i1_list:
result_list.append(i)
i1_list.append(i[1])
print(result_list)

Trying to compare two lists, find the equal values and replace them with the index of the other list, but it is not working

Basically, I have a list that contains all the possibles values of the second list. For example:
First list (Possible values):
list1 = ['cat','dog','pig']
Second list:
list2 = ['dog','cat','cat','cat','dog','pig','cat','pig']
I want to compare those lists and substitute all the strings in the second list to the index of the first one.
So expect something like this:
list2 = [1,0,0,0,1,2,0,2]
I've tried it in several different ways. The first one, although it worked, was not an intelligent method. Since if the first list had a huge variety of possible values, this strategy would not be functional to code.
That was the first solution:
list3 = []
for i in list2:
if i == 'cat':
i = 0
list3.append(i)
elif i == 'dog':
i = 1
list3.append(i)
elif i == 'pig':
i = 2
list3.append(i)
list2 = list3
print(list2)
output
[1, 0, 0, 0, 1, 2, 0, 2]
But I want a solution that works in a huge variety of possible values without having to code each test.
So I tried this (and other failed attempts), but it isn't working
for i in list2:
for j in list1:
if i == j:
i = list1.index(j)
The problem with your code is that you are simply replacing i on each iteration. You want to create a list and append the result from list1.index(j) to it on each iteration:
l = []
for i in list2:
for j in list1:
if i == j:
l.append(list1.index(j))
Note that this can be simplified with a list comprehension:
[list1.index(i) for i in list2]
# [1, 0, 0, 0, 1, 2, 0, 2]
Note that for a lower complexity solution, you can create a dictionary mapping strings to index, and simply create a list by looking up with the strings in list2, as in #blhshing's answer.
Some reads you might find useful:
Data Structures
List comprehensions
string — Common string operations
The i you get by iterating the list does not reflect changes back into the list. Hence no change.
Create a dictionary that mapps animal to index position.
Use a list comprehension to create a new list by replacing animals of list2 its lookup-index:
list1 = ['cat','dog','pig']
lookup = {animal:index for index,animal in enumerate(list1)}
list2 = ['dog','cat','cat','cat','dog','pig','cat','pig']
result = [lookup.get(what) for what in list2]
print(result) # [1, 0, 0, 0, 1, 2, 0, 2]
Doku:
enumerate(iterable)
Create a dictionary with list comprehension in Python (creation of the lookup)
list comprehension
Why dict.get(key) instead of dict[key]?
You can use a dict comprehension to create a mapping dict that maps keys to indices using the enumerate function, and then map items in list2 to the values of the mapping dict:
mapping = {k: i for i, k in enumerate(list1)}
list(map(mapping.get, list2))
This returns:
[1, 0, 0, 0, 1, 2, 0, 2]
You can use only one for loop and one if-statement that is easy to understand.
list1 = ['cat','dog','pig']
list2 = ['dog','cat','cat','cat','dog','pig','cat','pig']
for i,item in enumerate(list2):
if item in list1:
list2[i] = list1.index(item)
# list2 = [1, 0, 0, 0, 1, 2, 0, 2]

Create array containing list of lists of n repeated items in Python

I try to find a faster way to create such a list:
import numpy as np
values = [0,1,2]
repeat = [3,4,2]
list = np.empty(0, dtype=int)
for i in range(len(values)):
list = np.append(list, np.full(repeat[i], values[i]))
print list
returns
[0 0 0 1 1 1 1 2 2]
Any idea? Thanks
You can save a lot of time using native python lists instead of numpy arrays. When I ran your code using the timeit module, it took 16.87 seconds. The following code took 0.87.
list = []
for val, rep in zip(values, repeat):
list.extend([val]*rep)
If you then convert list to a numpy array using list = np.array(list), that time goes up to 2.09 seconds.
Of course, because numpy is optimized for large amounts of data, this may not hold for very long lists of values with large numbers of repeats. In this case, one alternative would be to do you memory allocation all at the same time, instead of continually lengthening the array (which I believe covertly causes a copy to made, which is slow). The example below completes in 4.44 seconds.
list = np.empty(sum(repeat), dtype=int) #allocate the full length
i=0 #start the index at 0
for val, rep in zip (values, repeat):
list[i:i+rep] = [val]*rep #replace the slice
i+=rep #update the index
You can try this. Multiply lists of values by lengths for each pair of values and lengths.
You will get list of lists
L = [[i]*j for i, j in zip(values, repeat)]
print(L)
returns
[[0, 0, 0], [1, 1, 1, 1], [2, 2]]
Than make a flat list
flat_L = [item for sublist in L for item in sublist]
print(flat_L)
[0, 0, 0, 1, 1, 1, 1, 2, 2]
I would do like this:
a=[1,2,3]
b=[2,4,3]
x=[[y]*cnt_b for cnt_b,y in zip(b,a)]
Output:
[[1,1],[2,2,2,2],[3,3,3]]
In [8]: [i for i, j in zip(values, repeat) for _ in range(j)]
Out[8]: [0, 0, 0, 1, 1, 1, 1, 2, 2]
Here, we are zipping values and repeat together with zip to have one to one correspondence between them (like [(0, 3), (1, 4), (2, 2)]). Now, in the list comprehension I'm inserting i or values and looping them over range of j to repeat it jth times.

Assigning elements to a lists within a list

I am running Python3.6 and am working with lists which contain other lists within it.
list_array = [[1,0,1,0,2,2],
[1,1,2,0,1,2],
[2,2,2,1,0,1]]
I would like to modify the list called list_array be deleting all the entries with value 2 within the sub lists.
The code I used for this is
for k in list_array:
k = [x for x in k if x!=2]
However, this code doesn't modify list_array.
Why isn't it possible to replace the elements in the lists within list_array this way?
You are creating a new list instead of assigning to the old one. You can fix this by adding an assignment using k[:] =, like this:
for k in list_array:
k[:] = [x for x in k if x!=2]
Your code creates a new list every time, and erase the previous one.
At the last iteration you should get this:
k = [1, 0, 1]
Instead, a list comprehension works fine:
list_array = [[x for x in sublist if x != 2] for sublist in list_array]
Output:
[[1, 0, 1, 0], [1, 1, 0, 1], [1, 0, 1]]
If you want to write it with an explicit for loop, it could be done like this:
new_list_array = list()
for sublist in list_array:
new_list_array.append([x for x in sublist if x != 2])
You are not modifying the old list you can modify it like that
list_array = [[1,0,1,0,2,2],[1,1,2,0,1,2],[2,2,2,1,0,1]]
for k in list_array:
k[:]= [x for x in k if x!=2]
print(list_array)

How to remove list of list at specific index

I'm trying to figure out how to delete an entire list at a specific index if the first element on the inside list meets a certain condition. Below I've shown an example of what I want done but when I run the code I'm getting a list index out of range error in python. If the list[i][0] meets a certain condition I want that entire list delete from the overall list.
list = [[0, 0, 0], [1, 1, 1], [2, 2, 2]]
for i in range(0, len(list)):
if list[i][0] == 0:
del list[i]
return list
Below I've shown a picture of what happens when I run the sample code in IDLE, the first time I run the loop it gives an error but the second time I run the code (copy and pasted both times) it doesn't and it does what I'm asking.
Weird Python Error
Deleting an element from the middle of a list moves everything down, breaking indexing. You need to either remove elements from the end:
for i in range(len(lst)-1, -1, -1):
if lst[i][0] == 0:
del lst[i]
or build a new list and assign it back to the variable, which would also be much more efficient:
lst = [x for x in lst if x[0] != 0]
Try changing your code to this, and 'list' is not a good variable name since it's already a builtin function:
my_list = [[0, 0, 0], [1, 1, 1], [2, 2, 2]]
my_list = [l for l in list if l[0] != 0]
print(my_list)
Usually it's not a good idea to remove elements from list while iterating through it like that because the first time you remove an element the list will no longer be the same length.
1) Create completely new list which will contain elements you want to keep:
listname = [element for element in listname if element[0] != 0]
2) Modify the list you already have (you can do this since lists are mutable):
listname[:] = [element for element in listname if element[0] != 0]
I would recommend using the second approach in case you have references to the same list somewhere else in you program.
Also try not to name your variables list, it's really not a good practice and it probably is not possible since it's keyword.
First, do not ever call your variables list.
Second, a while loop may be a slightly better solution:
stop = range(len(mylist))
i = 0
while i < stop:
if mylist[i][0] == 0:
del mylist[i]
stop -= 1
else:
i += 1
Third, a list comprehension is event better:
[item for item in mylist if item[0] != 0]
It's always bad idea to remove elements from a container while you are iterating over it.
A better approach would be to instead of removing bad elements from the list, copy good ones to a new list:
original = [[0, 0, 0], [1, 1, 1], [2, 2, 2]]
new_list = []
for l in original:
if list[0] != 0:
new_list.append(l)
return new_list
And by the way, "list" is a python keyword and can't be used as variable name.
Or better, use the built-in filter function:
return filter(lambda l : l[0] != 0, original)

Categories