find unique lists inside another list in an efficient way - python

solution = [[1,0,0],[0,1,0], [1,0,0], [1,0,0]]
I have the above nested list, which contain some other lists inside it, how do we need to get the unique lists inside the solution
output = [[1,0,0],[0,1,0]
Note: each list is of same size
Things I have tried :
Take each list and compare with all other lists to see if its duplicated or not ? but it is very slow..
How can I check before inserting inserting list , is there any deuplicate of it so to avoid inserting duplicates

If you don't care about the order, you can use set:
solution = [[1,0,0],[0,1,0],[1,0,0],[1,0,0]]
output = set(map(tuple, solution))
print(output) # {(1, 0, 0), (0, 1, 0)}

Since lists are mutable objects you can't really check identity very quickly. You could convert to tuple, however, and store the tuple-ized view of each list in a set.
Tuples are heterogenous immutable containers, unlike lists which are mutable and idiomatically homogenous.
from typing import List, Any
def de_dupe(lst: List[List[Any]]) -> List[List[Any]]:
seen = set()
output = []
for element in lst:
tup = tuple(element)
if tup in seen:
continue # we've already added this one
seen.add(tup)
output.append(element)
return output
solution = [[1,0,0],[0,1,0], [1,0,0], [1,0,0]]
assert de_dupe(solution) == [[1, 0, 0], [0, 1, 0]]

Pandas duplicate might be of help.
import pandas as pd
df=pd.DataFrame([[1,0,0],[0,1,0], [1,0,0], [1,0,0]])
d =df[~df.duplicated()].values.tolist()
Output
[[1, 0, 0], [0, 1, 0]]
or, since you tag multidimensional-array, you can use numpy approach.
import numpy as np
def unique_rows(a):
a = np.ascontiguousarray(a)
unique_a = np.unique(a.view([('', a.dtype)]*a.shape[1]))
return unique_a.view(a.dtype).reshape((unique_a.shape[0], a.shape[1]))
arr=np.array([[1,0,0],[0,1,0], [1,0,0], [1,0,0]])
output=unique_rows(arr).tolist()
Based on the suggestion in this OP

try this solution :
x=[[1,0,0],[0,1,0], [1,0,0], [1,0,0]]
Import numpy and convert the nested list into a numpy array
import numpy as np
a1=np.array(x)
find unique across rows
a2 = np.unique(a1,axis=0)
Convert it back to a nested list
a2.tolist()
Hope this helps

while lists are not hashable and therefore inefficient to duplicate, tuples are. So one way would be to transform your list into tuples and duplicate those.
>>> solution_tuples = [(1,0,0), (0,1,0), (1,0,0), (1,0,0)]
>>> set(solution_tuples)
{(1, 0, 0), (0, 1, 0)}

Related

Get unique values in a list of numpy arrays

I have a list made up of arrays. All have shape (2,).
Minimum example: mylist = [np.array([1,2]),np.array([1,2]),np.array([3,4])]
I would like to get a unique list, e.g.
[np.array([1,2]),np.array([3,4])]
or perhaps even better, a dict with counts, e.g. {np.array([1,2]) : 2, np.array([3,4]) : 1}
So far I tried list(set(mylist)), but the error is TypeError: unhashable type: 'numpy.ndarray'
As the error indicates, NumPy arrays aren't hashable. You can turn them to tuples, which are hashable and build a collections.Counter from the result:
from collections import Counter
Counter(map(tuple,mylist))
# Counter({(1, 2): 2, (3, 4): 1})
If you wanted a list of unique tuples, you could construct a set:
set(map(tuple,mylist))
# {(1, 2), (3, 4)}
In general, the best option is to use np.unique method with custom parameters
u, idx, counts = np.unique(X, axis=0, return_index=True, return_counts=True)
Then, according to documentation:
u is an array of unique arrays
idx is the indices of the X that give the unique values
counts is the number of times each unique item appears in X
If you need a dictionary, you can't store hashable values in its keys, so you might like to store them as tuples like in #yatu's answer or like this:
dict(zip([tuple(n) for n in u], counts))
Pure numpy approach:
numpy.unique(mylist, axis=0)
which produces a 2d array with your unique arrays in rows:
numpy.array([
[1 2],
[3 4]])
Works if all your arrays have same length (like in your example).
This solution can be useful depending on what you do earlier in your code: perhaps you would not need to get into plain Python at all, but stick to numpy instead, which should be faster.
Use the following:
import numpy as np
mylist = [np.array([1,2]),np.array([1,2]),np.array([3,4])]
np.unique(mylist, axis=0)
This gives out list of uniques arrays.
array([[1, 2],
[3, 4]])
Source: https://numpy.org/devdocs/user/absolute_beginners.html#how-to-get-unique-items-and-counts

List of strings to array of integers

From a list of strings, like this one:
example_list = ['010','101']
I need to get an array of integers, where each row is each one of the strings, being each character in one column, like this one:
example_array = np.array([[0,1,0],[1,0,1]])
I have tried with this code, but it isn't working:
example_array = np.empty([2,3],dtype=int)
i = 0 ; j = 0
for string in example_list:
for bit in string:
example_array[i,j] = int(bit)
j+=1
i+=1
Can anyone help me? I am using Python 3.6.
Thank you in advance for your help!
If all strings are the same length (this is crucial to building a contiguous array), then use view to efficiently separate the characters.
r = np.array(example_list)
r = r.view('<U1').reshape(*r.shape, -1).astype(int)
print(r)
array([[0, 1, 0],
[1, 0, 1]])
You could also go the list comprehension route.
r = np.array([[*map(int, list(l))] for l in example_list])
print(r)
array([[0, 1, 0],
[1, 0, 1]])
The simplest way is to use a list comprehension because it automatically generates the output list for you, which can be easily converted to a numpy array. You could do this using multiple for loops, but then you are stuck creating your list, sub lists, and appending to them. While not difficult, the code looks more elegant with list comprehensions.
Try this:
newList = np.array([[int(b) for b in a] for a in example_list])
newList now looks like this:
>>> newList
... [[0, 1, 0], [1, 0, 1]]
Note: there is not need to invoke map at this point, though that certainly works.
So what is going on here? We are iterating through your original list of strings (example_list) item-by-item, then iterating through each character within the current item. Functionally, this is equivalent to...
newList = []
for a in example_list:
tmpList = []
for b in a:
tmpList.append(int(b))
newList.append(tmpList)
newList = np.array(newList)
Personally, I find the multiple for loops to be easier to understand for beginners. However, once you grasp the list comprehensions you probably won't want to go back.
You could do this with map:
example_array = map(lambda x: map(lambda y: int(y), list(x)), example_list)
The outer lambda performs a list(x) operation on each item in example_list. For example, '010' => ['0','1','0']. The inner lambda converts the individual characters (resultants from list(x)) to integers. For example, ['0','1','0'] => [0,1,0].

Shortest way to linearize a list in Python

I want to make a list with linearly increasing values from a list with non-linearly increasing values in Python. For example
input =[10,10,10,6,6,4,1,1,1,10,10]
should be transformed to:
output=[0,0,0,1,1,2,3,3,3,0,0]
My code uses a python dictionary
def linearize(input):
"""
Remap a input list containing values in non linear-indices list
i.e.
input = [10,10,10,6,6,3,1,1]
output= [0,0,0,1,1,2,3,3]
"""
remap={}
i=0
output=[0]*len(input)
for x in input:
if x not in remap.keys():
remap[x]=i
i=i+1
for i in range(0,len(input)):
output[i]=remap[input[i]]
return output
but I know this code can be more efficient. Some ideas to do this task better and in a more pythonic way, Numpy is an option?
This function has to be called very frequently on big lists.
As per your comment in the question, you are looking for something like this
data = [8,8,6,6,3,8]
from itertools import count
from collections import defaultdict
counter = defaultdict(lambda x=count(): next(x))
print([counter[item] for item in data])
# [0, 0, 1, 1, 2, 0]
Thanks to poke,
list(map(lambda i, c=defaultdict(lambda c=count(): next(c)): c[i], data))
Its just a one liner now :)
Use collections.OrderedDict:
In [802]: from collections import OrderedDict
...: odk=OrderedDict.fromkeys(l).keys()
...: odk={k:i for i, k in enumerate(odk)}
...: [odk[i] for i in l]
Out[802]: [0, 0, 0, 1, 1, 2, 3, 3, 3]
A simpler solution without imports:
input =[10,10,10,6,6,4,1,1,1,10,10]
d = {}
result = [d.setdefault(x, len(d)) for x in input]
I came up with this function using numpy which in my tests worked faster than yours when input list was very big like 2,000,000 elements.
import numpy as np
def linearize(input):
unique, inverse = np.unique(input, return_inverse=True)
output = (len(unique)-1) - inverse
return output
Also, this function only works if your input is in Descending order like your example.
Let me know if it helps.

How to replace values at specific indexes of a python list?

If I have a list:
to_modify = [5,4,3,2,1,0]
And then declare two other lists:
indexes = [0,1,3,5]
replacements = [0,0,0,0]
How can I take to_modify's elements as index to indexes, then set corresponding elements in to_modify to replacements, i.e. after running, indexes should be [0,0,3,0,1,0].
Apparently, I can do this through a for loop:
for ind in to_modify:
indexes[to_modify[ind]] = replacements[ind]
But is there other way to do this?
Could I use operator.itemgetter somehow?
The biggest problem with your code is that it's unreadable. Python code rule number one, if it's not readable, no one's gonna look at it for long enough to get any useful information out of it. Always use descriptive variable names. Almost didn't catch the bug in your code, let's see it again with good names, slow-motion replay style:
to_modify = [5,4,3,2,1,0]
indexes = [0,1,3,5]
replacements = [0,0,0,0]
for index in indexes:
to_modify[indexes[index]] = replacements[index]
# to_modify[indexes[index]]
# indexes[index]
# Yo dawg, I heard you liked indexes, so I put an index inside your indexes
# so you can go out of bounds while you go out of bounds.
As is obvious when you use descriptive variable names, you're indexing the list of indexes with values from itself, which doesn't make sense in this case.
Also when iterating through 2 lists in parallel I like to use the zip function (or izip if you're worried about memory consumption, but I'm not one of those iteration purists). So try this instead.
for (index, replacement) in zip(indexes, replacements):
to_modify[index] = replacement
If your problem is only working with lists of numbers then I'd say that #steabert has the answer you were looking for with that numpy stuff. However you can't use sequences or other variable-sized data types as elements of numpy arrays, so if your variable to_modify has anything like that in it, you're probably best off doing it with a for loop.
numpy has arrays that allow you to use other lists/arrays as indices:
import numpy
S=numpy.array(s)
S[a]=m
Why not just:
map(s.__setitem__, a, m)
You can use operator.setitem.
from operator import setitem
a = [5, 4, 3, 2, 1, 0]
ell = [0, 1, 3, 5]
m = [0, 0, 0, 0]
for b, c in zip(ell, m):
setitem(a, b, c)
>>> a
[0, 0, 3, 0, 1, 0]
Is it any more readable or efficient than your solution? I am not sure!
A little slower, but readable I think:
>>> s, l, m
([5, 4, 3, 2, 1, 0], [0, 1, 3, 5], [0, 0, 0, 0])
>>> d = dict(zip(l, m))
>>> d #dict is better then using two list i think
{0: 0, 1: 0, 3: 0, 5: 0}
>>> [d.get(i, j) for i, j in enumerate(s)]
[0, 0, 3, 0, 1, 0]
for index in a:
This will cause index to take on the values of the elements of a, so using them as indices is not what you want. In Python, we iterate over a container by actually iterating over it.
"But wait", you say, "For each of those elements of a, I need to work with the corresponding element of m. How am I supposed to do that without indices?"
Simple. We transform a and m into a list of pairs (element from a, element from m), and iterate over the pairs. Which is easy to do - just use the built-in library function zip, as follows:
for a_element, m_element in zip(a, m):
s[a_element] = m_element
To make it work the way you were trying to do it, you would have to get a list of indices to iterate over. This is doable: we can use range(len(a)) for example. But don't do that! That's not how we do things in Python. Actually directly iterating over what you want to iterate over is a beautiful, mind-liberating idea.
what about operator.itemgetter
Not really relevant here. The purpose of operator.itemgetter is to turn the act of indexing into something, into a function-like thing (what we call "a callable"), so that it can be used as a callback (for example, a 'key' for sorting or min/max operations). If we used it here, we'd have to re-call it every time through the loop to create a new itemgetter, just so that we could immediately use it once and throw it away. In context, that's just busy-work.
You can solve it using dictionary
to_modify = [5,4,3,2,1,0]
indexes = [0,1,3,5]
replacements = [0,0,0,0]
dic = {}
for i in range(len(indexes)):
dic[indexes[i]]=replacements[i]
print(dic)
for index, item in enumerate(to_modify):
for i in indexes:
to_modify[i]=dic[i]
print(to_modify)
The output will be
{0: 0, 1: 0, 3: 0, 5: 0}
[0, 0, 3, 0, 1, 0]
elif menu.lower() == "edit":
print ("Your games are: "+str (games))
remove = input("Which one do you want to edit: ")
add = input("What do you want to change it to: ")
for i in range(len(games)) :
if str(games[i]) == str(remove) :
games[i] = str(add)
break
else :
pass
pass
why not use it like this? replace directly from where it was removed and anyway you can add arrays and the do .sort the .reverse if needed

Python: Get item from list based on input

I appreciate this may not be directly possible so I would be interested how you would go about solving this problem for a general case.
I have a list item that looks like this, [(array,time),(array,time)...] the array is a numpy array which can have any n by m dimensions. This will look like array[[derivatives dimension1],[derivatives dimension 2] ...]
From the list I want a function to create two lists which would contain all the values at the position passed to it. These could then be used for plotting.
I can think of ways to do this with alternative data structures but unfortunately this is no an option.
Essentially what I want is
def f(list, pos1, pos2):
xs = []
ys = []
for i in list:
ys.append(i pos1)
xs.append(i pos2)
return xs, ys
Where i pos1 is equivalent to i[n][m]
The real problem being when it's 1 by 1 so i can't just pass integers.
Any advice would be great, sorry the post is a bit long I wanted to be clear.
Thanks
If I'm understanding your question correctly, you essentially want to select indexes from a list of lists, and create new lists from that selection.
Selecting indexes from a list of lists is fairly simple, particularly if you have a fixed number of selections:
parts = [(item[pos1], item[pos2]) for item in list]
Creating new lists from those selections is also fairly easy, using the built-in zip() function:
separated = zip(*parts)
You can further reduce memory usage by using a generator expression instead of a list comprehension in the final function:
def f( list, pos1, pos2 ):
partsgen = ((item[pos1], item[pos2]) for item in list)
return zip(*partsgen)
Here's how it looks in action:
>>> f( [['ignore', 'a', 1], ['ignore', 'b', 2],['ignore', 'c', 3]], 1, 2 )
[('a', 'b', 'c'), (1, 2, 3)]
Update: After re-reading the question and comments, I'm realizing this is a bit over-simplified. However, the general idea should still work when you exchange pos1 and pos2 for appropriate indexing into the contained array.
if i understand your question, something like the following should be easy and fast, particularly if you need to do this multiple times:
z = np.dstack([ arr for arr, time in lst ])
x, y = z[pos1], z[pos2]
for example:
In [42]: a = arange(9).reshape(3,3)
In [43]: z = np.dstack([a, a*2, a*3])
In [44]: z[0,0]
Out[44]: array([0, 0, 0])
In [45]: z[1,1]
Out[45]: array([ 4, 8, 12])
In [46]: z[0,1]
Out[46]: array([1, 2, 3])

Categories