Indices of duplicate lists in a nested list - python

I am trying to solve a problem that is a part of my genome alignment project. The problem goes as follows:
if given a nested list
y = [[1,2,3],[1,2,3],[3,4,5],[6,5,4],[4,2,5],[4,2,5],[1,2,8],[1,2,3]]
extract indices of unique lists into a nested list again.
For example, the output for the above nested list should be
[[0,1,7],[2],[3],[4,5],[6]].
This is because list [1,2,3] is present in 0,1,7th index positions, [3,4,5] in 2nd index position and so on.
Since I will be dealing with large lists, what could be the most optimal way of achieving this in Python?

You could create an dictionary (or OrderedDict if on older pythons). The keys of the dict will be tuples of the sub-lists and the values will be an array of indexes. After looping through, the dictionary values will hold your answer:
from collections import OrderedDict
y = [[1,2,3],[1,2,3],[3,4,5],[6,5,4],[4,2,5],[4,2,5],[1,2,8],[1,2,3]]
lookup = OrderedDict()
for idx,l in enumerate(y):
lookup.setdefault(tuple(l), []).append(idx)
list(lookup.values())
# [[0, 1, 7], [2], [3], [4, 5], [6]]

You could use list comprehension and range to check for duplicate indexes and append them to result.
result = []
for num in range(len(y)):
occurances = [i for i, x in enumerate(y) if x == y[num]]
if occurances not in result: result.append(occurances)
result
#[[0, 1, 7], [2], [3], [4, 5], [6]]

Consider numpy to solve this:
import numpy as np
y = [
[1, 2, 3],
[1, 2, 3],
[3, 4, 5],
[6, 5, 4],
[4, 2, 5],
[4, 2, 5],
[1, 2, 8],
[1, 2, 3]
]
# Returns unique values of array, indices of that
# array, and the indices that would rebuild the original array
unique, indices, inverse = np.unique(y, axis=0, return_index=True, return_inverse=True)
Here's a print out of each variable:
unique = [
[1 2 3]
[1 2 8]
[3 4 5]
[4 2 5]
[6 5 4]]
indices = [0 6 2 4 3]
inverse = [0 0 2 4 3 3 1 0]
If we look at our variable - inverse, we can see that we do indeed get [0, 1, 7] as the index positions for our first unique element [1,2,3], all we need to do now is group them appropriately.
new_list = []
for i in np.argsort(indices):
new_list.append(np.where(inverse == i)[0].tolist())
Output:
new_list = [[0, 1, 7], [2], [3], [4, 5], [6]]
Finally, refs for the code above:
Numpy - unique, where, argsort

One more solution:
y = [[1, 2, 3], [1, 2, 3], [3, 4, 5], [6, 5, 4], [4, 2, 5], [4, 2, 5], [1, 2, 8], [1, 2, 3]]
occurrences = {}
for i, v in enumerate(y):
v = tuple(v)
if v not in occurrences:
occurrences.update({v: []})
occurrences[v].append(i)
print(occurrences.values())

Related

How can i sum up all values with the same index in a dictionary which each key has a nested list as a value?

I have a dictionary, each key of dictionary has a list of list (nested list) as its value. What I want is imagine we have:
x = {1: [[1, 2], [3, 5]], 2: [[2, 1], [2, 6]], 3: [[1, 5], [5, 4]]}
My question is how can I access each element of the dictionary and concatenate those with same index: for example first list from all keys:
[1,2] from first keye +
[2,1] from second and
[1,5] from third one
How can I do this?
You can access your nested list easily when you're iterating through your dictionary and append it to a new list and the you apply the sum function.
Code:
x={1: [[1,2],[3,5]] , 2:[[2,1],[2,6]], 3:[[1,5],[5,4]]}
ans=[]
for key in x:
ans += x[key][0]
print(sum(ans))
Output:
12
Assuming you want a list of the first elements, you can do:
>>> x={1: [[1,2],[3,5]] , 2:[[2,1],[2,6]], 3:[[1,5],[5,4]]}
>>> y = [a[0] for a in x.values()]
>>> y
[[1, 2], [2, 1], [1, 5]]
If you want the second element, you can use a[1], etc.
The output you expect is not entirely clear (do you want to sum? concatenate?), but what seems clear is that you want to handle the values as matrices.
You can use numpy for that:
summing the values
import numpy as np
sum(map(np.array, x.values())).tolist()
output:
[[4, 8], [10, 15]] # [[1+2+1, 2+1+5], [3+2+5, 5+6+4]]
concatenating the matrices (horizontally)
import numpy as np
np.hstack(list(map(np.array, x.values()))).tolist()
output:
[[1, 2, 2, 1, 1, 5], [3, 5, 2, 6, 5, 4]]
As explained in How to iterate through two lists in parallel?, zip does exactly that: iterates over a few iterables at the same time and generates tuples of matching-index items from all iterables.
In your case, the iterables are the values of the dict. So just unpack the values to zip:
x = {1: [[1, 2], [3, 5]], 2: [[2, 1], [2, 6]], 3: [[1, 5], [5, 4]]}
for y in zip(*x.values()):
print(y)
Gives:
([1, 2], [2, 1], [1, 5])
([3, 5], [2, 6], [5, 4])

Can't index multiple elements in list of lists in Python (using the : operator)

I found it strange that indexing using range(:) operator for list of lists is not supported.
Sometimes this result in strange values :
a = [[1, 2], [3, 4], [5, 6], [7, 8]]
>>> a
[[1, 2], [3, 4], [5, 6], [7, 8]]
>>> a[0][1]
2
>>> a[1][1]
4
>>> a[2][1]
6
However,
>>> a[0:3][1]
[3, 4]
I was expecting [2,4,6]. What am I missing here ?
I tried this on Numpy arrays as well.enter code here
>>> a
[[1, 2], [3, 4], [5, 6], [7, 8]]
>>> a[0][1]
2
>>> a[1][1]
4
>>> a[2][1]
6
>>> a[0:3][1]
[3, 4]
I know I can use list comprehension, but my question is whether ":" is supported for list of lists?
numpy arrays do support slicing, but you're not considering the shape of the array. In numpy, this array has shape:
import numpy as np
a = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
print(a.shape)
>>>(4, 2)
meaning it's 4x2. If you slice [0:3] you're returning the first three elements of the 1st dimension. i.e.:
print(a[0:3])
>>>[[1 2]
[3 4]
[5 6]]
this output has shape:
print(a[0:3].shape)
>>>(3, 2)
if you do:
print(a[0:3][1])
>>>[3 4]
You are again calling the first element of the first dimension of the array that has shape (3, 2).
Instead you want to call:
print(a[0:3][:,1])
>>>[2 4 6]
Which gives you all of the row elements (i.e. all three elements of the first dimension) at column index 1 (where 0 and 1 represent the indexes for the two dimensions of the second dimension).
even cleaner (recommended):
print(a[0:3, 1])
>>>[2 4 6]
Using : is totally supported. Explained below...
So we start with:
a = [[1, 2], [3, 4], [5, 6], [7, 8]]
You asked about:
a[0:3][1]
We want the items from list a, from positions zero to three [0:3]. Those items returned are
[1, 2] --- position 0
[3, 4] --- position 1
[5, 6] --- position 2
[7, 8] --- position 3
Then we request from that list the item in position 1, which returns:
[3, 4]
If you want to access items inside that smaller list you need to add another index, like this:
a[0:3][1][1]
would return:
4
Diagram of basic string splitting:
Your first bracket (represented in blue) is saying "give me elements in list a between positions 0 and 3, which in this case, is ALL of them.
Your second bracket (represented in red) is saying "of the results of my first bracket, give me the element that is in position 1", which is the entire sub-list [3,4]
In this specific case
a[0:3][1]
could have simply been written as
a[1]
let us assume a list of list
list=[[1,2],[3,4],[5,6],[7,8]]
then,
list[0:3]
will return a list with elements(which are also list) from index 0 to 2
[[1, 2], [3, 4], [5, 6]]
so according list[0:3][1] will return the second element([3,4]) whose index is "1" .
a[0:3][1] will not return[2,4,6] , it returns the list of list with 3 element and chooses the second element.
When you call a[0:3] the result of that is a list with the first three elements of a. You then call a[0:3][1] which returns the 2nd element of that list which is the list [3,4].
Ordinary Python lists do not support this kind of slicing.
You can get [2, 4, 6] with Numpy:
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
>>> a[0:3, 1]
array([2, 4, 6])
a = [[1, 2], [3, 4], [5, 6], [7, 8]]
a[0:3]
the output of this is a list:
>>> [[1, 2], [3, 4], [5, 6]]
Therefore:
a[0:3][1]
Accesses the element at index 1, which is [3, 4]
To get the desired output from your list, use list comprehension:
[x[1] for x in a[0:3]]
>>> [2, 4, 6]

How to compare two lists row wise (1st row with 1st row,2nd row with 2nd row likewise)?

I have three list of index values
indexval
0 [3, 2, 7, 5]
1 [1]
2 [4, 1, 6, 2]
3 [2,3,1]
Then value list
value
0 [1]
1 [0]
2 [3]
3 [2]
What I want is,indexval should randomly select "n" number of values in each row.That "n" number should refer the value list so that i should get output like this.
0 [3]
1 [ ]
2 [1,2,4]
3 [1,2]
Can anyone help me with this...
You can do:
import random
LoL=[[3, 2, 7, 5],
[1],
[4, 1, 6, 2],
[2,3,1]]
vals=[1,0,3,2]
for x,sl in zip(vals,LoL):
print [random.choice(sl) for _ in range(x)]
Or more concisely:
>>> ([[random.choice(sl) for _ in range(x)] for x,sl in zip(vals,LoL)])
[[3], [], [4, 6, 6], [3, 1]]
As stated in comments, you can also use random.sample if you do not want any values reused from the list or random.choices (Python 3.6+) if you want to add weightings to the section from the list.
Example with random.sample:
>>> [random.sample(sl, k) for k,sl in zip(vals,LoL)]
[[7], [], [6, 4, 2], [1, 2]]
Given the two lists. you can use random.sample.
This randomly picks k unique items from a population sequence (i.e. random sampling without replacement in stat. terms):
import random
indexval = [[3, 2, 7, 5], [1], [4, 1, 6, 2], [2, 3, 1]]
value = [1, 0, 3, 2]
for i in range(len(indexval)):
print(random.sample(population = indexval[i], k = value[i]))
[2]
[]
[1, 6, 4]
[1, 2]
To perform random sampling with replacement (i.e. non unique items are allowed) use random.choices, refer dawg's solution.

Get unique elements from a 2D list

I have a 2D list which I create like so:
Z1 = [[0 for x in range(3)] for y in range(4)]
I then proceed to populate this list, such that Z1 looks like this:
[[1, 2, 3], [4, 5, 6], [2, 3, 1], [2, 5, 1]]
I need to extract the unique 1x3 elements of Z1, without regard to order:
Z2 = makeUnique(Z1) # The solution
The contents of Z2 should look like this:
[[4, 5, 6], [2, 5, 1]]
As you can see, I consider [1, 2, 3] and [2, 3, 1] to be duplicates because I don't care about the order.
Also note that single numeric values may appear more than once across elements (e.g. [2, 3, 1] and [2, 5, 1]); it's only when all three values appear together more than once (in the same or different order) that I consider them to be duplicates.
I have searched dozens of similar problems, but none of them seems to address my exact issue. I'm a complete Python beginner so I just need a push in the right direction.
I have already tried :
Z2= dict((x[0], x) for x in Z1).values()
Z2= set(i for j in Z2 for i in j)
But this does not produce the desired behaviour.
Thank you very much for your help!
Louis Vallance
If the order of the elements inside the sublists does not matter, you could use the following:
from collections import Counter
z1 = [[1, 2, 3], [4, 5, 6], [2, 3, 1], [2, 5, 1]]
temp = Counter([tuple(sorted(x)) for x in z1])
z2 = [list(k) for k, v in temp.items() if v == 1]
print(z2) # [[4, 5, 6], [1, 2, 5]]
Some remarks:
sorting makes lists [1, 2, 3] and [2, 3, 1] from the example equal so they get grouped by the Counter
casting to tuple converts the lists to something that is hashable and can therefore be used as a dictionary key.
the Counter creates a dict with the tuples created above as keys and a value equal to the number of times they appear in the original list
the final list-comprehension takes all those keys from the Counter dictionary that have a count of 1.
If the order does matter you can use the following instead:
z1 = [[1, 2, 3], [4, 5, 6], [2, 3, 1], [2, 5, 1]]
def test(sublist, list_):
for sub in list_:
if all(x in sub for x in sublist):
return False
return True
z2 = [x for i, x in enumerate(z1) if test(x, z1[:i] + z1[i+1:])]
print(z2) # [[4, 5, 6], [2, 5, 1]]

finding a list in a list of list based on one element

I have a list of lists representing a connectivity graph in Python. This list look like a n*2 matrix
example = [[1, 2], [1, 5], [1, 8], [2, 1], [2, 9], [2,5] ]
what I want to do is to find the value of the first elements of the lists where the second element is equal to a user defined value. For instance :
input 1 returns [2] (because [2,1])
input 5 returns [1,2] (because [1,5] and [2,5])
input 7 returns []
in Matlab, I could use
output = example(example(:,1)==input, 2);
but I would like to do this in Python (in the most pythonic and efficient way)
You can use list comprehension as a filter, like this
>>> example = [[1, 2], [1, 5], [1, 8], [2, 1], [2, 9], [2,5]]
>>> n = 5
>>> [first for first, second in example if second == n]
[1, 2]
You can work with the Python functions map and filter very comfortable:
>>> example = [[1, 2], [1, 5], [1, 8], [2, 1], [2, 9], [2,5] ]
>>> n = 5
>>> map(lambda x: x[0], filter(lambda x: n in x, example))
[1,2]
With lambda you can define anonyme functions...
Syntax:
lambda arg0,arg1...: e
arg0,arg1... are your parameters of the fucntion, and e is the expression.
They use lambda functions mostly in functions like map, reduce, filter etc.
exemple = [[1, 2], [1, 5], [1, 8], [2, 1], [2, 9], [2,5] ]
foundElements = []
** input = [...] *** List of Inputs
for item in exemple:
if item[1] in input :
foundElements.append(item[0])

Categories