Multidimensional list match in python - python

This has caused some serious headache today.
Suppose I have two instances of my object, instance A and instance B. These come with properties is the form of a list. Say the two properties for A are
a1 = [1, 2, 3, 4, 5]
a2 = [10, 20, 30, 40, 50]
and those for B:
b1 = [5, 7, 3, 1]
b2 = [50, 20, 30, 20]
What I want is to simply find the indices in b1 and b2, where a pair equals the values in a1 and a2. So in this example this would be the indices 0 and 2 since for those we have
b1[0] = 5 and b2[0] = 50
which we find in a1 and a2 as the last entries. Same for index 2 for which we find (3, 30) in (b1, b2) which is also in (a1, a2).
Note here, that the lists a1 and a2 have always the same length as well as b1 and b2.
Any help? 😊

You can use a combination of zip, set and enumerate:
>>> a1 = [1, 2, 3, 4, 5]
>>> a2 = [10, 20, 30, 40, 50]
>>> b1 = [5, 7, 3, 1]
>>> b2 = [50, 20, 30, 20]
>>> a12 = set(zip(a1, a2))
>>> [i for i, e in enumerate(zip(b1, b2)) if e in a12]
[0, 2]
With zip, you group the pairs together, and with set you turn them into a set, as order does not matter and set have faster lookup. Then, enumerate gives you pairs of indices and elements, and using the list-comprehension you get those indices from b12 whose elements are in a12.

I think another structure would be better?
a tuple, or a key set ...
a = [(1,10),(2,20)] and so on
edit
well... tobias_k shows you how :)

Try this
In [38]: [b1.index(i[0]) for i in zip(a1,a2) for j in zip(b1,b2) if i==j]
Out[38]: [2, 0]

There is also the possibility to check for each element in (a1, a2) whether it is in (b1, b2) and it will return all matches in a list and will take care of duplicates:
a1 = [1, 2, 3, 4, 5]
a2 = [10, 20, 30, 40, 50]
b1 = [5, 7, 3, 1, 5]
b2 = [50, 20, 30, 20, 50]
# Construct list of tuples for easier matching
pair_a = [(i, k) for i, k in zip(a1, a2)]
pair_b = [(i, k) for i, k in zip(b1, b2)]
# Get matching indices (for each entry in pair_a get the indices in pair_b)
indices = [[i for i, j in enumerate(pair_b) if j == k] for k in pair_a]
gives
[[], [], [2], [], [0, 4]]

Related

Mapping issue with multiple lists in Python

I have two lists J1 and A1. I have another list J2 with some elements from J1. I want to print corresponding values from A1 using A2. I present the current and expected output.
J1 = [1, 7, 9, 11]
A1 = [2.1,6.9,7.3,5.4]
J2 = [1, 9]
J2,A2=map(list, zip(*((a, b) for a, b in zip(J2,A1))))
print(A2)
The current output is
[2.1, 6.9]
The expected output is
[2.1, 7.3]
Another variation, closer to the original:
A2 = [a for a,j in zip(A1,J1) if j in J2]
Define a dict using the keys in J1 and the values in A, then use the values in J2 as keys to look up in the new dict. operator.itemgetter will be useful.
>>> from operator import itemgetter
>>> d = dict(zip(J1, A1))
>>> A2 = list(itemgetter(*J2)(d))
>>> A2
[2.1, 7.3]
J1 = [1, 7, 9, 11]
A1 = [2.1,6.9,7.3,5.4]
J2 = [1, 9]
A2 = [A1[J1.index(a)] for a in J2]
print(A2)

Retrieving initial lists used for creating a Numpy array

Lets say one has a numpy array generated from lists
import numpy as np
a1 = [1,2,3,4]
a2 = [11,22,33,44]
a3 = [111,222,333,444]
a4 = [1111,2222,3333,4444]
a = []
for x in a1:
for y in a2:
for k in a3:
for l in a4:
a.append((x, y, k, l))
na = np.array(a)
Now the goal is to retrieve these initial lists from this 2D numpy array. One solution is
na.shape = (4,4,4,4,4)
a1 = na[:,0,0,0,0]
a2 = na[0,:,0,0,1]
a3 = na[0,0,:,0,2]
a4 = na[0,0,0,:,3]
print(a1)
print(a2)
print(a3)
print(a4)
[1 2 3 4]
[11 22 33 44]
[111 222 333 444]
[1111 2222 3333 4444]
This is perfectly fine and my first choice. I'm simply wondering if there's also a fancy way of doing this, thanks
If the values in each original array are always unique you could use numpy's "unique" to find unique values in each column like this:
#--- your code
import numpy as np
a1 = [1,2,3,4]
a2 = [11,22,33,44]
a3 = [111,222,333,444]
a4 = [1111,2222,3333,4444]
a = []
for x in a1:
for y in a2:
for k in a3:
for l in a4:
a.append((x, y, k, l))
na = np.array(a)
#--- suggested solution
original_arrays = [np.unique(column) for column in na.T]
>>> original_arrays
[array([1, 2, 3, 4]),
array([11, 22, 33, 44]),
array([111, 222, 333, 444]),
array([1111, 2222, 3333, 4444])]
Details of the solution:
First we loop through the columns of the array using list comprehension to construct a list of our outputs (instead of creating an empty list and appending to it in a for loop)
columns = [column for column in na.T]
Now instead of just looping through the columns we find the unique values in each column using the numpy "unique" function.
original_arrays = [np.unique(column) for column in na.T]
And the result is a list of NumPy arrays containing the unique values in each column:
>>> original_arrays
[array([1, 2, 3, 4]),
array([11, 22, 33, 44]),
array([111, 222, 333, 444]),
array([1111, 2222, 3333, 4444])]
The initial na and shape:
In [117]: na
Out[117]:
array([[ 1, 11, 111, 1111],
[ 1, 11, 111, 2222],
[ 1, 11, 111, 3333],
...,
[ 4, 44, 444, 2222],
[ 4, 44, 444, 3333],
[ 4, 44, 444, 4444]])
In [118]: na.shape
Out[118]: (256, 4)
Your indexing works with
naa=na.reshape(4,4,4,4,4)
Initially I missed the fact that you were using
na.shape = (4,4,4,4,4)
to do this reshape. (I use reshape far more often than the in-place reshape.)
The a# values appear in the respective columns, but with many repeats. You can skip those with the right slicing.
In [119]: na[:4,3]
Out[119]: array([1111, 2222, 3333, 4444])
In [122]: na[:16:4,2]
Out[122]: array([111, 222, 333, 444])
In [123]: na[:16*4:16,1]
Out[123]: array([11, 22, 33, 44])
In [124]: na[:16*4*4:16*4,0]
Out[124]: array([1, 2, 3, 4])
On the 5d version, your solution is probably as good as any. It's not a common arrangement of values, so it's unlikely that there will be a built-in shortcut.

Inserting elements of one list into another list at different positions in python

Consider the two lists:
a=[1,2,3]
and
b=[10,20,30],
and a list of positions
pos=[p1,p2,p3]
giving the positions that the elements of b should take in the final list of 6 elements given by the union of a and b, where p1 is the position of b[0]=10, p2 is the position of b[1]=20 and p3 is the position of b[2]=30.
What is the best python approach to this problem?
You could create the output list by extending it with slices of a and appending the next item of b where needed:
def insert(a, b, positions):
# reorder b and positions so that positions are in increasing order
positions, b = zip(*sorted(zip(positions, b)))
out = []
a_idx = 0
it_b = iter(b)
for pos in positions:
slice_length = pos - len(out)
out.extend(a[a_idx:a_idx + slice_length])
out.append(next(it_b))
a_idx += slice_length
out.extend(a[a_idx:])
return out
An example:
a=[1,2,3]
b=[10,20,30]
pos=[0, 1, 5]
insert(a, b, pos)
# [10, 20, 1, 2, 3, 30]
pos = [0, 2, 4]
insert(a, b, pos)
# [10, 1, 20, 2, 30, 3]
pos=[5, 3, 0]
insert(a, b, pos)
# [30, 1, 2, 20, 3, 10]
If you make the indices and values into a dictionary, you can then loop over the range of the combined lengths. If the index is in the dict, use the value, otherwise take the next value from a:
a = [1,2,3]
b = [10,20,30]
pos =[2,0,5]
p_b = dict(zip(pos, b))
it_a = iter(a)
[p_b[i] if i in p_b else next(it_a) for i in range(len(a) + len(b))]
# [20, 1, 10, 2, 3, 30]
You will need to insure that the lengths of the arrays and the positions all make sense. If they don't you can run out of a values which will produce a StopIteration exception.
You use a defaultdict for similar approach, which simplifies the list comprehension at the expense of a slightly more complicated setup:
from collections import defaultdict
a = [1,2,3]
b = [10,20,30]
pos =[4,0,2]
it_a = iter(a)
d = defaultdict(lambda: next(it_a))
d.update(dict(zip(pos, b)))
[d[i] for i in range(len(a) + len(b))]
# [20, 1, 30, 2, 10, 3]

Getting unique values in python using List Comprehension technique

I want to get the values that appear in one of the lists but not in the others. I even tried using '<>', it says invalid syntax. I am trying using list comprehensions.
com_list = []
a1 = [1,2,3,4,5]
b1 = [6,4,2,1]
come_list = [a for a in a1 for b in b1 if a != b ]
Output:
[1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5]
My expected output would be `[3, 5, 6]
What you want is called symmetric difference, you can do:
a1 = [1,2,3,4,5]
b1 = [6,4,2,1]
set(a1).symmetric_difference(b1)
# {3, 5, 6}
which you can also write as:
set(a1) ^ set(b1)
If you really want a list in the end, just convert it:
list(set(a1) ^ set(b1))
# [3, 5, 6]
a1 = [1,2,3,4,5]
b1 = [6,4,2,1]
If you really want to do that using list comprehensions, well, here it is, but it's really not the right thing to do here.
A totally inefficient version:
# Don't do that !
sym_diff = [x for x in a1+b1 if x in a1 and x not in b1 or x in b1 and x not in a1]
print(sym_diff)
# [3, 5, 6]
It would be a bit better using sets to test membership efficiently:
# Don't do that either
a1 = set([1,2,3,4,5])
b1 = set([6,4,2,1])
sym_diff = [x for x in a1|b1 if x in a1 and x not in b1 or x in b1 and x not in a1]
print(sym_diff)
# [3, 5, 6]
But if you start using sets, which is the right thing to do here, use them all the way properly and use symmetric_difference.
You can do
come_list =[i for i in list((set(a1) - set(b1))) + list((set(b1) - set(a1)))]
print(come_list)
Output
[3, 5, 6]
This new list contains all unique numbers for both of the lists together.
the problem with this line come_list = [a for a in a1 for b in b1 if a != b ] is that the items iterating over each item in the first list over all the items in the second list to check if it's inited but it's not giving unique numbers between both.

Splitting arrays depending on unique values in an array

I currently have two arrays, one of which has several repeated values and another with unique values.
Eg array 1 : a = [1, 1, 2, 2, 3, 3]
Eg array 2 : b = [10, 11, 12, 13, 14, 15]
I was developing a code in python that looks at the first array and distinguishes the elements that are all the same and remembers the indices. A new array is created that contains the elements of array b at those indices.
Eg: As array 'a' has three unique values at positions 1,2... 3,4... 5,6, then three new arrays would be created such that it contains the elements of array b at positions 1,2... 3,4... 5,6. Thus, the result would be three new arrays:
b1 = [10, 11]
b2 = [12, 13]
b3 = [14, 15]
I have managed to develop a code, however, it only works for when there are three unique values in array 'a'. In the case there are more or less unique values in array 'a', the code has to be physically modified.
import itertools
import numpy as np
import matplotlib.tri as tri
import sys
a = [1, 1, 2, 2, 3, 3]
b = [10, 10, 20, 20, 30, 30]
b_1 = []
b_2 = []
b_3 = []
unique = []
for vals in a:
if vals not in unique:
unique.append(vals)
if len(unique) != 3:
sys.exit("More than 3 'a' values - check dimension")
for j in range(0,len(a)):
if a[j] == unique[0]:
b_1.append(c[j])
elif a[j] == unique[1]:
b_2.append(c[j])
elif a[j] == unique[2]:
b_3.append(c[j])
else:
sys.exit("More than 3 'a' values - check dimension")
print (b_1)
print (b_2)
print (b_3)
I was wondering if there is perhaps a more elegant way to perform this task such that the code is able to cope with an n number of unique values.
Well given that you are also using numpy, here's one way using np.unique. You can set return_index=True to get the indices of the unique values, and use them to split the array b with np.split:
a = np.array([1, 1, 2, 2, 3, 3])
b = np.array([10, 11, 12, 13, 14, 15])
u, s = np.unique(a, return_index=True)
np.split(b,s[1:])
Output
[array([10, 11]), array([12, 13]), array([14, 15])]
You can use the function groupby():
from itertools import groupby
from operator import itemgetter
a = [1, 1, 2, 2, 3, 3]
b = [10, 11, 12, 13, 14, 15]
[[i[1] for i in g] for _, g in groupby(zip(a, b), key=itemgetter(0))]
# [[10, 11], [12, 13], [14, 15]]

Categories