Retrieving initial lists used for creating a Numpy array - python

Lets say one has a numpy array generated from lists
import numpy as np
a1 = [1,2,3,4]
a2 = [11,22,33,44]
a3 = [111,222,333,444]
a4 = [1111,2222,3333,4444]
a = []
for x in a1:
for y in a2:
for k in a3:
for l in a4:
a.append((x, y, k, l))
na = np.array(a)
Now the goal is to retrieve these initial lists from this 2D numpy array. One solution is
na.shape = (4,4,4,4,4)
a1 = na[:,0,0,0,0]
a2 = na[0,:,0,0,1]
a3 = na[0,0,:,0,2]
a4 = na[0,0,0,:,3]
print(a1)
print(a2)
print(a3)
print(a4)
[1 2 3 4]
[11 22 33 44]
[111 222 333 444]
[1111 2222 3333 4444]
This is perfectly fine and my first choice. I'm simply wondering if there's also a fancy way of doing this, thanks

If the values in each original array are always unique you could use numpy's "unique" to find unique values in each column like this:
#--- your code
import numpy as np
a1 = [1,2,3,4]
a2 = [11,22,33,44]
a3 = [111,222,333,444]
a4 = [1111,2222,3333,4444]
a = []
for x in a1:
for y in a2:
for k in a3:
for l in a4:
a.append((x, y, k, l))
na = np.array(a)
#--- suggested solution
original_arrays = [np.unique(column) for column in na.T]
>>> original_arrays
[array([1, 2, 3, 4]),
array([11, 22, 33, 44]),
array([111, 222, 333, 444]),
array([1111, 2222, 3333, 4444])]
Details of the solution:
First we loop through the columns of the array using list comprehension to construct a list of our outputs (instead of creating an empty list and appending to it in a for loop)
columns = [column for column in na.T]
Now instead of just looping through the columns we find the unique values in each column using the numpy "unique" function.
original_arrays = [np.unique(column) for column in na.T]
And the result is a list of NumPy arrays containing the unique values in each column:
>>> original_arrays
[array([1, 2, 3, 4]),
array([11, 22, 33, 44]),
array([111, 222, 333, 444]),
array([1111, 2222, 3333, 4444])]

The initial na and shape:
In [117]: na
Out[117]:
array([[ 1, 11, 111, 1111],
[ 1, 11, 111, 2222],
[ 1, 11, 111, 3333],
...,
[ 4, 44, 444, 2222],
[ 4, 44, 444, 3333],
[ 4, 44, 444, 4444]])
In [118]: na.shape
Out[118]: (256, 4)
Your indexing works with
naa=na.reshape(4,4,4,4,4)
Initially I missed the fact that you were using
na.shape = (4,4,4,4,4)
to do this reshape. (I use reshape far more often than the in-place reshape.)
The a# values appear in the respective columns, but with many repeats. You can skip those with the right slicing.
In [119]: na[:4,3]
Out[119]: array([1111, 2222, 3333, 4444])
In [122]: na[:16:4,2]
Out[122]: array([111, 222, 333, 444])
In [123]: na[:16*4:16,1]
Out[123]: array([11, 22, 33, 44])
In [124]: na[:16*4*4:16*4,0]
Out[124]: array([1, 2, 3, 4])
On the 5d version, your solution is probably as good as any. It's not a common arrangement of values, so it's unlikely that there will be a built-in shortcut.

Related

Form an numpy array from indices and fill with zeroes

I have a nympy array a = np.array([483, 39, 18, 999, 20, 48]
I have an array of indices indices = np.array([2, 3])
I would like to have all the indices of the array and fill the rest of the indices with 0 so I get as a result :
np.array([0, 0, 18, 999, 0, 0])
Thank you for your answer.
Create an all zeros array and copy the values at the desired indices:
import numpy as np
a = np.array([483, 39, 18, 999, 20, 48])
indices = np.array([2, 3])
b = np.zeros_like(a)
b[indices] = a[indices]
# a = b # if needed
print(a)
print(indices)
print(b)
Output:
[483 39 18 999 20 48]
[2 3]
[ 0 0 18 999 0 0]
Hope that helps!
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.16299-SP0
Python: 3.8.1
NumPy: 1.18.1
----------------------------------------
EDIT: Even better, use np.setdiff1d:
import numpy as np
a = np.array([483, 39, 18, 999, 20, 48])
indices = np.array([2, 3])
print(a)
print(indices)
a[np.setdiff1d(np.arange(a.shape[0]), indices, True)] = 0
print(a)
Output:
[483 39 18 999 20 48]
[2 3]
[ 0 0 18 999 0 0]
What about using list comprehension?
a = np.array([n if i in indices else 0 for i, n in enumerate(a)])
print(a) #array([ 0, 0, 18, 999, 0, 0])
You can create a function that uses the input array and the index array to do this, as in the following:
import numpy as np
def remove_by_index(input_array, indexes):
for i,_ in enumerate(input_array):
if i not in indexes:
input_array[i] = 0
return input_array
input_array = np.array([483, 39, 18, 999, 20, 48])
indexes = np.array([2, 3])
new_out = remove_by_index(input_array, indexes)
expected_out = np.array([0, 0, 18, 999, 0, 0])
print(new_out == expected_out) # to check if it's correct
Edit
You can also use list comprehension inside the function, which would be better, as:
def remove_by_index(input_array, indexes):
return [input_array[i] if (i in indexes) else 0 for i,_ in enumerate(input_array)]
It is not, as pointed out in comments, the most efficient way of doing it, performing iteration at Python level instead of C level, but it does work, and for casual use it will solve.

Splitting arrays depending on unique values in an array

I currently have two arrays, one of which has several repeated values and another with unique values.
Eg array 1 : a = [1, 1, 2, 2, 3, 3]
Eg array 2 : b = [10, 11, 12, 13, 14, 15]
I was developing a code in python that looks at the first array and distinguishes the elements that are all the same and remembers the indices. A new array is created that contains the elements of array b at those indices.
Eg: As array 'a' has three unique values at positions 1,2... 3,4... 5,6, then three new arrays would be created such that it contains the elements of array b at positions 1,2... 3,4... 5,6. Thus, the result would be three new arrays:
b1 = [10, 11]
b2 = [12, 13]
b3 = [14, 15]
I have managed to develop a code, however, it only works for when there are three unique values in array 'a'. In the case there are more or less unique values in array 'a', the code has to be physically modified.
import itertools
import numpy as np
import matplotlib.tri as tri
import sys
a = [1, 1, 2, 2, 3, 3]
b = [10, 10, 20, 20, 30, 30]
b_1 = []
b_2 = []
b_3 = []
unique = []
for vals in a:
if vals not in unique:
unique.append(vals)
if len(unique) != 3:
sys.exit("More than 3 'a' values - check dimension")
for j in range(0,len(a)):
if a[j] == unique[0]:
b_1.append(c[j])
elif a[j] == unique[1]:
b_2.append(c[j])
elif a[j] == unique[2]:
b_3.append(c[j])
else:
sys.exit("More than 3 'a' values - check dimension")
print (b_1)
print (b_2)
print (b_3)
I was wondering if there is perhaps a more elegant way to perform this task such that the code is able to cope with an n number of unique values.
Well given that you are also using numpy, here's one way using np.unique. You can set return_index=True to get the indices of the unique values, and use them to split the array b with np.split:
a = np.array([1, 1, 2, 2, 3, 3])
b = np.array([10, 11, 12, 13, 14, 15])
u, s = np.unique(a, return_index=True)
np.split(b,s[1:])
Output
[array([10, 11]), array([12, 13]), array([14, 15])]
You can use the function groupby():
from itertools import groupby
from operator import itemgetter
a = [1, 1, 2, 2, 3, 3]
b = [10, 11, 12, 13, 14, 15]
[[i[1] for i in g] for _, g in groupby(zip(a, b), key=itemgetter(0))]
# [[10, 11], [12, 13], [14, 15]]

Multidimensional list match in python

This has caused some serious headache today.
Suppose I have two instances of my object, instance A and instance B. These come with properties is the form of a list. Say the two properties for A are
a1 = [1, 2, 3, 4, 5]
a2 = [10, 20, 30, 40, 50]
and those for B:
b1 = [5, 7, 3, 1]
b2 = [50, 20, 30, 20]
What I want is to simply find the indices in b1 and b2, where a pair equals the values in a1 and a2. So in this example this would be the indices 0 and 2 since for those we have
b1[0] = 5 and b2[0] = 50
which we find in a1 and a2 as the last entries. Same for index 2 for which we find (3, 30) in (b1, b2) which is also in (a1, a2).
Note here, that the lists a1 and a2 have always the same length as well as b1 and b2.
Any help? 😊
You can use a combination of zip, set and enumerate:
>>> a1 = [1, 2, 3, 4, 5]
>>> a2 = [10, 20, 30, 40, 50]
>>> b1 = [5, 7, 3, 1]
>>> b2 = [50, 20, 30, 20]
>>> a12 = set(zip(a1, a2))
>>> [i for i, e in enumerate(zip(b1, b2)) if e in a12]
[0, 2]
With zip, you group the pairs together, and with set you turn them into a set, as order does not matter and set have faster lookup. Then, enumerate gives you pairs of indices and elements, and using the list-comprehension you get those indices from b12 whose elements are in a12.
I think another structure would be better?
a tuple, or a key set ...
a = [(1,10),(2,20)] and so on
edit
well... tobias_k shows you how :)
Try this
In [38]: [b1.index(i[0]) for i in zip(a1,a2) for j in zip(b1,b2) if i==j]
Out[38]: [2, 0]
There is also the possibility to check for each element in (a1, a2) whether it is in (b1, b2) and it will return all matches in a list and will take care of duplicates:
a1 = [1, 2, 3, 4, 5]
a2 = [10, 20, 30, 40, 50]
b1 = [5, 7, 3, 1, 5]
b2 = [50, 20, 30, 20, 50]
# Construct list of tuples for easier matching
pair_a = [(i, k) for i, k in zip(a1, a2)]
pair_b = [(i, k) for i, k in zip(b1, b2)]
# Get matching indices (for each entry in pair_a get the indices in pair_b)
indices = [[i for i, j in enumerate(pair_b) if j == k] for k in pair_a]
gives
[[], [], [2], [], [0, 4]]

Numpy: find index of elements in one array that occur in another array

I have two 1D arrays and I want to find out if an element in one array occurs in another array or not.
For example:
import numpy as np
A = np.array([ 1, 48, 50, 78, 85, 97])
B = np.array([38, 43, 50, 62, 78, 85])
I want:
C = [2,3,4] # since 50 in second array occurs in first array at index 2,
# similarly 78 in second array occurs in first array in index 3,
# similarly for 85, it is index 4
I tried:
accuracy = np.searchsorted(A, B)
But it gives me undesirable results.
You could use np.where and np.in1d:
>>> np.where(np.in1d(A, B))[0]
array([2, 3, 4])
np.in1d(A, B) returns a boolean array indicating whether each value of A is found in B. np.where returns the indices of the True values. (This will also work if your arrays are not sorted.)
You should start with np.intersect1d, which finds the set intersection (common elements) between arrays.
In [5]: np.intersect1d(A, B)
Out[5]: array([50, 78, 85])
To get the desired output from your question, you can then use np.searchsorted with just those items:
In [7]: np.searchsorted(A, np.intersect1d(A, B))
Out[7]: array([2, 3, 4])

How to select the maximum value over all rows from a list of scipy.sparse.arrays?

Consider a list of n scipy.sparse.arrays with entries of type float. I am using the in Compressed Sparse Row format structure.
my_list = [sparse_array_1, sparse_array_2, ... , sparse_array_n]
Each sparse_array_i has the same length.
What I want to generate is a list of maximum per row values. So this example
[array[0, array[4, array[88,
3, 2, 287,
99, 1234, 0,
3], 0], 77]
would result in
[88, 287, 1324, 77]
Is this possible in a pythonic way?
I'm not familiar with scipy sparse arrays, but if they behave like other python iterables then a combination of map and zip will achieve what you want:
>>> arr
[[0, 3, 99, 3], [4, 2, 1234, 0], [88, 287, 0, 77]]
>>> zip(*arr)
[(0, 4, 88), (3, 2, 287), (99, 1234, 0), (3, 0, 77)]
>>> map(max, zip(*arr))
[88, 287, 1234, 77]
Here's the answer for two sparse matrices: just repeat this n-1 times.
import numpy as np
def spmax(X,Y):
# X,Y two csr sparse matrices
sX = X.copy(); sX.data[:] = 1
sY = Y.copy(); sY.data[:] = 1
sXY = sX+sY; sXY.data[:] = 1
X = X+sXY; X.data = X.data-1
Y = Y+sXY; Y.data = Y.data-1
maxXY = X.copy()
maxXY.data = np.amax(np.c_[X.data,Y.data],axis=1)
return maxXY
This is pretty slow though. Hopefully, they'll implement this in scipy.sparse at some point. This is a pretty basic operation.

Categories