concatenate numpy columns in different positions - python

I have an array x = np.empty([2,3]). Assume I have two set of logical indices indx1 and indx2 and each one of them is paired with different columns, set1 and set2:
indx1 = [False,False,True]
set1 = np.array([[-1],[-1]])
indx2 = [True,True,False]
set2 = np.array([[1,2],[1,2]])
#need to join these two writing operations to a one.
x[:,indx1] = set1
x[:,indx2] = set2
>>> x
array([[1., 2., -1.],
[1., 2., -1.]])
How can I use indx1 and indx2 at the same time? For instance, I am looking for something like this (which does not work):
x[:,[indx1,indx2]] = [set1,set2]

In your case there are array, which have different dimensions (axis=0 if there the same dimension, and axis=1 if there is different dimensions)
For the easiest concatenate:
import numpy as np
set1 = np.array([[3],[3]])
set2 = np.array([[1,2],[1,2]])
indx1 = [False,False,True]
indx2 = [True,True,False]
sets = np.concatenate((set1, set2), axis=1)
np.concatenate((indx1, indx2), axis=0)
sets.sort()
output sets:
output index:
If you wan't to correlate sets with index - provide the proper output.

I did not manage to find an exact solution to the problem, but maybe (depending on how you generate the sets and indices), this will lead you in the right direction.
Let's suppose that, instead of the sparse definition of set1 and set2, you have dense arrays, each with the same size as x:
indx1 = [False,False,True]
indx2 = [True,True,False]
fullset1 = np.array([[0, 0, -1],
[0, 0, -1]])
fullset2 = np.array([[1, 2, 0],
[1, 2, 0]])
x = np.select( [indx1, indx2], [fullset1, fullset2] )
print(x)
#[[1 2 -1]
# [1 2 -1]]
It works with one command and can be easily extended if you have indx3, indx4, etc. However, I see several drawbacks. First, it creates a new variable that satisfies the conditions, which may not be your use case. Also, if there is an index that is set to false for all indx variables, the result might be unexpected:
indx1 = [False,False,True,False]
indx2 = [True,True,False,False]
fullset1 = np.array([[0, 0, -1, 0],
[0, 0, -1, 0]])
fullset2 = np.array([[1, 2, 0, 0],
[1, 2, 0, 0]])
x = np.select( [indx1, indx2], [fullset1, fullset2], default=None )
print(x)
#[[1 2 -1 None]
# [1 2 -1 None]]
In that case, my proposal (but I haven't tested the performances) would be to use an intermediate variable and np.where to fill the final variable:
x = np.array([[11, 12, 13, 14],
[15, 16, 17, 18]])
#....
intermediate_x = np.select( [indx1, indx2], [fullset1, fullset2], default=None )
indx_final = np.where(intermediate_x == None)
x[indx_final] = intermediate_x[indx_final]
print(x)
#[[ 1 2 -1 14]
# [ 1 2 -1 18]]

Related

Dynamic way to compute linear constraints with multiple operators

Imagine a matrix A having one column with a lot of inequality/equality operators (≥, = ≤) and a vector b, where the number of rows in A is equal the number of elements in b. Then one row, in my setting would be computed by, e.g
dot(A[0, 1:], x) ≥ b[0]
where x is some vector, column A[,0] represents all operators and we'd know that for row 0 we were suppose to calculate using ≥ operator (e.i. A[0,0] == "≥" is true). Now, is there a way for dynamically calculate all rows in following so far imaginary way
dot(A[, 1:], x) A[, 0] b
My hope was for a dynamic evaluation of each row where we evaluate which operator is used for each row.
Example, let
A = [
[">=", -2, 1, 1],
[">=", 0, 1, 0],
["==", 0, 1, 1]
]
b = [0, 1, 1]
and x be some given vector, e.g. x = [1,1,0] we wish to compute as following
A[,1:] x A[,0] b
dot([-2, 1, 1], [1, 1, 0]) >= 0
dot([0, 1, 0], [1, 1, 0]) >= 1
dot([0, 1, 1], [1, 1, 0]) == 1
The output would be [False, True, True]
If I understand correctly, this is a way to do that operation:
import numpy as np
# Input data
a = [
[">=", -2, 1, 1],
[">=", 0, 1, 0],
["==", 0, 1, 1]
]
b = np.array([0, 1, 1])
x = np.array([1, 1, 0])
# Split in comparison and data
a0 = np.array([lst[0] for lst in a])
a1 = np.array([lst[1:] for lst in a])
# Compute dot product
c = a1 # x
# Compute comparisons
leq = c <= b
eq = c == b
geq = c >= b
# Find comparison index for each row
cmps = np.array(["<=", "==", ">="]) # This array is lex sorted
cmp_idx = np.searchsorted(cmps, a0)
# Select the right result for each row
result = np.choose(cmp_idx, [leq, eq, geq])
# Convert to numeric type if preferred
result = result.astype(np.int32)
print(result)
# [0 1 1]

classify np.arrays as duplicates

My goal is to take a list of np.arrays and create an associated list or array that classifies each as having a duplicate or not. Here's what I thought would work:
www = [np.array([1, 1, 1]), np.array([1, 1, 1]), np.array([2, 1, 1])]
uniques, counts = np.unique(www, axis = 0, return_counts = True)
counts = [1 if x > 1 else 0 for x in counts]
count_dict = dict(zip(uniques, counts))
[count_dict[i] for i in www]
The desired output for this case would be :
[1, 1, 0]
because the first and second element have another copy within the original list. It seems that the problem is that I cannot use a np.array as a key for a dictionary.
Suggestions?
First convert www to a 2D Numpy array then do the following:
In [18]: (counts[np.where((www[:,None] == uniques).all(2))[1]] > 1).astype(int)
Out[18]: array([1, 1, 0])
here we use broadcasting for check the equality of all www rows with uniques array and then using all() on last axis to find out which of its rows are completely equal to uniques rows.
Here's the elaborated results:
In [20]: (www[:,None] == uniques).all(2)
Out[20]:
array([[ True, False],
[ True, False],
[False, True]])
# Respective indices in `counts` array
In [21]: np.where((www[:,None] == uniques).all(2))[1]
Out[21]: array([0, 0, 1])
In [22]: counts[np.where((www[:,None] == uniques).all(2))[1]] > 1
Out[22]: array([ True, True, False])
In [23]: (counts[np.where((www[:,None] == uniques).all(2))[1]] > 1).astype(int)
Out[23]: array([1, 1, 0])
In Python, lists (and numpy arrays) cannot be hashed, so they can't be used as dictionary keys. But tuples can! So one option would be to convert your original list to a tuple, and to convert uniques to a tuple. The following works for me:
www = [np.array([1, 1, 1]), np.array([1, 1, 1]), np.array([2, 1, 1])]
www_tuples = [tuple(l) for l in www] # list of tuples
uniques, counts = np.unique(www, axis = 0, return_counts = True)
counts = [1 if x > 1 else 0 for x in counts]
# convert uniques to tuples
uniques_tuples = [tuple(l) for l in uniques]
count_dict = dict(zip(uniques_tuples, counts))
[count_dict[i] for i in www_tuples]
Just a heads-up: this will double your memory consumption, so it may not be the best solution if www is large.
You can mitigate the extra memory consumption by ingesting your data as tuples instead of numpy arrays if possible.

Replace values in subarray based upon dynamic condition in Numpy

I have a Python Numpy array that is a 2D array where the second dimension is a subarray of 3 elements of integers. For example:
[ [2, 3, 4], [9, 8, 7], ... [15, 14, 16] ]
For each subarray I want to replace the lowest number with a 1 and all other numbers with a 0. So the desired output from the above example would be:
[ [1, 0, 0], [0, 0, 1], ... [0, 1, 0] ]
This is a large array, so I want to exploit Numpy performance. I know about using conditions to operate on array elements, but how do I do this when the condition is dynamic? In this instance the condition needs to be something like:
newarray = (a == min(a)).astype(int)
But how do I do this across each subarray?
You can specify the axis parameter to calculate a 2d array of mins(if you keep the dimension of the result), then when you do a == a.minbyrow, you will get trues at the minimum position for each sub array:
(a == a.min(1, keepdims=True)).astype(int)
#array([[1, 0, 0],
# [0, 0, 1],
# [0, 1, 0]])
How about this?
import numpy as np
a = np.random.random((4,3))
i = np.argmin(a, axis=-1)
out = np.zeros(a.shape, int)
out[np.arange(out.shape[0]), i] = 1
print(a)
print(out)
Sample output:
# [[ 0.58321885 0.18757452 0.92700724]
# [ 0.58082897 0.12929637 0.96686648]
# [ 0.26037634 0.55997658 0.29486454]
# [ 0.60398426 0.72253012 0.22812904]]
# [[0 1 0]
# [0 1 0]
# [1 0 0]
# [0 0 1]]
It appears to be marginally faster than the direct approach:
from timeit import timeit
def dense():
return (a == a.min(1, keepdims=True)).astype(int)
def sparse():
i = np.argmin(a, axis=-1)
out = np.zeros(a.shape, int)
out[np.arange(out.shape[0]), i] = 1
return out
for shp in ((4,3), (10000,3), (100,10), (100000,1000)):
a = np.random.random(shp)
d = timeit(dense, number=40)/40
s = timeit(sparse, number=40)/40
print('shape, dense, sparse, ratio', '({:6d},{:6d}) {:9.6g} {:9.6g} {:9.6g}'.format(*shp, d, s, d/s))
Sample run:
# shape, dense, sparse, ratio ( 4, 3) 4.22172e-06 3.1274e-06 1.34992
# shape, dense, sparse, ratio ( 10000, 3) 0.000332396 0.000245348 1.35479
# shape, dense, sparse, ratio ( 100, 10) 9.8944e-06 5.63165e-06 1.75693
# shape, dense, sparse, ratio (100000, 1000) 0.344177 0.189913 1.81229

Output fractional amount of "incorrect" values in an array with python

I have a method that will predict some data and output it to a numpy array, called Y_predict. I then have a numpy array called Y_real which stores the real values of Y that should have been predicted.
For example:
Y_predict = [1, 0, 2, 1]
Y_real = [1, 0, 1, 1]
I then want an array called errRate[] which will check if Y_predict[i] == Y_real[i]. Any value that does not match Y_real should be noted. Finally, the output should be the amount of correct predictions. In the case above, this would be 0.75 since Y_predict[2] = 2 and Y_real[2] = 1
Is there some way either in numpy or python to quickly compute this rate?
Since they're numpy arrays, this is relatively straightforward:
>>> p
array([1, 0, 2, 1])
>>> r
array([1, 0, 1, 1])
>>> p == r
array([ True, True, False, True], dtype=bool)
>>> (p == r).mean()
0.75
Given these lists:
Y_predict = [1, 0, 2, 1]
Y_real = [1, 0, 1, 1]
The easiest way I can think of is using zip() within a list comp:
Y_rate = [int(x == y) for x, y in zip(Y_predict, Y_real)] # 1 if correct, 0 if incorrect
Y_rate_correct = sum(Y_rate) / len(Y_rate)
print( Y_rate_correct ) # this will print 0.75

python - increase array size and initialize new elements to zero

I have an array of a size 2 x 2 and I want to change the size to 3 x 4.
A = [[1 2 ],[2 3]]
A_new = [[1 2 0 0],[2 3 0 0],[0 0 0 0]]
I tried 3 shape but it didn't and append can only append row, not column. I don't want to iterate through each row to add the column.
Is there any vectorized way to do this like that of in MATLAB: A(:,3:4) = 0; and A(3,:) = 0; this converted the A from 2 x 2 to 3 x 4. I was thinking is there a similar way in python?
In Python, if the input is a numpy array, you can use np.lib.pad to pad zeros around it -
import numpy as np
A = np.array([[1, 2 ],[2, 3]]) # Input
A_new = np.lib.pad(A, ((0,1),(0,2)), 'constant', constant_values=(0)) # Output
Sample run -
In [7]: A # Input: A numpy array
Out[7]:
array([[1, 2],
[2, 3]])
In [8]: np.lib.pad(A, ((0,1),(0,2)), 'constant', constant_values=(0))
Out[8]:
array([[1, 2, 0, 0],
[2, 3, 0, 0],
[0, 0, 0, 0]]) # Zero padded numpy array
If you don't want to do the math of how many zeros to pad, you can let the code do it for you given the output array size -
In [29]: A
Out[29]:
array([[1, 2],
[2, 3]])
In [30]: new_shape = (3,4)
In [31]: shape_diff = np.array(new_shape) - np.array(A.shape)
In [32]: np.lib.pad(A, ((0,shape_diff[0]),(0,shape_diff[1])),
'constant', constant_values=(0))
Out[32]:
array([[1, 2, 0, 0],
[2, 3, 0, 0],
[0, 0, 0, 0]])
Or, you can start off with a zero initialized output array and then put back those input elements from A -
In [38]: A
Out[38]:
array([[1, 2],
[2, 3]])
In [39]: A_new = np.zeros(new_shape,dtype = A.dtype)
In [40]: A_new[0:A.shape[0],0:A.shape[1]] = A
In [41]: A_new
Out[41]:
array([[1, 2, 0, 0],
[2, 3, 0, 0],
[0, 0, 0, 0]])
In MATLAB, you can use padarray -
A_new = padarray(A,[1 2],'post')
Sample run -
>> A
A =
1 2
2 3
>> A_new = padarray(A,[1 2],'post')
A_new =
1 2 0 0
2 3 0 0
0 0 0 0
Pure Python way achieve this:
row = 3
column = 4
A = [[1, 2],[2, 3]]
A_new = map(lambda x: x + ([0] * (column - len(x))), A + ([[0] * column] * (row - len(A))))
then A_new is [[1, 2, 0, 0], [2, 3, 0, 0], [0, 0, 0, 0]].
Good to know:
[x] * n will repeat x n-times
Lists can be concatenated using the + operator
Explanation:
map(function, list) will iterate each item in list pass it to function and replace that item with the return value
A + ([[0] * column] * (row - len(A))): A is being extended with the remaining "zeroed" lists
repeat the item in [0] by the column count
repeat that array by the remaining row count
([0] * (column - len(x))): for each row item (x) add an list with the remaining count of columns using
Q: Is there a vectorised way to ...
A: Yes, there is
A = np.ones( (2,2) ) # numpy create/assign 1-s
B = np.zeros( (4,5) ) # numpy create/assign 0-s "padding" mat
B[:A.shape[0],:A.shape[1]] += A[:,:] # numpy vectorised .ADD at a cost of ~270 us
B[:A.shape[0],:A.shape[1]] = A[:,:] # numpy vectorised .STO at a cost of ~180 us
B[:A.shape[0],:A.shape[1]] = A # numpy high-level .STO at a cost of ~450 us
B
Out[4]:
array([[ 1., 1., 0., 0., 0.],
[ 1., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
Q: Is it resources efficient to "extend" the A´s data-structure in a smart way "behind the curtain"?
A: No, fortunately not much. Try bigger, big or huge sizes to feel the resources-allocation/processing costs...
Numpy has genuine data-structure "behind-the-curtain" that allows lot of smart tricks alike strided (re-)mapping, view-based operations, fast vectorised/broadcast operations, however, changing the memory-layout "accross the strided smart-mapping" is rather expensive.
For this reason numpy has added since 1.7.0 an in-built layout/mapper-modifier .lib.pad() that is well-aware & optimised so as to handle the "behind-the-curtain" structures both smart & fast.
B = np.lib.pad( A,
( ( 0, 3 ), ( 0, 2) ),
'constant',
constant_values = ( 0, 0 )
) # .pad() at a cost of ~ 270 us

Categories