Equations containing outer products of vectors - python

where x is a column vector.
We know from the diagonal elements in A, the value of x entries. But signs of them remains unknown. For example:
import numpy as np
A = array([[ 1.562, -0.833, -0.833, -0.031, -0.031, 0.167],
[-0.833, 0.795, 0.167, -0.149, 0.167, -0.146],
[-0.833, 0.167, 0.795, 0.167, -0.149, -0.146],
[-0.031, -0.149, 0.167, 1.68 , -0.833, -0.833],
[-0.031, 0.167, -0.149, -0.833, 1.68 , -0.833],
[ 0.167, -0.146, -0.146, -0.833, -0.833, 1.792]])
np.sqrt(A.diagonal())
>>> array([ 1.24979998, 0.89162773, 0.89162773, 1.29614814, 1.29614814,
1.33865604])
But we still dont know the signs. With a mask we have the product signs:
A > 0
>>> array([[ True, False, False, False, False, True],
[False, True, True, False, True, False],
[False, True, True, True, False, False],
[False, False, True, True, False, False],
[False, True, False, False, True, False],
[ True, False, False, False, False, True]], dtype=bool)
How can I find x elements signs.

Note that (-x)(-x)^T = (x)(x)^T, so you can't distinguish x from -x. Given that, you can determine the sign pattern (i.e. you can determine whether two elements have the same or opposite signs). In fact, since each row of A is a scalar multiple of x, each row gives you the sign pattern (unless the row is all 0, which is possible if an element of x is 0). The same holds for the columns.
Note that your example A can not be a product of the form (x)(x)^T. It has full rank. The maximum possible rank of (x)(x)^T is 1.
For example,
In [14]: x = np.array([1.0, -2.0, -3.0, 4.0])
In [15]: np.outer(x, x)
Out[15]:
array([[ 1., -2., -3., 4.],
[ -2., 4., 6., -8.],
[ -3., 6., 9., -12.],
[ 4., -8., -12., 16.]])
Note the sign pattern in the product. Each row (and each column) is either (+, -, -, +) or (-, +, +, -).

In general, you can't.
For example, imagine the matrix A == [1].
How should anyone know whether x is [1] or [-1]?

Related

Appending to a multidimensional array Python

I am filtering the arrays a and b for likewise values and then I want to append them to a new array difference howveer I get the error: ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 0 and the array at index 1 has size 2. How would I be able to fix this?
import numpy as np
a = np.array([[0,12],[1,40],[0,55],[1,23],[0,123.5],[1,4]])
b = np.array([[0,3],[1,10],[0,55],[1,34],[1,122],[0,123]])
difference= np.array([[]])
for i in a:
for j in b:
if np.allclose(i, j, atol=0.5):
difference = np.concatenate((difference,[i]))
Expected Output:
[[ 0. 55.],[ 0. 123.5]]
The problem is that you are trying to concatenate an array where the elements has size 0
difference= np.array([[]]) # Specifically [ [<no elements>] ]
To an array where the elements has size 2
np.concatenate((difference,[i])) # Specifically [i] which is [ [ 0., 55.] ]
Instead of initializing it with an empty array which has the size of 0, you could try just calling .reshape() on the difference.
# difference= np.array([[]]) # Old code
difference= np.array([]).reshape(0, 2) # Updated code
Output
[[ 0. 55. ]
[ 0. 123.5]]
np.array([[]]) shape is (1, 0). To make it work, it should be (0, 2):
difference= np.zeros((0, *a.shape[1:]))
In [22]: a = np.array([[0,12],[1,40],[0,55],[1,23],[0,123.5],[1,4]])
...: b = np.array([[0,3],[1,10],[0,55],[1,34],[1,122],[0,123]])
Using a straight forward list comprehension:
In [23]: [i for i in a for j in b if np.allclose(i,j,atol=0.5)]
Out[23]: [array([ 0., 55.]), array([ 0. , 123.5])]
But as for your concatenate. Look at the shape of the arrays:
In [24]: np.array([[]]).shape
Out[24]: (1, 0)
In [25]: np.array([i]).shape
Out[25]: (1, 1)
Those can only be joined on axis 1; default is 0, giving you the error. Like wrote in the comment, you have to understand arrays shapes to use concatenate.
In [26]: difference= np.array([[]])
...: for i in a:
...: for j in b:
...: if np.allclose(i, j, atol=0.5):
...: difference = np.concatenate((difference,[i]), axis=1)
...:
In [27]: difference
Out[27]: array([[ 0. , 55. , 0. , 123.5]])
vectorized
A whole-array approach:
broadcase a against b, producing a (5,5,2) closeness array:
In [37]: np.isclose(a[:,None,:],b[None,:,:], atol=0.5)
Out[37]:
array([[[ True, False],
[False, False],
[ True, False],
[False, False],
[False, False],
[ True, False]],
[[False, False],
[ True, False],
[False, False],
[ True, False],
[ True, False],
[False, False]],
[[ True, False],
[False, False],
[ True, True],
[False, False],
[False, False],
[ True, False]],
[[False, False],
[ True, False],
[False, False],
[ True, False],
[ True, False],
[False, False]],
[[ True, False],
[False, False],
[ True, False],
[False, False],
[False, False],
[ True, True]],
[[False, False],
[ True, False],
[False, False],
[ True, False],
[ True, False],
[False, False]]])
Find where both columns are true, and where at least one "row" is:
In [38]: _.all(axis=2)
Out[38]:
array([[False, False, False, False, False, False],
[False, False, False, False, False, False],
[False, False, True, False, False, False],
[False, False, False, False, False, False],
[False, False, False, False, False, True],
[False, False, False, False, False, False]])
In [39]: _.any(axis=1)
Out[39]: array([False, False, True, False, True, False])
In [40]: a[_]
Out[40]:
array([[ 0. , 55. ],
[ 0. , 123.5]])

NumPy masked operation?

Say there's a np.float32 matrix A of shape (N, M). Together with A, I possess another matrix B, of type np.bool, of the exact same shape (elements from A can be mapped 1:1 to B). Example:
A =
[
[0.1, 0.2, 0.3],
[4.02, 123.4, 534.65],
[2.32, 22.0, 754.01],
[5.41, 23.1, 1245.5],
[6.07, 0.65, 22.12],
]
B =
[
[True, False, True],
[False, False, True],
[True, True, False],
[True, True, True],
[True, False, True],
]
Now, I'd like to perform np.max, np.min, np.argmax and np.argmin on axis=1 of A, but only considering elements A[i,j] for which B[i,j] == True. Is it possible to do something like this in NumPy? The for-loop version is trivial, but I'm wondering whether I can get some of that juicy NumPy speed.
The result for A, B and np.max (for example) would be:
[ 0.3, 534.65, 22.0, 1245.5, 22.12 ]
I've avoided ma because I've heard that the computation gets very slow and I don't feel like specifying fill_value makes sense in this context. I just want the numbers to be ignored.
Also, if it matters at all in my case, N ranges in thousands and M ranges in units.
This is a textbook application for masked arrays. But as always there are other ways to do it.
import numpy as np
A = np.array([[ 0.1, 0.2, 0.3],
[ 4.02, 123.4, 534.65],
[ 2.32, 22.0, 754.01],
[ 5.41, 23.1, 1245.5],
[ 6.07, 0.65, 22.12]])
B = np.array([[ True, False, True],
[False, False, True],
[ True, True, False],
[ True, True, True],
[ True, False, True]])
With nanmax etc.
You could cast the 'invalid' values to NaN (say), then use NumPy's special NaN-ignoring functions:
>>> A[~B] = np.nan # <-- Note this mutates A
>>> np.nanmax(A, axis=1)
array([3.0000e-01, 5.3465e+02, 2.2000e+01, 1.2455e+03, 2.2120e+01])
The catch is that, while np.nanmax, np.nanmin, np.nanargmax, and np.nanargmin all exist, lots of functions don't have a non-NaN twin, so you might have to come up with something else eventually.
With ma
It seems weird not to mention masked arrays, which are straightforward. Notice that the mask is (to my mind anyway) 'backwards'. That is, True means the value is 'masked' or invalid and will be ignored. Hence having to negate B with the tilde. Then you can do what you want with the masked array:
>>> X = np.ma.masked_array(A, mask=~B) # <--- Note the tilde.
>>> np.max(X, axis=1)
masked_array(data=[0.3, 534.65, 22.0, 1245.5, 22.12],
mask=[False, False, False, False, False],
fill_value=1e+20)

Identification of rows containing column median in numpy matrix of cum percentiles

Consider the matrix quantiles that's a subset [:8,:3,0] of a 3D matrix with shape (10,355,8).
quantiles = np.array([
[ 1. , 1. , 1. ],
[ 0.63763978, 0.61848863, 0.75348137],
[ 0.43439645, 0.42485407, 0.5341457 ],
[ 0.22682343, 0.18878366, 0.25253915],
[ 0.16229408, 0.12541476, 0.15263742],
[ 0.12306046, 0.10372971, 0.09832783],
[ 0.09271845, 0.08209844, 0.05982584],
[ 0.06363636, 0.05471266, 0.03855727]])
I want a boolean output of the same shape as the quantiles matrix where True marks the row in which the median is located:
In [21]: medians
Out[21]:
array([[False, False, False],
[ True, True, False],
[False, False, True],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False]], dtype=bool)
To achieve this, I have the following algorithm in mind:
1) Identify the entries that are greater than .5:
In [22]: quantiles>.5
Out[22]:
array([[ True, True, True],
[ True, True, True],
[False, False, True],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False]], dtype=bool)
2) Considering only the values subset by the quantiles>.5 operation, mark the row that minimizes the np.abs distance between the entry and .5. Torturing the terminology a bit, I wish to intersect the two matrices of np.argmin(np.abs(quantiles-.5),axis=0) and quantiles>.5 to get the above result. However, I cannot for my life figure out a way to perform the np.argmin on the subset and retain the shape of the quantile matrix.
PS. Yes, there is a similar question here but it doesn't implement my algorithm which could be, I think, more efficient on a larger scale
Bumping into the old mask operation in Numpy, I found the following solution
#mask quantities that are less than .5
masked_quantiles = ma.masked_where(quantiles<.5,quantiles)
#identify the minimum in column of the masked array
median_idx = np.where(masked_quantiles == masked_quantiles.min(axis=0))
#make a matrix of all False values
median_mat = np.zeros(quantiles.shape, dtype=bool)
#assign True value to corresponding rows
In [86]: median_mat[medians] = True
In [87]: median_mat
Out[87]:
array([[False, False, False],
[ True, True, False],
[False, False, True],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False]], dtype=bool)
Update: comparison of my answer to that of Divakar's:
I ran two comparisons, one on the sample 2D matrix provided for this question and one on my 3D (10,380,8) dataset (not large data by any means).
Sample dataset:
My code
%%timeit
masked_quantiles = ma.masked_where(quantiles<=.5,quantiles)
median_idx = masked_quantiles.argmin(0)
10000 loops, best of 3: 65.1 µs per loop
Divakar's code
%%timeit
mask1 = quantiles<=0.5
min_idx = (quantiles+mask1).argmin(0)
The slowest run took 17.49 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.92 µs per loop
Full dataset
My code:
%%timeit
masked_quantiles = ma.masked_where(quantiles<=.5,quantiles)
median_idx = masked_quantiles.argmin(0)
1000 loops, best of 3: 490 µs per loop
Divakar's code:
%%timeit
mask1 = quantiles<=0.5
min_idx = (quantiles+mask1).argmin(0)
10000 loops, best of 3: 172 µs per loop
Conclusion:
Divakar's answer seems about 3-12 times faster than mine. I presume that the np.ma.where masking operation takes longer than matrix addition. However, the addition operation needs to be stored whereas masking may be more efficient on larger datasets. I wonder how it would compare on something that doesn't or nearly doesn't fit into memory.
Approach #1
Here's an approach using broadcasting and some masking trick -
# Mask of quantiles lesser than or equal to 0.5 to select the invalid ones
mask1 = quantiles<=0.5
# Since we are dealing with quantiles, the elems won't be > 1,
# which can be leveraged here as we will add 1s to invalid elems, and
# then look for argmin across each col
min_idx = (np.abs(quantiles-0.5)+mask1).argmin(0)
# Let some broadcasting magic happen here!
out = min_idx == np.arange(quantiles.shape[0])[:,None]
Step-by-step run
1) Input :
In [37]: quantiles
Out[37]:
array([[ 1. , 1. , 1. ],
[ 0.63763978, 0.61848863, 0.75348137],
[ 0.43439645, 0.42485407, 0.5341457 ],
[ 0.22682343, 0.18878366, 0.25253915],
[ 0.16229408, 0.12541476, 0.15263742],
[ 0.12306046, 0.10372971, 0.09832783],
[ 0.09271845, 0.08209844, 0.05982584],
[ 0.06363636, 0.05471266, 0.03855727]])
2) Run the code :
In [38]: mask1 = quantiles<=0.5
...: min_idx = (np.abs(quantiles-0.5)+mask1).argmin(0)
...: out = min_idx == np.arange(quantiles.shape[0])[:,None]
...:
3) Analyze output at each step :
In [39]: mask1
Out[39]:
array([[False, False, False],
[False, False, False],
[ True, True, False],
[ True, True, True],
[ True, True, True],
[ True, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
In [40]: np.abs(quantiles-0.5)+mask1
Out[40]:
array([[ 0.5 , 0.5 , 0.5 ],
[ 0.13763978, 0.11848863, 0.25348137],
[ 1.06560355, 1.07514593, 0.0341457 ],
[ 1.27317657, 1.31121634, 1.24746085],
[ 1.33770592, 1.37458524, 1.34736258],
[ 1.37693954, 1.39627029, 1.40167217],
[ 1.40728155, 1.41790156, 1.44017416],
[ 1.43636364, 1.44528734, 1.46144273]])
In [41]: (np.abs(quantiles-0.5)+mask1).argmin(0)
Out[41]: array([1, 1, 2])
In [42]: min_idx == np.arange(quantiles.shape[0])[:,None]
Out[42]:
array([[False, False, False],
[ True, True, False],
[False, False, True],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False]], dtype=bool)
Performance boost : Following the comments, it seems to get min_idx, we can just do :
min_idx = (quantiles+mask1).argmin(0)
Approach #2
This is focused on memory efficiency.
# Mask of quantiles greater than 0.5 to select the valid ones
mask = quantiles>0.5
# Select valid elems
vals = quantiles.T[mask.T]
# Get vald count per col
count = mask.sum(0)
# Get the min val per col given the mask
minval = np.minimum.reduceat(vals,np.append(0,count[:-1].cumsum()))
# Get final boolean array by just comparing the min vals across each col
out = np.isclose(quantiles,minval)

compare tuple with tuples in numpy array

I have an array (dtype=object) with the first column containing tuples of arrays and the second column containing scalars. I want all scalars from the second column where the tuples in the first column equal a certain tuple.
Say
>>> X
array([[(array([ 21.]), array([ 13.])), 0.29452519286647716],
[(array([ 25.]), array([ 9.])), 0.9106600600510809],
[(array([ 25.]), array([ 13.])), 0.8137344043493814],
[(array([ 25.]), array([ 14.])), 0.8143093864975313],
[(array([ 25.]), array([ 15.])), 0.6004337591112664],
[(array([ 25.]), array([ 16.])), 0.6239450452872853],
[(array([ 21.]), array([ 13.])), 0.32082105959687424]], dtype=object)
and I want all rows where the 1st column equals X[0,0].
ar = X[0,0]
>>> ar
(array([ 21.]), array([ 13.]))
I thaugh checking X[:,0]==ar should find me those rows. I would had then retrieved my final result by X[X[:,0]==ar,1].
What seems to happen, however, is that ar gets to be interpreted as a 2dimensional array and each single element in ar is compared to the tuples in X[:,0]. This yields a, in this case, 2x7 array all entries equal to False. In contrast, the comparison X[0,0]==ar works just as I would want it giving a value of True.
Why is that happening and how can I fix it to obtain the desired result?
Comparison using list comprehension works:
In [176]: [x==ar for x in X[:,0]]
Out[176]: [True, False, False, False, False, False, True]
This is comparing tuples with tuples
Comparing tuple ids gives a different result
In [175]: [id(x)==id(ar) for x in X[:,0]]
Out[175]: [True, False, False, False, False, False, False]
since the 2nd match has a different id.
In [177]: X[:,0]==ar
Out[177]:
array([[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False]], dtype=bool)
returns a (2,7) result because it is, effect comparing a (7,) array with a (2,1) array (np.array(ar)).
But this works like the comprehension:
In [190]: ar1=np.zeros(1,dtype=object)
In [191]: ar1[0]=ar
In [192]: ar1
Out[192]: array([(array([ 21.]), array([ 13.]))], dtype=object)
In [193]: X[:,0]==ar1
Out[193]: array([ True, False, False, False, False, False, True], dtype=bool)
art1 is a 1 element array containing the ar tuple. Now the comparison with the elements of X[:,0] proceeds as expected.
np.array(...) tries to create as high a dimension array as the input data allows. That is why it turns a 2 element tuple into a 2 element array. I had to do a 2 step assignment to get around that default.

How to efficiently use a index array as a mask to turn a numpy array into a boolean array?

I have a numpy array like this:
>>> I
array([[ 1., 0., 2., 1., 0.],
[ 0., 2., 1., 0., 2.]])
And an array A like this:
>>> A = np.ones((2,5,3))
I'd like to obtain the following matrix:
>>> result
array([[[ False, False, True],
[ False, True, True],
[ False, False, False],
[ False, False, True],
[ False, True, True]],
[[ False, True, True],
[ False, False, False],
[ False, False, True],
[ False, True, True],
[ False, False, False]]], dtype=bool)
It is better to explain with an example: I[0,0] = 1 -> result[0,0,:2] = False and result[1,1,2:] = True I[1,0] = 0 -> result[1,1,0] = False and result[1,1,1:] = True
Here is my current implementation (correct):
result = np.empty((A.shape[0], A.shape[1], A.shape[2]))
r = np.arange(A.shape[2])
for i in xrange(A.shape[0]):
result[i] = r > np.vstack(I[i])
print result.astype(np.bool)
Is there a way to implement in a faster way (avoiding the for loop)?
Thanks!
You just need to add another dimension on to I, such that you can broadcast r properly:
result = r > I.reshape(I.shape[0],I.shape[1],1)
e.g.
In [41]: r>I.reshape(2,5,1)
Out[41]:
array([[[False, False, True],
[False, True, True],
[False, False, False],
[False, False, True],
[False, True, True]],
[[False, True, True],
[False, False, False],
[False, False, True],
[False, True, True],
[False, False, False]]], dtype=bool)

Categories