Use numpy array to do conditional operations on another array - python

Let's say I have 2 arrays:
a = np.array([2, 2, 0, 0, 2, 1, 0, 0, 0, 0, 3, 0, 1, 0, 0, 2])
b = np.array([0, 0.5, 0.25, 0.9])
What I would like to do, is take the value in array b and multiple it to the values in array a, based on it's index.
So the first value in array a is 2. I want the value in array b at that index position to be multiplied by that value. So in array b, index postion 2's value is 0.25, so multiple that value (2) in array a by 0.25.
I know it can be done with iteration, but I'm trying to figure out how it's done elmentwise operations.
Here's the iteration way that I've done:
result = np.array([])
for idx in a:
result = np.append(result, (b[idx] * idx))
To get the result:
print(result)
[0.5 0.5 0. 0. 0.5 0.5 0. 0. 0. 0. 2.7 0. 0.5 0. 0. 0.5]
What's an elementwise equivalent?

Integer arrays can be used as indices in numpy. As a consequence, you can simply do something like this
b[a] * a
EDIT:
Just for completeness, your iterative solution triggers a new memory allocation every time append is called (see the 'returns' section of this page). Since you already now the shape of your output (i.e. a.shape), it's much better to allocate the output array in advance, e.g. result = np.empty(a.shape) and then go through the cycle.

So there are a few ways to do this, but if you want purely element-wise operations you could do the following:
Before getting the result, each element of b is transformed by its index. So create another vector n.
n = np.arange(len(b)) * b
# In the example, n now equals [0. , 0.5, 0.5, 2.7]
# then the result is just n indexed by a
result = n[a]
# result = [0.5, 0.5, 0. , 0. , 0.5, 0.5, 0. , 0. , 0. , 0. , 2.7, 0. , 0.5, 0. , 0. , 0.5]

Related

Combining two multi-dimentional numpy arrays when one of them encodes the index information, the other encodes the array content

Here is the toy version of the problem I am facing:
Given the following two numpy arrays:
img = np.array([[0,1,1,2], [0,2,1,1]])
number = np.array([[0,0.1,0.1,0.2], [0.1,0,0.2,0.2]])
both img and number are 2 by 4 NumPy arrays. You can think of it as 2 participants in a study and 4 trials per participant. img encodes which image is presented at each trial, so its element is always an integer (0, 1, or 2) representing an image ID (image #0, #1, or #2), and there are in total 3 candidate images. Each image may occur more than once for each participant as shown in the example.
number is also a 2 by 4 NumPy array which encodes some numeric quantity corresponding to each image. You can think of it as a number presented to the participant above the image.
Within each participant, the number and image are uniquely paired. For example, img[0,1]=img[0,2]=1 means the first participant sees the same image (image #1) in the second and the third trial. Then it must follow that number[0,1]=number[0,2]. However, for the second participant, the pairing may change. While image #1 is paired with 0.1 for the first participant, it is instead paired with 0.2 for the second. The end product I want is something like the following:
goal = np.array([[[0,0],[0.1, 0.1],[0.2, 0.2]], [[0.1,0.1],[0.2, 0.2],[0, 0]]])
The goal is a 2x3x2 NumPy array. 2 again means the 2 participants, 3 means the total amount of unique images used. In this example, 3 unique images are indexed by 0,1, and 2. The third dimension 2 is just repeating the same digit twice, which I do need. Can someone think of a way of doing this in a purely vectorized fashion?
Here is how I would do it using for loop (not exactly syntactically correct):
goal = np.empty((2,3,2))
img = extract_first_occurance_of_each_element(img)
number = extract_first_occurance_of_each_element(number)
for subj in range(subjects):
for trial in range(3):
img_idx = img[subj, trial]
goal[subj, img_idx,:] = [number[subj, trial], number[subj, trial]]
Your iteration - cleaned up a bit:
In [7]: goal = np.zeros((2,3,2))
...: for subj in range(2):
...: for trial in range(3):
...: img_idx = img[subj, trial]
...: goal[subj, img_idx,:] = [number[subj, trial], number[subj, trial]]
In [8]: goal
Out[8]:
array([[[0. , 0. ],
[0.1, 0.1],
[0. , 0. ]],
[[0.1, 0.1],
[0.2, 0.2],
[0. , 0. ]]])
Not quite the same as the stated target, but close enough:
In [9]: np.array([[[0,0],[0.1, 0.1],[0.2, 0.2]], [[0.1,0.1],[0.2, 0.2],[0, 0]]])
Out[9]:
array([[[0. , 0. ],
[0.1, 0.1],
[0.2, 0.2]],
[[0.1, 0.1],
[0.2, 0.2],
[0. , 0. ]]])
With multidimensional indexing:
In [26]: subj=np.arange(2)[:,None]; trial=np.arange(3)
In [27]: img_idx = img[subj, trial]; img_idx
Out[27]:
array([[0, 1, 1],
[0, 2, 1]])
In [28]: goal = np.zeros((2,3,2))
In [29]: goal[subj,img_idx]=np.stack([number[subj,trial], number[subj, trial]], axis=2)
In [30]: goal
Out[30]:
array([[[0. , 0. ],
[0.1, 0.1],
[0. , 0. ]],
[[0.1, 0.1],
[0.2, 0.2],
[0. , 0. ]]])

Create lower triangular matrix from a vector in python

I want to create a python program which computes a matrix from a vector with some coefficients. So lets say we have the following vector of coefficients c = [c0, c1, c2] = [0, 1, 0], then I want to compute the matrix:
So how do I go from the vector c to creating a lower triangular matrix A. I know how to index it manually, but I need a program that can do it. I was maybe thinking about a for-loop inside another for-loop but I struggle with how it is done practically, what do you guys think should be done here?
One way (assuming you're using plain arrays and not numpy or anything):
src = [0, 1, 0]
dst = [
[
src[i-j] if i >= j else 0
for j in range(len(src))
] for i in range(len(src))
]
You can try the following:
import numpy as np
c = [1, 2, 3, 4, 5]
n = len(c)
a = np.zeros((n,n))
for i in range(n):
np.fill_diagonal(a[i:, :], c[i])
print(a)
It gives:
[[1. 0. 0. 0. 0.]
[2. 1. 0. 0. 0.]
[3. 2. 1. 0. 0.]
[4. 3. 2. 1. 0.]
[5. 4. 3. 2. 1.]]

Truncating a 2D array for a given tolerance [Python]

An old question on Singular Value Decomposition lead me to ask this question:
How could I truncate a 2-Dimensional array, to a number of columns dictated by a certain tolerance?
Specifically, please consider the following code snippet, which defines an accepted tolerance of 1e-4 and applies Singular Value Decomposition to a matrix 'A'.
#Python
tol=1e-4
U,Sa,V=np.linalg.svd(A)
S=np.diag(Sa)
The resulting singular value diagonal matrix 'S' holds non-negative singular values in decreasing order of magnitude.
What I want to obtain is a truncated 'S' matrix, in a way that the columns of the matrix holding singular values lower than 1e-4 would be removed. Then, apply this truncation to the matrix 'U'.
Is there a simple way of doing this? I have been looking around, and found some solutions to the problem for Matlab, but didn't find anything similar for Python.
For Matlab, the code would look something like:
%Matlab
tol=1e-4
mask=any(Sigma>=tol,2);
sigRB=Sigma(:,mask);
mask2=any(U>=tol,2);
B=U(:,mask);
Thanks in advance. I hope my post was not too messy to understand.
I am not sure if I understand you correctly. If my solution is not what you ask for, please consider adding an example to your question.
The following code drops all columns from array s that consist only of values smaller than tol.
s = np.array([
[1, 0, 0, 0, 0, 0],
[0, .9, 0, 0, 0, 0],
[0, 0, .5, 0, 0, 0],
[0, 0, 0, .4, 0, 0],
[0, 0, 0, 0, .3, 0],
[0, 0, 0, 0, 0, .2]
])
print(s)
tol = .4
ind = np.argwhere(s.max(axis=1) < tol)
s = np.delete(s, ind, 1)
print(s)
Output:
[[1. 0. 0. 0. 0. 0. ]
[0. 0.9 0. 0. 0. 0. ]
[0. 0. 0.5 0. 0. 0. ]
[0. 0. 0. 0.4 0. 0. ]
[0. 0. 0. 0. 0.3 0. ]
[0. 0. 0. 0. 0. 0.2]]
[[1. 0. 0. 0. ]
[0. 0.9 0. 0. ]
[0. 0. 0.5 0. ]
[0. 0. 0. 0.4]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]]
I am applying max to axis 1 and then using np.argwhere to get the indices of the columns where the max value is smaller than tol.
Edit: In order to truncate the columns of matrix 'U', so it coincides in size with the reduced matrix 'S', the following code works:
k = len(S[0])
Ured = U[:,0:k]
Uredsize = np.shape(Ured) # To check it has worked
print(Uredsize)

Replacing non zero values in a matrix with the marginals

I am trying to do some math with my matrix, i can write it down but i am not sure how to code it. This involves getting a column of row marginal values, then making a new matrix that has all non-zero row values replaced with the marginals, after that I would like to divide the sum of non zero new values to be the column marginals.
I can get to the row marginals but I cant seem to think of a way to repopulate.
example of what i want
import numpy as np
matrix = np.matrix([[1,3,0],[0,1,2],[1,0,4]])
matrix([[1, 3, 0],
[0, 1, 2],
[1, 0, 4]])
marginals = ((matrix != 0).sum(1) / matrix.sum(1))
matrix([[0.5 ],
[0.66666667],
[0.4 ]])
What I want done next is a filling of the matrix based on the non zero locations of the first.
matrix([[0.5, 0.5, 0],
[0, 0.667, 0.667],
[0.4, 0, 0.4]])
Final wanted result is the new matrix column sum divided by the number of non zero occurrences in that column.
matrix([[(0.5+0.4)/2, (0.5+0.667)/2, (0.667+0.4)/2]])
To get the final matrix we can use matrix-multiplication for efficiency -
In [84]: mask = matrix!=0
In [100]: (mask.T*marginals).T/mask.sum(0)
Out[100]: matrix([[0.45 , 0.58333334, 0.53333334]])
Or simpler -
In [110]: (marginals.T*mask)/mask.sum(0)
Out[110]: matrix([[0.45 , 0.58333334, 0.53333334]])
If you need that intermediate filled output too, use np.multiply for broadcasted elementwise multiplication -
In [88]: np.multiply(mask,marginals)
Out[88]:
matrix([[0.5 , 0.5 , 0. ],
[0. , 0.66666667, 0.66666667],
[0.4 , 0. , 0.4 ]])

Error in scipy sparse diags matrix construction

When using scipy.sparse.spdiags or scipy.sparse.diags I have noticed want I consider to be a bug in the routines eg
scipy.sparse.spdiags([1.1,1.2,1.3],1,4,4).toarray()
returns
array([[ 0. , 1.2, 0. , 0. ],
[ 0. , 0. , 1.3, 0. ],
[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. ]])
That is for positive diagonals it drops the first k data. One might argue that there is some grand programming reason for this and that I just need to pad with zeros. OK annoying as that may be, one can use scipy.sparse.diags which gives the correct result. However this routine has a bug that can't be worked around
scipy.sparse.diags([1.1,1.2],0,(4,2)).toarray()
gives
array([[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ],
[ 0. , 0. ]])
nice, and
scipy.sparse.diags([1.1,1.2],-2,(4,2)).toarray()
gives
array([[ 0. , 0. ],
[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2]])
but
scipy.sparse.diags([1.1,1.2],-1,(4,2)).toarray()
gives an error saying ValueError: Diagonal length (index 0: 2 at offset -1) does not agree with matrix size (4, 2). Obviously the answer is
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ]])
and for extra random behaviour we have
scipy.sparse.diags([1.1],-1,(4,2)).toarray()
giving
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.1],
[ 0. , 0. ]])
Anyone know if there is a function for constructing diagonal sparse matrices that actually works?
Executive summary: spdiags works correctly, even if the matrix input isn't the most intuitive. diags has a bug that affects some offsets in rectangular matrices. There is a bug fix on scipy github.
The example for spdiags is:
>>> data = array([[1,2,3,4],[1,2,3,4],[1,2,3,4]])
>>> diags = array([0,-1,2])
>>> spdiags(data, diags, 4, 4).todense()
matrix([[1, 0, 3, 0],
[1, 2, 0, 4],
[0, 2, 3, 0],
[0, 0, 3, 4]])
Note that the 3rd column of data always appears in the 3rd column of the sparse. The other columns also line up. But they are omitted where they 'fall off the edge'.
The input to this function is a matrix, while the input to diags is a ragged list. The diagonals of the sparse matrix all have different numbers of values. So the specification has to accomodate this in one or other. spdiags does this by ignoring some values, diags by taking a list input.
The sparse.diags([1.1,1.2],-1,(4,2)) error is puzzling.
the spdiags equivalent does work:
In [421]: sparse.spdiags([[1.1,1.2]],-1,4,2).A
Out[421]:
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ]])
The error is raised in this block of code:
for j, diagonal in enumerate(diagonals):
offset = offsets[j]
k = max(0, offset)
length = min(m + offset, n - offset)
if length <= 0:
raise ValueError("Offset %d (index %d) out of bounds" % (offset, j))
try:
data_arr[j, k:k+length] = diagonal
except ValueError:
if len(diagonal) != length and len(diagonal) != 1:
raise ValueError(
"Diagonal length (index %d: %d at offset %d) does not "
"agree with matrix size (%d, %d)." % (
j, len(diagonal), offset, m, n))
raise
The actual matrix constructor in the diags is:
dia_matrix((data_arr, offsets), shape=(m, n))
This is the same constructor that spdiags uses, but without any manipulation.
In [434]: sparse.dia_matrix(([[1.1,1.2]],-1),shape=(4,2)).A
Out[434]:
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ]])
In dia format, the inputs are stored exactly as given by spdiags (complete with that matrix with extra values):
In [436]: M.data
Out[436]: array([[ 1.1, 1.2]])
In [437]: M.offsets
Out[437]: array([-1], dtype=int32)
As #user2357112 points out, length = min(m + offset, n - offset is wrong, producing 3 in the test case. Changing it to length = min(m + k, n - k) makes all cases for this (4,2) matrix work. But it fails with the transpose: diags([1.1,1.2], 1, (2, 4))
The correction, as of Oct 5, for this issue is:
https://github.com/pv/scipy-work/commit/529cbde47121c8ed87f74fa6445c05d71353eb6c
length = min(m + offset, n - offset, min(m,n))
With this fix, diags([1.1,1.2], 1, (2, 4)) works.

Categories