What is the Python numpy equivalent of the IDL # operator? - python

I am looking for the Python numpy equivalent of the IDL # operator.
Here is what the # operator does:
Computes array elements by multiplying the columns of the first array
by the rows of the second array. The second array must have the same
number of columns as the first array has rows. The resulting array has
the same number of columns as the first array and the same number of
rows as the second array.
Here are the numpy arrays I am dealing with:
A = [[ 0.9826128 0. 0.18566662]
[ 0. 1. 0. ]
[-0.18566662 0. 0.9826128 ]]
and
B = [[ 1. 0. 0. ]
[ 0.62692564 0.77418869 0.08715574]]
Also, numpy.dot(A,B) results in ValueError: matrices are not aligned.

Reading the notes on IDL's definition of matrix multiplication, it seems they use the opposite notation to everyone else:
IDL’s convention is to consider the first dimension to be the column
and the second dimension to be the row
So # can be achieved by the rather strange looking:
numpy.dot(A.T, B.T).T
from their example values:
import numpy as np
A = np.array([[0, 1, 2], [3, 4, 5]])
B = np.array([[0, 1], [2, 3], [4, 5]])
C = np.dot(A.T, B.T).T
print(C)
gives
[[ 3 4 5]
[ 9 14 19]
[15 24 33]]

If I'm correct you want matrix multiplication.

Related

Shift a numpy array by an increasing value with each row

I have a numpy array I'll use np.ones((2,3)) as a MWE:
arr = [[1,1,1],
[1,1,1],
[1,1,1]]
I wish to shift the rows by a set integer. This will increase with the row.
Shift the 1st row by 0
shift the 5th row by 4
I imagine the row length will have to be equal for all rows giving something list this:
to give this:
arr = [[1,1,1,0,0],
[0,1,1,1,0],
[0,0,1,1,1]]
This is a MWE and the actual arrays are taken from txt files and are up to (1000x96). The important values are not just 1 but any float from 0->inf.
Is there a way of doing this?
(Extra information: these data are for 2D heatmap plotting)
Assuming an array with arbitrary values, you could use:
# add enough "0" columns for the shift
arr2 = np.c_[arr, np.zeros((arr.shape[0], arr.shape[0]-1), dtype=arr.dtype)]
# get the indices as ogrid
r, c = np.ogrid[:arr2.shape[0], :arr2.shape[1]]
# roll the values
arr2 = arr2[r, c-r]
used input:
arr = np.arange(1,10).reshape(3,3)
# array([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]])
output:
array([[1, 2, 3, 0, 0],
[0, 4, 5, 6, 0],
[0, 0, 7, 8, 9]])
I have the following solution:
import numpy as np
arr = [[1,1,1],
[1,1,1],
[1,1,1],
[1,1,1]]
arr = np.array(arr)
shift = 1
extend = shift*(np.shape(arr)[0]-1)
arr2 = np.zeros((np.shape(arr)[0],extend+np.shape(arr)[1]))
for i,row in enumerate(arr):
arr2[i,(i*shift):(i*shift)+3] = row
print(arr2)
[[1. 1. 1. 0. 0. 0.]
[0. 1. 1. 1. 0. 0.]
[0. 0. 1. 1. 1. 0.]
[0. 0. 0. 1. 1. 1.]]

Replacing non zero values in a matrix with the marginals

I am trying to do some math with my matrix, i can write it down but i am not sure how to code it. This involves getting a column of row marginal values, then making a new matrix that has all non-zero row values replaced with the marginals, after that I would like to divide the sum of non zero new values to be the column marginals.
I can get to the row marginals but I cant seem to think of a way to repopulate.
example of what i want
import numpy as np
matrix = np.matrix([[1,3,0],[0,1,2],[1,0,4]])
matrix([[1, 3, 0],
[0, 1, 2],
[1, 0, 4]])
marginals = ((matrix != 0).sum(1) / matrix.sum(1))
matrix([[0.5 ],
[0.66666667],
[0.4 ]])
What I want done next is a filling of the matrix based on the non zero locations of the first.
matrix([[0.5, 0.5, 0],
[0, 0.667, 0.667],
[0.4, 0, 0.4]])
Final wanted result is the new matrix column sum divided by the number of non zero occurrences in that column.
matrix([[(0.5+0.4)/2, (0.5+0.667)/2, (0.667+0.4)/2]])
To get the final matrix we can use matrix-multiplication for efficiency -
In [84]: mask = matrix!=0
In [100]: (mask.T*marginals).T/mask.sum(0)
Out[100]: matrix([[0.45 , 0.58333334, 0.53333334]])
Or simpler -
In [110]: (marginals.T*mask)/mask.sum(0)
Out[110]: matrix([[0.45 , 0.58333334, 0.53333334]])
If you need that intermediate filled output too, use np.multiply for broadcasted elementwise multiplication -
In [88]: np.multiply(mask,marginals)
Out[88]:
matrix([[0.5 , 0.5 , 0. ],
[0. , 0.66666667, 0.66666667],
[0.4 , 0. , 0.4 ]])

New array from existing one, 2 column begin indexes of line/colum from the existing, third being values [duplicate]

This question already has answers here:
Generalise slicing operation in a NumPy array
(4 answers)
Closed 5 years ago.
Here is some code I'm struggling with.
My goal is to create an array (db) from an existing one (t) , in db each line will represent a value of t. db will have 3 column, 1 for line index in t, 1 for column index in t and 1 for the value in t.
In my case, t was a distance matrix, thus diagonal was 0 and it was symetric, I replaced lower triangular values with 0. I don't need 0 values in the new array but I can just delete them in another step.
import numpy as np
t = np.array([[0, 2.5],
[0, 0]])
My goal is to obtain a new array such as :
db = np.array([[0, 0, 0],
[0, 1, 2.5],
[1, 0, 0],
[1, 1, 0]])
Thanks for your time.
You can create a meshgrid of 2D coordinates for the rows and columns, then unroll these into 1D arrays. You can then concatenate these two arrays as well as the unrolled version of t into one final matrix:
import numpy as np
(Y, X) = np.meshgrid(np.arange(t.shape[1]), np.arange(t.shape[0]))
db = np.column_stack((X.ravel(), Y.ravel(), t.ravel()))
Example run
In [9]: import numpy as np
In [10]: t = np.array([[0, 2.5],
...: [0, 0]])
In [11]: (Y, X) = np.meshgrid(np.arange(t.shape[1]), np.arange(t.shape[0]))
In [12]: db = np.column_stack((X.ravel(), Y.ravel(), t.ravel()))
In [13]: db
Out[13]:
array([[ 0. , 0. , 0. ],
[ 0. , 1. , 2.5],
[ 1. , 0. , 0. ],
[ 1. , 1. , 0. ]])

Keep the n highest values of each row of an numpy array and zero everything else [duplicate]

This question already has answers here:
numpy matrix, setting 0 to values by sorting each row
(2 answers)
Closed 5 years ago.
I have a numpy array of data where I need to keep only n highest values, and zero everything else.
My current solution:
import numpy as np
np.random.seed(30)
# keep only the n highest values
n = 3
# Simple 2x5 data field for this example, real life application will be exteremely large
data = np.random.random((2,5))
#[[ 0.64414354 0.38074849 0.66304791 0.16365073 0.96260781]
# [ 0.34666184 0.99175099 0.2350579 0.58569427 0.4066901 ]]
# find indices of the n highest values per row
idx = np.argsort(data)[:,-n:]
#[[0 2 4]
# [4 3 1]]
# put those values back in a blank array
data_ = np.zeros(data.shape) # blank slate
for i in xrange(data.shape[0]):
data_[i,idx[i]] = data[i,idx[i]]
# Each row contains only the 3 highest values per row or the original data
#[[ 0.64414354 0. 0.66304791 0. 0.96260781]
# [ 0. 0.99175099 0. 0.58569427 0.4066901 ]]
In the code above, data_ has the n highest values and everything else is zeroed out. This works out nicely even if data.shape[1] is smaller than n. But the only issue is the for loop, which is slow because my actual use case is on very very large arrays.
Is it possible to get rid of the for loop?
You could act on the result of np.argsort -- np.argsort twice, the first to get the index order and the second to get the ranks -- in a vectorized fashion, and then use either np.where or simply multiplication to zero everything else:
In [116]: np.argsort(data)
Out[116]:
array([[3, 1, 0, 2, 4],
[2, 0, 4, 3, 1]])
In [117]: np.argsort(np.argsort(data)) # these are the ranks
Out[117]:
array([[2, 1, 3, 0, 4],
[1, 4, 0, 3, 2]])
In [118]: np.argsort(np.argsort(data)) >= data.shape[1] - 3
Out[118]:
array([[ True, False, True, False, True],
[False, True, False, True, True]], dtype=bool)
In [119]: data * (np.argsort(np.argsort(data)) >= data.shape[1] - 3)
Out[119]:
array([[ 0.64414354, 0. , 0.66304791, 0. , 0.96260781],
[ 0. , 0.99175099, 0. , 0.58569427, 0.4066901 ]])
In [120]: np.where(np.argsort(np.argsort(data)) >= data.shape[1]-3, data, 0)
Out[120]:
array([[ 0.64414354, 0. , 0.66304791, 0. , 0.96260781],
[ 0. , 0.99175099, 0. , 0.58569427, 0.4066901 ]])

Performing grouped average and standard deviation with NumPy arrays

I have a set of data (X,Y). My independent variable values X are not unique, so there are multiple repeated values, I want to output a new array containing : X_unique, which is a list of unique values of X. Y_mean, the mean of all of the Y values corresponding to X_unique. Y_std, the standard deviation of all the Y values corresponding to X_unique.
x = data[:,0]
y = data[:,1]
You can use binned_statistic from scipy.stats that supports various statistic functions to be applied in chunks across a 1D array. To get the chunks, we need to sort and get positions of the shifts (where chunks change), for which np.unique would be useful. Putting all those, here's an implementation -
from scipy.stats import binned_statistic as bstat
# Sort data corresponding to argsort of first column
sdata = data[data[:,0].argsort()]
# Unique col-1 elements and positions of breaks (elements are not identical)
unq_x,breaks = np.unique(sdata[:,0],return_index=True)
breaks = np.append(breaks,data.shape[0])
# Use binned statistic to get grouped average and std deviation values
idx_range = np.arange(data.shape[0])
avg_y,_,_ = bstat(x=idx_range, values=sdata[:,1], statistic='mean', bins=breaks)
std_y,_,_ = bstat(x=idx_range, values=sdata[:,1], statistic='std', bins=breaks)
From the docs of binned_statistic, one can also use a custom statistic function :
function : a user-defined function which takes a 1D array of values,
and outputs a single numerical statistic. This function will be called
on the values in each bin. Empty bins will be represented by
function([]), or NaN if this returns an error.
Sample input, output -
In [121]: data
Out[121]:
array([[2, 5],
[2, 2],
[1, 5],
[3, 8],
[0, 8],
[6, 7],
[8, 1],
[2, 5],
[6, 8],
[1, 8]])
In [122]: np.column_stack((unq_x,avg_y,std_y))
Out[122]:
array([[ 0. , 8. , 0. ],
[ 1. , 6.5 , 1.5 ],
[ 2. , 4. , 1.41421356],
[ 3. , 8. , 0. ],
[ 6. , 7.5 , 0.5 ],
[ 8. , 1. , 0. ]])
x_unique = np.unique(x)
y_means = np.array([np.mean(y[x==u]) for u in x_unique])
y_stds = np.array([np.std(y[x==u]) for u in x_unique])
Pandas is done for such task :
data=np.random.randint(1,5,20).reshape(10,2)
import pandas
pandas.DataFrame(data).groupby(0).mean()
gives
1
0
1 2.666667
2 3.000000
3 2.000000
4 1.500000

Categories