What is the most efficient way to Normalize a 4D array? - python

I have a 4D array with shape (4, 320, 528, 279) which in fact is a data set of 4, 3D image stacks.
What I am trying to achieve is to normalize each pixel of each 3D image between all the samples. So let's say the first pixel values with coordinates (0,0,0) in the four images are [140., 20., 10., 220.]. I would like to change those values so that they become : [0.61904762, 0.04761905, 0., 1.].
I wrote a script that supposedly achieves this :
def NormalizeMatrix(mat) :
mat = np.array(mat);
sink = mat.copy();
for i in np.arange(mat.shape[1]) :
for j in np.arange(mat.shape[2]) :
for k in np.arange(mat.shape[3]) :
PixelValues = mat[:,i,j,k];
Min = float(PixelValues.min());
Max = float(PixelValues.max());
if Max-Min != 0. :
sink[:,i,j,k] = (PixelValues - Min) / (Max - Min);
else :
sink[:,i,j,k] = np.full_like(PixelValues, 0.);
return sink;
But this is really REALLY slow !
How can I make this faster ?
Any ideas ?
Tom

I think I found a pretty fast way in the end which actually goes in the way of user3483203 :
def NormalizeMatrix(mat) :
mat = np.array(mat);
minMat = np.min(mat, axis=0, keepdims=1);
maxMat = np.max(mat, axis=0, keepdims=1);
sink = (mat - minMat)/ (maxMat - minMat);
return sink;
This takes 5-10s instead of hours on my machine :)

Related

Matrices help - index 3 is out of bounds for axis 1 with size 3 BUT I'm pretty sure I have a (3,n) matrix

I keep getting error 'index 3 is out of bounds for axis 1 with size 3' but I'm sure that I'm using a (3,n) matrix rather than a (n,3) one. I'm not very familiar with matrices in python so have been using a kind of hacky way of getting them into the shape I want so I can multiply or add them. Can anyone see where I've gone wrong or suggest some better practice?
I'm trying to perform a rotational transform on A, generated via:
A = array(random.rand(3, 9));
where A is containes a set of x,y,z coordinates in every column. E.g:
Matrix A:
[[0.70799333 0.77123425 0.07271538 0.52498025 0.84353825 0.78331767
0.06428417 0.25629863 0.6654734 0.77562903]
[0.34179928 0.83233168 0.3920859 0.19819796 0.22486337 0.09274312
0.49057914 0.69716143 0.613912 0.04940198]
[0.98522559 0.71273242 0.70784866 0.61589377 0.34007973 0.34492078
0.44491238 0.37423906 0.37427018 0.13558728]]
The translated matrix is calculated via A_translated = re_R.(each column of A) + ret_t, where
ret_R:
[[ 0.1928724 0.90776212 0.372516 ]
[ 0.27931303 -0.41473028 0.8660156 ]
[ 0.94062983 -0.06298194 -0.33353981]]
and
ret_t:
[[0.93445859]
[0.59949888]
[0.77385835]]
My attempt was as follows
count = 0
num_rows, num_cols = A.shape
translated_A = pd.DataFrame( zeros( (num_rows, num_cols) ) )
print('Translated A: \n', translated_A)
for i in range(0, num_cols):
multiply = ret_R.A[:,i] # works up until (not including) i = 3
#IndexError: index 3 is out of bounds for axis 1 with size 3
print('Multiply: \n', multiply)
multiply2 = np.matrix(pd.DataFrame(multiply))
matrix = multiply2 + ret_t #works
matrix2 = pd.DataFrame(matrix) #np.matrix(pd.DataFrame(matrix)) # not working ?
print('Matrix:', matrix2)
translated_A[i] = matrix2[0]
print(translated_A)
The line multiply = ret_R.A[:,i] only works up until and not including i = 3, which suggests that my A matrix is n,3 but I'm sure it's 3,n. I kept switching between matrices and data frames as this seemed to work but it doesn't work past i = 2.
I've realised that I should be using an '#' to find the dot product of the matrices properly rather than a '.' and I had to transpose multiply2 to get an matrix in the form [ [] [] [] ]. I no longer have to keep switching between a data frame and matrix

MATLAB to Python conversion: vectors, arrays, index elements

Good day to everyone! I'm currently converting a MATLAB project to Python 2.7. I am trying to convert the line
h = [ im(:,2:cols) zeros(rows,1) ] - [ zeros(rows,1) im(:,1:cols-1) ];
When I try to convert it
h = np.concatenate((im[1,range(2,cols)], np.zeros((rows, 1)))) -
np.concatenate((np.zeros((rows, 1)),im[1,range(2,cols - 1)] ))
IDLE returns different errors like
ValueError: all the input arrays must have same number of dimensions
I'm very new to Python and I would appreciate it if you would suggest other methods. Thank you so much! Here's the function I am trying to convert.
function [gradient, or] = canny(im, sigma, scaling, vert, horz)
xscaling = vert; yscaling = horz;
hsize = [6*sigma+1, 6*sigma+1]; % The filter size.
gaussian = fspecial('gaussian',hsize,sigma);
im = filter2(gaussian,im); % Smoothed image.
im = imresize(im, scaling, 'AntiAliasing',false);
[rows, cols] = size(im);
h = [ im(:,2:cols) zeros(rows,1) ] - [ zeros(rows,1) im(:,1:cols-1) ];
And I also would ask the equivalent of ':' operator that is used mainly in indeces and arrays in Python. Is there any equivalent for the : operator?
The Python converted code I started:
def canny(im=None, sigma=None, scaling=None, vert=None, horz=None):
xscaling = vert
yscaling = horz
hsize = (6 * sigma + 1), (6 * sigma + 1) # The filter size.
gaussian = gauss2D(hsize, sigma)
im = filter2(gaussian, im) # Smoothed image.
print("This is im")
print(im)
print("This is hsize")
print(hsize)
print("This is scaling")
print(scaling)
#scaling = 0.4
#scaling = tuple(scaling)
im = cv2.resize(im,None, fx=scaling, fy=scaling )
[rows, cols] = np.shape(im)
Say your data is in a list of lists. Try this:
a = [[2, 9, 4], [7, 5, 3], [6, 1, 8]]
im = np.array(a, dtype=float)
rows = 3
cols = 3
h = (np.hstack([im[:, 1:cols], np.zeros((rows, 1))])
- np.hstack([np.zeros((rows, 1)), im[:, :cols-1]]))
The equivalent of MATLAB's horzcat (that is, [A B]) is np.hstack and the equivalent of vertcat ([A; B]) is np.vstack.
Array indexing in numpy is very close to MATLAB, except that indexes start at 0 in numpy, and the range p:q means "p to q-1".
Also, the storage order of arrays is row-major by default, and you can use column-major order if you want (see this). In MATLAB, arrays are stored in column-major order. To check in Python, type for instance np.isfortran(im). If it returns true, the array has the same order as MATLAB (Fortran order), otherwise it's row-major (C order). It's important when you want to optimize loops, or when you pass an array to a C or Fortran routine.
Ideally, try to put everything in an np.array as soon as possible, and don't use lists (they take much more space and processing is much slower). There are also some quirks: for instance, 1.0 / 0.0 throws an exception, but np.float64(1.0) / np.float64(0.0) returns inf, like in MATLAB.
Another example from the comments:
d1 = [ im(2:rows,2:cols) zeros(rows-1,1); zeros(1,cols) ] - ...
[ zeros(1,cols); zeros(rows-1,1) im(1:rows-1,1:cols-1) ];
d2 = [ zeros(1,cols); im(1:rows-1,2:cols) zeros(rows-1,1); ] - ...
[ zeros(rows-1,1) im(2:rows,1:cols-1); zeros(1,cols) ];
For this one, rather than np.vstack and np.hstack, you can use np.block.
im = np.ones((10, 15))
rows, cols = im.shape
d1 = (np.block([[im[1:rows, 1:cols], np.zeros((rows-1, 1))],
[np.zeros((1, cols))]]) -
np.block([[np.zeros((1, cols))],
[np.zeros((rows-1, 1)), im[:rows-1, :cols-1]]]))
d2 = (np.block([[np.zeros((1, cols))],
[im[:rows-1, 1:cols], np.zeros((rows-1, 1))]]) -
np.block([[np.zeros((rows-1, 1)), im[1:rows, :cols-1]],
[np.zeros((1, cols))]]))
With np.zeros((Nrows,1)) you are generating a 2D array containing Nrows 1D arrays with 1 element. Then, with im[1,2:cols] your are getting a 1D array of cols-2 elements. You should change np.zeros((rows,1)) by np.zeros(rows).
Moreover, at the second np.concatenate, when you get a subarray from 'im' you should take the same number of elements than in the first concatenate. Note that you are taking one element less: range(2,cols) VS range(2,cols-1).

Python: slice array uniformly with respect to dataset

I have a data set that has time t and a data d. Unfortunately, I changed the rate of exporting the data after some time (the rate was too high initially). I would like to sample the data so that I effectively remove the high-frequency exported data but maintain the low-frequency exported data near the end.
Consider the following code:
arr = np.loadtxt(file_name,skiprows=3)
Where t = arr[:,0], d = arr[:,1].
Here is a function to get a uniform slicing:
def get_uniform_slices(arr, N_desired_points):
s = arr.shape
if s[0] > N_desired_points:
n_skip = m.ceil(s[0]/N_desired_points)
else:
n_skip = 1
return arr[0::n_skip,:] # Sample output
However, the data then looks fine for the high-frequency exported data, but is too sparse for the low-frequency exported data.
Is there some way to slice such that indexes are uniformly spaced with respect to t?
Any help is greatly appreciated.
This is function I used to find the indexes, based on the accepted answer:
def get_uniform_index(t,N_desired_points):
t_uniform = np.linspace(np.amin(t),np.amax(t),N_desired_points)
t_desired = [nearest(t_d, t) for t_d in t_uniform]
i = np.in1d(t, t_desired)
return i
You have 2d data e.g.,
t = np.arange(0., 100., 0.5)
d = np.random.rand(len(t))
You want to keep only particular values of data at uniformly spaced times, e.g.
t_desired = np.arange(0., 100., 1.)
Let's pick them out the data points desired at the times desired using the in1d function:
d_pruned = d[np.in1d(t, t_desired)]
Of course, you must pick the t_desired and they should match values in t. If that's a problem, you could pick approximately uniform times using e.g.,
def nearest(x, arr):
index = (np.abs(arr - x)).argmin()
return arr[index]
t_uniform = np.arange(0., 100., 1.)
t_desired = [nearest(t_d, t) for t_d in t_uniform]
Here is the complete code:
import numpy as np
t = np.arange(0., 100., 0.5)
d = np.random.rand(len(t))
def nearest(x, arr):
index = (np.abs(arr - x)).argmin()
return arr[index]
t_uniform = np.arange(0., 100., 1.)
t_desired = [nearest(t_d, t) for t_d in t_uniform]
d_pruned = d[np.in1d(t, t_desired)]

Python Replacing every imaginary value in array by random

I got an
array([[ 0.01454911+0.j, 0.01392502+0.00095922j,
0.00343284+0.00036535j, 0.00094982+0.0019255j ,
0.00204887+0.0039264j , 0.00112154+0.00133549j, 0.00060697+0.j],
[ 0.02179418+0.j, 0.01010125-0.00062646j,
0.00086327+0.00495717j, 0.00204473-0.00584213j,
0.00159394-0.00678094j, 0.00121372-0.0043044j , 0.00040639+0.j]])
I need a solution which gives me the possibility to replace just the imaginary components by an random value generated by:
numpy.random.vonmises(mu, kappa, size=size)
The resulting array needs to be in the same form as the first one.
Loop over the numbers and just set them to a value you like. The parameters mu and kappa for the numpy.random.vonmises function need to be defined, since in they are undefined in the below example.
import numpy as np
data = np.array([[ 0.01454911+0.j, 0.01392502+0.00095922j,
0.00343284+0.00036535j, 0.00094982+0.0019255j ,
0.00204887+0.0039264j , 0.00112154+0.00133549j, 0.00060697+0.j],
[ 0.02179418+0.j, 0.01010125-0.00062646j,
0.00086327+0.00495717j, 0.00204473-0.00584213j,
0.00159394-0.00678094j, 0.00121372-0.0043044j , 0.00040639+0.j]])
def setRandomImag(c):
c.imag = np.random.vonmises(mu, kappa, size=size)
return c
data = [ setRandomImag(i) for i in data]
n_epochs = 2
n_freqs = 7
# form giving parameters for the array
data2 = np.zeros((n_epochs, n_freqs), dtype=complex)
for i in range(0,n_epochs):
data2[i] = np.real(data[i]) + np.random.vonmises(mu, kappa) * complex(0,1)
It gives my whole n_epoch the same imaginary value. Not exactly what I was asking for, but solves my problem.
Try using this approach:
Store your numbers into a 2-D array: Real-part and Imaginary-part.
Then replace the Imaginary-part with the randomly chosen numbers.

Why does numpy.random.dirichlet() not accept multidimensional arrays?

On the numpy page they give the example of
s = np.random.dirichlet((10, 5, 3), 20)
which is all fine and great; but what if you want to generate random samples from a 2D array of alphas?
alphas = np.random.randint(10, size=(20, 3))
If you try np.random.dirichlet(alphas), np.random.dirichlet([x for x in alphas]), or np.random.dirichlet((x for x in alphas)), it results in a
ValueError: object too deep for desired array. The only thing that seems to work is:
y = np.empty(alphas.shape)
for i in xrange(np.alen(alphas)):
y[i] = np.random.dirichlet(alphas[i])
print y
...which is far from ideal for my code structure. Why is this the case, and can anyone think of a more "numpy-like" way of doing this?
Thanks in advance.
np.random.dirichlet is written to generate samples for a single Dirichlet distribution. That code is implemented in terms of the Gamma distribution, and that implementation can be used as the basis for a vectorized code to generate samples from different distributions. In the following, dirichlet_sample takes an array alphas with shape (n, k), where each row is an alpha vector for a Dirichlet distribution. It returns an array also with shape (n, k), each row being a sample of the corresponding distribution from alphas. When run as a script, it generates samples using dirichlet_sample and np.random.dirichlet to verify that they are generating the same samples (up to normal floating point differences).
import numpy as np
def dirichlet_sample(alphas):
"""
Generate samples from an array of alpha distributions.
"""
r = np.random.standard_gamma(alphas)
return r / r.sum(-1, keepdims=True)
if __name__ == "__main__":
alphas = 2 ** np.random.randint(0, 4, size=(6, 3))
np.random.seed(1234)
d1 = dirichlet_sample(alphas)
print "dirichlet_sample:"
print d1
np.random.seed(1234)
d2 = np.empty(alphas.shape)
for k in range(len(alphas)):
d2[k] = np.random.dirichlet(alphas[k])
print "np.random.dirichlet:"
print d2
# Compare d1 and d2:
err = np.abs(d1 - d2).max()
print "max difference:", err
Sample run:
dirichlet_sample:
[[ 0.38980834 0.4043844 0.20580726]
[ 0.14076375 0.26906604 0.59017021]
[ 0.64223074 0.26099934 0.09676991]
[ 0.21880145 0.33775249 0.44344606]
[ 0.39879859 0.40984454 0.19135688]
[ 0.73976425 0.21467288 0.04556287]]
np.random.dirichlet:
[[ 0.38980834 0.4043844 0.20580726]
[ 0.14076375 0.26906604 0.59017021]
[ 0.64223074 0.26099934 0.09676991]
[ 0.21880145 0.33775249 0.44344606]
[ 0.39879859 0.40984454 0.19135688]
[ 0.73976425 0.21467288 0.04556287]]
max difference: 5.55111512313e-17
I think you're looking for
y = np.array([np.random.dirichlet(x) for x in alphas])
for your list comprehension. Otherwise you're simply passing a python list or tuple. I imagine the reason numpy.random.dirichlet does not accept your list of alpha values is because it's not set up to - it already accepts an array, which it expects to have a dimension of k, as per the documentation.

Categories