Variable amount of dimensions in slice

Variable amount of dimensions in slice - python

I have a multidimensional array called resultsten, with the following shape
print np.shape(resultsten)
(3, 3, 6, 10, 1, 9)
In some occasions, I use a part of this array in a program called cleanup, which then further tears this array apart into x, y, and z arrays:
x,y,z = cleanup(resultsten[0,:,:,:,:,:])
def cleanup(resultsmat):
x = resultsmat[:,:,:,:,2]
y = resultsmat[:,:,:,:,1]
z = resultsmat[:,:,:,:,4]
return x,y,z
However, it might also occur that I do not want to put the entire matrix of resultsten in my program cleanup, thus:
x,y,z = cleanup(resultsten[0,0,:,:,:,:])
This, of course gives an error, as the indices given to cleanup do not match the indices expected.
I was wondering if it is possible to have a variable amount of dimensions included in your slice.
I would like to know a command that takes all the entries for every dimension, up until the last dimension, where it only takes one index.
I've seen that is possible to do this for all dimensions except the first, e.g
resultsten[1,:,:,:,:,:]
gives the same result as:
resultsten[1,:]
I tried this:
resultsten[:,1]
but it does not give the required result, Python interprets it like this:
resultsten[:,1,:,:,:,:]
MWE:
def cleanup(resultsmat):
x = resultsmat[:,:,:,0,2]
y = resultsmat[:,:,:,0,1]
z = resultsmat[:,:,:,0,4]
return x,y,z
resultsten=np.arange(3*3*6*10*1*9).reshape(3,3,6,10,1,9)
x0,y0,z0 = cleanup(resultsten[0,:,:,:,:,:]) #works
x0,y0,z0 = cleanup(resultsten[0,0,:,:,:,:]) #does not work

I would use a list of slice objects:
import numpy as np
A = np.arange(2*3*4*5).reshape(2,3,4,5)
#[:] <-> [slice(None,None, None)]
sliceList = [slice(None, None, None)]*(len(A.shape)-1)
a,b,c,d,e = [A[sliceList+[i]] for i in range(A.shape[-1])]
Output:
>>> A[:,:,:,0]
array([[[ 0, 5, 10, 15],
[ 20, 25, 30, 35],
[ 40, 45, 50, 55]],
[[ 60, 65, 70, 75],
[ 80, 85, 90, 95],
[100, 105, 110, 115]]])
>>> a
array([[[ 0, 5, 10, 15],
[ 20, 25, 30, 35],
[ 40, 45, 50, 55]],
[[ 60, 65, 70, 75],
[ 80, 85, 90, 95],
[100, 105, 110, 115]]])

Related

scipy.stats.binned_statistic_dd() bin numbering has lots of extra bins

I'm struggling to deal with a scipy.stats.binned_statistic_dd() result. I have an array of positions and another array of ids that I'm binning in 3 directions. I'm providing a list of the bin edges as input rather than a number of bins in each direction coupled with a range option. I have 3 bins in x, 2 in y, and 3 in z, or 18 bins.
However, when I check the binnumbers listed, they are all in a range greater than 20. How do I get the bin numbers to reflect the number of bins provided and get rid of all the extra bins?
I've tried to follow what was suggested in this post (Output in scipy.stats.binned_statistic_dd()) which deals with something similar, but I can't understand how to apply this to my case. As usual, the documentation is as cryptic as ever.
Any help on get my binnumbers between 1-18 in this example would be greatly appreciated!
pos = np.array([[-0.02042167, -0.0223282 , 0.00123734],
[-0.0420364 , 0.01196078, 0.00694259],
[-0.09625651, -0.00311446, 0.06125461],
[-0.07693234, -0.02749618, 0.03617278],
[-0.07578646, 0.01199925, 0.02991888],
[-0.03258293, -0.00371765, 0.04245596],
[-0.06765955, 0.02798434, 0.07075846],
[-0.02431445, 0.02774102, 0.06719837],
[ 0.02798265, -0.01096739, -0.01658691],
[-0.00584252, 0.02043389, -0.00827088],
[ 0.00623063, -0.02642285, 0.03232817],
[ 0.00884222, 0.01498996, 0.02912483],
[ 0.07189474, -0.01541584, 0.01916607],
[ 0.07239394, 0.0059483 , 0.0740187 ],
[-0.08519159, -0.02894125, 0.10923724],
[-0.10803509, 0.01365444, 0.09555333],
[-0.0442866 , -0.00845725, 0.10361843],
[-0.04246779, 0.00396127, 0.1418258 ],
[-0.08975861, 0.02999023, 0.12713186],
[ 0.01772454, -0.0020405 , 0.08824418]])
ids = np.array([16, 9, 6, 19, 1, 4, 10, 5, 18, 11, 2, 12, 13, 8, 3, 17, 14,
15, 20, 7])
xbinEdges = np.array([-0.15298488, -0.05108961, 0.05080566, 0.15270093])
ybinEdges = np.array([-0.051, 0. , 0.051])
zbinEdges = np.array([-0.053, 0.049, 0.151, 0.253])
ret = stats.binned_statistic_dd(pos, ids, bins=[xbinEdges, ybinEdges, zbinEdges],
statistic='count', expand_binnumbers=False)
bincounts = ret.statistic
binnumber = ret.binnumber.T
>>> binnumber = array([46, 51, 27, 26, 31, 46, 32, 52, 46, 51, 46, 51, 66, 72, 27, 32, 47,
52, 32, 47], dtype=int64)
ranges = [[-0.15298488071, 0.15270092971],
[-0.051000000000000004, 0.051000000000000004],
[-0.0530000000000001, 0.25300000000000006]]
ret3 = stats.binned_statistic_dd(pos, ids, bins=(3,2,3), statistic='count', expand_binnumbers=False, range=ranges)
bincounts = ret3.statistic
binnumber = ret3.binnumber.T
>>> binnumber = array([46, 51, 27, 26, 31, 46, 32, 52, 46, 51, 46, 51, 66, 72, 27, 32, 47,
52, 32, 47], dtype=int64)

Ok, after several days of background thinking and a quick scour through the binned_statistic_dd() source code I think I've come to the correct answer and it's pretty simple.
It seem binned_statistic_dd() adds an extra set of outlier bins in the binning phase and then removes these when returning the histogram results, but leaving the bin numbers untouched (I think this is in case you want to reuse the result for further stats outputs).
So it seems that if you export the expanded binnumbers (expand_binnumbers=True) and then subtract 1 from each binnumber to re-adjust the bin indices you can calculate the "correct" bin ids.
ret2 = stats.binned_statistic_dd(pos, ids, bins=[xbinEdges, ybinEdges, zbinEdges],
statistic='count', expand_binnumbers=True)
bincounts2 = ret2.statistic
binnumber2 = ret2.binnumber
indxnum2 = binnumber2-1
corrected_bin_ids = np.ravel_multi_index((indxnum2),(numX, numY, numZ))
Quick and simple in the end!

Randomly pick index with constant interval

Is there any way randomly pick index of numpy array with constant interval.
For Example,
I have an array with shape (1, 150) that is 5*30 elements. I want to randomly pick x indices where x<=30 for each 30 elements. So, totally I will have x*5 randomly picked indices from the array.
First, I tried with np.chocie but I can't use np.choice, because it doesn't look the constant interval.
I can go with loop iteration for each elements but I feel it's not the effective way.
Is there any way in numpy?
I tried this, it gives the required result. but I want to improve the code
frames_picker = np.zeros(30*5)
samples=[]
for i in range(5):
sample = (np.random.choice(frames_picker[0: 30].shape[0], 5, replace=False))+(i*30)
samples.append(sample)
samples=np.array(samples)
frames_picker[np.sort(samples)]=1

>>> np.random.choice(30, size=(5,5), replace=False) + np.arange(0, 150, 30)[:,None]
array([[ 18, 28, 13, 6, 8],
[ 40, 56, 44, 57, 32],
[ 83, 71, 65, 81, 64],
[114, 115, 97, 90, 106],
[137, 121, 129, 142, 149]])
You can then flatten and sort them. This gives you 5 random indices from each interval [0, 29], [30, 59], ..., [120, 149].

You can convert the 1D array to a more suitable 2D matrix and then apply sampling. For example:
data = np.array(...) # 1D array with 5*30 elements
m = np.asmatrix(np.split(data, 5)) # chunk and convert to a matrix
sample = m[np.random.randint(m.shape[0], size=30), :] # get random 30 slices from the matrix
np.ravel(sample) # convert back to 1D array

how do i repeat my function but instead use the next three values in the list?

if i type:
microcar(np.array([[45, 10, 10], [110, 10, 8], [60, 10, 5], [170, 10, 4]]), np.array([[47, 10, 15], [112, 9, 8.5], [50, 10, 8], [160, 8.5, 5]]))
it returns:
(52.53219888177297, 85.09035245341184, -148.85032037263932, 18.5359684117836, 100, 150.0)
which is good, however i want it to repeat this code for the next set of 3 values and so on e.g. [110,10,8] for the expected and [50,10,8] for the actual.
i can't figure how to incorporate a loop, where it treats the next set of 3 values as the new one to look at.
Also, cos(45) = 0.707106 (45 degrees) however it treats the cos(45) = 0.5253 (as radians) is there a way to convert the settings to degrees?
Below is my code
import numpy as np
def microcar(expected, actual):
horizontal_expected = expected[0,1]*expected[0,2]*np.cos(expected[0,0])
vertical_expected = expected[0,1]*expected[0,2]*np.sin(expected[0,0])
horizontal_actual = actual[0,1]*actual[0,2]*np.cos(actual[0,0])
vertical_actual = actual[0,1]*actual[0,2]*np.sin(actual[0,0])
distance_expected = expected[0,1]*expected[0,2]
distance_actual = actual[0,1]*actual[0,2]
return horizontal_expected, vertical_expected, horizontal_actual, vertical_actual, distance_expected, distance_actual

You can zip the inputs and loop over them like this
import numpy as np
def microcar(expected, actual):
l = zip(expected, actual)
res = []
for e in l:
horizontal_expected = e[0][1]*e[0][2]*np.cos(e[0][0])
vertical_expected = e[0][1]*e[0][2]*np.sin(e[0][0])
horizontal_actual = e[1][1]*e[1][2]*np.cos(e[1][0])
vertical_actual = e[1][1]*e[1][2]*np.sin(e[1][0])
distance_expected = e[0][1]*e[0][2]
distance_actual = e[1][1]*e[1][2]
res.append([
horizontal_expected,
vertical_expected,
horizontal_actual,
vertical_actual,
distance_expected,
distance_actual
])
return res
x = microcar(
np.array([[45, 10, 10], [110, 10, 8], [60, 10, 5], [170, 10, 4]]),
np.array([[47, 10, 15], [112, 9, 8.5], [50, 10, 8], [160, 8.5, 5]])
)
print(x)
The output:
[
[52.53219888177297, 85.09035245341184, -148.85032037263932, 18.5359684117836, 100, 150.0],
[-79.92166506517184, -3.539414246805677, 34.88163648998712, -68.08466373406274, 80, 76.5],
[-47.62064902075782, -15.240531055110834, 77.19728227936906, -20.9899882963143, 50, 80.0],
[37.51979008477766, 13.865978219881212, -41.46424579379759, 9.325573481107702, 40, 42.5]
]
I don't know what kind of output you expected, so this simply returns a list of lists with the results.
As for your question about np.cos, it expects input in radians, so you could convert the degrees to radians through np.deg2rad:
import numpy as np
print(np.cos(np.deg2rad(45))
# 0.7071067811865476
Without using zip, you can create a range equal to the length of one the arrays and loop over that, using the values (in this case i) to index into the arrays, in the following way
import numpy as np
def microcar(expected, actual):
res = []
for i in range(len(expected)):
horizontal_expected = expected[i,1]*expected[i,2]*np.cos(expected[i,0])
vertical_expected = expected[i,1]*expected[i,2]*np.sin(expected[i,0])
horizontal_actual = actual[i,1]*actual[i,2]*np.cos(actual[i,0])
vertical_actual = actual[i,1]*actual[i,2]*np.sin(actual[i,0])
distance_expected = expected[i,1]*expected[i,2]
distance_actual = actual[i,1]*actual[i,2]
res.append([
horizontal_expected,
vertical_expected,
horizontal_actual,
vertical_actual,
distance_expected,
distance_actual
])
return res
x = microcar(
np.array([[45, 10, 10], [110, 10, 8], [60, 10, 5], [170, 10, 4]]),
np.array([[47, 10, 15], [112, 9, 8.5], [50, 10, 8], [160, 8.5, 5]])
)
print(x)
Output:
[
[52.53219888177297, 85.09035245341184, -148.85032037263932, 18.5359684117836, 100, 150.0],
[-79.92166506517184, -3.539414246805677, 34.88163648998712, -68.08466373406274, 80, 76.5],
[-47.62064902075782, -15.240531055110834, 77.19728227936906, -20.9899882963143, 50, 80.0],
[37.51979008477766, 13.865978219881212, -41.46424579379759, 9.325573481107702, 40, 42.5]
]
Note that this assumes that both inputs are of equal length. If they are not, you will likely encounter an IndexError exception. This assumption holds for zip as well, but there, you would "lose" the surplus entries in the longer array.

Finding the closest to value in two datasets using a for loop

In MATLAB, I am able to find to identify the values in data_b that come closest to the values in data_a, alongside the indices that indicate in which place in the matrix they occur, with the following code:
clear all; close all; clc;
data_a = [0; 15; 30; 45; 60; 75; 90];
data_b = randi([0, 90], [180, 101]);
[rows_a,cols_a] = size(data_a);
[rows_b,cols_b] = size(data_b);
val1 = zeros(rows_a,cols_b);
ind1 = zeros(rows_a,cols_b);
for i = 1:cols_b
for j = 1:rows_a
[val1(j,i),ind1(j,i)] = min(abs(data_b(:,i) - data_a(j)));
end
end
Since I would like to phase out MATLAB (I will be out of a license eventually), I decided to try the same in python, without any luck:
import numpy as np
data_a = np.array([[0],[15],[30],[45],[60],[75],[90]])
data_b = np.random.randint(91, size=(180, 101))
[rows_a,cols_a] = data_a.shape
[rows_b,cols_b] = data_b.shape
val1 = np.zeros((rows_a,cols_b))
ind1 = np.zeros((rows_a,cols_b))
for i in range(cols_b):
for j in range(rows_a):
[val1[j][i],ind1[j][i]] = np.amin(np.abs(data_b[:][i] - data_a[j]))
The code also produced an error that made me none the wiser:
TypeError: cannot unpack non-iterable numpy.int32 object
If anyone could find time to explain why I am an ignorant fool by indicating what I did wrong, and what I could do to fix it, I would be grateful as this has proven to become a major obstacle for my progress.
Thank you.

I think you are facing two problems:
Incorrect use of slicing for multidimensional arrays: use [i, j] instead of [i][j]
Improper translation of min() from MATLAB to NumPy: you have to use both argmin() and min().
Your fixed code would look like:
import numpy as np
# just to make it reproducible in testing, can be commented for production
np.random.seed(0)
data_a = np.array([[0],[15],[30],[45],[60],[75],[90]])
data_b = np.random.randint(91, size=(180, 101))
[rows_a,cols_a] = data_a.shape
[rows_b,cols_b] = data_b.shape
val1 = np.zeros((rows_a,cols_b), dtype=int)
ind1 = np.zeros((rows_a,cols_b), dtype=int)
for i in range(cols_b):
for j in range(rows_a):
ind1[j, i] = np.argmin(np.abs(data_b[:, i] - data_a[j]))
val1[j, i] = np.min(np.abs(data_b[:, i] - data_a[j])[ind1[j, i]])
However, I would avoid direct looping here and I would make good use of broadcasting:
import numpy as np
# just to make it reproducible in testing, can be commented for production
np.random.seed(0)
data_a = np.arange(0, 90 + 1, 15).reshape((-1, 1, 1))
data_b = np.random.randint(90 + 1, size=(1, 180, 101))
tmp_arr = np.abs(data_a.reshape(-1, 1, 1) - data_b.reshape(1, 180, -1), dtype=int)
min_idxs = np.argmin(tmp_arr, axis=1)
min_vals = np.min(tmp_arr, axis=1)
del tmp_arr # you can delete this if you no longer need it
where now ind1 == min_idxs and val1 == min_vals, i.e.:
print(np.all(min_idxs == ind1))
# True
print(np.all(min_vals == val1))
# True

Your error has to do with "[val1[j][i],ind1[j][i]] = (a single number)". You are trying to assign a single value to it which doesn't work in python. What about this?
import numpy as np
data_a = np.array([[0],[15],[30],[45],[60],[75],[90]])
data_b = np.random.randint(91, size=(180,101))
[rows_a,cols_a] = data_a.shape
[rows_b,cols_b] = data_b.shape
val1 = np.zeros((rows_a,cols_b))
ind1 = np.zeros((rows_a,cols_b))
for i in range(cols_b):
for j in range(rows_a):
array = np.abs(data_b[:][i] - data_a[j])
val = np.amin(array)
val1[j][i] = val
ind1[j][i] = np.where(val == array)[0][0]
Numpy amin does not return an index so you need to return it using np.where. This example does not store the full index, only the index of the first occurrence in the row. Then you can pull it out since your row order matches your column order in ind1 and data_b. So for instance on the first iteration.
In [2]: np.abs(data_b[:][0] - data_a[j0])
Out[2]:
array([ 3, 31, 19, 53, 28, 81, 10, 11, 89, 15, 50, 22, 40, 81, 43, 29, 63,
72, 22, 37, 54, 12, 19, 78, 85, 78, 37, 81, 41, 24, 29, 56, 37, 86,
67, 7, 38, 27, 83, 81, 66, 32, 68, 29, 71, 26, 12, 27, 45, 58, 17,
57, 54, 55, 23, 21, 46, 58, 75, 10, 25, 85, 70, 76, 0, 11, 19, 83,
81, 68, 8, 63, 72, 48, 18, 29, 0, 47, 85, 79, 72, 85, 28, 28, 7,
41, 80, 56, 59, 44, 82, 33, 42, 23, 42, 89, 58, 52, 44, 65, 65])
In [3]: np.amin(array)
Out[3]: 0
In [4]: val
Out[4]: 0
In [5]: np.where(val == array)[0][0]
Out[5]: 69
In [6]: data_b[0,69]
Out[6]: 0

Blockproc like function for Python image processing

edit: it's an image so the suggested (How can I efficiently process a numpy array in blocks similar to Matlab's blkproc (blockproc) function) isn't really working for me
I have the following matlab code
fun = #(block_struct) ...
std2(block_struct.data) * ones(size(block_struct.data));
B=blockproc(im2double(Icorrected), [4 4], fun);
I want to remake my code, but this time in Python. I have installed Scikit and i'm trying to work around it like this
b = np.std(a, axis = 2)
The problem of course it's that i'm not applying the std for a number of blocks, just like above.
How can i do something like this? Start a loop and try to call the function for each X*X blocks? Then i wouldn't keep the size the it was.
Is there another more efficient way?

If there is no overlap in the windows you can reshape the data to suit your needs:
Find the mean of 3x3 windows of a 9x9 array.
import numpy as np
>>> a
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59, 60, 61, 62],
[63, 64, 65, 66, 67, 68, 69, 70, 71],
[72, 73, 74, 75, 76, 77, 78, 79, 80]])
Find the new shape
>>> window_size = (3,3)
>>> tuple(np.array(a.shape) / window_size) + window_size
(3, 3, 3, 3)
>>> b = a.reshape(3,3,3,3)
Find the mean along the first and third axes.
>>> b.mean(axis = (1,3))
array([[ 10., 13., 16.],
[ 37., 40., 43.],
[ 64., 67., 70.]])
>>>
2x2 windows of a 4x4 array:
>>> a = np.arange(16).reshape((4,4))
>>> window_size = (2,2)
>>> tuple(np.array(a.shape) / window_size) + window_size
(2, 2, 2, 2)
>>> b = a.reshape(2,2,2,2)
>>> b.mean(axis = (1,3))
array([[ 2.5, 4.5],
[ 10.5, 12.5]])
>>>
It won't work if the window size doesn't divide into the array size evenly. In that case you need some overlap in the windows or if you just want overlap numpy.lib.stride_tricks.as_strided is the way to go - a generic N-D function can be found at Efficient Overlapping Windows with Numpy
Another option for 2d arrays is sklearn.feature_extraction.image.extract_patches_2d and for ndarray's - sklearn.feature_extraction.image.extract_patches. Each manipulate the array's strides to produce the patches/windows.

I did the following
io.use_plugin('pil', 'imread')
a = io.imread('C:\Users\Dimitrios\Desktop\polimesa\\arizona.jpg')
B = np.zeros((len(a)/2 +1, len(a[0])/2 +1))
for i in xrange(0, len(a), 2):
for j in xrange(0, len(a[0]), 2):
x.append(a[i][j])
if i+1 < len(a):
x.append(a[i+1][j])
if j+1 < len(a[0]):
x.append(a[i][j+1])
if i+1 < len(a) and j+1 < len(a[0]):
x.append(a[i+1][j+1])
B[i/2][j/2] = np.std(x)
x[:] = []
and i think it's correct. Iterating over the image by 2 and taking each neighbour node, adding them to a list and calculating std.
edit* later edited for 4x4 blocks.

We can implement blockproc() in python the following way:
def blockproc(im, block_sz, func):
h, w = im.shape
m, n = block_sz
for x in range(0, h, m):
for y in range(0, w, n):
block = im[x:x+m, y:y+n]
block[:,:] = func(block)
return im
Now, let's apply it to implement contrast enhancement with local histogram equalization, with the low-contrast moon image (of size 512x512) as input and choosing 32x32 blocks:
from skimage import data, exposure
img = data.moon()
img = img / img.max()
m, n = 64, 64
img_eq = blockproc(img.copy(), (m, n), exposure.equalize_hist)
Display the input and output images:
Note that the function does in-place modification to the image, hence a copy of the input image is passed instead.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Variable amount of dimensions in slice - python

Related

scipy.stats.binned_statistic_dd() bin numbering has lots of extra bins

Randomly pick index with constant interval

how do i repeat my function but instead use the next three values in the list?

Finding the closest to value in two datasets using a for loop

Blockproc like function for Python image processing

Categories

Resources