Cannot unflat numpy arrays properly - python

I am triying to 'unflat' numpy arrays for a tensorflow topic. I need to take a NxN matrix, for example 27x27, and then, row by row take all the row elements (27 each time) and reshape this to 3x3x3 maps (with the 27 columns I will get 3x3x3 x 27maps), and I did the following function:
def unflat_pca(flated_patches, depth=3, verbose=False):
# tensor with shape [components_width, components_height]
p_width = flated_patches.shape[0]
p_height = flated_patches.shape[1]
# Utilizo 3x3 por la ventana de la convolucion
res = np.empty((3,3, depth, p_width))
for one_map in range(p_width):
map_unflat = np.empty((3,3, depth))
current_indx = 0
for d in range(depth):
# flated_patches matriz cuadrada de pca (PxP)
map_unflat[:,:,d] = flated_patches[one_map, current_indx:(current_indx+(3*3))].reshape(3,3)
current_indx += 3*3
res[:,:, d, one_map] = map_unflat[:,:,d]
if verbose:
print("\n-- unflat_pca function --")
print("The initial shape was: " + str(flated_patches.shape))
print("The output shape is: " + str(res.shape) + "\n")
return res #[width, height, depth, pca_maps]
Then when I try to test the function, I pass an easy to follow array (0,1,2...) to try to observe if the function works properly...
utest = unflat_pca(np.arange(0, 27*27).reshape(27,27), verbose=True)
And I get
-- unflat_pca function --
The initial shape was: (27, 27)
The output shape is: (3, 3, 3, 27)
Perfect! But now, when I inspect the result, for example with utest[:,:,:,0], I expect that all the numbers in the same array go as 1,2,3.... but got
array([[[ 0., 9., 18.],
[ 1., 10., 19.],
[ 2., 11., 20.]],
[[ 3., 12., 21.],
[ 4., 13., 22.],
[ 5., 14., 23.]],
[[ 6., 15., 24.],
[ 7., 16., 25.],
[ 8., 17., 26.]]])
But If I only inspect the first channel I got what I expected.
> array([[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.]])
I am confused because later I use the unflated maps and I am getting bad results, I think that its due to the first result where I obtain the numbers not correctly (by columns?!). Could you help me? Sorry about my english :P
PS: Expected Value of utest[:,:,:,0] -> the 3x3x3 maps ordered (width, height, depth):
array([[[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.]],
[[ 9., 10., 11.],
[ 12., 13., 14.],
[ 15., 16., 17.]],
[[ 18., 19., 20.],
[ 21., 22., 23.],
[ 24., 25., 26.]]])
PS2: Example for first row via paper: First row result

Reshape & permute axes -
a.reshape(-1,3,3,3).transpose(1,2,3,0)
Sample run -
In [482]: a = np.arange(27*27).reshape(27,27)
In [483]: out = a.reshape(-1,3,3,3).transpose(1,2,3,0)
# Verify output shape
In [484]: out.shape
Out[484]: (3, 3, 3, 27)
# Verify output values for the first slice
In [485]: out[...,0]
Out[485]:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])

Related

Clean way to return default when taking minimum of empty NumPy array

I have two arrays, one holding a series of years and another holding some quantities. I want to study for each year how long it takes for the quantity to double.
For this I wrote this code:
years = np.arange(2020, 2060)
qma = np.array([8.00000000e+13, 8.14928049e+13, 8.30370113e+13, 8.46353044e+13,
8.62905581e+13, 8.80058517e+13, 8.97844887e+13, 9.16300175e+13,
9.35462542e+13, 9.55373083e+13, 9.76076116e+13, 9.97619497e+13,
1.02005499e+14, 1.04343864e+14, 1.06783128e+14, 1.09329900e+14,
1.11991375e+14, 1.14775397e+14, 1.17690539e+14, 1.20746183e+14,
1.23952624e+14, 1.27321176e+14, 1.30864305e+14, 1.34595778e+14,
1.38530838e+14, 1.74048570e+14, 1.92205500e+14, 2.14405932e+14,
2.42128686e+14, 2.77655470e+14, 3.24688168e+14, 3.89624819e+14,
4.84468500e+14, 6.34373436e+14, 9.74364148e+14, 2.33901669e+15,
1.78934647e+16, 4.85081278e+20, 8.63469750e+21, 2.08204297e+22])
def doubling_year(idx):
try:
return years[qma >= 2*qma[idx]].min()
except ValueError:
return np.nan
years_until_doubling = [doubling_year(idx) - years[idx]
for idx in range(len(years))]
This works as I expect, but having to define a named function for what is essentially a one-liner feels wrong. Is there a cleaner and more succing way of replicating this behaviour?
For each year in the series, the number of years in which the quantity is double or more the original quantity can be computed through broadcasting. Then you simply have to subtract that number from the number of years remaining in the series and replace the 0's with np.nan's.
In [426]: n_doubled = np.sum(qma[None, :] >= 2*qma[:, None], axis=1)
In [427]: n_doubled
Out[427]:
array([15, 15, 15, 15, 15, 14, 14, 14, 14, 14, 13, 13, 13, 13, 13, 12, 12,
12, 12, 12, 11, 11, 11, 11, 11, 9, 9, 8, 8, 7, 6, 6, 6, 5,
5, 4, 3, 2, 1, 0])
In [428]: np.where(n_doubled, np.arange(len(years), 0, -1) - n_doubled, np.nan)
Out[428]:
array([25., 24., 23., 22., 21., 21., 20., 19., 18., 17., 17., 16., 15.,
14., 13., 13., 12., 11., 10., 9., 9., 8., 7., 6., 5., 6.,
5., 5., 4., 4., 4., 3., 2., 2., 1., 1., 1., 1., 1.,
nan])

How to stack uneven numpy arrays?

how can I stack the elements from the same respective index from each array in a list of arrays?
arrays = [np.array([1,2,3,4,5]),
np.array([6,7,8,9]),
np.array([11,22,33,44,55]),
np.array([2,4])]
output = [[1,6,11,2],
[2,7,22,4],
[3,8,33],
[4,9,44],
[5,55]]
arrays is a list of arrays of uneven lengths. The output has a first array (don't mind if it's a list too) that contains all possible index 0s from each array. The next array within output contains all possible index 1s and so on...
Closest thing I can find (but requires same shape arrays) is:
a = np.array([1, 2, 3])
b = np.array([2, 3, 4])
np.stack((a, b), axis=-1)
# which gives
array([[1, 2],
[2, 3],
[3, 4]])
Thanks.
This gets you close. You can't really have a 2D sparse array as shown in your example output.
import numpy as np
arrays = [np.array([1,2,3,4,5]),
np.array([6,7,8,9]),
np.array([11,22,33,44,55]),
np.array([2,4])]
maxx = max(x.shape[0] for x in arrays)
for x in arrays:
x.resize(maxx,refcheck=False)
output = np.stack(arrays, axis=1)
print(output)
C:\tmp>python x.py
[[ 1 6 11 2]
[ 2 7 22 4]
[ 3 8 33 0]
[ 4 9 44 0]
[ 5 0 55 0]]
You could just wrap it in a DataFrame first:
arr = pd.DataFrame(arrays).values.T
Output:
array([[ 1., 6., 11., 2.],
[ 2., 7., 22., 4.],
[ 3., 8., 33., nan],
[ 4., 9., 44., nan],
[ 5., nan, 55., nan]])
Though if you really want it with different sizes, go with:
arr = [x.dropna().values for _, x in pd.DataFrame(arrays).iteritems()]
Output:
[array([ 1, 6, 11, 2]),
array([ 2, 7, 22, 4]),
array([ 3., 8., 33.]),
array([ 4., 9., 44.]),
array([ 5., 55.])]

Take N first values from every row in NumPy matrix that fulfill condition

I have a numpy vector, and a numpy array.
I need to take from every row in the matrix the first N (lets say 3) values that are smaller than (or equal to) the corresponding line in the vector.
so if this is my vector:
7,
9,
22,
38,
6,
15
and this is my matrix:
[[ 20., 9., 7., 5., None, None],
[ 33., 21., 18., 9., 8., 7.],
[ 31., 21., 13., 12., 4., 0.],
[ 36., 18., 11., 7., 7., 2.],
[ 20., 14., 10., 6., 6., 3.],
[ 14., 14., 13., 11., 5., 5.]]
the output should be:
[[7,5,None],
[9,8,7],
[21,13,12],
[36,18,11],
[6,6,3],
14,14,13]]
Is there any efficient way to do that with masks or something, without an ugly for loop?
Any help will be appreciated!
Approach #1
Here's one with broadcasting -
def takeN_le_per_row_broadcasting(a, b, N=3): # a, b : 1D, 2D arrays respectively
# First col indices in each row of b with <= corresponding one in a
idx = (b <= a[:,None]).argmax(1)
# Get all N ranged column indices
all_idx = idx[:,None] + np.arange(N)
# Finally advanced-index with those indices into b for desired output
return b[np.arange(len(all_idx))[:,None], all_idx]
Approach #2
Inspired by NumPy Fancy Indexing - Crop different ROIs from different channels's solution, we can leverage np.lib.stride_tricks.as_strided for efficient patch extraction, like so -
from skimage.util.shape import view_as_windows
def takeN_le_per_row_strides(a, b, N=3): # a, b : 1D, 2D arrays respectively
# First col indices in each row of b with <= corresponding one in a
idx = (b <= a[:,None]).argmax(1)
# Get 1D sliding windows for each element off data
w = view_as_windows(b, (1,N))[:,:,0]
# Use fancy/advanced indexing to select the required ones
return w[np.arange(len(idx)), idx]

Simple way of stacking arrays with index offset

I have a number of time series, each containing measurements across weeks of the year, but not all of them start and end on the same weeks. I know the offsets, that is I know in what weeks each one starts and ends. Now I would like to combine them into a matrix respecting the inherent offsets, such that all values will align with the correct week numbers.
If the horizontal direction contains the series and vertical direction represents the weeks, given two series a and b, where values correspond to week numbers:
a = np.array([[1,2,3,4,5,6]])
b = np.array([[0,1,2,3,4,5]])
I want to know if is it possible to combine them, e.g. using some method that takes an offset argument in a fashion like combine((a, b), axis=0, offset=-1), such that the resulting array (lets call it c) looks like this:
print c
[[NaN 1 2 3 4 5 6 ]
[0 1 2 3 4 5 NaN]]
What more is, since the time series are enormous, I must stream them through my program, and therefore cannot know all offsets at the same time. I thought of using Pandas because it has nice indexing, but I felt there had to be a simpler way, since the essence of what I'm trying to do is super simple.
Update:
This seems to work
def offset_stack(a, b, offset=0):
if offset < 0:
a = np.insert(a, [0] * abs(offset), np.nan)
b = np.append(b, [np.nan] * abs(offset))
if offset > 0:
a = np.append(a, [np.nan] * abs(offset))
b = np.insert(b, [0] * abs(offset), np.nan)
return np.concatenate(([a],[b]), axis=0)
You can do in numpy:
def f(a, b, n):
v = np.empty(abs(n))*np.nan
if np.sign(n)==-1:
return np.vstack((np.append(a,v), np.append(v,b)))
elif np.sign(n)==1:
return np.vstack((np.append(v,a), np.append(b,v)))
else:
return np.vstack((a,b))
#In [148]: a = np.array([23, 13, 4, 12, 4, 4])
#In [149]: b = np.array([4, 12, 3, 41, 45, 6])
#In [150]: f(a,b,-2)
#Out[150]:
#array([[ 23., 13., 4., 12., 4., 4., nan, nan],
# [ nan, nan, 4., 12., 3., 41., 45., 6.]])
#In [151]: f(a,b,2)
#Out[151]:
#array([[ nan, nan, 23., 13., 4., 12., 4., 4.],
# [ 4., 12., 3., 41., 45., 6., nan, nan]])
#In [152]: f(a,b,0)
#Out[152]:
#array([[23, 13, 4, 12, 4, 4],
# [ 4, 12, 3, 41, 45, 6]])
There is a real simple way to accomplish this.
You basically want to pad and then stack your arrays and for both there are numpy functions:
numpy.lib.pad() aka offset
a = np.array([[1,2,3,4,5,6]], dtype=np.float_) # float because NaN is a float value!
b = np.array([[0,1,2,3,4,5]], dtype=np.float_)
from numpy.lib import pad
print(pad(a, ((0,0),(1,0)), mode='constant', constant_values=np.nan))
# [[ nan 1. 2. 3. 4. 5. 6.]]
print(pad(b, ((0,0),(0,1)), mode='constant', constant_values=np.nan))
# [[ 0., 1., 2., 3., 4., 5., nan]]
The ((0,0)(1,0)) means just no padding in the first axis (top/bottom) and only pad one element left and no element on the right. So you have to tweak these if you want more/less shift.
numpy.vstack() aka stack along axis=0
import numpy as np
a_padded = pad(a, ((0,0),(1,0)), mode='constant', constant_values=np.nan)
b_padded = pad(b, ((0,0),(0,1)), mode='constant', constant_values=np.nan)
np.vstack([a_padded, b_padded])
# array([[ nan, 1., 2., 3., 4., 5., 6.],
# [ 0., 1., 2., 3., 4., 5., nan]])
Your function:
Combining these two would be very easy and is easy to extend:
from numpy.lib import pad
import numpy as np
def offset_stack(a, b, axis=0, offsets=(0, 1)):
if (len(offsets) != a.ndim) or (a.ndim != b.ndim):
raise ValueError('Offsets and dimensions of the arrays do not match.')
offset1 = [(0, -offset) if offset < 0 else (offset, 0) for offset in offsets]
offset2 = [(-offset, 0) if offset < 0 else (0, offset) for offset in offsets]
a_padded = pad(a, offset1, mode='constant', constant_values=np.nan)
b_padded = pad(b, offset2, mode='constant', constant_values=np.nan)
return np.concatenate([a_padded, b_padded], axis=axis)
offset_stack(a, b)
This function works for generalized offsets in arbitary dimensions and can stack in arbitary dimensions. It doesn't work in the same way as the original since you pad the second dimension just passing in offset=1 would pad in the first dimension. But if you keep track of the dimensions of your arrays it should work fine.
For example:
offset_stack(a, b, offsets=(1,2))
array([[ nan, nan, nan, nan, nan, nan, nan, nan],
[ nan, nan, 1., 2., 3., 4., 5., 6.],
[ 0., 1., 2., 3., 4., 5., nan, nan],
[ nan, nan, nan, nan, nan, nan, nan, nan]])
or for 3d arrays:
a = np.array([1,2,3], dtype=np.float_)[None, :, None] # makes it 3d
b = np.array([0,1,2], dtype=np.float_)[None, :, None] # makes it 3d
offset_stack(a, b, offsets=(0,1,0), axis=2)
array([[[ nan, 0.],
[ 1., 1.],
[ 2., 2.],
[ 3., nan]]])
pad and concatenate (and the various stack and inserts) create a target array of the right size, and fill values from the input arrays. So we can do the same, and potentially do it faster.
Just for example using your 2 arrays and the 1 step offset:
In [283]: a = np.array([[1,2,3,4,5,6]])
In [284]: b = np.array([[0,1,2,3,4,5]])
create the target array, and fill it with the pad value. np.nan is a float (even though a is int):
In [285]: m=a.shape[0]+b.shape[0]
In [286]: n=a.shape[1]+1
In [287]: c=np.zeros((m,n),float)
In [288]: c.fill(np.nan)
Now just copy values into the right places on the target. More arrays and offsets will require some generalization here.
In [289]: c[:a.shape[0],1:]=a
In [290]: c[-b.shape[0]:,:-1]=b
In [291]: c
Out[291]:
array([[ nan, 1., 2., 3., 4., 5., 6.],
[ 0., 1., 2., 3., 4., 5., nan]])

Conditional indexing with Numpy ndarray

I have a Numpy ndarray matrix of float values and I need to select spesific rows where certain columns have values satisfying certain criteria. For example lets say I have the following numpy matrix:
matrix = np.ndarray([4, 5])
matrix[0,:] = range(1,6)
matrix[1,:] = range(6,11)
matrix[2,:] = range(11,16)
matrix[3,:] = range(16,21)
Lets say I want to select rows from the matrix where the first column's value is between 1 and 6 and the value of second column is between 2-7.
How can I get the row-indexes of the matrix where these conditions are satisfied? What about if I want to delete the rows that satisfy the conditional criterion?
For a numpy based solution, you can use numpy.where and then get the row indexes from it and then use it for indexing you matrix. Example -
matrix[np.where((1 <= matrix[:,0]) & (matrix[:,0] <= 6)
& (2 <= matrix[:,1]) & (matrix[:,1] <= 7))]
Demo -
In [169]: matrix
Out[169]:
array([[ 1., 2., 3., 4., 5.],
[ 6., 7., 8., 9., 10.],
[ 11., 12., 13., 14., 15.],
[ 16., 17., 18., 19., 20.]])
In [170]: matrix[np.where((1 <= matrix[:,0]) & (matrix[:,0] <= 6)
.....: & (2 <= matrix[:,1]) & (matrix[:,1] <= 7))]
Out[170]:
array([[ 1., 2., 3., 4., 5.],
[ 6., 7., 8., 9., 10.]])
Another method , as indicated in the comments would be to use boolean masks, Example -
mask = ((1 <= matrix[:,0]) & (matrix[:,0] <= 6)
& (2 <= matrix[:,1]) & (matrix[:,1] <= 7))
matrix[mask,:]
Demo -
In [41]: matrix
Out[41]:
array([[ 1., 2., 3., 4., 5.],
[ 6., 7., 8., 9., 10.],
[ 11., 12., 13., 14., 15.],
[ 16., 17., 18., 19., 20.]])
In [42]: mask = ((1 <= matrix[:,0]) & (matrix[:,0] <= 6)
....: & (2 <= matrix[:,1]) & (matrix[:,1] <= 7))
In [43]:
In [43]: matrix[mask,:]
Out[43]:
array([[ 1., 2., 3., 4., 5.],
[ 6., 7., 8., 9., 10.]])
You can get the indices with :
rows = np.logical_and(0 < matrix[:, 0], < matrix[:, 0] < 6 ) * np.logical_and(1 < matrix[:, 1], matrix[:, 1] < 7)
Then newMatrix = np.delete(matrix, rows, axis = 0)
You mentioned MATLAB. Here's the equivalent to the accepted answer using Octave
octave:17> ma=reshape(1:20,5,4)
ma =
1 6 11 16
2 7 12 17
3 8 13 18
4 9 14 19
5 10 15 20
octave:18> mask=(1<=ma(1,:))&(ma(1,:)<=6)&(2<=ma(2,:))&(ma(2,:)<=7)
mask =
1 1 0 0
octave:19> ma(:,mask)
ans =
1 6
2 7
3 8
4 9
5 10
The accepted answer without where is:
In [592]: mask=(1 <= matrix[:,0]) & (matrix[:,0] <= 6) &(2 <= matrix[:,1]) & (matrix[:,1] <= 7)
In [593]: matrix[mask,:]
Out[593]:
array([[ 1., 2., 3., 4., 5.],
[ 6., 7., 8., 9., 10.]])
I switched rows and columns in the Octave version because that is its natural way of generating the same numbers (MATLAB/Octave use the equivalent of numpys 'F' order - see below).
The other changes are 0 v 1 start index, and () v []. Otherwise the two notations are similar.
A simpler way to generate the matrix in numpy:
In [594]: np.arange(1,21).reshape(4,5)
Out[594]:
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20]])
Or with the MATLAB layout:
In [595]: np.arange(1,21).reshape(5,4,order='F')
Out[595]:
array([[ 1, 6, 11, 16],
[ 2, 7, 12, 17],
[ 3, 8, 13, 18],
[ 4, 9, 14, 19],
[ 5, 10, 15, 20]])
Get row indices:
row_indices = [x for x in range(4) if matrix[x][0] in range(1,7) and matrix[x][1] in range(2,8)]
Delete rows:
indices = [x for x in range(4) if not( matrix[x][0] in range(1,7) and matrix[x][1] in range(2,8))]
new_matrix = matrix[indices]

Categories