I want to reverse reshaped numpy by calling reshape again on the array to reshape it into the original dimensions.
I have an array trian_x with dimensions (x, y, z) then I reshape train_x
train_X_1 = train_X.reshape(train_X.shape[0], train_X.shape[1] * train_X.shape[2])
then I want to reverse the reshaped
train_X_2 = train_X_1.reshape((train_X.shape[0], train_X.shape[1], train_X.shape[2])
when I compare
print((train_X_2 == train_X).all())
I get False
what's wrong with my code? thanks
Are you just trying this:
In [184]: x = np.arange(24).reshape(2,3,4)
In [185]: x1 = x.reshape(2,12)
In [186]: x2 = x1.reshape(2,3,4)
In [187]: np.allclose(x,x2)
Out[187]: True
What's your dtype? allclose is better for floats.
In [218]: data = np.load('../Downloads/train_X.npy')
In [219]: data.shape
Out[219]: (97848, 20, 2)
In [220]: data.dtype
Out[220]: dtype('float64')
In [221]: data1 = data.reshape(data.shape[0], data.shape[1]*data.shape[2])
In [222]: data1.shape
Out[222]: (97848, 40)
In [223]: data2 = data1.reshape(data.shape)
In [224]: data2.shape
Out[224]: (97848, 20, 2)
In [225]: np.allclose(data, data2)
Out[225]: False
In [226]: np.max(np.abs(data - data2))
Out[226]: nan
In [247]: np.isnan(data).sum()
Out[247]: 2514
In [248]: np.isnan(data2).sum()
Out[248]: 2514
There's your problem - the array contains nan, which don't test ==. Let's compare without those nan:
In [251]: np.allclose(np.nan_to_num(data),np.nan_to_num(data2))
Out[251]: True
It sounds like you want to flatten, then reverse, then reshape.
starting with an array:
import numpy as np
arr = np.arange(6).reshape((2,3)) #[[0, 1, 2,], [3, 4, 5]]
We can flatten into a 1D array using ravel
arr = arr.ravel() #[0,1,2,3,4,5]
We can then reverse the order
arr = arr[::-1] #[5,4,3,2,1,0]
Then we reshape it
arr.reshape(2,3) #[[5, 4, 3], [2, 1, 0]]
Altogether:
import numpy as np
arr = np.arange(6).reshape((2,3))
arr = arr.ravel()[::-1].reshape(2,3)
print(arr)
Related
I am trying to create permutations of size 4 from a group of real numbers. After that, I'd like to know the position of the first element in a permutation after I sort it. Here is what I have tried so far. What's the best way to do this?
import numpy as np
from itertools import chain, permutations
N_PLAYERS = 4
N_STATES = 60
np.random.seed(0)
state_space = np.linspace(0.0, 1.0, num=N_STATES, retstep=True)[0].tolist()
perms = permutations(state_space, N_PLAYERS)
perms_arr = np.fromiter(chain(*perms),dtype=np.float16)
def loc(row):
return np.where(np.argsort(row) == 0)[0].tolist()[0]
locs = np.apply_along_axis(loc, 0, perms)
In [153]: N_PLAYERS = 4
...: N_STATES = 60
...: np.random.seed(0)
...: state_space = np.linspace(0.0, 1.0, num=N_STATES, retstep=True)[0].tolist()
...: perms = itertools.permutations(state_space, N_PLAYERS)
In [154]: alist = list(perms)
In [155]: len(alist)
Out[155]: 11703240
Simply making a list from the permuations produces a list of lists, with all sublists of length N_PLAYERS.
Making an array from that with chain flattens it:
In [156]: perms = itertools.permutations(state_space, N_PLAYERS)
In [158]: perms_arr = np.fromiter(itertools.chain(*perms),dtype=np.float16)
In [159]: perms_arr.shape
Out[159]: (46812960,)
In [160]: alist[0]
Which could be reshaped to (11703240,4).
Using apply on that 1d array doesn't work (or make sense):
In [170]: perms_arr.shape
Out[170]: (46812960,)
In [171]: locs = np.apply_along_axis(loc, 0, perms_arr)
In [172]: locs.shape
Out[172]: ()
Reshape to 4 columns:
In [173]: locs = np.apply_along_axis(loc, 0, perms_arr.reshape(-1,4))
In [174]: locs.shape
Out[174]: (4,)
In [175]: locs
Out[175]: array([ 0, 195054, 578037, 769366])
This applies loc to each column, returning one value for each. But loc has a row variable. Is that supposed to be significant?
I could switch the axis; this takes much longer, and al
In [176]: locs = np.apply_along_axis(loc, 1, perms_arr.reshape(-1,4))
In [177]: locs.shape
Out[177]: (11703240,)
list comprehension
This iteration does the same thing as your apply_along_axis, and I expect is faster (though I haven't timed it - it's too slow).
In [188]: locs1 = np.array([loc(row) for row in perms_arr.reshape(-1,4)])
In [189]: np.allclose(locs, locs1)
Out[189]: True
whole array sort
But argsort takes an axis, so I can sort all rows at once (instead of iterating):
In [185]: np.nonzero(np.argsort(perms_arr.reshape(-1,4), axis=1)==0)
Out[185]:
(array([ 0, 1, 2, ..., 11703237, 11703238, 11703239]),
array([0, 0, 0, ..., 3, 3, 3]))
In [186]: np.allclose(_[1],locs)
Out[186]: True
Or going the other direction: - cf with Out[175]
In [187]: np.nonzero(np.argsort(perms_arr.reshape(-1,4), axis=0)==0)
Out[187]: (array([ 0, 195054, 578037, 769366]), array([0, 1, 2, 3]))
I'm trying to insert elements to an empty 2d numpy array. However, I am not getting what I want.
I tried np.hstack but it is giving me a normal array only. Then I tried using append but it is giving me an error.
Error:
ValueError: all the input arrays must have same number of dimensions
randomReleaseAngle1 = np.random.uniform(20.0, 77.0, size=(5, 1))
randomVelocity1 = np.random.uniform(40.0, 60.0, size=(5, 1))
randomArray =np.concatenate((randomReleaseAngle1,randomVelocity1),axis=1)
arr1 = np.empty((2,2), float)
arr = np.array([])
for i in randomArray:
data = [[170, 68.2, i[0], i[1]]]
df = pd.DataFrame(data, columns = ['height', 'release_angle', 'velocity', 'holding_angle'])
test_y_predictions = model.predict(df)
print(test_y_predictions)
if (np.any(test_y_predictions == 1)):
arr = np.hstack((arr, np.array([i[0], i[1]])))
arr1 = np.append(arr1, np.array([i[0], i[1]]), axis=0)
print(arr)
print(arr1)
I wanted to get something like
[[1.5,2.2],
[3.3,4.3],
[7.1,7.3],
[3.3,4.3],
[3.3,4.3]]
However, I'm getting
[56.60290125 49.79106307 35.45102444 54.89380834 47.09359271 49.19881675
22.96523274 44.52753514 67.19027156 54.10421167]
The recommended list append approach:
In [39]: alist = []
In [40]: for i in range(3):
...: alist.append([i, i+10])
...:
In [41]: alist
Out[41]: [[0, 10], [1, 11], [2, 12]]
In [42]: np.array(alist)
Out[42]:
array([[ 0, 10],
[ 1, 11],
[ 2, 12]])
If we start with a empty((2,2)) array:
In [47]: arr = np.empty((2,2),int)
In [48]: arr
Out[48]:
array([[139934912589760, 139934912589784],
[139934871674928, 139934871674952]])
In [49]: np.concatenate((arr, [[1,10]],[[2,11]]), axis=0)
Out[49]:
array([[139934912589760, 139934912589784],
[139934871674928, 139934871674952],
[ 1, 10],
[ 2, 11]])
Note that empty does not mean the same thing as the list []. It's a real 2x2 array, with 'unspecified' values. And those values remain when we add other arrays to it.
I could start with an array with a 0 dimension:
In [51]: arr = np.empty((0,2),int)
In [52]: arr
Out[52]: array([], shape=(0, 2), dtype=int64)
In [53]: np.concatenate((arr, [[1,10]],[[2,11]]), axis=0)
Out[53]:
array([[ 1, 10],
[ 2, 11]])
That looks more like the list append approach. But why start with the (0,2) array in the first place?
np.concatenate takes a list of arrays (or lists that can be made into arrays). I used nested lists that make (1,2) arrays. With this I can join them on axis 0.
Each concatenate makes a new array. So if done iteratively it is more expensive than the list append.
np.append just takes 2 arrays and does a concatenate. So doesn't add much. hstack tweaks shapes and joins on the 2nd (horizontal) dimension. vstack is another variant. But they all end up using concatenate.
With the hstack method, you can just reshape after you get the final array:
arr = arr.reshape(-1, 2)
print(arr)
The other method can be more easily done in a similar way:
arr1 = np.append(arr1, np.array([i[0], i[1]]) # in the loop
arr1 = arr1.reshape(-1, 2)
print(arr1)
I have a numpy array (nxn matrix), and I would like to modify only the columns which sum is 0. And I would like to assign the same value to all of these columns.
To do that, I have first taken the index of the columns that sum to 0:
sum_lines = np.sum(mat_trans, axis = 0)
indices = np.where(sum_lines == 0)[0]
then I did a loop on those indices:
for i in indices:
mat_trans[:, i] = rank_vect
so that each of these columns now has the value of the rank_vect column vector.
I was wondering if there was a way to do this without loop, something that would look like:
mat_trans[:, (np.where(sum_lines == 0)[0]))] = rank_vect
Thanks!
In [114]: arr = np.array([[0,1,2,3],[1,0,2,-3],[-1,2,0,0]])
In [115]: sumlines = np.sum(arr, axis=0)
In [116]: sumlines
Out[116]: array([0, 3, 4, 0])
In [117]: idx = np.where(sumlines==0)[0]
In [118]: idx
Out[118]: array([0, 3])
So the columns that we want to modify are:
In [119]: arr[:,idx]
Out[119]:
array([[ 0, 3],
[ 1, -3],
[-1, 0]])
In [120]: rv = np.array([10,11,12])
If rv is 1d, we get a shape error:
In [121]: arr[:,idx] = rv
ValueError: shape mismatch: value array of shape (3,) could not be broadcast to indexing result of shape (2,3)
But if it is a column vector (shape (3,1)) it can be broadcast to the (3,2) target:
In [122]: arr[:,idx] = rv[:,None]
In [123]: arr
Out[123]:
array([[10, 1, 2, 10],
[11, 0, 2, 11],
[12, 2, 0, 12]])
This should do the trick
mat_trans[:,indices] = np.stack((rank_vect,)*indices.size,-1)
Please test and let me know if it does what you want. It just stacks the rank_vect repeatedly to match the shape of the LHS on the RHS.
I believe this is equivalent to
for i in indices:
mat_trans[:, i] = rank_vec
I'd be interested to know the speed difference
I have following problem when using NumPy:
Code:
import numpy as np
get_label = lambda x: 'SMALL' if x.sum() <= 10 else 'BIG'
arr = np.array([[1, 2], [30, 40]])
print np.apply_along_axis(get_label, 1, arr)
arr = np.array([[30, 40], [1, 2]])
print np.apply_along_axis(get_label, 1, arr)
Output:
['SMALL' 'BIG']
['BIG' 'SMA'] # String 'SMALL' is stripped!
I can see that NumPy in some way infers datatype from first value returned by function. I came up with following workaround - return NumPy array from function with explicitly stated dtype instead of string, and reshape the result:
def get_label_2(x):
if x.sum() <= 10:
return np.array(['SMALL'], dtype='|S5')
else:
return np.array(['BIG'], dtype='|S5')
arr = np.array([[30, 40], [1, 2]])
print np.apply_along_axis(get_label_2, 1, arr).reshape(arr.shape[0])
Do you know more elegant solutions for this problem?
You can use np.where:
arr1 = np.array([[1, 2], [30, 40]])
arr2 = np.array([[30, 40], [1, 2]])
print(np.where(arr1.sum(axis=1)<=10,'SMALL','BIG'))
print(np.where(arr2.sum(axis=1)<=10,'SMALL','BIG'))
['SMALL' 'BIG']
['BIG' 'SMALL']
In a function:
def get_label(x, threshold, axis=1, label1='SMALL', label2='BIG'):
return np.where(x.sum(axis=axis) <= threshold, label1, label2)
apply_along_axis is not an elegant solution; it's convenient, but not fast. Essentially it does
In [277]: get_label = lambda x: 'SMALL' if x.sum() <= 10 else 'BIG'
In [279]: np.array([get_label(row) for row in np.array([[30,40],[1,2]])])
Out[279]:
array(['BIG', 'SMALL'],
dtype='<U5')
In [280]: res = np.zeros((2,),dtype='S5')
In [281]: arr = np.array([[30,40],[1,2]])
In [282]: for i in range(2):
...: res[i] = get_label(arr[i,:])
...:
In [283]: res
Out[283]:
array([b'BIG', b'SMALL'],
dtype='|S5')
except it generalizes the shape and deduces the res dtype.
With a simple 'iterate over rows' case like this you could just as well do:
In [278]: np.array([get_label(row) for row in np.array([[1,2],[30,40]])])
Out[278]:
array(['SMALL', 'BIG'],
dtype='<U5')
In [279]: np.array([get_label(row) for row in np.array([[30,40],[1,2]])])
Out[279]:
array(['BIG', 'SMALL'],
dtype='<U5')
The elegant solution is to avoid the Python level loops, explicit or hidden, using instead compiled array methods like giving sum an axis:
In [284]: arr.sum(axis=1)
Out[284]: array([70, 3])
Given a numpy array of size (n,) how do you transform it to a numpy array of size (n,1).
The reason is because I am trying to matrix multiply to numpy arrays of size (n,) and (,n) to get a (n,n) but when I do:
numpy.dot(a,b.T)
It says that you can't do it. I know as a fact that transposing a (n,) does nothing, so it would just be nice to change the (n,) and make them (n,1) and avoid this problem all together.
Use reshape (-1,1) to reshape (n,) to (n,1), see detail examples:
In [1]:
import numpy as np
A=np.random.random(10)
In [2]:
A.shape
Out[2]:
(10,)
In [3]:
A1=A.reshape(-1,1)
In [4]:
A1.shape
Out[4]:
(10, 1)
In [5]:
A.T
Out[5]:
array([ 0.6014423 , 0.51400033, 0.95006413, 0.54321892, 0.2150995 ,
0.09486603, 0.54560678, 0.58036358, 0.99914564, 0.09245124])
In [6]:
A1.T
Out[6]:
array([[ 0.6014423 , 0.51400033, 0.95006413, 0.54321892, 0.2150995 ,
0.09486603, 0.54560678, 0.58036358, 0.99914564, 0.09245124]])
You can use None for dimensions that you want to be treated as degenerate.
a = np.asarray([1,2,3])
a[:]
a[:, None]
In [48]: a
Out[48]: array([1, 2, 3])
In [49]: a[:]
Out[49]: array([1, 2, 3])
In [50]: a[:, None]
Out[50]:
array([[1],
[2],
[3]])