numpy.append how to use - python

I would like to append my numpy array in a loop. In the begining my numpy array is empty.
x = np.array([])
I would like to append x with 3 element long array in order to get Mx3 matrix, but my array is appending in one dimension... What's wrong?
In [166]: x = np.array([])
In [167]: a
Out[167]: array([248, 249, 250])
In [168]: x = np.append(x,a, axis=0)
In [169]: x
Out[169]: array([ 248., 249., 250.])
In [170]: x = np.append(x,a, axis=0)
In [171]: x
Out[171]: array([ 248., 249., 250., 248., 249., 250.])

Use vstack:
In [51]: x = np.array([])
In [52]: a= np.array([248, 249, 250])
In [53]: x = np.append(x,a, axis=0)
In [54]: np.vstack((x,a))
Out[54]:
array([[ 248., 249., 250.],
[ 248., 249., 250.]])
Not sure what way you are using this but I doubt you need to use np.append(x,a, axis=0) at all. Just set x=a then vstack.

What's wrong is that your initial x is one-dimensional. See:
z = np.array([])
z.shape
# (0,)
np.ndim(z)
# 1
So if you np.append to x you will always end up with a one-dimensional array, i.e. a vector. Note that in Numpy one-dimensional arrays are row-vectors.
To use np.append you could start with a 2D array like so. Also, the array you append must have the same number of dimensions as the array you append to.
z = np.array([]).reshape((0,3))
a = np.array(248, 249, 250)
a2d = a.reshape(1, 3)
# a2d = np.atleast_2d(a)
# a2d = a[None, :]
# a2d = a[np.newaxis, :]
z = np.append(z, a2d, axis=0)

Related

Best way to expand_dim and repeat a numpy array

I have an array that is of shape (1,5).
arr = np.arange(5)
arr = np.expand_dims(arr, axis=0)
I want to make an array that is of shape (1,4,5) with each value divided by 4. Is there another way to do this besides from the below solution?
arr2 = np.expand_dims(arr, axis=1)
arr2 = np.repeat(arr2, 4, axis=1)
arr2 = arr2*0.25
You can use np.broadcast_to
arr2 = np.broadcast_to(arr, shape=(1, 4, 5))
arr2 = arr2 * 0.25
or as suggested in the comments
arr2 = np.broadcast_to(arr * 0.25, shape=(1, 4, 5))

How can I manipulate a numpy array without nested loops?

If I have a MxN numpy array denoted arr, I wish to index over all elements and adjust the values like so
for m in range(arr.shape[0]):
for n in range(arr.shape[1]):
arr[m, n] += x**2 * np.cos(m) * np.sin(n)
Where x is a random float.
Is there a way to broadcast this over the entire array without needing to loop? Thus, speeding up the run time.
You are just adding zeros, because sin(2*pi*k) = 0 for integer k.
However, if you want to vectorize this, the function np.meshgrid could help you.
Check the following example, where I removed the 2 pi in the trigonometric functions to add something unequal zero.
x = 2
arr = np.arange(12, dtype=float).reshape(4, 3)
n, m = np.meshgrid(np.arange(arr.shape[1]), np.arange(arr.shape[0]), sparse=True)
arr += x**2 * np.cos(m) * np.sin(n)
arr
Edit: use the sparse argument to reduce memory consumption.
You can use nested generators of two-dimensional arrays:
import numpy as np
from random import random
x = random()
n, m = 10,20
arr = [[x**2 * np.cos(2*np.pi*j) * np.sin(2*np.pi*i) for j in range(m)] for i in range(n)]
In [156]: arr = np.ones((2, 3))
Replace the range with arange:
In [157]: m, n = np.arange(arr.shape[0]), np.arange(arr.shape[1])
And change the first array to (2,1) shape. A (2,1) array broadcasts with a (3,) to produce a (2,3) result.
In [158]: A = 0.23**2 * np.cos(m[:, None]) * np.sin(n)
In [159]: A
Out[159]:
array([[0. , 0.04451382, 0.04810183],
[0. , 0.02405092, 0.02598953]])
In [160]: arr + A
Out[160]:
array([[1. , 1.04451382, 1.04810183],
[1. , 1.02405092, 1.02598953]])
The meshgrid suggested in the accepted answer does the same thing:
In [161]: np.meshgrid(m, n, sparse=True, indexing="ij")
Out[161]:
[array([[0],
[1]]),
array([[0, 1, 2]])]
This broadcasting may be clearer with:
In [162]: m, n
Out[162]: (array([0, 1]), array([0, 1, 2]))
In [163]: m[:, None] * 10 + n
Out[163]:
array([[ 0, 1, 2],
[10, 11, 12]])

can't reverse reshaped numpy array

I want to reverse reshaped numpy by calling reshape again on the array to reshape it into the original dimensions.
I have an array trian_x with dimensions (x, y, z) then I reshape train_x
train_X_1 = train_X.reshape(train_X.shape[0], train_X.shape[1] * train_X.shape[2])
then I want to reverse the reshaped
train_X_2 = train_X_1.reshape((train_X.shape[0], train_X.shape[1], train_X.shape[2])
when I compare
print((train_X_2 == train_X).all())
I get False
what's wrong with my code? thanks
Are you just trying this:
In [184]: x = np.arange(24).reshape(2,3,4)
In [185]: x1 = x.reshape(2,12)
In [186]: x2 = x1.reshape(2,3,4)
In [187]: np.allclose(x,x2)
Out[187]: True
What's your dtype? allclose is better for floats.
In [218]: data = np.load('../Downloads/train_X.npy')
In [219]: data.shape
Out[219]: (97848, 20, 2)
In [220]: data.dtype
Out[220]: dtype('float64')
In [221]: data1 = data.reshape(data.shape[0], data.shape[1]*data.shape[2])
In [222]: data1.shape
Out[222]: (97848, 40)
In [223]: data2 = data1.reshape(data.shape)
In [224]: data2.shape
Out[224]: (97848, 20, 2)
In [225]: np.allclose(data, data2)
Out[225]: False
In [226]: np.max(np.abs(data - data2))
Out[226]: nan
In [247]: np.isnan(data).sum()
Out[247]: 2514
In [248]: np.isnan(data2).sum()
Out[248]: 2514
There's your problem - the array contains nan, which don't test ==. Let's compare without those nan:
In [251]: np.allclose(np.nan_to_num(data),np.nan_to_num(data2))
Out[251]: True
It sounds like you want to flatten, then reverse, then reshape.
starting with an array:
import numpy as np
arr = np.arange(6).reshape((2,3)) #[[0, 1, 2,], [3, 4, 5]]
We can flatten into a 1D array using ravel
arr = arr.ravel() #[0,1,2,3,4,5]
We can then reverse the order
arr = arr[::-1] #[5,4,3,2,1,0]
Then we reshape it
arr.reshape(2,3) #[[5, 4, 3], [2, 1, 0]]
Altogether:
import numpy as np
arr = np.arange(6).reshape((2,3))
arr = arr.ravel()[::-1].reshape(2,3)
print(arr)

How to add element to empty 2d numpy array

I'm trying to insert elements to an empty 2d numpy array. However, I am not getting what I want.
I tried np.hstack but it is giving me a normal array only. Then I tried using append but it is giving me an error.
Error:
ValueError: all the input arrays must have same number of dimensions
randomReleaseAngle1 = np.random.uniform(20.0, 77.0, size=(5, 1))
randomVelocity1 = np.random.uniform(40.0, 60.0, size=(5, 1))
randomArray =np.concatenate((randomReleaseAngle1,randomVelocity1),axis=1)
arr1 = np.empty((2,2), float)
arr = np.array([])
for i in randomArray:
data = [[170, 68.2, i[0], i[1]]]
df = pd.DataFrame(data, columns = ['height', 'release_angle', 'velocity', 'holding_angle'])
test_y_predictions = model.predict(df)
print(test_y_predictions)
if (np.any(test_y_predictions == 1)):
arr = np.hstack((arr, np.array([i[0], i[1]])))
arr1 = np.append(arr1, np.array([i[0], i[1]]), axis=0)
print(arr)
print(arr1)
I wanted to get something like
[[1.5,2.2],
[3.3,4.3],
[7.1,7.3],
[3.3,4.3],
[3.3,4.3]]
However, I'm getting
[56.60290125 49.79106307 35.45102444 54.89380834 47.09359271 49.19881675
22.96523274 44.52753514 67.19027156 54.10421167]
The recommended list append approach:
In [39]: alist = []
In [40]: for i in range(3):
...: alist.append([i, i+10])
...:
In [41]: alist
Out[41]: [[0, 10], [1, 11], [2, 12]]
In [42]: np.array(alist)
Out[42]:
array([[ 0, 10],
[ 1, 11],
[ 2, 12]])
If we start with a empty((2,2)) array:
In [47]: arr = np.empty((2,2),int)
In [48]: arr
Out[48]:
array([[139934912589760, 139934912589784],
[139934871674928, 139934871674952]])
In [49]: np.concatenate((arr, [[1,10]],[[2,11]]), axis=0)
Out[49]:
array([[139934912589760, 139934912589784],
[139934871674928, 139934871674952],
[ 1, 10],
[ 2, 11]])
Note that empty does not mean the same thing as the list []. It's a real 2x2 array, with 'unspecified' values. And those values remain when we add other arrays to it.
I could start with an array with a 0 dimension:
In [51]: arr = np.empty((0,2),int)
In [52]: arr
Out[52]: array([], shape=(0, 2), dtype=int64)
In [53]: np.concatenate((arr, [[1,10]],[[2,11]]), axis=0)
Out[53]:
array([[ 1, 10],
[ 2, 11]])
That looks more like the list append approach. But why start with the (0,2) array in the first place?
np.concatenate takes a list of arrays (or lists that can be made into arrays). I used nested lists that make (1,2) arrays. With this I can join them on axis 0.
Each concatenate makes a new array. So if done iteratively it is more expensive than the list append.
np.append just takes 2 arrays and does a concatenate. So doesn't add much. hstack tweaks shapes and joins on the 2nd (horizontal) dimension. vstack is another variant. But they all end up using concatenate.
With the hstack method, you can just reshape after you get the final array:
arr = arr.reshape(-1, 2)
print(arr)
The other method can be more easily done in a similar way:
arr1 = np.append(arr1, np.array([i[0], i[1]]) # in the loop
arr1 = arr1.reshape(-1, 2)
print(arr1)

Numpy - Matrix Generation from list comprehension

I have the following list of np.array:
dataset = [np.random.normal(r_mean/(p*t), r_vol/t/np.sqrt(p), n) \
for t in rule]
I want to transform it into an 2D np.array (ie. a matrix). I could use np.asarray, but (I believe) it would be inefficient.
Also, each np.random.normal(r_mean/(p*t), r_vol/t/np.sqrt(p), n) is meant to be a column of the resulting matrix, not a row (ie. I'd have to transpose np.asarray(dataset)).
What is the best way of achieving the result ?
You can use broadcasting to create dataset with a single call to numpy.random.normal. Instead of using a list comprehension, make rule a numpy array and use it where you have t in your expression, and request a sample with size (n, len(rule)):
In [66]: r_mean = 1.0
In [67]: r_vol = 3.0
In [68]: p = 2.0
In [69]: rule = np.array([1.0, 100.0, 10000.0])
In [70]: n = 8
In [71]: dataset = np.random.normal(r_mean/(p*rule), r_vol/rule/np.sqrt(p), size=(n, len(rule)))
In [72]: dataset
Out[72]:
array([[ 7.44295301e-01, -1.57786106e-03, -1.85518458e-04],
[ -2.37293991e+00, -2.27875859e-02, 3.38182239e-04],
[ 2.01362974e+00, 5.93566418e-02, -3.00178175e-04],
[ 2.52533022e+00, 8.15380813e-03, 1.82511343e-04],
[ 7.32980563e-01, 2.67511372e-02, -1.95965258e-04],
[ 2.91958598e+00, -1.36314059e-02, 2.45200175e-04],
[ -4.43329724e+00, -5.85052629e-02, -1.75796458e-04],
[ -2.45005431e-01, -1.68543495e-02, 1.69715542e-04]])
If you are unsure that the columns correctly match the parameters, we can test a large sample:
In [73]: n = 100000
Create mu and std so we can see the requested means and standard deviations:
In [74]: mu = r_mean/(p*rule)
In [75]: std = r_vol/rule/np.sqrt(p)
Generate the data:
In [76]: dataset = np.random.normal(mu, std, size=(n, len(rule)))
Here's the mu that we requested:
In [77]: mu
Out[77]: array([ 5.00000000e-01, 5.00000000e-03, 5.00000000e-05])
And here's what we got in the sample:
In [78]: dataset.mean(axis=0)
Out[78]: array([ 4.95672937e-01, 5.08624034e-03, 5.02922664e-05])
Here are the desired standard deviations:
In [79]: std
Out[79]: array([ 2.12132034e+00, 2.12132034e-02, 2.12132034e-04])
And here's what we got:
In [80]: dataset.std(axis=0)
Out[80]: array([ 2.11258192e+00, 2.12437161e-02, 2.11784163e-04])
ds = np.empty((dataset[0].size, len(dataset)), dtype=dataset[0].dtype)
for i in range(ds.shape[1]):
ds[:, i] = dataset[i]
but only do that if you must precompute the dataset list first.
Else use a generator:
ds = np.empty((n, len(rule)))
dataset = (np.random.normal(r_mean/(p*t), r_vol/t/np.sqrt(p), n) for t in rule)
for i, d in enumerate(dataset):
ds[:, i] = d

Categories