Removing multiple values from ndarray at random - python

I need to remove multiple elements, specifically 11 samples from a Numpy array object with shape (5891, 10) so that when converted to 3d array, its second dimension = 6 in the resultant shape (-1, 6, 10). Need some help in this regard.
array([[-0.0296606 , -0.86639415, 1.31166578, ..., -0.56398655,
-0.62098712, -0.60561292],
[-0.08361501, -0.8338129 , 1.59085632, ..., -0.44607017,
-0.51810143, -0.73432292],
[-0.56023046, -0.90793786, 1.70571559, ..., -0.53988458,
0.16418027, -0.62065893],
...,
[ 0.08385978, -0.85598757, 2.09466405, ..., -0.53553566,
-0.41929891, -0.67636976],
[-0.1878731 , -0.8483329 , 1.93933521, ..., -0.66563641,
-0.43016374, -0.63886954],
[-0.06811212, -0.9358068 , 0.99574035, ..., -0.62080424,
-0.1695455 , -0.8211152 ]])

arr = np.random.random((5891, 10))
# set a static seed if you want reproducability of the choices
rng = np.random.default_rng(seed=42)
# choose all but 11 rows
chosen = rng.choice(arr, size=arr.shape[0] - 11, replace=False, axis=0)
# and reshape
out = chosen.reshape((-1, 6, 10))

Related

How to make a ufunc output a matrix given two array_like operands (instead of trying to broadcast them)?

I would like to get a matrix of values given two ndarray's from a ufunc, for example:
degs = numpy.array(range(5))
pnts = numpy.array([0.0, 0.1, 0.2])
values = scipy.special.eval_chebyt(degs, pnts)
The above code doesn't work (it gives a ValueError because it tries to broadcast two arrays and fails since they have different shapes: (5,) and (3,)); I would like to get a matrix of values with rows corresponding to degrees and columns to points at which polynomials are evaluated (or vice versa, it doesn't matter).
Currently my workaround is simply to use for-loop:
values = numpy.zeros((5,3))
for j in range(5):
values[j] = scipy.special.eval_chebyt(j, pnts)
Is there a way to do that? In general, how would you let a ufunc know you want an n-dimensional array if you have n array_like arguments?
I know about numpy.vectorize, but that seems neither faster nor more elegant than just a simple for-loop (and I'm not even sure you can apply it to an existent ufunc).
UPDATE What about ufunc's that receive 3 or more parameters? trying outer method gives a ValueError: outer product only supported for binary functions. For example, scipy.special.eval_jacobi.
What you need is exactly the outer method of ufuncs:
ufunc.outer(A, B, **kwargs)
Apply the ufunc op to all pairs (a, b) with a in A and b in B.
values = scipy.special.eval_chebyt.outer(degs, pnts)
#array([[ 1. , 1. , 1. ],
# [ 0. , 0.1 , 0.2 ],
# [-1. , -0.98 , -0.92 ],
# [-0. , -0.296 , -0.568 ],
# [ 1. , 0.9208, 0.6928]])
UPDATE
For more parameters, you must broadcast by hand. meshgrid often help for that,spanning each parameter in a dimension. For exemple :
n=3
alpha = numpy.array(range(5))
beta = numpy.array(range(3))
x = numpy.array(range(2))
data = numpy.meshgrid(n,alpha,beta,x)
values = scipy.special.eval_jacobi(*data)
Reshape the input arguments for broadcasting. In this case, change the shape of degs to be (5, 1) instead of just (5,). The shape (5, 1) broadcast with the shape (3,) results in the shape (5, 3):
In [185]: import numpy as np
In [186]: import scipy.special
In [187]: degs = np.arange(5).reshape(-1, 1) # degs has shape (5, 1)
In [188]: pnts = np.array([0.0, 0.1, 0.2])
In [189]: values = scipy.special.eval_chebyt(degs, pnts)
In [190]: values
Out[190]:
array([[ 1. , 1. , 1. ],
[ 0. , 0.1 , 0.2 ],
[-1. , -0.98 , -0.92 ],
[-0. , -0.296 , -0.568 ],
[ 1. , 0.9208, 0.6928]])

How to split a 3D matrix into 3D matrices lined up in a list?

I have a NumPy array with the following shape:
(1532, 2036, 5)
I would like to generate a list of arrays where each one has the following shape:
(1532, 2036)
You can use Ellipsis to signify all dimensions up to the last. For example:
arr = np.random.rand(4, 3, 2)
arr
array([[[ 0.35235813, 0.57984153],
[ 0.53743048, 0.46753367],
[ 0.80048303, 0.07982378]],
[[ 0.1339381 , 0.84586721],
[ 0.81425027, 0.41086151],
[ 0.34039991, 0.19972737]],
[[ 0.2112466 , 0.73086434],
[ 0.03755819, 0.40113463],
[ 0.74622891, 0.74695994]],
[[ 0.99313615, 0.65634951],
[ 0.90787642, 0.37387861],
[ 0.8738962 , 0.41747727]]])
The list of the last dimension arrays can be constructed as #Usernamenotfound mentioned or with Ellipsis like so:
[arr[..., i] for i in range(arr.shape[-1])]
[array([[ 0.35235813, 0.53743048, 0.80048303],
[ 0.1339381 , 0.81425027, 0.34039991],
[ 0.2112466 , 0.03755819, 0.74622891],
[ 0.99313615, 0.90787642, 0.8738962 ]]),
array([[ 0.57984153, 0.46753367, 0.07982378],
[ 0.84586721, 0.41086151, 0.19972737],
[ 0.73086434, 0.40113463, 0.74695994],
[ 0.65634951, 0.37387861, 0.41747727]])]
Each element has the shape (4, 3).
Likewise you could so the same for the first dimension, making 4 (3, 2) arrays.
[arr[i, ...] for i in range(arr.shape[0])]
[array([[ 0.35235813, 0.57984153],
[ 0.53743048, 0.46753367],
[ 0.80048303, 0.07982378]]), array([[ 0.1339381 , 0.84586721],
[ 0.81425027, 0.41086151],
[ 0.34039991, 0.19972737]]), array([[ 0.2112466 , 0.73086434],
[ 0.03755819, 0.40113463],
[ 0.74622891, 0.74695994]]), array([[ 0.99313615, 0.65634951],
[ 0.90787642, 0.37387861],
[ 0.8738962 , 0.41747727]])]
You can also permute the axes with numpy.transpose then simply iterate through the array:
import numpy as np
a = ... # Define the input array here
out = [a for a in np.transpose(arr, (2, 0, 1))]
You can slice the 3D array using
[x[:,:,i] for i in range(5)]
The above would give you a list of 2D arrays.
The same process can be scaled for multidimensional arrays

Create 3D array from multiple 2D arrays

I have two monthly gridded data sets which I want to compare later.
The input looks like this for both data and that is also how I want the output.
In[4]: data1.shape
Out[4]: (444, 72, 144)
In[5]: gfz.shape
Out[5]: (155, 72, 144)
In[6]: data1
Out[6]:
array([[[ 0.98412287, 0.96739882, 0.91172796, ..., 1.12651634,
1.0682013 , 1.07681048],
[ 1.47803092, 1.44721365, 1.49585509, ..., 1.58934438,
1.66956687, 1.57198083],
[ 0.68730044, 0.76112831, 0.78218687, ..., 0.92582172,
1.07873237, 0.87490368],
...,
[ 1.00752461, 1.00758123, 0.99440521, ..., 0.94128627,
0.88981551, 0.93984401],
[ 1.03467119, 1.02640462, 0.91580886, ..., 0.88302392,
0.99204206, 0.96396238],
[ 0.8280431 , 0.82936555, 0.82637453, ..., 0.92009377,
0.77890259, 0.81065702]],
...,
[[-0.12173297, -0.06624345, -0.02809682, ..., -0.04522502,
-0.11502996, -0.22779272],
[-0.61080372, -0.61958522, -0.52239478, ..., -0.6775983 ,
-0.79460669, -0.70022893],
[-0.12011283, -0.10849079, 0.096185 , ..., -0.45782232,
-0.39763898, -0.31247514],
...,
[ 0.90601307, 0.88580155, 0.90268403, ..., 0.86414611,
0.87041426, 0.86274058],
[ 1.46445823, 1.31938004, 1.37585044, ..., 1.51378822,
1.48515761, 1.49078977],
[ 0.29749078, 0.22273554, 0.27161494, ..., 0.43205476,
0.43777165, 0.36340511]],
[[ 0.41008961, 0.44208974, 0.40928891, ..., 0.45899671,
0.39472976, 0.36803097],
[-0.13514084, -0.17332518, -0.11183424, ..., -0.22284794,
-0.2532815 , -0.15402752],
[ 0.28614867, 0.33750001, 0.48767376, ..., 0.01886483,
0.07220326, 0.17406547],
...,
[ 1.0551219 , 1.09540403, 1.19031584, ..., 1.09203815,
1.07658005, 1.08363533],
[ 1.54310501, 1.49531853, 1.56107259, ..., 1.57243073,
1.5867976 , 1.57728028],
[ 1.1034857 , 0.98658448, 1.14141166, ..., 0.97744882,
1.13562942, 1.08589089]],
[[ 1.02020931, 0.99780071, 0.87209344, ..., 1.11072564,
1.01270151, 0.9222675 ],
[ 0.93467152, 0.81068456, 0.68190312, ..., 0.95696563,
0.84669352, 0.84596157],
[ 0.97022212, 0.94228816, 0.97413743, ..., 1.06613588,
1.08708596, 1.04224277],
...,
[ 1.21519053, 1.23492992, 1.2802881 , ..., 1.33915019,
1.32537413, 1.27963519],
[ 1.32051706, 1.28170252, 1.36266208, ..., 1.29100537,
1.38395023, 1.34622073],
[ 0.86108029, 0.86364979, 0.88489276, ..., 0.81707358,
0.82471925, 0.83550251]]], dtype=float32)
So both have the same spatial resolution of 144x72 but different length of time.
As one of them has some missing months, I made sure that only the months are selected were both have data. So I created a two dimensional array where the data is stored according to their longitude and latitude value if both data sets contain this month. In the end I want to have a three dimensional array for data1 and data2 of the same length.
3Darray_data1 =[]
3Darray_data2=[]
xy_data1=[[0 for i in range(len(lons_data1))] for j in range(len(lats_data1))]
xy_data2=[[0 for i in range(len(lons_data2))] for j in range(len(lats_data2))]
# comparing the time steps
for i in range(len(time_data1)):
for j in range(len(time_data2)):
if time_data1.year[i] == time_data2[j].year and time_data1[i].month==time_data2[j].month:
# loop for data1 which writes the data into a 2D array
for x in range(len(lats_data1)):
for y in range(len(lons_data1)):
xy_data1[x][y]=data1[j,0,x,y]
# append to get an array of arrays
xy_data1 = np.squeeze(np.asarray(xy_data1))
3Darray_data1 = np.append(3Darray_data1,[xy_data1])
# loop for data2 which writes the data into a 2D array
for x in range(len(lats_data2)):
for y in range(len(lons_data2)):
xy_data2[x][y]=data2[i,x,y]
# append to get an array of arrays
xy_data2 = np.squeeze(np.asarray(xy_data2))
3Darray_data2 = np.append(3Darray_data2,[xy_data2])
The script runs without an error, however, I only get a really long 1D array.
In[3]: 3Darray_data1
Out[3]: array([ nan, nan, nan, ..., 0.81707358,
0.82471925, 0.83550251])
How can I arrange it to a three dimensional array?
For me I got it working with the following.
I defined the three dimensional array with the fixed dimension of the longitude and latitude and an undefined length of the time axis.
temp_data1 = np.zeros((0,len(lats_data1),len(lons_data1)))
And then I appended two dimensional outputs along the time axis.
3Darray = np.append(3Darray,xy_data1[np.newaxis,:,:],axis=0)

Assign values from 1d NumPy into classes

If I have a 1d array:
arr = np.array([ 5.243618, 5.219185, 4.755633, 5.685147, 5.2342 , 6.06918 ,
5.324837, 4.857919, 5.768971, 4.310884, 4.442189, 4.883281,
4.591852, 5.8325 , 5.865175, 5.642187, 5.941979, 6.30038 ,
6.475276, 4.598086, 5.822819, 5.938378, 6.271719, 5.465492,
4.230573, 4.331199, 4.912246, 4.878696, 5.393229, 4.857071,
4.95928 , 4.83672 , 5.530075, 4.233449, 5.591468, 4.546228,
4.710242, 4.880406, 4.279519, 4.461141, 6.168588, 6.074305,
5.720245, 6.127273, 5.79335 , 6.176584, 5.04695 , 5.80022 ,
5.899088, 5.925466, 5.095225, 6.33216 , 6.335905, 3.918357,
4.703728, 4.605504, 5.216878, 6.144148, 4.883721, 5.601009,])
and a list containing upper bounds:
bins = [4.9122459999999997, 5.3932289999999998, 5.7202450000000002, 6.0743049999999998, 6.475276]
I'd like to return an array of equal size to arr, containing the bin number for each value (1, 1, 0, 2, 1, 3, 1 etc.)
I've tried np.split() with the bins (patently wrong), but I can't find a simple method to do this.
You can use numpy digitize method to bin your data into bins:
np.digitize(arr, bins)
The output contains the bin that each data point belong to. See doc here: LINK

Force numpy to keep a list a list

x2_Kaxs is an Nx3 numpy array of lists, and the elements in those lists index into another array. I want to end up with an Nx3 numpy array of lists of those indexed elements.
x2_Kcids = array([ ax2_cid[axs] for axs in x2_Kaxs.flat ], dtype=object)
This outputs a (N*3)x1 array of numpy arrays. great. that almost works for what I want. All I need to do is reshape it.
x2_Kcids.shape = x2_Kaxs.shape
And this works.x2_Kcids becomes an Nx3 array of numpy arrays. Perfect.
Except all the lists in x2_Kaxs only have one element in them. Then it flattens
it into an Nx3 array of integers, and my code expects a list later in the pipeline.
One solution I came up with was to append a dummy element and then pop it off, but that is very ugly. Is there anything nicer?
Your problem is not really about lists of size 1, it is about list all of the same size. I have created this dummy samples:
ax2_cid = np.random.rand(10)
shape = (10, 3)
x2_Kaxs = np.empty((10, 3), dtype=object).reshape(-1)
for j in xrange(x2_Kaxs.size):
x2_Kaxs[j] = [random.randint(0, 9) for k in xrange(random.randint(1, 5))]
x2_Kaxs.shape = shape
x2_Kaxs_1 = np.empty((10, 3), dtype=object).reshape(-1)
for j in xrange(x2_Kaxs.size):
x2_Kaxs_1[j] = [random.randint(0, 9)]
x2_Kaxs_1.shape = shape
x2_Kaxs_2 = np.empty((10, 3), dtype=object).reshape(-1)
for j in xrange(x2_Kaxs_2.size):
x2_Kaxs_2[j] = [random.randint(0, 9) for k in xrange(2)]
x2_Kaxs_2.shape = shape
If we run your code on these three, the return has the following shapes:
>>> np.array([ax2_cid[axs] for axs in x2_Kaxs.flat], dtype=object).shape
(30,)
>>> np.array([ax2_cid[axs] for axs in x2_Kaxs_1.flat], dtype=object).shape
(30, 1)
>>> np.array([ax2_cid[axs] for axs in x2_Kaxs_2.flat], dtype=object).shape
(30, 2)
And the case with all lists of length 2 won't even let you reshape to (n, 3). The problem is that, even with dtype=object, numpy tries to numpify your input as much as possible, which is all the way down to individual elements if all lists are of the same length. I think that your best bet is to preallocate your x2_Kcids array:
x2_Kcids = np.empty_like(x2_Kaxs).reshape(-1)
shape = x2_Kaxs.shape
x2_Kcids[:] = [ax2_cid[axs] for axs in x2_Kaxs.flat]
x2_Kcids.shape = shape
EDIT Since unubtu's answer is no longer visible, I am going to steal from him. The code above can be much more nicely and compactly written as:
x2_Kcids = np.empty_like(x2_Kaxs)
x2_Kcids.ravel()[:] = [ax2_cid[axs] for axs in x2_Kaxs.flat]
With the above example of single item lists:
>>> x2_Kcids_1 = np.empty_like(x2_Kaxs_1).reshape(-1)
>>> x2_Kcids_1[:] = [ax2_cid[axs] for axs in x2_Kaxs_1.flat]
>>> x2_Kcids_1.shape = shape
>>> x2_Kcids_1
array([[[ 0.37685372], [ 0.95328117], [ 0.63840868]],
[[ 0.43009678], [ 0.02069558], [ 0.32455781]],
[[ 0.32455781], [ 0.37685372], [ 0.09777559]],
[[ 0.09777559], [ 0.37685372], [ 0.32455781]],
[[ 0.02069558], [ 0.02069558], [ 0.43009678]],
[[ 0.32455781], [ 0.63840868], [ 0.37685372]],
[[ 0.63840868], [ 0.43009678], [ 0.25532799]],
[[ 0.02069558], [ 0.32455781], [ 0.09777559]],
[[ 0.43009678], [ 0.37685372], [ 0.63840868]],
[[ 0.02069558], [ 0.17876822], [ 0.17876822]]], dtype=object)
>>> x2_Kcids_1[0, 0]
array([ 0.37685372])
Similar to #Denis:
if x.ndim == 2:
x.shape += (1,)

Categories