Passing multiple slices to a numpy array - python
I have a 2D NumPy array. I want to slice out unequal length subsets of columns and put them in a single array with the rest of the values being filled by nan. That is to say in:
data = np.random.normal(size=(100,4))
I want to index from [75, 33, 42, 54] to end. That is to say row index from 75 to the end in column 0, row 33 to the end in column 1 and so on.
I tried data[[slice(75,100),slice(33,100)],:] but it didn't work.
You can do it by creating a mask, with True for the indices you want to be np.nan and False otherwise:
import numpy as np
data = np.random.normal(size=(5,4))
b = np.array([0, 1, 2, 3])
mask = np.arange(len(data))[:, None] < b
data[mask] = np.nan
data
Output:
array([[-0.53306108, nan, nan, nan],
[ 1.32282687, 0.83204007, nan, nan],
[-1.07143908, 0.12972517, -0.4783274 , nan],
[ 0.39686727, -1.20532247, -2.17043218, 0.74859079],
[ 1.82548696, 0.98669461, -1.17961517, -0.7813723 ]])
To get repeatable results, I seeded the random generator:
np.random.seed(0)
and then created the source array:
data = np.random.normal(size=(100,4))
The list of starting indices I defined as:
ind = [75, 33, 42, 54]
An initial step is to compute the max length of the output column:
ml = max([ 100 - i for i in ind ])
Then you can generate the output array as:
result = np.array([ np.pad(data[rInd:, i[0]], (0, ml - 100 + rInd),
constant_values = np.nan) for i, rInd in np.ndenumerate(ind) ]).T
This code:
takes the required slice of the source array (data[rInd:, i[0]]),
pads it with the requered number of NaN values,
creates a Numpy array (so far each row contains what the target column should contain),
so the only remaining step is to transpose this array.
The result, for my source data, is:
array([[-1.30652685, 0.03183056, 0.92085882, -0.04225715],
[ 0.66638308, -0.20829876, -1.03424284, 0.48148147],
[ 0.69377315, 0.4393917 , -0.4555325 , 0.23218104],
[-1.12682581, 0.94447949, -0.6436184 , -0.49331988],
[-0.04217145, -0.4615846 , -1.10438334, 0.7811981 ],
[-0.71960439, -0.82643854, -1.29285691, 0.67690804],
[-1.15735526, -1.07993151, 0.52327666, -0.29779088],
[-0.70470028, 1.92953205, 2.16323595, 1.07961859],
[ 0.77325298, 0.84436298, 1.0996596 , -0.57578797],
[-1.75589058, 0.31694261, -0.02432612, 0.69474914],
[ 1.0685094 , -0.65102559, 0.91017891, 0.61037938],
[-0.44092263, -0.68954978, -0.94444626, -0.0525673 ],
[ 0.5785215 , -1.37495129, 2.25930895, 0.08842209],
[ 1.36453185, -1.60205766, -0.46359597, -2.77259276],
[-1.84306955, 1.5430146 , 0.15650654, -0.39095338],
[ 0.69845715, -1.1680935 , -1.42406091, 2.06449286],
[-0.01568211, 0.82350415, -1.15618243, 1.53637705],
[-0.26773354, -0.23937918, 0.42625873, 1.21114529],
[ 0.84163126, -1.61695604, -0.13288058, -0.48102712],
[ 0.64331447, -0.09815039, 1.15233156, 1.13689136],
[-1.69810582, -0.4664191 , 0.52106488, 0.37005589],
[ 0.03863055, 0.37915174, 0.69153875, -0.6801782 ],
[ 1.64813493, -0.34598178, -1.5829384 , -1.34671751],
[-0.35343175, 0.06326199, -0.59631404, 1.07774381],
[ 0.85792392, -0.23792173, 0.52389102, 0.09435159],
[ nan, 0.41605005, 0.39904635, -0.10730528],
[ nan, -2.06998503, -0.65240858, -0.89091508],
[ nan, -0.39727181, -2.03068447, 2.2567235 ],
[ nan, -1.67600381, -0.69204985, -1.18894496],
[ nan, -1.46642433, -1.04525337, 0.60631952],
[ nan, -0.31932842, -0.62808756, 1.6595508 ],
[ nan, -1.38336396, -0.1359497 , -1.2140774 ],
[ nan, -0.50681635, -0.39944903, 0.15670386],
[ nan, 0.1887786 , -0.11816405, -1.43779147],
[ nan, 0.09740017, -1.33425847, -0.52118931],
[ nan, 0.39009332, -0.13370156, 0.6203583 ],
[ nan, -0.11610394, -0.38487981, 0.33996498],
[ nan, 1.02017271, -0.0616264 , -0.39484951],
[ nan, 0.60884383, 0.27451636, -0.99312361],
[ nan, 1.30184623, -0.15766702, 0.49383678],
[ nan, -1.06001582, 0.74718833, 0.88017891],
[ nan, 0.58295368, -2.65917224, -1.02250684],
[ nan, 1.65813068, -0.6840109 , -1.47183501],
[ nan, -0.46071979, -0.68783761, -0.2226751 ],
[ nan, -0.15957344, -0.36469354, -0.76149221],
[ nan, -0.73067775, -0.76414392, 0.85255194],
[ nan, -0.28688719, -0.6522936 , nan],
[ nan, -0.81299299, -0.47965581, nan],
[ nan, -0.31229225, 0.93184837, nan],
[ nan, 0.94326072, -0.19065349, nan],
[ nan, -1.18388064, 0.28044171, nan],
[ nan, 0.45093446, 0.04949498, nan],
[ nan, -0.4533858 , -0.20690368, nan],
[ nan, -0.2803555 , -2.25556423, nan],
[ nan, 0.34965446, -0.98551074, nan],
[ nan, -0.68944918, 0.56729028, nan],
[ nan, -0.477974 , -0.29183736, nan],
[ nan, 0.00377089, 1.46657872, nan],
[ nan, 0.16092817, nan, nan],
[ nan, -1.12801133, nan, nan],
[ nan, -0.24945858, nan, nan],
[ nan, -1.57062341, nan, nan],
[ nan, 0.38728048, nan, nan],
[ nan, -1.6567151 , nan, nan],
[ nan, 0.16422776, nan, nan],
[ nan, -1.61647419, nan, nan],
[ nan, 1.14110187, nan, nan]])
Note that the above code contains i[0], because np.ndenumerate
returns as the first result a tuple of indices.
Since data is a 1-D array, we are interested in the first index only,
so after i I put [0].
This one works for me:
data = np.random.normal(size=(100,4))
slices_values = [75, 33, 42, 54] # You name your slices here
slices = [] # In this list you will keep the slices
for i in slices_values:
x = slice(i, 100)
slices.append(data[x])
Now you can confirm the shape of each slices:
slices[0].shape # (25, 4)
slices[1].shape # (67, 4)
slices[2].shape # (58, 4)
slices[3].shape # (46, 4)
It is difficult to find out what is the expected result, But if IIUC, one way is to create an array and fill it using looping:
data = np.random.normal(size=(5, 4))
ids = np.array([2, 1, 2, 3])
def test(data, ids):
arr = np.empty_like(data)
for i, j in enumerate(ids):
arr[:j, i] = data[:j, i]
arr[j:, i] = np.nan
return arr
res = test(data, ids)
# [[ 0.1768507210788626 2.3777541249700573 0.998732857053734 -1.3101507969798436 ]
# [ 0.18018992116935298 nan -1.443125868756967 -1.3992855573400653 ]
# [ nan nan nan -0.2319322879433409 ]
# [ nan nan nan nan]
# [ nan nan nan nan]]
or:
def test(data, ids):
arr = np.empty_like(data)
for i, j in enumerate(ids):
arr[:j, i] = np.nan
arr[j:, i] = data[j:, i]
return arr
# [[ nan nan nan nan]
# [ nan -1.7647540193678475 nan nan]
# [ 0.8203539992532282 1.2952993197746814 0.9421974218807785 nan]
# [-0.6313979666045816 -0.6421770233773478 -0.3816716009896775 -1.7634440039930654]
# [ 1.611668212682313 -0.878108388861928 -0.4985770669099582 0.9072434022928676]]
Related
Randomly replace 50% values of one array with 50% of randomly selected values from a second array in Python
I am completely new to Python and coding, and I am stuck in trying to replace randomly selected values from one array with values from a second array. My data are extracted from 2 Iris Cubes and consists of LAT and LON data. After loading the two cubes, I can extract the data from the 2 observation datasets of Latitude and Longitude, say "obs_1" and "obs_2", with shape (475, 635): obs_1 <iris 'Cube' of OBSERVATIONS / (g/m2) (latitude: 475; longitude: 635)> and obs_2 <iris 'Cube' of OBSERVATIONS / (g/m2) (latitude: 475; longitude: 635)> both obs_1.data and obs_2.data can be threaded as numpy arrays: type(obs_1.data) Out[174]: numpy.ndarray with size(obs_1.data) Out[173]: 301625 My obs_1 consist of observations at time=18:00 for a selected day, and obs_2 an average over time for the same day, from t=14:00 to t=17:00. Now, what I am trying to do is to randomly replace 50% of values in obs_1, with 50% of randomly selected values from obs_2. Data in the arrays look like this (this is a selection from the array): array([[ nan, nan, nan, nan, nan, nan, nan, nan, 3.6444201 , 3.6288068 , 3.4562614 , 3.1650603 , 2.837024 , 2.5862055 , 2.5824826 , nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], [ nan, nan, nan, nan, nan, nan, nan, 4.126052 , 4.154033 , 3.6938105 , 3.1892183 , 2.837798 , 2.695081 , nan, 2.4830801 , 2.619453 , 2.744787 , nan, nan, nan, 4.037193 , 3.9007418 , 3.918395 , 4.1123595 , nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], [ nan, nan, nan, nan, nan, nan, 4.479512 , 4.139696 , 3.7454944 , nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 1.7283309 , 2.0259488 , 2.6097915 , 2.8537903 , 3.3934724 , nan, nan, nan, nan, nan, nan, nan, nan], [ nan, nan, nan, nan, nan, 4.476785 , 4.5633755 , 3.7924814 , 3.270711 , nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 1.7360739 , 2.171296 , 2.6570952 , 3.58288 , 4.6880975 , nan, nan, nan, nan, nan, nan], [ nan, nan, nan, nan, nan, 4.411482 , 3.9552238 , 3.7757099 , 2.875049 , 2.1458075 , nan, nan, nan, nan, nan, nan, 1.7425493 , 1.8161889 , nan, nan, nan, nan, nan, 1.2822593 , 1.4383382 , 1.5031592 , 1.5003852 , 1.9955662 , 4.0983477 , nan, nan, nan, nan, nan], [ nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 1.5202525 , 1.2684406 , 1.3887881 , 1.6239417 , 1.5679324 , 1.3143418 , nan, nan, nan, 0.9014559 , 1.046359 , 1.1121098 , 1.2461395 , 1.3922306 , 1.5674534 , 1.7686707 , 4.694426 , 5.8581176 , nan, nan, nan], [ nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 1.4250685 , 1.342187 , 1.460965 , 1.5898347 , 1.4935569 , nan, nan, 0.76497865, 0.7578024 , 0.9086805 , 1.1051334 , 1.0408422 , 1.0398425 , 1.1574577 , nan, nan, 1.6596926 , 4.667655 , nan, nan, nan], [ nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 1.4770626 , 1.3014681 , 1.2809513 , nan, nan, 1.0585229 , 0.98995847, 0.8447306 , 0.7979446 , nan, nan, nan, nan, nan, nan, nan, nan, 2.920856 , nan, nan, nan], [ nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 1.2806126 , nan, 0.97792864, 0.8848762 , 2.0891907 , 1.4531214 , 1.2615036 , 0.97086287, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 4.1831126 , nan, nan], [ nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 1.1235833 , 1.2448411 , 0.95834756, 0.99093884, 1.0072019 , 1.1916308 , 0.9324562 , 1.0275717 , 1.2712531 , nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 3.2303405 , 4.449829 , nan]], dtype=float32) Where nan are values masked by the loading processes (data not relevant). I did a search and tried with np.random and masking, however I can't understand how to randomly select from both arrays, and replace the obs_1 mask with obs_2 mask, given that the masks have a different shape. I am struggling with writing the code, so except for loading the data using iris cube (that i can post if of any help), i do not have an example to show. Could someone please point me to any example (I couldn't find any so far regarding exchanging data from different arrays) or give me any hints of how to proceed. Many thanks in advance. All the best
See this question for randomly selection from a Numpy Array. obs_1 = np.array( [[1, 3, 0], [3, 2, 0], [0, 2, 1], [1, 1, 4], [3, 2, 2], [0, 1, 0], [1, 3, 1], [0, 4, 1], [2, 4, 2], [3, 3, 1]] ) obs_2 = np.array( [[10, 3, 0], [30, 2, 0], [100, 2, 1], [10, 1, 4], [30, 2, 2], [100, 1, 0], [10, 3, 1], [100, 4, 1], [20, 4, 2], [30, 3, 1]] ) n_observation = min(obs_1.shape[0], obs_2.shape[0]) index_1 = np.random.choice(np.arange(obs_1.shape[0]), int(n_observation / 2), replace=False) index_2 = np.random.choice(np.arange(obs_2.shape[0]), int(n_observation / 2), replace=False) obs_1[index_1, :] = obs_2[index_2, :]
Array split task: based on values and custom types
There are two 2-D ndarrays A and B. A contains panel values for a feature, rows represents days, columns are different regions. There are ~3000 columns and ~5000 rows in A. Such as A = array([[ 3.53, 3.56, nan, ..., nan, nan, nan], # day 1 data [-4.91, -2.54, nan, ..., nan, nan, nan], # day 2 data [-6.31, -3.39, nan, ..., nan, nan, nan], # day 3 data, etc ..., [ 0. , -3.41, nan, ..., 12.69, 2.32, nan], [-2.74, -4.14, nan, ..., -8.63, -1.45, nan], [-1.74, -7.45, nan, ..., 0.68, -6.52, nan]]) B contains the type of each value corresponding in A. There are around 30 types in total. Such as B = array([[ 'A', 'B', nan, ..., nan, nan, nan], # day 1 type [ 'A', 'A', nan, ..., nan, nan, nan], # day 2 type, etc ..., [ 'D', 'E', nan, ..., 'I', 'D', nan], [ 'X', 'Y', nan, ..., 'O', 'S', nan]]) The goal is for each day (row), the regions should be split into 10 groups based on values (group 10 > group 9 ...). And for each group, the weight of each type should be equal to total number of the type in the row / 10. For example, day 1: # of A: 35 --> weight of A in each group: 3.5 # of B: 33 --> weight of B in each group: 3.3 ... # of Z: 6 --> weight of Z in each group: 0.6 And the result should be something like weight_group_1 = array([[ 1, 1, nan, ..., 0.5, ..., 1, ..., nan, nan, nan] # And the sum of each group's weights should be equal, if all steps correct. weight_group_2 = array([[ 0, 0, nan, ..., 1, ..., 0.3, ..., nan, nan, nan] and so on Are there any efficient algorithms to achieve this? Please help, thanks in advance!
Python: Stretch 2D array into 3D based on corresponding array with 3rd-dimension index
Lets say I have some 2D array a = np.ones((3,3)) I want to stretch this array into 3 dimensions. I have array b, the same size as a, that provides the index in the 3rd dimension that each corresponding element in a needs to go too. I also have 3D array c that is filled with NaNs. This is the array that the information from a should be put into. The remaining blank spaces that do not get "filled: can remain NaNs. >>> a = np.ones((3,3)) >>> b = np.random.randint(0,3,(3,3)) >>> c = np.empty((3,3,3))*np.nan >>> >>> a array([[ 1., 1., 1.], [ 1., 1., 1.], [ 1., 1., 1.]]) >>> b array([[2, 2, 2], [1, 0, 2], [1, 0, 0]]) >>> c array([[[ nan, nan, nan], [ nan, nan, nan], [ nan, nan, nan]], [[ nan, nan, nan], [ nan, nan, nan], [ nan, nan, nan]], [[ nan, nan, nan], [ nan, nan, nan], [ nan, nan, nan]]]) So, in the above example, I would want to end up with c[0,0,2] = 1. I know I probably do this with some nested loops, but ideally I want this done in a more efficient/vectorized way.
You can use fancy indexing as this, assuming the largest value in b is always less than c.shape[2]: n1, n2 = a.shape c[np.arange(n1)[:,None], np.arange(n2), b] = a c #array([[[ nan, nan, 1.], # [ nan, nan, 1.], # [ nan, nan, 1.]], # [[ nan, 1., nan], # [ 1., nan, nan], # [ nan, nan, 1.]], # [[ nan, 1., nan], # [ 1., nan, nan], # [ 1., nan, nan]]]) Here we use integer arrays for all dimensions to trigger advanced indexing, and the three arrays are broadcasted against each other as follows (here we use numpy.broacast_arrays to visualize this): i, j, k = np.broadcast_arrays(np.arange(3)[:,None], np.arange(3), b) print("first dimension index: ") print(i) print("second dimension index: ") print(j) print("third dimension index: ") print(k) first dimension index: [[0 0 0] [1 1 1] [2 2 2]] second dimension index: [[0 1 2] [0 1 2] [0 1 2]] third dimension index: [[2 2 2] [1 0 2] [1 0 0]] Now the advanced indexing goes as (0, 0, 2), (0, 1, 2), (0, 2, 2) ... ,i.e. pick one value from each array at the same positions to form an index for an element: Some testing cases: c[0,0,2] #1.0 c[0,1,2] #1.0 c[2,1,0] #1.0
Ok, so this feels like a total hack, but does the trick: a = np.ones((3,3)) b = np.array([[2, 2, 2], [1, 0, 2], [1, 0, 0]]) c = np.empty((3,3,3))*np.nan z_coords = np.arange(3) c[z_coords[None, None, :] == b[..., None]] = a.ravel() What I do is create an boolean indexing array that is true for the indices we want to assign, and then assign these. array([[[ nan, nan, 1.], [ nan, nan, 1.], [ nan, nan, 1.]], [[ nan, 1., nan], [ 1., nan, nan], [ nan, nan, 1.]], [[ nan, 1., nan], [ 1., nan, nan], [ 1., nan, nan]]])
A slower but perhaps clearer option: x, y = np.indices(c.shape[:2]) c[x, y, b] = a # same as looping over c[x[i,j], y[i,j], b[i,j]] = a[i,j] The trick is to produce 3 arrays of indices all with the same shape - one for each dimension of c. The accepted answer is doing essentially this, but taking advantage of broadcasting
numpy argmax with max less than some number
I have a numpy array as: myArray array([[ 1. , nan, nan, nan, nan], [ 1. , nan, nan, nan, nan], [ 0.63 , 0.79 , 1. , nan, nan], [ 0.25 , 0.4 , 0.64 , 0.84 , nan]]) I need to find for each row, the column numbers for max value but the max has to be less than 1. In the above array, row 0,1 should return Nan. Row 2 should return 1. Row 3 should return 3. I am not sure how to condition this on argmax.
Here's one approach with np.where - m = a < 1 # Mask of elems < 1 and non-NaNs # Set NaNs and elems > 1 to global minimum values minus 1, # so that when used with argmax those would be ignored idx0 = np.where(m, a,np.nanmin(a)-1).argmax(1) # Look for rows with no non-NaN and < 1 elems and set those in o/p as NaNs idx = np.where(m.any(1), idx0, np.nan) Sample run - In [97]: a Out[97]: array([[ 1. , nan, nan, nan, nan], [ 1. , nan, nan, nan, nan], [ 0.63, 0.79, 1. , nan, nan], [ 0.25, 0.4 , 0.64, 0.84, nan]]) In [98]: m = a < 1 In [99]: idx0 = np.where(m, a,np.nanmin(a)-1).argmax(1) In [100]: idx0 Out[100]: array([0, 0, 1, 3]) In [101]: np.where(m.any(1), idx0, np.nan) Out[101]: array([ nan, nan, 1., 3.])
How can I conditionally change the values in a numpy array taking into account nans?
My array is a 2D matrix and it has numpy.nan values besides negative and positive values: >>> array array([[ nan, nan, nan, ..., -0.04891211, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [-0.02510989, -0.02520096, -0.02669156, ..., nan, nan, nan], [-0.02725595, -0.02715945, -0.0286231 , ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], dtype=float32) (There are positive numbers in the array, they just don't show in the preview.) And I want to replace all the positive numbers with a number and all the negative numbers with another number. How can I perform that using python/numpy? (For the record, the matrix is a result of geoimage, which I want to perform a classification)
The fact that you have np.nan in your array should not matter. Just use fancy indexing: x[x>0] = new_value_for_pos x[x<0] = new_value_for_neg If you want to replace your np.nans: x[np.isnan(x)] = something_not_nan More info on fancy indexing a tutorial and the NumPy documentation.
Try: a[a>0] = 1 a[a<0] = -1
to add or subtract to current value then (np.nan not affected) import numpy as np a = np.arange(-10, 10).reshape((4, 5)) print("after -") print(a) a[a<0] = a[a<0] - 2 a[a>0] = a[a>0] + 2 print(a) output [[-10 -9 -8 -7 -6] [ -5 -4 -3 -2 -1] [ 0 1 2 3 4] [ 5 6 7 8 9]] after - [[-12 -11 -10 -9 -8] [ -7 -6 -5 -4 -3] [ 0 3 4 5 6] [ 7 8 9 10 11]]
Pierre's answer doesn't work if new_value_for_pos is negative. In that case, you could use np.where() in a chain: # Example values x = np.array([np.nan, -0.2, 0.3]) new_value_for_pos = -1 new_value_for_neg = 2 x[:] = np.where(x>0, new_value_for_pos, np.where(x<0, new_value_for_neg, x)) Result: array([nan, 2., -1.])