I iterate over a 3D numpy array and want to append in every step a float value to the array in the 3rd dimension (axis =2).
Something like (I know the code doesn't work as of now, latIndex, data and lonIndex for simplicity as randoms)
import numpy as np
import random
GridData = np.ones((121, 201, 1000))
data = np.random.rand(4800, 4800)
for row in range(4800):
for column in range(4800):
latIndex = random.randrange(0, 121, 1)
lonIndex = random.randrange(0, 201, 1)
GridData = np.append(GridData[latIndex, lonIndex, :], data[column, row], axis = 2)
The 3rd dimension of GridData is arbitrary in this example of size 1000.
How can I achieve this?
Addition:
It might be possible without np.append but then I don't know how to do this since the 3rd index is different for every combination of latIndex and lonIndex.
You can allocate extra space for your array grid_data, fill it with the NaN, and keep track of the next index to be filled in another array while iterating through and filling with values from data. If you completely fill the third dimension for some lat_idx, lon_idx with non-NaN values, then you just allocate more space. Since appending is expensive with numpy, it's best that this extra space is pretty large so you only do it once or twice (below I allocate twice the original space).
Once the array is filled, you can remove the added space that was unused with numpy.isnan(). This solution does what you want but is very slow (for the example values you gave it took about two minutes), but the slow execution comes from iterating rather than the numpy operations.
Here's the code:
import random
import numpy as np
grid_data = np.ones(shape=(121, 201, 1000))
data = np.random.rand(4800, 4800)
# keep track of next index to fill for all the arrays in axis 2
next_to_fill = np.full(shape=(grid_data.shape[0], grid_data.shape[1]),
fill_value=grid_data.shape[2],
dtype=np.int32)
# allocate more space
double_shape = (grid_data.shape[0], grid_data.shape[1], grid_data.shape[2] * 2)
extra_space = np.full(shape=double_shape, fill_value=np.nan)
grid_data = np.append(grid_data, extra_space, axis=2)
for row in range(4800):
for col in range(4800):
lat_idx = random.randint(0, 120)
lon_idx = random.randint(0, 200)
# allocate more space if needed
if next_to_fill[lat_idx, lon_idx] >= grid_data.shape[2]:
grid_data = np.append(grid_data, extra_space, axis=2)
grid_data[lat_idx, lon_idx, next_to_fill[lat_idx, lon_idx]] = data[row,
col]
next_to_fill[lat_idx, lon_idx] += 1
# remove unnecessary nans that were appended
not_all_nan_idxs = ~np.isnan(grid_data).all(axis=(0, 1))
grid_data = grid_data[:, :, not_all_nan_idxs]
Related
I have a 2d numpy array size 100 x 100.
I want to randomly sample values from the "inside" 80 x 80 values so that I can exclude values which are influenced by edge effects. I want to sample from row 10 to row 90 and within that from column 10 to column 90.
However, importantly, I need to retain the original index values from the 100 x 100 grid, so I can't just trim the dataset and move on. If I do that, I am not really solving the edge effect problem because this is occurring within a loop with multiple iterations.
gridsize = 100
new_abundances = np.zeros([100,100],dtype=np.uint8)
min_select = int(np.around(gridsize * 0.10))
max_select = int(gridsize - (np.around(gridsize * 0.10)))
row_idx =np.arange(min_select,max_select)
col_idx = np.arange(min_select,max_select)
indices_random = ????? Somehow randomly sample from new_abundances only within the rows and columns of row_idx and col_idx set.
What I ultimately need is a list of 250 random indices selected from within the flattened new_abundances array. I need to keep the new_abundances array as 2d to identify the "edges" but once that is done, I need to flatten it to get the indices which are randomly selected.
Desired output:
An 1d list of indices from a flattened new_abundances array.
Woudl something like solve your problem?
import numpy as np
np.random.seed(0)
mat = np.random.random(size=(100,100))
x_indices = np.random.randint(low=10, high=90, size=250)
y_indices = np.random.randint(low=10, high=90, size=250)
coordinates = list(zip(x_indices,y_indices))
flat_mat = mat.flatten()
flat_index = x_indices * 100 + y_indices
Then you can access elements using any value from the coordinates list, e.g. mat[coordinates[0]] returns the the matrix value at coordinates[0]. Value of coordinates[0] is (38, 45) in my case. If the matrix is flattened, you can calculate the 1D index of the corresponding element. In this case, mat[coordinates[0]] == flat_mat[flat_index[0]] holds, where flat_index[0]==3845=100*38+45
Please also note that multiple sampling of the original data is possible this way.
Using your notation:
import numpy as np
np.random.seed(0)
gridsize = 100
new_abundances = np.zeros([100,100],dtype=np.uint8)
min_select = int(np.around(gridsize * 0.10))
max_select = int(gridsize - (np.around(gridsize * 0.10)))
x_indices = np.random.randint(low=min_select, high=max_select, size=250)
y_indices = np.random.randint(low=min_select, high=max_select, size=250)
coords = list(zip(x_indices,y_indices))
flat_new_abundances = new_abundances.flatten()
flat_index = x_indices * gridsize + y_indices
I have a data array of multiple dimensions, with the last one being the distance. On the other hand, I have the distance vector r. For instance:
Data = np.ones((20, 30, 100))
r = np.linspace(10, 50, 100)
Finally, I also have a list of critical distance values called r0, such that (r0.shape == Data.shape[:-1]).all(). For instance,
r0 = np.random.random((20, 30))*40 + 10
I'm looking to replace values of Data by zero based on a condition on r and r0 corresponding to the first dimensions of Data. For example, I want for any i and j that:
Data[i, j, r>=r0[i,j]] = 0
Consider that Data can be a big array such that using loops is very long. My current workaround is:
r_temp = np.broadcast_to(np.expand_dims(r, list(np.arange(len(Data.shape)-1))), Data.shape)
Data[r_temp >= r0[..., None]] = 0
It is fast, but it consumes a lot of memory considering that I have to store the array r_temp, which can be problematic if Data starts to be big.
Any solution that does not necessitate to create and store r_temp ?
Note: for the creation of r_temp, see here.
I have an index manipulation problem that I can solve, but it's slow. I'm looking for a way to speed this up.
I have a large (m, 2) float array (think 2D point coordinates) and a large idx array into points. (A typical operation is to pick indices out, points[idx].) From idx, I sometimes need to delete a few entries. I can do that with mask, but the operation is slow, presumably because the entire array is rewritten in memory. Alternative: numpy's masked arrays. Masking is fast, of course, but unfortunately they don't work as indices: the masking is simply ignored.
MWE:
import numpy
# setup
points = numpy.random.rand(10, 2)
n = 5 # can be very large irl
idx = numpy.random.randint(0, 10, n)
# typical operation with idx:
# points[idx]
# a few entries are deleted
mask = numpy.ones(n, dtype=bool)
mask[2] = False # only a few are masked
idx = idx[mask] # takes a while
# alternative: use ma?
idx = numpy.random.randint(0, 10, n)
idx = numpy.ma.array(idx)
idx[2] = numpy.ma.masked
# Doesn't work, masking is ignored:
points[idx]
Any hints on how to speed this up?
My code is
import numpy as np
housing_data = np.loadtxt('Housing.csv', delimiter=',')
x1 = housing_data[:,0]
x2 = housing_data[:,1]
y = housing_data[:,2]
avgX1 = np.mean(x1)
stdX1 = np.std(x1)
normX1 = (x1 - avgX1) / stdX1
avgX2 = np.mean(x2)
stdX2 = np.std(x2)
normX2 = (x2 - avgX2) / stdX2
ones = np.ones((normX2.shape[0], 1))
normalizedX = np.array((ones[0], normX1, normX2))
I'm trying to create a new normalized array with the ones in the first column, then the normX1 and normX2. For some reason, my code isn't working. Any idea what I'm doing wrong?
The actual issue is that you made ones 2D where normX1 and normX2 are 1D. then when you call np.array((ones[0], normX1, normX2)) you get the first row of ones which is another array of length 1. The mismatch in length between the three args for np.array causes it to return a list of the objects instead (a numpy array with dtype=object).
I'd just make ones big enough to fit all your data in the first place and avoid making one extra array. Then just assign the values of normX1 and normX2 to the columns of that array:
normalizedX = np.ones((normX2.shape[0], 3))
normalizedX[:,1] = normX1
normalizedX[:,2] = normX2
print(normalizedX)
I am trying to figure out how to iteratively append 2D arrays to generate a singular larger array. On each iteration a 16x200 ndarray is generated as seen below:
For each iteration a new 16x200 array is generated, I would like to 'append' this to the previously generated array for a total of N iterations. For example for two iterations the first generated array would be 16x200 and for the second iteration the newly generated 16x200 array would be appended to the first creating a 16x400 sized array.
train = np.array([])
for i in [1, 2, 1, 2]:
spike_count = [0, 0, 0, 0]
img = cv2.imread("images/" + str(i) + ".png", 0) # Read the associated image to be classified
k = np.array(temporallyEncode(img, 200, 4))
# Somehow append k to train on each iteration
In the case of the above embedded code the loop iterates 4 times so the final train array is expected to be 16x800 in size. Any help would be greatly appreciated, I have drawn a blank on how to successfully accomplish this. The code below is a general case:
import numpy as np
totalArray = np.array([])
for i in range(1,3):
arrayToAppend = totalArray = np.zeros((4, 200))
# Append arrayToAppend to totalArray somehow
While it is possible to perform a concatenate (or one of the 'stack' variants) at each iteration, it is generally faster to accumulate the arrays in a list, and perform the concatenate once. List append is simpler and faster.
alist = []
for i in range(0,3):
arrayToAppend = totalArray = np.zeros((4, 200))
alist.append(arrayToAppend)
arr = np.concatenate(alist, axis=1) # to get (4,600)
# hstack does the same thing
# vstack is the same, but with axis=0 # (12,200)
# stack creates new dimension, # (3,4,200), (4,3,200) etc
Try using numpy hstack. From the documention, hstack takes a sequence of arrays and stack them horizontally to make a single array.
For example:
import numpy as np
x = np.zeros((16, 200))
y = x.copy()
for i in xrange(5):
y = np.hstack([y, x])
print y.shape
Gives:
(16, 400)
(16, 600)
(16, 800)
(16, 1000)
(16, 1200)