I know this is supposed to be simple but I can't figure it out.
The problem:
gt_prices = np.random.uniform(0, 100, size = (121147, 28))
pred_idxs = np.random.randint(0, 28 , size = (121147,))
print(gt_prices.shape, pred_idxs.shape)
(121147, 28) (121147,)
I want to get an array of shape (121147,), where for each row I have the element of ground_truth_prices in the position given by pred_idxs.
In other words, I want to do this:
selected_prices = np.array([gt_prices[i, pred_idxs[i]] for i in range(gt_prices.shape[0])])
But I'd like to do everything with NumPy. Is this possible?
You can do the following (used a smaller dimension of 3 for checking the correctness easier)
gt_prices = np.random.uniform(0, 100, size = (3, 28))
pred_idxs = np.random.randint(0, 28 , size = (3,))
indices = np.expand_dims(pred_idxs, axis=1)
gt_prices[np.arange(gt_prices.shape[0])[:,None], indices]
There is now an easy wrapper for this from numpy: https://numpy.org/devdocs/reference/generated/numpy.take_along_axis.html
For your usage, I believe it would be:
gt_prices = np.random.uniform(0, 100, size = (121147, 28))
pred_idxs = np.random.randint(0, 28 , size = (121147, 1)) # number of dimensions has to match
your_output = np.take_along_axis(gt_prices, pred_idxs, axis=1) # output shape [121147, 1]
Related
import numpy as np
import pandas as pd
df = pd.read_csv('concrete_data.csv', delimiter=',', sep=r', ')
X_raw = df.drop(['concrete_compressive_strength'], axis=1)
y_raw = df['concrete_compressive_strength']
# Isolate our examples for our labeled dataset.
n_labeled_examples = X_raw.shape[0]
training_indices = np.random.randint(low=0, high=len(X_raw)+1, size=3)
# Defining the training data
X_training = X_raw.iloc[training_indices]
y_training = y_raw.iloc[training_indices]
The shape of these variables are:
X_training.shape
(3, 8)
y_training.shape
(3,)
X_raw.shape
(1030, 8)
y_raw.shape
(1030,)
Now, I want to isolate the non-training examples:
X_pool = np.delete(X_raw, training_indices, axis=0)
y_pool = np.delete(y_raw, training_indices, axis=0)
This gives me the following error?
ValueError: Shape of passed values is (1027, 8), indices imply (1030, 8)
I tried to reshape the training_indices but still gives the same error.
r = np.reshape(training_indices, (3,1), order='C')
May I know what is wrong, how to change the shape of training_indices to be fixed.
you can use these lines:
X_pool = X_raw.drop(training_indices.tolist())
y_pool = y_raw.drop(training_indices.tolist())
instead of these lines:
X_pool = np.delete(X_raw, training_indices, axis=0)
y_pool = np.delete(y_raw, training_indices, axis=0)
I have the following scenario
array = np.ndarray(size=(100, 100), dtype = int) #this should be an empty array of this size
newarray = np.ndarray(size=(100, 100), dtype = int)
def function1(parameter1, parameter2)
for i in range(50):
function2(pm1, pm2)
def function2(parameter3, parameter4)
function3(pm3, pm4)
def function3(parameter4, parameter5)
if (statement):
input = array of 100 column #input is a array of (100, 0) size
else:
input1 = array of 100 column #input1 is a array of (100, 0 ) size
if (statement):
input3 = array of 100 column #input is a array of (100, 0) size
else:
input4 = array of 100 column #input1 is a array of (100, 0 ) size
I have trouble in making mental model here. My question is how do we add/append our input arrays(which is of (100, 0)) which is in function3 here so that the array(first one at the beginning) above will be array of size (100, 100). This array of size (100, 100) will have the rows as input or input1 and input3 or input4 of size (100, 0) .If this is not possible.
How do we store the arrays of size (100, 0) into a array of size (100, 100).
We don't usually use np.ndarray. arr=np.zeros((100,100),int) is the preferred way of initializing an array of a given size and dtype.
# is the comment character in Python.
Having created such an array, assign new values to a row (or rows) with:
arr[0,:] = np.arange(100)
arr[1:4,:] = np.random.randint(0,10,(3,100))
etc
arr[4:10,:] = np.arange(600).reshape(6,100)
Better yet figure out a way of creating the whole array at once.
arr = np.arange(100)[:,None] * np.arange(100)
I think reading the basic numpy introduction will get you started.
I need to slice sections out of a NumPy array in a specific way. Say I have a (200,200, 4) shape NumPy array. Then for every index in (200, 200), I want to select the 5x5x4 surrounding indexes, flatten it, and then put it into another array. So finally, the shape of the final array would be (200, 200, 100). Additionally, I want to delete all values at the location (:, :, 12). So finally, we'd get shape (200, 200, 99).
I've thought of two ways to go about this but they give different results and I'm not sure what I'm doing wrong.
Method 1:
import numpy as np
arr_lst = [np.random.normal(size=(200, 200)) for _ in range(4)]
slice_arr = np.zeros([200, 200, 99])
start = 0
for i, arr in enumerate(arr_lst):
for idx, _ in np.ndenumerate(arr):
#Getting surrounding 25 pixels
pos_arr = arr[idx[0]-2:idx[0]+3, idx[1]-2:idx[1]+3]
#Reshaping, into size 100
pos_arr = pos_arr.reshape(-1)
#Near the boundaries slicing does not result in size 25
if pos_arr.shape[0] != 25:
pos_arr = np.full(25, np.nan)
if i == 0:
pos_arr = np.delete(pos_arr, 12)
end = start + 25 - 1
else:
end = start + 25
slice_arr[idx[0], idx[1], start:end] = pos_arr
start = end
print(slice_arr[10, 100])
Method 2:
import numpy as np
arr_lst = [np.random.normal(size=(200, 200)) for _ in range(4)]
stacked_arr = np.stack(arr_lst, axis=2)
slice_arr = np.zeros([200, 200, 100])
for i in range(200):
for j in range(200):
x = stacked_arr[i-2:i+3, j-2:j+3, 0:4]
if x.shape != (5, 5, 4):
x = np.array([np.nan for _ in range(100)])
else:
x = x.reshape(100)
slice_arr[i,j] = x
slice_arr = np.delete(slice_arr, 12, 2)
print(slice_arr[10, 100])
The first method gives me the array that I want in the correct order, but the second method feels more natural and faster. Another question I have is if I can optimize this at all? Is there a fast way for slicing around every index at the same time and keeping each slice the same shape? Then afterwards, deleting what things we want to?
Using #hpaulj helpful comments I designed a solution that I think works for my purposes. It's similar to what was suggested here: Rolling windows for ndarrays but has the additional border of np.nan values. If anyone else finds this useful I've posted it here, for debugging purposes, I've set the values in the padded array to coordinate tuples:
from skimage.util.shape import view_as_windows
arr_lst = [np.empty(shape=(200, 200), dtype=tuple) for _ in range(4)]
arr_lst = [np.pad(x, pad_width=2, mode='constant', constant_values=np.nan) for x in arr_lst]
padded_arr = np.stack(arr_lst, axis=2)
for idx, _ in np.ndenumerate(padded_arr):
padded_arr[idx[0], idx[1], idx[2]] = idx
w = view_as_windows(padded_arr, (5, 5, 4)).reshape(200, 200, 100)
I am creating inside a for loop in each iteration of it a numpy array of size 20x30x30x3. I want to concatenate all of those numpy arrays into a bigger one. If the iteration steps are 100 then the numpy array I want should be2000x30x30x3. I tried to do with lists:
new_one_arr1_list = []
new_one_arr2_list = []
all_arr1 = np.array([])
for item in one_arr1: # 100 iterations
item = np.reshape(item, (1, 30, 30, 3))
new_one_arr1 = np.repeat(item, 20, axis=0)
all_arr1 = np.concatenate(([all_arr1 , new_one_arr1 ]))
ind = np.random.randint(one_arr2.shape[0], size=(20,))
new_one_arr2= one_arr1[ind]
new_one_arr1_list.append(new_one_arr1)
new_one_arr2_list.append(new_one_arr2)
In each iteration step new_one_arr1 and new_one_arr2 they have size 20x30x30x3. In the end when I am converting new_one_arr1_list and new_one_arr2_list and the size it is 100x20x30x30x3. How can I have 2000x30x30x3 in the end in a numpy array?
EDIT: I tried to use concatenate to add the arrays within a numpy array all_arr1 using: all_arr1= np.concatenate(([all_arr1, new_one_arr1])) however, I received the message:
ValueError: all the input arrays must have same number of dimensions
In order to create the concatenation and work around the error, I initialized the array with None and tested if it is None in the loop.
Thereby you do not have to worry about not fitting dimensions.
However, i created some arrays for the ones you did only describe and ended up with a final dimesion of (400, 30, 30, 3).
This fits in here, since 20*20 = 400.
Hope this helps for you solution.
new_one_arr1_list = []
new_one_arr2_list = []
one_arr1 = np.ones((20,30,30,3))
one_arr2 = np.ones((20,30,30,3))
all_arr1 = None
count = 0
for item in one_arr1: # 100 iterations
item = np.reshape(item, (1, 30, 30, 3))
new_one_arr1 = np.repeat(item, 20, axis=0)
# print(all_arr1.shape, new_one_arr1.shape)
if all_arr1 is None:
all_arr1 = new_one_arr1
else:
all_arr1 = np.concatenate(([all_arr1 , new_one_arr1 ]), axis=0)
ind = np.random.randint(one_arr2.shape[0], size=(20,))
new_one_arr2= one_arr1[ind]
new_one_arr1_list.append(new_one_arr1)
new_one_arr2_list.append(new_one_arr2)
count += 1
print(count)
all_arr1.shape
Use np.concatenate operation given in the documenation:
https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.concatenate.html
Don't concatenate in the first iteration, it'll raise dimension error, just copy it during the first iter. For the remaining iterations keep concatenating.
new_one_arr1_list = []
new_one_arr2_list = []
all_arr1 = np.array([])
firstIteration = True
for item in one_arr1: # 100 iterations
item = np.reshape(item, (1, 30, 30, 3))
new_one_arr1 = np.repeat(item, 20, axis=0)
if firstIteration:
all_arr1 = new_one_arr1
firstIteration=False
else:
all_arr1 = np.concatenate(([all_arr1 , new_one_arr1 ]))
ind = np.random.randint(one_arr2.shape[0], size=(20,))
new_one_arr2= one_arr1[ind]
new_one_arr1_list.append(new_one_arr1)
new_one_arr2_list.append(new_one_arr2)
I have the following code:
big_k = gabor((height * 2, width *2), (height, width)) #Returns a 2d-array
r = np.arange(0, radialSlices, radialWidth)
p = np.arange(0, angularSlices, angularWidth)
pp, rr = np.meshgrid(p, r, sparse=False)
z = np.sum(img * big_k[height-rr:2*height-rr, width-pp:2*width-pp])
I get this error:
z = np.sum(img * big_k[height-rr:2*height-rr, width-pp:2*width-pp])
IndexError: invalid slice
I understand this error and why it has happened. The problem is you can't slice arrays with arrays of indices. The thing is, using meshgrid is a fabulous way to speed things up & get rid of the nested loops in my code (otherwise I would have to iterate over angularSlices * radialSlices). Is there a way I can use meshgrid to slice big_k?
You need to broadcast the index yourself, for example:
a = np.zeros((200, 300))
yy, xx = np.meshgrid([10, 40, 90], [30, 60])
hh, ww = np.meshgrid(np.arange(5), np.arange(8))
YY = yy[..., None, None] + hh[None, None, ...]
XX = xx[..., None, None] + ww[None, None, ...]
a[YY, XX] = 1
the image looks like: