I have the following scenario
array = np.ndarray(size=(100, 100), dtype = int) #this should be an empty array of this size
newarray = np.ndarray(size=(100, 100), dtype = int)
def function1(parameter1, parameter2)
for i in range(50):
function2(pm1, pm2)
def function2(parameter3, parameter4)
function3(pm3, pm4)
def function3(parameter4, parameter5)
if (statement):
input = array of 100 column #input is a array of (100, 0) size
else:
input1 = array of 100 column #input1 is a array of (100, 0 ) size
if (statement):
input3 = array of 100 column #input is a array of (100, 0) size
else:
input4 = array of 100 column #input1 is a array of (100, 0 ) size
I have trouble in making mental model here. My question is how do we add/append our input arrays(which is of (100, 0)) which is in function3 here so that the array(first one at the beginning) above will be array of size (100, 100). This array of size (100, 100) will have the rows as input or input1 and input3 or input4 of size (100, 0) .If this is not possible.
How do we store the arrays of size (100, 0) into a array of size (100, 100).
We don't usually use np.ndarray. arr=np.zeros((100,100),int) is the preferred way of initializing an array of a given size and dtype.
# is the comment character in Python.
Having created such an array, assign new values to a row (or rows) with:
arr[0,:] = np.arange(100)
arr[1:4,:] = np.random.randint(0,10,(3,100))
etc
arr[4:10,:] = np.arange(600).reshape(6,100)
Better yet figure out a way of creating the whole array at once.
arr = np.arange(100)[:,None] * np.arange(100)
I think reading the basic numpy introduction will get you started.
Related
I have the following data
>>> import numpy as np
>>> original_classes=np.load("classes.npy")
>>> original_features=np.load("features.npy")
These NumPy arrays have the following shapes
>>> original_classes.shape
(12000,)
>>> original_features.shape
(12000, 224, 224, 3)
What I would like to do is to replace 2/3 of the original_features NumPy array with the content of a new array
>> new_features=np.load("new-features.npy")
>> new_features.shape
(600, 224, 224, 3)
However, these data must replace 600 of the positions in original_features Numpy array where the original_classes==11.
That means, there are a total of 12 unique classes in original_classes array, and there are 1000 features per class in original_features. I want to simply replace 600 features of class 11 with 600 features from new_features array, is that any way of doing that with python?
P.S= data can be found here
First, we should find out which indices are for class 11:
items_11 = original_classes == 11
idx_11 = np.argwhere(items_11).ravel() # it gives the array of args equal to 11
EDIT
Then, we choose the last 600 items:
selected_idx = idx_11[len(idx_11)-600:]
Or you can select randomly:
size = ( 2 * len(idx_11) ) // 3
selected_idx = np.random.choice(idx_11, size=size, replace=False)
New Data:
mask = np.ones(len(original_features), dtype=bool) # all elements included/True.
mask[selected_idx] = False
new_x = original_features[mask]
new_y = original_classes[mask]
new_x = np.concatenate([new_x,new_features],axis=0)
new_y = np.concatenate([new_y,np.ones(len(new_features)) * 11],axis=0)
I have a NumPy array with shape (300, 500). Consider this as an image with size (300, 500) and there are 100 objects on it that I want to fill each of them with a different value.
image = np.zeros((300, 500))
I have bounding-box coordinates (x_min, x_max, y_min, y_max) for each of these objects. Then I create indexing arrays using these bounding-box coordinates.
array_of_x_indexing_arrays = []
array_of_y_indexing_arrays = []
for obj in objects:
x_indices, y_indices = np.mgrid[obj.x_min: obj.x_max + 1, obj.y_min: obj.y_max + 1]
x_indices, y_indices = x_indices.ravel(), y_indices.ravel()
array_of_x_indexing_arrays.append(x_indices)
array_of_y_indexing_arrays.append(y_indices)
Then, I want to assign a different value to image for each of these objects. I stored them the values for each object in an array with shape (100,)
data = np.array((100,))
# Assume that I filled data such as
# data[0] = 10
# data[1] = 2
# ...
# data[99] = 3
Then what I want to do is following
for i in range(len(objects)):
image[array_of_y_indexing_arrays[i], array_of_x_indexing_arrays[i]] = data[i]
But I want to do this in NumPy way, I have tried the following but does not work
image[array_of_y_indexing_arrays, array_of_x_indexing_arrays] = data
I need to slice sections out of a NumPy array in a specific way. Say I have a (200,200, 4) shape NumPy array. Then for every index in (200, 200), I want to select the 5x5x4 surrounding indexes, flatten it, and then put it into another array. So finally, the shape of the final array would be (200, 200, 100). Additionally, I want to delete all values at the location (:, :, 12). So finally, we'd get shape (200, 200, 99).
I've thought of two ways to go about this but they give different results and I'm not sure what I'm doing wrong.
Method 1:
import numpy as np
arr_lst = [np.random.normal(size=(200, 200)) for _ in range(4)]
slice_arr = np.zeros([200, 200, 99])
start = 0
for i, arr in enumerate(arr_lst):
for idx, _ in np.ndenumerate(arr):
#Getting surrounding 25 pixels
pos_arr = arr[idx[0]-2:idx[0]+3, idx[1]-2:idx[1]+3]
#Reshaping, into size 100
pos_arr = pos_arr.reshape(-1)
#Near the boundaries slicing does not result in size 25
if pos_arr.shape[0] != 25:
pos_arr = np.full(25, np.nan)
if i == 0:
pos_arr = np.delete(pos_arr, 12)
end = start + 25 - 1
else:
end = start + 25
slice_arr[idx[0], idx[1], start:end] = pos_arr
start = end
print(slice_arr[10, 100])
Method 2:
import numpy as np
arr_lst = [np.random.normal(size=(200, 200)) for _ in range(4)]
stacked_arr = np.stack(arr_lst, axis=2)
slice_arr = np.zeros([200, 200, 100])
for i in range(200):
for j in range(200):
x = stacked_arr[i-2:i+3, j-2:j+3, 0:4]
if x.shape != (5, 5, 4):
x = np.array([np.nan for _ in range(100)])
else:
x = x.reshape(100)
slice_arr[i,j] = x
slice_arr = np.delete(slice_arr, 12, 2)
print(slice_arr[10, 100])
The first method gives me the array that I want in the correct order, but the second method feels more natural and faster. Another question I have is if I can optimize this at all? Is there a fast way for slicing around every index at the same time and keeping each slice the same shape? Then afterwards, deleting what things we want to?
Using #hpaulj helpful comments I designed a solution that I think works for my purposes. It's similar to what was suggested here: Rolling windows for ndarrays but has the additional border of np.nan values. If anyone else finds this useful I've posted it here, for debugging purposes, I've set the values in the padded array to coordinate tuples:
from skimage.util.shape import view_as_windows
arr_lst = [np.empty(shape=(200, 200), dtype=tuple) for _ in range(4)]
arr_lst = [np.pad(x, pad_width=2, mode='constant', constant_values=np.nan) for x in arr_lst]
padded_arr = np.stack(arr_lst, axis=2)
for idx, _ in np.ndenumerate(padded_arr):
padded_arr[idx[0], idx[1], idx[2]] = idx
w = view_as_windows(padded_arr, (5, 5, 4)).reshape(200, 200, 100)
I know this is supposed to be simple but I can't figure it out.
The problem:
gt_prices = np.random.uniform(0, 100, size = (121147, 28))
pred_idxs = np.random.randint(0, 28 , size = (121147,))
print(gt_prices.shape, pred_idxs.shape)
(121147, 28) (121147,)
I want to get an array of shape (121147,), where for each row I have the element of ground_truth_prices in the position given by pred_idxs.
In other words, I want to do this:
selected_prices = np.array([gt_prices[i, pred_idxs[i]] for i in range(gt_prices.shape[0])])
But I'd like to do everything with NumPy. Is this possible?
You can do the following (used a smaller dimension of 3 for checking the correctness easier)
gt_prices = np.random.uniform(0, 100, size = (3, 28))
pred_idxs = np.random.randint(0, 28 , size = (3,))
indices = np.expand_dims(pred_idxs, axis=1)
gt_prices[np.arange(gt_prices.shape[0])[:,None], indices]
There is now an easy wrapper for this from numpy: https://numpy.org/devdocs/reference/generated/numpy.take_along_axis.html
For your usage, I believe it would be:
gt_prices = np.random.uniform(0, 100, size = (121147, 28))
pred_idxs = np.random.randint(0, 28 , size = (121147, 1)) # number of dimensions has to match
your_output = np.take_along_axis(gt_prices, pred_idxs, axis=1) # output shape [121147, 1]
I am comparing 2 numpy arrays, and want to add them together. but, before doing so, i need to make sure they are the same size. If the size are not same, then take the smaller sized one and fill the last rows with zero to match the shape.
Both array have 16 columns and N rows. I am assuming it should be pretty straight forward, but I can't get my head around it. So far I am able to compare the 2 array shape.
import csv
import numpy as np
import sys
data = np.genfromtxt('./test1.csv', dtype=float, delimiter=',')
data_sys = np.genfromtxt('./test2.csv', dtype=float, delimiter=',')
print data.shape
print data_sys.shape
if data.shape != data_sys.shape:
print "we have an error"
This is the output I got:
=============New file.csv============
(603, 16)
(604, 16)
we have an error
I want the fill the last row of "data" array with 0 so that I can add the 2 arrays.
Thanks for your help.
You can use vstack(array1, array2) from numpy which stacks arrays vertically. For example:
A = np.random.randint(2, size = (2, 16))
B = np.random.randint(2, size = (5, 16))
print A.shape
print B.shape
if A.shape[0] < B.shape[0]:
A = np.vstack((A, np.zeros((B.shape[0] - A.shape[0], 16))))
elif A.shape[0] > B.shape[0]:
B = np.vstack((B, np.zeros((A.shape[0] - B.shape[0], 16))))
print A.shape
print A
In your case:
if data.shape[0] < data_sys.shape[0]:
data = np.vstack((data, np.zeros((data_sys.shape[0] - data.shape[0], 16))))
elif data.shape[0] > data_sys.shape[0]:
data_sys = np.vstack((data_sys, np.zeros((data.shape[0] - data_sys.shape[0], 16))))
I assume that your matrices have always the same number of columns, if not you can similarly use hstack to stack them horizontally.
If you have only two files, and their shapes differ in just the 0th dimension, a simple check and copy is probably easiest, though it lacks generality:
import numpy as np
data = np.genfromtxt('./test1.csv', dtype=float, delimiter=',')
data_sys = np.genfromtxt('./test2.csv', dtype=float, delimiter=',')
fill_value = 0 # could be np.nan or something else instead
if data.shape[0]>data_sys.shape[0]:
temp = data_sys
data_sys = np.ones(data.shape)*fill_value
data_sys[:temp.shape[0],:] = temp
elif data.shape[0]<data_sys.shape[0]:
temp = data
data = np.ones(data_sys.shape)*fill_value
data[:temp.shape[0],:] = temp
print 'Using conditional:'
print data.shape
print data_sys.shape
if data.shape != data_sys.shape:
print "we have an error"
A much more general solution is a custom class--overkill for your two files but much easier if you have lots of files to handle. The basic idea is that static class variables sx and sy keep track of the largest widths and heights, and are used when get_data is called, to output a standard shape array. This is pre-filled with your desired fill value, and the actual data from the corresponding file are copied into the upper left corner of the standard shape array:
import numpy as np
class IsomorphicArray:
sy = 0 # static class variable
sx = 0 # static class variable
fill_value = 0.0
def __init__(self,csv_filename):
self.data = np.genfromtxt(csv_filename,dtype=float,delimiter=',')
self.instance_sy,self.instance_sx = self.data.shape
if self.instance_sy>IsomorphicArray.sy:
IsomorphicArray.sy = self.instance_sy
if self.instance_sx>IsomorphicArray.sx:
IsomorphicArray.sx = self.instance_sx
def get_data(self):
out = np.ones((IsomorphicArray.sy,IsomorphicArray.sx))*self.fill_value
out[:self.instance_sy,:self.instance_sx] = self.data
return out
isomorphic_array_list = []
for filename in ['./test1.csv','./test2.csv']:
isomorphic_array_list.append(IsomorphicArray(filename))
numpy_array_list = []
for isomorphic_array in isomorphic_array_list:
numpy_array_list.append(isomorphic_array.get_data())
print 'Using custom class:'
for numpy_array in numpy_array_list:
print numpy_array.shape
Assuming both arrays have 16 columns
len1=len(data)
len2=len(data_sys)
if len1<len2:
data=np.append(data, np.zeros((len2-len1, 16)),axis=0)
elif len2<len1:
data_sys=np.append(data_sys, np.zeros((len1-len2, 16)),axis=0)
print data.shape
print data_sys.shape
if data.shape != data_sys.shape:
print "we have an error"
else:
print "we r good"
Numpy provides an append function to add values to an array: see here for details. In multi-dimensional arrays you can define how the values should be added. As you have already the information which of your arrays is the smaller one, just add the desired number of zeroes with creating a zero filled array first by numpy.zeroes and then append it to your target array.
It might be necessary to flatten your array first and then to reshape it.
I had a similar situation. Two arrays of sizes mask_in:(n1,m1) and mask_ot:(n2,m2)that were generated through a mask of a 2D image of size (N,M) where A2 is larger than A1 and both share a common center (X0,Y0). I followed the approach suggested by #AniaG using vstack and hstack. I simply obtained the shapes of both arrays, size difference and finally account the number of missing elements at both ends.
Here is what I got:
mask_in = np.random.randint(2, size = (2, 8))
mask_ot = np.random.randint(2, size = (6, 16))
mask_in_amp = mask_in
dif_row = mask_ot.shape[0]-mask_in_amp.shape[0]
dif_col = mask_ot.shape[1]-mask_in_amp.shape[1]
complete_row = dif_row / 2
complete_col = dif_col / 2
mask_in_amp = np.vstack((mask_in_amp, np.zeros((complete_row, mask_in_amp.shape[1]))))
mask_in_amp = np.vstack((np.zeros((complete_row, mask_in_amp.data.shape[1])), mask_in_amp))
mask_in_amp = np.hstack((mask_in_amp, np.zeros((mask_in_amp.shape[0],complete_col))))
mask_in_amp = np.hstack((np.zeros((mask_in_amp.shape[0],complete_col)), mask_in_amp))
If you don't care about the exact shapes of two arrays you can also do the following:
if data.size == datasys.size:
print ('arrays have the same number of elements, and possibly shape')
else:
print ('arrays do not have the same shape for sure')