Python: Randomly insert multiple rows into 3D numpy array - python

I have a 3D array arr of size (2, 5, 5). I also have another array rows_to_ins of size (3, 5).
I would like to randomly insert rows_to_insert into each page of arr. However, rows_to_insert must not be inserted as a block. In addition, the position to insert should be random for ever page of arr.
However, I am struggling with efficiently inserting rows_to_ins. My current solution incorporates a for-loop.
import numpy as np
arr = np.arange(100, 125).reshape(5, 5)
arr = np.repeat(arr[None, :, :], 2, axis=0)
rows_to_ins = np.random.randint(0, 99, (3,5))
row_nums_3D = np.random.randint(0, arr.shape[1], (2, 1, 3))
arr_ins = list()
for i in range(row_nums_3D.shape[0]):
arr_ins.append(np.insert(arr[i, :, :], np.squeeze(row_nums_3D[i, :, :]), rows_to_ins, axis=0))
arr_ins = np.asarray(arr_ins)
I am wondering, if I can avoid the for-loop. What would a vectorize solution look like?
Maybe a more concrete example will help to understand my problem.
# arr - shape (2, 5, 5)
[[[100 101 102 103 104]
[105 106 107 108 109]
[110 111 112 113 114]
[115 116 117 118 119]
[120 121 122 123 124]]
[[100 101 102 103 104]
[105 106 107 108 109]
[110 111 112 113 114]
[115 116 117 118 119]
[120 121 122 123 124]]]
# rows_to_insert - shape(3, 5)
[[37 31 28 34 10]
[ 2 97 89 36 11]
[66 14 70 37 45]]
I am looking for a potential result such like this:
# 3D array with insertet rows - shape (2, 8, 5)
[[[100 101 102 103 104]
[ 2 97 89 36 11]
[66 14 70 37 45]
[105 106 107 108 109]
[110 111 112 113 114]
[115 116 117 118 119]
[120 121 122 123 124]
[37 31 28 34 10]]
[[66 14 70 37 45]
[100 101 102 103 104]
[105 106 107 108 109]
[ 2 97 89 36 11]
[110 111 112 113 114]
[37 31 28 34 10]
[115 116 117 118 119]
[120 121 122 123 124]]]

Here's a vectorized way -
def insert_random_places(arr, rows_to_ins):
m,n,r = arr.shape
N = len(rows_to_ins) + n
idx = np.random.rand(m,N).argsort(1)
out = np.zeros((m,N,r),dtype=np.result_type(arr, rows_to_ins))
np.put_along_axis(out,np.sort(idx[:,:n,None],axis=1),arr,axis=1)
np.put_along_axis(out,idx[:,n:,None],rows_to_ins,axis=1)
return out
Sample run -
In [58]: arr
Out[58]:
array([[[100, 101, 102, 103, 104],
[105, 106, 107, 108, 109],
[110, 111, 112, 113, 114],
[115, 116, 117, 118, 119]],
[[100, 101, 102, 103, 104],
[105, 106, 107, 108, 109],
[110, 111, 112, 113, 114],
[115, 116, 117, 118, 119]]])
In [59]: rows_to_ins
Out[59]:
array([[77, 72, 9, 20, 80],
[69, 79, 47, 64, 82]])
In [60]: np.random.seed(0)
In [61]: insert_random_places(arr, rows_to_ins)
Out[61]:
array([[[100, 101, 102, 103, 104],
[ 69, 79, 47, 64, 82],
[105, 106, 107, 108, 109],
[110, 111, 112, 113, 114],
[115, 116, 117, 118, 119],
[ 77, 72, 9, 20, 80]],
[[100, 101, 102, 103, 104],
[ 77, 72, 9, 20, 80],
[ 69, 79, 47, 64, 82],
[105, 106, 107, 108, 109],
[110, 111, 112, 113, 114],
[115, 116, 117, 118, 119]]])
Another one based on masking -
def insert_random_places_v2(arr, rows_to_ins):
m,n,r = arr.shape
L = len(rows_to_ins)
N = L + n
insert_idx = np.random.rand(m,N).argpartition(kth=-L,axis=1)[:,-L:]
mask = np.zeros((m,N),dtype=bool)
np.put_along_axis(mask,insert_idx,1,axis=1)
out = np.zeros((m,N,r),dtype=np.result_type(arr, rows_to_ins))
rows_to_ins_3D = rows_to_ins[np.random.rand(m,L).argsort(1)]
out[mask] = rows_to_ins_3D.reshape(-1,r)
out[~mask] = arr.reshape(-1,r)
return out

Related

Getting each column in a 3d numpy array

I converted an image from RBG to CieLab, now I need to use the value of the cielab to calculate some equations.
I have been trying to get the value of each column in the array. For example if I have:
List =
[[[ 65 234 169]
[203 191 245]
[ 36 58 196]
[207 208 143]
[251 208 187]]
[[ 79 69 237]
[ 13 124 42]
[104 165 82]
[170 178 178]
[ 66 42 210]]
[[ 40 163 219]
[142 37 140]
[ 75 205 143]
[246 30 221]
[ 16 98 102]]]
How can I get it to give me the values of each columns like:
1st_column =
65
203
36
207
251
79
13
104
170
66
40
142
75
246
16
Thank you.
Try:
>>> m[:, :, 0]
array([[ 65, 203, 36, 207, 251],
[ 79, 13, 104, 170, 66],
[ 40, 142, 75, 246, 16]])
As suggested by #mozway, you can use the ellipsis syntax: m[..., 0].
To know more, read How do you use the ellipsis slicing syntax in Python?
You can also flatten your array:
>>> m[:, :, 0].flatten()
array([ 65, 203, 36, 207, 251, 79, 13, 104, 170, 66, 40, 142, 75, 246, 16])

Sum of positive arrays yields negative results

I try to sum together three positive arrays, however, the result yields an array that has negative values. How is this possible?
#Example of an image
img=np.array(([[[246, 240, 243],[240, 239, 239],
[243, 242, 244]],[[ 241, 240, 240],
[241, 243, 246],[ 239, 239, 239]],
[[249, 249, 250],[ 33, 33, 34],
[249, 249, 249]],[[ 33, 33, 33],
[250, 250, 249],[ 34, 34, 34]]]), dtype=np.uint8)
#Creating three positive arrays from image
#Image type converted to np.int16 as otherwise values remain between 0-255
R=abs((img[:,:,0].astype(np.int16)-255)**2)
G=abs((img[:,:,1].astype(np.int16)-255)**2)
B=abs((img[:,:,2].astype(np.int16)-255)**2)
print(R, G, B)
[[ 81 225 144]
[ 196 196 256]
[ 36 16252 36]
[16252 25 16695]] [[ 225 256 169]
[ 225 144 256]
[ 36 16252 36]
[16252 25 16695]] [[ 144 256 121]
[ 225 81 256]
[ 25 16695 36]
[16252 36 16695]]
#Adding three positive arrays together
R+G+B
array([[ 450, 737, 434],
[ 646, 421, 768],
[ 97, -16337, 108],
[-16780, 86, -15451]], dtype=int16)
I thought it had something to do with the abs() function I am applying, however, the results separately clearly show they are referenced correctly and positive?

Access x_train columns after train test split function

After the splitting of my data, im trying a feature ranking but when im trying to access the X_train.columns im getting this 'numpy.ndarray' object has no attribute 'columns'.
from sklearn.model_selection import train_test_split
y=df['DIED'].values
x=df.drop('DIED',axis=1).values
X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=42)
print('X_train',X_train.shape)
print('X_test',X_test.shape)
print('y_train',y_train.shape)
print('y_test',y_test.shape)
bestfeatures = SelectKBest(score_func=chi2, k="all")
fit = bestfeatures.fit(X_train,y_train)
dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(X_train.columns)
i know that train test split returns a numpy array, but how i should deal with it?
May be this code makes it clear:
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
# here i imitate your example of data
df = pd.DataFrame(data = np.random.randint(100, size = (50,5)), columns = ['DIED']+[f'col_{i}' for i in range(4)])
df.head()
Out[1]:
DIED col_0 col_1 col_2 col_3
0 36 0 23 43 55
1 81 59 83 37 31
2 32 86 94 50 87
3 10 69 4 69 27
4 1 16 76 98 74
#df here is a DataFrame, with all attributes, like df.columns
y=df['DIED'].values
x=df.drop('DIED',axis=1).values # <- here you get values, so the type of structure is array of array now (not DataFrame), so it hasn't any columns name
x
Out[2]:
array([[ 0, 23, 43, 55],
[59, 83, 37, 31],
[86, 94, 50, 87],
[69, 4, 69, 27],
[16, 76, 98, 74],
[17, 50, 52, 31],
[95, 4, 56, 68],
[82, 35, 67, 76],
.....
# now you can access to columns by index, like this:
x[:,2] # <- gives you access to the 3rd column
Out[3]:
array([43, 37, 50, 69, 98, 52, 56, 67, 81, 64, 48, 68, 14, 41, 78, 65, 11,
86, 80, 1, 11, 32, 93, 82, 93, 81, 63, 64, 47, 81, 79, 85, 60, 45,
80, 21, 27, 37, 87, 31, 97, 16, 59, 91, 20, 66, 66, 3, 9, 88])
# or you able to convert array of array back to DataFrame
pd.DataFrame(data = x, columns = df.columns[1:])
Out[4]:
col_0 col_1 col_2 col_3
0 0 23 43 55
1 59 83 37 31
2 86 94 50 87
3 69 4 69 27
....
The same approach with all your variables: X_train, X_test, Y_train, Y_test

Iterating elements in 3D array gives wrong element

I have a numpy array (of an image), the 3rd dimension is of length 3. An example of my array is below. I am attempting to iterate it so I access/print the last dimension of the array. But each of the techniques below accesses each individual value in the 3d array rather than the whole 3d array.
How can I iterate this numpy array at the 3d array level?
My array:
src = cv2.imread('./myimage.jpg')
# naive/shortened example of src contents (shape=(1, 3, 3))
[[[117 108 99]
[115 105 98]
[ 90 79 75]]]
When iterating my objective is print the following values each iteration:
[117 108 99] # iteration 1
[115 105 98] # iteration 2
[ 90 79 75] # iteration 3
# Attempt 1 to iterate
for index,value in np.ndenumerate(src):
print(src[index]) # src[index] and value = 117 when I was hoping it equals [117 108 99]
# Attempt 2 to iterate
for index,value in enumerate(src):
print(src[index]) # value = is the entire row
Solution
You could use any of the following two methods. However, Method-2 is more robust and the justification for that has been shown in the section: Detailed Solution below.
import numpy as np
src = [[117, 108, 99], [115, 105, 98], [ 90, 79, 75]]
src = np.array(src).reshape((1,3,3))
Method-1
for row in src[0,:]:
print(row)
Method-2
Robust method.
for e in np.transpose(src, [2,0,1]):
print(e)
Output:
[117 108 99]
[115 105 98]
[90 79 75]
Detailed Solution
Let us make an array of shape (3,4,5). So, if we iterate over the 3rd dimension, we should find 5 items, each with a shape of (3,4). You could achieve this by using numpy.transpose as shown below:
src = np.arange(3*4*5).reshape((3,4,5))
for e in np.transpose(src, [2,0,1]):
print(row)
Output:
[[ 0 5 10 15]
[20 25 30 35]
[40 45 50 55]]
[[ 1 6 11 16]
[21 26 31 36]
[41 46 51 56]]
[[ 2 7 12 17]
[22 27 32 37]
[42 47 52 57]]
[[ 3 8 13 18]
[23 28 33 38]
[43 48 53 58]]
[[ 4 9 14 19]
[24 29 34 39]
[44 49 54 59]]
Here the array src is:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
General advice: When working with numpy, explicit python loops should be a last resort. Numpy is an extremely powerful tool which covers most use cases. Learn how to use it properly! If it helps, you can think of numpy as almost its own mini-language within a language.
Now, onto the code. I chose here to keep only the subarrays whose values are all below 100, but of course this is completely arbitrary and serves only to demonstrate the code.
import numpy as np
arr = np.array([[[117, 108, 99], [115, 105, 98], [90, 79, 75]], [[20, 3, 99], [101, 250, 30], [75, 89, 83]]])
cond_mask = np.all(a=arr < 100, axis=2)
arr_result = arr[cond_mask]
Let me know if you have any questions about the code :)

Read contents of a text file into a dictionary

The contents of my text file are as follows:
a 0 45 124 234 53 12 34
a 90 294 32 545 190 87
a 180 89 63 84 73 63 83
How can I read the contents into a dictionary such that a0 becomes the key and the rest of them as values. I would want my dictionary to look like this:
{a0: [45, 124, 234, 53, 12, 34], a90: [294, 32, 545, 190, 87], a180: [89, 63, 84, 73, 63, 83]}
I have tried the conventional approach where I remove the delimiter and then
store it in the dictionary as shown below
newt = {}
newt = {t[0]: t[1:] for t in data}
But here I get only a as the key
This may help you out (it's about Christmas time after all)
d = {}
with open("dd.txt") as f:
for line in f:
els = line.split()
k = ''.join(els[:2])
d[k] = list(map(int,els[2:]))
print(d)
with an input file of
a 0 45 124 234 53 12 34
a 90 294 32 545 190 87
a 180 89 63 84 73 63 83
it produces
{'a90': [294, 32, 545, 190, 87],
'a180': [89, 63, 84, 73, 63, 83],
'a0': [45, 124, 234, 53, 12, 34]}
It essentially reads each line from the file, it then splits it into chunks ignoring whitespace.
It then uses the first two elements to compose the key and the rest to compose a list, converting each element into an integer.
I have assumed you want the numbers as integers. If you want them as strings you can ignore the conversion to int
d[k] = els[2:]
If you like one-liners (kind-of):
with open('my_file.txt') as f:
res = {''.join(r.split(None, 2)[:2]): [int(x) for x in r.split()[2:]] for r in f}
>>> res
{'a0': [45, 124, 234, 53, 12, 34],
'a180': [89, 63, 84, 73, 63, 83],
'a90': [294, 32, 545, 190, 87]}

Categories