MinmaxScaler: Normalise a 4D array of input - python

I have a 4D array of input that I would like to normalise using MinMaxScaler. For simplicity, I give an example with the following array:
A = np.array([
[[[0, 1, 2, 3],
[3, 0, 1, 2],
[2, 3, 0, 1],
[1, 3, 2, 1],
[1, 2, 3, 0]]],
[[[9, 8, 7, 6],
[5, 4, 3, 2],
[0, 9, 8, 3],
[1, 9, 2, 3],
[1, 0, -1, 2]]],
[[[0, 7, 1, 2],
[1, 2, 1, 0],
[0, 2, 0, 7],
[-1, 3, 0, 1],
[1, 0, 1, 0]]]
])
A.shape
(3,1,5,4)
In the given example, the array contains 3 input samples, where each sample has the shape (1,5,4). Each column of the input represents 1 variable (feature), so each sample has 4 features.
I would like to normalise the input data, But MinMaxScaler expects a 2D array (n_samples, n_features) like dataframe.
How then do I use it to normalise this input data?

Vectorize the data
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
A_sq = np.squeeze(A)
print(A_sq.shape)
# (3, 5, 4)
scaler.fit(np.squeeze(A_sq).reshape(3,-1)) # reshape to (3, 20)
#MinMaxScaler()

You can use the below code to normalize 4D array.
from sklearn.preprocessing import MinMaxScaler, StandardScaler
scaler = MinMaxScaler(feature_range=(0, 1))
def norm(arr):
arrays_list=list()
objects_list=list()
for i in range(arr.shape[0]):
temp_arr=arr[i]
temp_arr=temp_arr[0]
scaler.fit(temp_arr)
temp_arr=scaler.transform(temp_arr)
objects_list.append(scaler)
arrays_list.append([temp_arr])
return objects_list,np.array(arrays_list)
pass the array to the function like
objects,array=norm(A)
it will return a list of MinMax objects and your original array with normalize values.
Output:

" If you want a scaler for each channel, you can reshape each channel of the data to be of shape (10000, 5*5). Each channel (which was previously 5x5) is now a length 25 vector, and the scaler will work. You'll have to transform your evaluation data in the same way with the scalers in channel_scalers."
Maybe this will help, not sure if this is what you're looking for exactly, but...
Python scaling with 4D data

Related

Mean of 3D array with Numpy

I’m new to python/numpy prorammation.
I have a 3-dimensional array representing an image (x and y axis) and each point of the image is associated with a pixel with its rgb values:
So for example:
a = np.array([[[0, 2, 2], [1, 3, 2]], [[1, 4, 5], [6, 5, 3]]])
I'd like to compute the mean of the R, G, and B color channels over the image.
For exemple (0+1+1+6)/4, (2+3+4+5)/4, (2+2+5+3)/4.
And i have to use the function numpy.mean.
I’ve tested several things by slicing but I feel it’s not the right thing to do: (patch == image)
enter image description here
thx for your help !
Let us consider this array rgb
array([[[1, 1, 1],
[1, 2, 3],
[4, 5, 6]],
[[1, 2, 3],
[0, 0, 0],
[1, 0, 0]]])
Then mean values for each channel we can get it by
import numpy as np
rgb = np.array([[[1,1,1],[1,2,3],[4,5,6]],[1,2,3],[0,0,0],[1,0,0]]])
print(rgb.shape)
rgb_mean = np.mean(np.mean(rgb, axis=1), axis=0)
print(rgb_mean)
rgb_mean = rgb_mean.astype(np.uint8)
print(rgb_mean.shape)
print(rgb_mean)
output
(2, 3, 3)
array([1.33333333, 1.66666667, 2.16666667])
(3,)
array([1, 1, 2], dtype=uint8)

keras-gcn fit model ValueError

I'm using this library to create a model to learn graphs. Here is the code (from repository):
import numpy as np
from keras_gcn.backend import keras
from keras_gcn import GraphConv
# feature matrix
input_data = np.array([[[0, 1, 2],
[2, 3, 4],
[4, 5, 6],
[7, 7, 8]]])
# adjacency matrix
input_edge = np.array([[[1, 1, 1, 0],
[1, 1, 0, 0],
[1, 0, 1, 0],
[0, 0, 0, 1]]])
labels = np.array([[[1],
[0],
[1],
[0]]])
data_layer = keras.layers.Input(shape=(None, 3), name='Input-Data')
edge_layer = keras.layers.Input(shape=(None, None), dtype='int32', name='Input-Edge')
conv_layer = GraphConv(units=4, step_num=1, kernel_initializer='ones',
bias_initializer='ones', name='GraphConv')([data_layer, edge_layer])
model = keras.models.Model(inputs=[data_layer, edge_layer], outputs=conv_layer)
model.compile(optimizer='adam', loss='mae', metrics=['mae'])
model.fit([input_data, input_edge], labels)
However, when I run the code I get the following error:
ValueError: Error when checking target: expected GraphConv to have 3 dimensions, but got array with shape (4, 1)
while the shape of labels is (1, 4, 1)
You should encode your labels using onehot-encoder, something like the following:
lables = np.array([[[0, 1],
[1, 0],
[0, 1],
[1, 0]]])
Also number of units in GraphConv layer should be equal to the number of unique labels which is 2 in your case.
I think the issue is mismatch between the shapes of your edge_layer and data_layer.
When you use the function keras.layers.Input you're giving data_layer a shape of shape=(None, 3) and then you're giving edge_layer a shape of shape=(None, None)
Match the shapes and let me know how it goes.

How do I calculate xi^j in a matrix in Numpy

I am trying to calculate a matrix from an array that is inputted.
I would like to be able to input
a = [0,1,2]
in python and would like to reshape it with Numpy such that the result is that the array is in the form of x_i^j at row i and column j,
so for example
the input is:
a = [0,1,2]
and the output should be
[[1,0,0],
[1,1,1],
[1,2,4]]
and I have used the following code
xij = np.matrix([np.power(xi,j) for j in x for xi in x]).reshape(3,3)
[[ 1, 2, 3],
[ 1, 4, 9],
[ 1, 8, 27]]
I assume I'm using the wrong formula for Numpy,
please could you assist me in this to solve the problem.
Thanks in advance
You need to use a range(len(a)) to get the exponents and the correct order of for loops
a = [0,1,2]
xij = np.matrix([np.power(xi,j) for xi in a for j in range(len(a))]).reshape(3,3)
# matrix([[1, 0, 0],
# [1, 1, 1],
# [1, 2, 4]])
With array broadcasting:
In [823]: np.array([0,1,2])**np.arange(3)[:,None]
Out[823]:
array([[1, 1, 1],
[0, 1, 2],
[0, 1, 4]])
In [825]: np.array([1,2,3])**np.arange(1,4)[:,None]
Out[825]:
array([[ 1, 2, 3],
[ 1, 4, 9],
[ 1, 8, 27]])

Numpy - Apply a custom function on all combination of rows in matrix to get a new matrix?

I have the following function, that applies the histogram intersection kernel for 2 arrays:
def histogram_intersection_kernel(X, Y):
k = np.array([])
for x_i,y_i in zip(X,Y):
k = np.append(k,np.minimum(x_i,y_i))
return np.sum(k)
now, lets say I have the following matrix "mat":
[[1,0,0,2,3],
[2,3,4,0,1],
[3,3,5,0,1]]
I would like to find an efficient way to get the matrix that is the result of applying "histogram_intersection_kernel" to all of the combinations of rows in mat. In this example it would be:
[[6,2,2],
[6,10,10],
[2,10,12]]
Extend dimensions to 3D and leverage broadcasting -
np.minimum(a[:,None,:],a[None,:,:]).sum(axis=2)
Or simply -
np.minimum(a[:,None],a).sum(2)
Sample run -
In [248]: a
Out[248]:
array([[1, 0, 0, 2, 3],
[2, 3, 4, 0, 1],
[3, 3, 5, 0, 1]])
In [249]: np.minimum(a[:,None],a).sum(2)
Out[249]:
array([[ 6, 2, 2],
[ 2, 10, 10],
[ 2, 10, 12]])

TensorFlow tf.data.Dataset and bucketing

For an LSTM network, I've seen great improvements with bucketing.
I've come across the bucketing section in the TensorFlow docs which (tf.contrib).
Though in my network, I am using the tf.data.Dataset API, specifically I'm working with TFRecords, so my input pipeline looks something like this
dataset = tf.data.TFRecordDataset(TFRECORDS_PATH)
dataset = dataset.map(_parse_function)
dataset = dataset.map(_scale_function)
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.padded_batch(batch_size, padded_shapes={.....})
How can I incorporate the bucketing method into a the tf.data.Dataset pipeline?
If it matters, in every record in the TFRecords file I have the sequence length saved as an integer.
Various bucketing use cases using Dataset API are explained well here.
bucket_by_sequence_length() example:
def elements_gen():
text = [[1, 2, 3], [3, 4, 5, 6, 7], [1, 2], [8, 9, 0, 2]]
label = [1, 2, 1, 2]
for x, y in zip(text, label):
yield (x, y)
def element_length_fn(x, y):
return tf.shape(x)[0]
dataset = tf.data.Dataset.from_generator(generator=elements_gen,
output_shapes=([None],[]),
output_types=(tf.int32, tf.int32))
dataset = dataset.apply(tf.contrib.data.bucket_by_sequence_length(element_length_func=element_length_fn,
bucket_batch_sizes=[2, 2, 2],
bucket_boundaries=[0, 8]))
batch = dataset.make_one_shot_iterator().get_next()
with tf.Session() as sess:
for _ in range(2):
print('Get_next:')
print(sess.run(batch))
Output:
Get_next:
(array([[1, 2, 3, 0, 0],
[3, 4, 5, 6, 7]], dtype=int32), array([1, 2], dtype=int32))
Get_next:
(array([[1, 2, 0, 0],
[8, 9, 0, 2]], dtype=int32), array([1, 2], dtype=int32))

Categories