I have a 4D array of input that I would like to normalise using MinMaxScaler. For simplicity, I give an example with the following array:
A = np.array([
[[[0, 1, 2, 3],
[3, 0, 1, 2],
[2, 3, 0, 1],
[1, 3, 2, 1],
[1, 2, 3, 0]]],
[[[9, 8, 7, 6],
[5, 4, 3, 2],
[0, 9, 8, 3],
[1, 9, 2, 3],
[1, 0, -1, 2]]],
[[[0, 7, 1, 2],
[1, 2, 1, 0],
[0, 2, 0, 7],
[-1, 3, 0, 1],
[1, 0, 1, 0]]]
])
A.shape
(3,1,5,4)
In the given example, the array contains 3 input samples, where each sample has the shape (1,5,4). Each column of the input represents 1 variable (feature), so each sample has 4 features.
I would like to normalise the input data, But MinMaxScaler expects a 2D array (n_samples, n_features) like dataframe.
How then do I use it to normalise this input data?
Vectorize the data
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
A_sq = np.squeeze(A)
print(A_sq.shape)
# (3, 5, 4)
scaler.fit(np.squeeze(A_sq).reshape(3,-1)) # reshape to (3, 20)
#MinMaxScaler()
You can use the below code to normalize 4D array.
from sklearn.preprocessing import MinMaxScaler, StandardScaler
scaler = MinMaxScaler(feature_range=(0, 1))
def norm(arr):
arrays_list=list()
objects_list=list()
for i in range(arr.shape[0]):
temp_arr=arr[i]
temp_arr=temp_arr[0]
scaler.fit(temp_arr)
temp_arr=scaler.transform(temp_arr)
objects_list.append(scaler)
arrays_list.append([temp_arr])
return objects_list,np.array(arrays_list)
pass the array to the function like
objects,array=norm(A)
it will return a list of MinMax objects and your original array with normalize values.
Output:
" If you want a scaler for each channel, you can reshape each channel of the data to be of shape (10000, 5*5). Each channel (which was previously 5x5) is now a length 25 vector, and the scaler will work. You'll have to transform your evaluation data in the same way with the scalers in channel_scalers."
Maybe this will help, not sure if this is what you're looking for exactly, but...
Python scaling with 4D data
Related
I’m new to python/numpy prorammation.
I have a 3-dimensional array representing an image (x and y axis) and each point of the image is associated with a pixel with its rgb values:
So for example:
a = np.array([[[0, 2, 2], [1, 3, 2]], [[1, 4, 5], [6, 5, 3]]])
I'd like to compute the mean of the R, G, and B color channels over the image.
For exemple (0+1+1+6)/4, (2+3+4+5)/4, (2+2+5+3)/4.
And i have to use the function numpy.mean.
I’ve tested several things by slicing but I feel it’s not the right thing to do: (patch == image)
enter image description here
thx for your help !
Let us consider this array rgb
array([[[1, 1, 1],
[1, 2, 3],
[4, 5, 6]],
[[1, 2, 3],
[0, 0, 0],
[1, 0, 0]]])
Then mean values for each channel we can get it by
import numpy as np
rgb = np.array([[[1,1,1],[1,2,3],[4,5,6]],[1,2,3],[0,0,0],[1,0,0]]])
print(rgb.shape)
rgb_mean = np.mean(np.mean(rgb, axis=1), axis=0)
print(rgb_mean)
rgb_mean = rgb_mean.astype(np.uint8)
print(rgb_mean.shape)
print(rgb_mean)
output
(2, 3, 3)
array([1.33333333, 1.66666667, 2.16666667])
(3,)
array([1, 1, 2], dtype=uint8)
I'm using this library to create a model to learn graphs. Here is the code (from repository):
import numpy as np
from keras_gcn.backend import keras
from keras_gcn import GraphConv
# feature matrix
input_data = np.array([[[0, 1, 2],
[2, 3, 4],
[4, 5, 6],
[7, 7, 8]]])
# adjacency matrix
input_edge = np.array([[[1, 1, 1, 0],
[1, 1, 0, 0],
[1, 0, 1, 0],
[0, 0, 0, 1]]])
labels = np.array([[[1],
[0],
[1],
[0]]])
data_layer = keras.layers.Input(shape=(None, 3), name='Input-Data')
edge_layer = keras.layers.Input(shape=(None, None), dtype='int32', name='Input-Edge')
conv_layer = GraphConv(units=4, step_num=1, kernel_initializer='ones',
bias_initializer='ones', name='GraphConv')([data_layer, edge_layer])
model = keras.models.Model(inputs=[data_layer, edge_layer], outputs=conv_layer)
model.compile(optimizer='adam', loss='mae', metrics=['mae'])
model.fit([input_data, input_edge], labels)
However, when I run the code I get the following error:
ValueError: Error when checking target: expected GraphConv to have 3 dimensions, but got array with shape (4, 1)
while the shape of labels is (1, 4, 1)
You should encode your labels using onehot-encoder, something like the following:
lables = np.array([[[0, 1],
[1, 0],
[0, 1],
[1, 0]]])
Also number of units in GraphConv layer should be equal to the number of unique labels which is 2 in your case.
I think the issue is mismatch between the shapes of your edge_layer and data_layer.
When you use the function keras.layers.Input you're giving data_layer a shape of shape=(None, 3) and then you're giving edge_layer a shape of shape=(None, None)
Match the shapes and let me know how it goes.
I am trying to calculate a matrix from an array that is inputted.
I would like to be able to input
a = [0,1,2]
in python and would like to reshape it with Numpy such that the result is that the array is in the form of x_i^j at row i and column j,
so for example
the input is:
a = [0,1,2]
and the output should be
[[1,0,0],
[1,1,1],
[1,2,4]]
and I have used the following code
xij = np.matrix([np.power(xi,j) for j in x for xi in x]).reshape(3,3)
[[ 1, 2, 3],
[ 1, 4, 9],
[ 1, 8, 27]]
I assume I'm using the wrong formula for Numpy,
please could you assist me in this to solve the problem.
Thanks in advance
You need to use a range(len(a)) to get the exponents and the correct order of for loops
a = [0,1,2]
xij = np.matrix([np.power(xi,j) for xi in a for j in range(len(a))]).reshape(3,3)
# matrix([[1, 0, 0],
# [1, 1, 1],
# [1, 2, 4]])
With array broadcasting:
In [823]: np.array([0,1,2])**np.arange(3)[:,None]
Out[823]:
array([[1, 1, 1],
[0, 1, 2],
[0, 1, 4]])
In [825]: np.array([1,2,3])**np.arange(1,4)[:,None]
Out[825]:
array([[ 1, 2, 3],
[ 1, 4, 9],
[ 1, 8, 27]])
I have the following function, that applies the histogram intersection kernel for 2 arrays:
def histogram_intersection_kernel(X, Y):
k = np.array([])
for x_i,y_i in zip(X,Y):
k = np.append(k,np.minimum(x_i,y_i))
return np.sum(k)
now, lets say I have the following matrix "mat":
[[1,0,0,2,3],
[2,3,4,0,1],
[3,3,5,0,1]]
I would like to find an efficient way to get the matrix that is the result of applying "histogram_intersection_kernel" to all of the combinations of rows in mat. In this example it would be:
[[6,2,2],
[6,10,10],
[2,10,12]]
Extend dimensions to 3D and leverage broadcasting -
np.minimum(a[:,None,:],a[None,:,:]).sum(axis=2)
Or simply -
np.minimum(a[:,None],a).sum(2)
Sample run -
In [248]: a
Out[248]:
array([[1, 0, 0, 2, 3],
[2, 3, 4, 0, 1],
[3, 3, 5, 0, 1]])
In [249]: np.minimum(a[:,None],a).sum(2)
Out[249]:
array([[ 6, 2, 2],
[ 2, 10, 10],
[ 2, 10, 12]])
For an LSTM network, I've seen great improvements with bucketing.
I've come across the bucketing section in the TensorFlow docs which (tf.contrib).
Though in my network, I am using the tf.data.Dataset API, specifically I'm working with TFRecords, so my input pipeline looks something like this
dataset = tf.data.TFRecordDataset(TFRECORDS_PATH)
dataset = dataset.map(_parse_function)
dataset = dataset.map(_scale_function)
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.padded_batch(batch_size, padded_shapes={.....})
How can I incorporate the bucketing method into a the tf.data.Dataset pipeline?
If it matters, in every record in the TFRecords file I have the sequence length saved as an integer.
Various bucketing use cases using Dataset API are explained well here.
bucket_by_sequence_length() example:
def elements_gen():
text = [[1, 2, 3], [3, 4, 5, 6, 7], [1, 2], [8, 9, 0, 2]]
label = [1, 2, 1, 2]
for x, y in zip(text, label):
yield (x, y)
def element_length_fn(x, y):
return tf.shape(x)[0]
dataset = tf.data.Dataset.from_generator(generator=elements_gen,
output_shapes=([None],[]),
output_types=(tf.int32, tf.int32))
dataset = dataset.apply(tf.contrib.data.bucket_by_sequence_length(element_length_func=element_length_fn,
bucket_batch_sizes=[2, 2, 2],
bucket_boundaries=[0, 8]))
batch = dataset.make_one_shot_iterator().get_next()
with tf.Session() as sess:
for _ in range(2):
print('Get_next:')
print(sess.run(batch))
Output:
Get_next:
(array([[1, 2, 3, 0, 0],
[3, 4, 5, 6, 7]], dtype=int32), array([1, 2], dtype=int32))
Get_next:
(array([[1, 2, 0, 0],
[8, 9, 0, 2]], dtype=int32), array([1, 2], dtype=int32))