How to model data for tensorflow? - python

I have data of the form :
A B C D E F G
1 0 0 1 0 0 1
1 0 0 1 0 0 1
1 0 0 1 0 1 0
1 0 1 0 1 0 0
...
1 0 1 0 1 0 0
0 1 1 0 0 0 1
0 1 1 0 0 0 1
0 1 0 1 1 0 0
0 1 0 1 1 0 0
A,B,C,D are my inputs and E,F,G are my outputs. I wrote the following code in Python using TensorFlow:
from __future__ import print_function
#from random import randint
import numpy as np
import tflearn
import pandas as pd
data,labels =tflearn.data_utils.load_csv('dummy_data.csv',target_column=-1,categorical_labels=False, n_classes=None)
print(data)
# Build neural network
net = tflearn.input_data(shape=[None, 4])
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, 3, activation='softmax')
net = tflearn.regression(net)
# Define model
model = tflearn.DNN(net)
#Start training (apply gradient descent algorithm)
data_to_array = np.asarray(data)
print(data_to_array.shape)
#data_to_array= data_to_array.reshape(6,9)
print(data_to_array.shape)
model.fit(data_to_array, labels, n_epoch=10, batch_size=3, show_metric=True)
I am getting an error which says:
ValueError: Cannot feed value of shape (3, 6) for Tensor 'InputData/X:0', which has shape '(?, 4)'
I am guessing this is because my input data has 7 columns (0...6), but I want the input layer to take only the first four columns as input and predict the last 3 columns in the data as output. How can I model this?

If the data's in a numpy format, then the first 4 columns are taken with a simple slice:
data[:,0:4]
The : means "all rows", and 0:4 is a range of values 0,1,2,3, the first 4 columns.
If the data isn't in a numpy format, just convert it to a numpy format so you can slice easily.
Here's a related article on numpy slices: Numpy - slicing 2d row or column vector from array

Related

Stuck in error loop between Data cardinality is ambiguous and shapes are incompatible with my 3d cnn model

I'm attempting to train my model using the "train_on_batch" function, as the data is too large to be fully put in at once. The shape of my training data is as follows: X.shape = (388, 108, 36, 36, 36), Y.shape = (388, 108). To make the data clear, there are 388 x and 388 y train files. Each of these training files contains 108 arrays of 3d arrays (36,36,36). For every 3d array, there is a corresponding binary. I'm trying to iterate through these 388 pairs of files 1 by 1 to use in the train_on_batch. Below is the CNN model:
model = Sequential()
model.add(Conv3D(filters=16, kernel_size=(3,3,3), padding='valid', input_shape=(108, 36, 36, 36)))
model.add(Activation('relu'))
model.add(MaxPool3D(pool_size=(2,2,2)))
model.add(Conv3D(32, kernel_size=(3,3,3)))
model.add(Activation('relu'))
model.add(MaxPool3D(pool_size=(2,2,2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(32))
model.add(Activation('relu'))
model.add(Dropout(0.1))
model.add(Dense(2))
model.add(Activation('softmax'))
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
This was my first for loop for trying to input the data:
for i in range(len(X_train)):
model.train_on_batch(X_train[i], Y_train[i], sample_weight=None)
Which resulted in the following error:
ValueError: Input 0 of layer "sequential" is incompatible with the layer: expected shape=(None, 108, 36, 36, 36), found shape=(108, 36, 36, 36)
To combat this I reshaped my data, which resulted in my input being accepted. I ensured that the y data was the same shape, however then I reached the error loop which I cannot figure out myself, and wanted to ask others. Here is the reshape resulting in ValueError: Shapes (1, 108) and (1, 2) are incompatible:
for i in range(len(X_train)):
new_X_train = X_train[i].reshape(1, 108, 36, 36, 36)
new_Y_train = Y_train[i].reshape(1, 108)
When I apply .astype('float32').reshape((-1,1)) on the Y, then I get the error that ValueError: Data cardinality is ambiguous:. This makes sense to me because since then the x and y data won't be the same format.
The output should be 0 or 1, as these are ct_scan slices, so it's identifying the array as either "nodule" or "non-nodule". For reference, here is what Y_train[0] looks like:
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
I've been trying to wrap my head around this for a while. There are many questions that can help me solve my errors, but my issue is when I solve "Data cardinality is ambiguous", I get sent to the "shapes are incompatible". Vise-verca. I might be missing something, I tried what several threads have done with these individual problems but I can't seem to figure it out. Is it just the data format that my training files are in?
As it turns out, I was misinterpreting a comment I had read while following a guide on how to setup this model. By writing (108,36,36,36) instead of (36,36,36,1) I was telling the model the incorrect input shape. Once that was fixed, it worked.

ValueError: Shapes (None, 2, 28) and (None, 2) are incompatible // How can i transform 2 onehotvectors to one

I'm working on a classification Problem. The data i use is from the Aras Dataset. One line of the Data looks like the following:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 17
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 17
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 17
From the first 19 columns represent sensordata(binary). The last two columns represent the activites of two persons who lived in a household, where the data was collected.
i have diveded the dataset into different pieces, because it's not small at all, 30 Days with one datapoint every second.
What i want to do with my model: I want to train my model so it can predict what Person A&B are doing at the moment.
So here is my Code(X-Data:Column 1-19;Y-Data_Column 20-21):
*import keras
from keras import losses
from keras import regularizers
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np
from keras.utils import to_categorical
import matplotlib.pyplot as plt
from tensorflow.keras import optimizers
batch_size =512
no_epochs = 5
verbosity = 1
x_train=np.loadtxt('x_train.txt')
x_val=np.loadtxt('x_val.txt')
x_test=np.loadtxt('x_test.txt')
y_train=np.loadtxt('y_train.txt')
y_val=np.loadtxt('y_val.txt')
y_test=np.loadtxt('y_test.txt')
y_train_onehot=keras.utils.to_categorical(y_train)
y_val_onehot=keras.utils.to_categorical(y_val)
y_test_onehot=keras.utils.to_categorical(y_test)
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=[19,]))
model.add(Dense(128, activation='relu'))
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer=tf.keras.optimizers.Adam(
learning_rate=0.000001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, amsgrad=False,
name='Adam'),
metrics=['accuracy'])
model.summary()
history=model.fit(x_train, y_train_onehot, batch_size, epochs=no_epochs,verbose=verbosity, shuffle=True,validation_data=(x_val, y_val_onehot))
Error: ValueError: Shapes (None, 2, 28) and (None, 2) are incompatible
When i do not convert the labels to the onehot format it is working, but it is not a useful result (i guess). Problem is, that i got this valueerror at the end and i know it has something to do with the fact that inside the vector are two onehot-vectors, but i have no idea how to solve this issue.
--> i tried to put both onehot vectors into one, but then every line has 729 columns(27*27 for each labelcombination), but then the labeldata gots to big an python won't work the script out.
Windows 10
Keras 2.4.3
Tensorflow 2.3.1
Python 3.7.9
I'm new to this whole topic, so don't be mad with me, if my question is stupid.
Your model requires two outputs. It is impossible with Sequential API. Create a new model with Functional API

How to find similar predicted x between 2 models?

I have 2 models implemented with the same algorithm but with different number of features thus 2 different confusion matrix.
I would like to see which predicted items are similar between those 2 and plot the similarity predicted in a Venn diagram.
Answer
data = {"Mod1":[1,0,1,1,0,0,0,1,1,1],"Mod2":[1,0,1,0,1,0,0,1,0,1]}
df = pd.DataFrame(data)
df["Similar"] = np.where(df["Mod1"]==df["Mod2"],1,0)
df.head()
#output
Mod1Mod2Similar
0 1 1 1
1 0 0 1
2 1 1 1
3 1 0 0
4 0 1 0
This should do the job
Visualization
# !pip install matplotlib-venn
import matplotlib.pyplot as plt
from matplotlib_venn import venn2
venn2(subsets = (3, 3, 7), set_labels = ('Mod1', 'Mod2'))
plt.show()

Custom training with my own images using tf.data

I'm new to tensorflow and I have trouble with feeding my custom data to keras model.
I've followed this guide:Load images to convert my .jpg files to tf.data.
Now I have my data converted to (image_batch, label_batch). The image_batch is EagerTensor with shape (32,224,224,3) and the label_batch is EagerTensor with shape (32,2).
Then I found this guide:Custom training: walkthrough but the data in the guild is converted to EagerTensor with shape (32,4).
I got Warning when executing the code:
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, activation=tf.nn.relu, input_shape=(3,)), # input shape required
tf.keras.layers.Dense(10, activation=tf.nn.relu),
tf.keras.layers.Dense(3)
])
predictions = model(image_batch)
WARNING:tensorflow:Model was constructed with shape (None, 3) for input Tensor("dense_input:0", shape=(None, 3), dtype=float32), but it was called on an input with incompatible shape (32, 224, 224, 3).
How should I adjust my model or what should I do with my data?
EDIT:
The model now works, but with one additional problem.
When I run the following code:
print("Prediction: {}".format(tf.argmax(predictions, axis=1)))
print(" Labels: {}".format(labels_batch))
it prints:
Prediction: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
Labels: [[ True False]
[False True]
[ True False]
[False True]
[ True False]...(omitted)]
But I expected it prints something like:
Prediction: [0 1 0 1 1 1 0 1 0 1 1 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0 0 1 0 0 1 0]
Labels: [2 0 2 0 0 0 1 0 2 0 0 1 1 2 2 2 1 0 1 0 1 2 0 1 1 1 1 0 2 2 0 2]
with Labels as a one dimensional array with integers.
I wonder if it is normal that the predictions are all 1? What should I do?
Your input is 32 images of shape (224, 224, 3) not (3,). Your input shape needs to be (224,224,3).
I am also noting that your output shape looks like it is going to be (224,224,3) as well, this won't match your labels. You need to flatten the data at some point or do something similar.
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, activation=tf.nn.relu, input_shape=(224,224,3)), # input shape required
tf.keras.layers.Dense(10, activation=tf.nn.relu),
tf.kears.layers.Flatten(),
tf.keras.layers.Dense(2)
])
The input shape to the Danse layer should have a dimension (None, n), where None is a batch_size. In your case, if you'd like to use a Dense layer you should first use a Flatten layer wich roll your images to the shape (32, 224 * 224 * 3). The code should be:
model = tf.keras.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(10, activation=tf.nn.relu),
tf.keras.layers.Dense(10, activation=tf.nn.relu),
tf.keras.layers.Dense(3)
])
For more details please see https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten

Get Top 3 predicted classes from GaussianNB classifier python

I am trying to predict a class using GaussianNB, but I need to get top 3 predicted classes to create a custom score for the prediction.
My training data is x,y,class where given x and y it needs to predict the class
tests variable cointains (x,y) values and testclass contains class values.
Test is a list data set in following format
Index Type Size Value
0 tuple 2 (0.6424, 0.8325)
1 tuple 2 (0.8493, 0.7848)
2 tuple 2 (0.791, 0.4191)
Test class data
Index Type Size Value
0 str 1 1.274e+09
1 str 1 9.5047e+09
Code:
import csv
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.naive_bayes import GaussianNB
clf_pf = GaussianNB()
clf_pf.fit(train, trainclass)
print clf_pf.score(test,testclass)
ff = clf_pf.predict_proba(test)
How to get the top 3 predicted classes from above variable ff?
My ff data is like below
0 1 2 3 4 5 6 7 8
0 1.80791e-05 0 0.00126251 0 6.38504e-256 0 0 0 0
1 2.89477e-199 1.01093e-06 0 1.1056e-55 0 5.52213e-67 0 0
2 2.47755e-05 0 2.43499e-08 0 1.00392e-239 0 0 0 0
3 2.54941e-161 3.79815e-06 0 1.53516e-40 0 1.63465e-41 0 0
As said in the comment, ff has [n_samples, n_classes]. Using numpy.argsort you will obtain, for each row, the predicted classes ordered by their probability in ascending order, obtaining again a matrix of shape [n_samples, n_classes]. You then take the last three elements of all rows ([:, -3:]) and reverse their order ([:, ::-1]) to obtain the class with best probability first:
np.argsort(ff)[:, -3:][:, ::-1]
Note the [:, in the slicing just means "get all the rows".

Categories