I'm a beginner in datascience and tensorflow, so, as a test of my "skills" I wanted to try and make an AI that you give a number to and then gives back a 28x28 pixel image of that number. It is possible to do this the other way around, so I figured, why not? So the code works pretty well actually, but the accuracy of the AI is very low, so low in fact that it just returns random pixels. Is there any way to make this AI more accurate, apart from maybe doing like 100 epochs or something? Heres the code I'm using:
import tensorflow as tf
import tensorflow.keras as tk
import numpy as np
import matplotlib.pyplot as plt
(train_data, train_labels), (test_data, test_labels) = tk.datasets.mnist.load_data()
model = tk.Sequential([
tk.layers.Dense(64, activation='relu'),
tk.layers.Dense(64, activation='relu'),
tk.layers.Dense(784, activation='relu')])
history = model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics=['acc'])
train_data = np.reshape(train_data, (60000, 784))
test_data = np.reshape(test_data, (-1, 784))
model.fit(train_labels, train_data, epochs=10, validation_data=(test_labels, test_data))
result = model.predict([2])
result = np.reshape(result, (28, 28))
plt.imshow(result)
plt.show()
I'm using google.colab since I havent yet been able to install tensorflow in my computer, maybe it has something to do with that. Thanks for any answers in advance!
This is very much possible, and has resulted in a vast area of research called Generative Adversarial Networks (GANs).
First off, let me list the problems with your approach:
You use a single number as input and expect the model to understand it. This does not practically work. It's better to use a representation called one-hot encoding.
For each label, multiple images exist. Mathematically, if the domain X consists of labels (one-hot encoded or not) and the range Y consists of images, the relationship is a one-to-many relationship, which can't be learned in a supervised fashion. By its very nature, supervised learning can be used to model only many-to-one relationships, or one-to-one relationships (although there is no point in using ML for this; dictionaries are a much better choice). Therefore, it is easy to predict labels for images, but impossible to generate images for labels using fully supervised approaches, unless you use only one image per label for training.
The way GANs solve the problem in (2) is by generating a "probable" image given a set of random values. Variations of GANs allow specifying the exact value to generate.
I suggest reading the paper that introduced GANs. Then try out the basic GAN before moving on to generate specific numbers.
Related
I'm learning CNN and wondering why is my network stuck at 0% accuracy even after multiple epochs? I'm sharing the entire code as it's really simple.
I have a dataset with faces and respective ages. I'm using keras and tf to train a convolution neural network to determine age.
However, my accuracy is always reporting as 0%. I'm very new to neural networks and I'm hoping you could tell me what I am doing wrong?
path = "dataset"
pixels = []
age = []
for img in os.listdir(path):
ages = img.split("_")[0]
img = cv2.imread(str(path)+"/"+str(img))
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
pixels.append(np.array(img))
age.append(np.array(ages))
age = np.array(age,dtype=np.int64)
pixels = np.array(pixels)
x_train,x_test,y_train,y_test = train_test_split(pixels,age,random_state=100)
input = Input(shape=(200,200,3))
conv1 = Conv2D(70,(3,3),activation="relu")(input)
conv2 = Conv2D(65,(3,3),activation="relu")(conv1)
batch1 = BatchNormalization()(conv2)
pool3 = MaxPool2D((2,2))(batch1)
conv3 = Conv2D(60,(3,3),activation="relu")(pool3)
batch2 = BatchNormalization()(conv3)
pool4 = MaxPool2D((2,2))(batch2)
flt = Flatten()(pool4)
#age
age_l = Dense(128,activation="relu")(flt)
age_l = Dense(64,activation="relu")(age_l)
age_l = Dense(32,activation="relu")(age_l)
age_l = Dense(1,activation="relu")(age_l)
model = Model(inputs=input,outputs=age_l)
model.compile(optimizer="adam",loss=["mse","sparse_categorical_crossentropy"],metrics=['mae','accuracy'])
save = model.fit(x_train,y_train,validation_data=(x_test,y_test),epochs=2)
Well you have to decide if you want to do a classification model or a regression model. As it stands now it looks like you are trying to do a regression model.
Lets start at the outset. Apparently you have a dataset of image files and within the files path is text that defines the age so it is something like
say 27_01.jpg I assume. So you split the path based on the _ to get the age associated with the image file. You then read in the image using cv2 and then convert it to rgb. Now cv2 reads in the image and return it as an array so you don't need to convert it to an np array just use
pixels.append(img)
now the variable ages is a string which you want to convert into an integer. So just
use the code
ages =int( img.split("_")[0])
this is now a scaler integer value, not an array so just use
age.append(ages)
you now have two lists, pixels and age. To use them in a model you need to convert them to np arrays so use
age=np.array(age)
pixels=np.array(pixels
Now the next thing you want to do is to create a train set and a test set using the train_test_split function. Lets assume you want 90% of the data set to be used for training and 10% for testing. so use
x_train,x_test,y_train,y_test = train_test_split(pixels,age,train_size=.9, shuffle=True, random_state=100)
Now lets look at your model. This is what decides if you are doing regression or
classification. You want to do regression. Your model is OK but needs some changes
You have 4 dense layers. I suspect that this will lead to a case where your model
is over-fitting so I recommend that prior to the last layer you add a dropout layer
Use the code
drop=Dropout(rate=.4, seed=123)(age_1)
age_l = Dense(1,activation="linear")(age_l)
Note the activation is set to linear. That way the output can take a range of values
that can be compared to the integer values of the age array.
Now when you compile your model you want your loss to be mse. So it is measuring the error between the models output and the ages. Sparse_categorical crossentropy is used when you are doing classification which is NOT what you are doing. As for the metrics accuracy is used for classification models so you only want to use mae So you compile code should be
model.compile(optimizer="adam",loss="mse",metrics=['mae'])
now model.fit looks ok but you should run for more epochs like say 20. Now when you run your model look at the training loss and the validation loss. As the training loss decreases, on AVERAGE the validation loss should trend to decrease. If it starts to trend upward your model is over-fitting. In that case you may want to add an additional dropout layer.
At some point your model will stop improving if you run a sufficient number of epochs. You can usually get an improvement in performance if you use an adjustable learning rate. Since you are new to this you may not have experience using callbacks. Callbacks are used within model.fit and there are many types. Documentation for callbacks can be found here. To implement an adjustable learning rate you can use the callback ReduceLROnPlateau. The documentation for that is here. What it does is to set it up to monitor the validation loss. If the validation loss fails to reduce for a "patience" number of epochs the callback will reduce the learning rate by the parameter "factor" where
new_learning_rate=current_learning rate * factor
where factor is a float between 0 and 1.0. May recommended code for this callback is
shown below
rlronp=tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss",factor=0.5,
patience=2, verbose=1)
I also recommend you use the callback EarlyStopping. The documentation for that is here. Set it up to monitor validation loss. If the loss fails to reduce for 'patience number of consecutive epochs training will be halted. Set the parameter restore_best_weights=True. That way if the callback halts training it leaves your model set with the weights for the epoch that had the lowest validation loss. My recommended code for the callback is shown below
estop=tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=4,
verbose=1, restore_best_weights=True)
To use the callback for model.fit include the code
save = model.fit(x_train,y_train,validation_data=(x_test,y_test),epochs=20,
callbacks=[rlronp,estop])
By the way I think I am familar with this dataset or a similar one. Do not expect
great root mean squared error as I have seen many models for this and none had a small error margin. Incidentally if you want to learn machine learning there is an excellent set of about 200 tutorials on this by a guy called Gabriel Atkin. He can see his tutorials called Data Everyday here. The specific tutorial dealing with this kind of age dataset is located here.
I set up a network to learn Fashion MNIST in the style of Hands On Machine Learning, page 298. When I ran the code multiple times, the accuracy was slightly different each time. This made me wonder if the accuracies were normally distributed around some mean, and that with enough runs, I could accurately determine the population mean of all runs for those parameters.
So I ran this code, which fits the model 1000 times, each with differently shuffled training data:
from tensorflow.keras import datasets, models, layers
import matplotlib.pyplot as plt
(X_train, y_train), _ = datasets.fashion_mnist.load_data()
final_accuracies = []
for i in range(1000):
model = models.Sequential(layers=[
layers.Flatten(input_shape=[28, 28]),
layers.Dense(300, activation="relu"),
layers.Dense(100, activation="relu"),
layers.Dense(10, activation="softmax")])
model.compile(loss="sparse_categorical_crossentropy", optimizer="sgd", metrics=["accuracy"])
history = model.fit(X_train / 255.0, y_train, epochs=5, validation_split=0.2)
final_accuracies.append(history.history['val_accuracy'][-1])
plt.hist(final_accuracies, bins=30)
plt.show()
The resulting distribution of accuracies is shown in the attached histogram.
What kind of distribution is this?! It's clearly not normally distributed. It's got a much longer tail in the direction of lower accuracies.
Is the statistics of what kind of distribution these accuracies are drawn from worked out? If so, please illuminate me.
Also, if this question really fits in better at stats.stackexchange.com, please kindly let me know that all I'll remove it from here and post it there instead.
I do not know the answer to your question specifically. It would be difficult to figure out because there are numerous sources of randomness at play each time you run a model. One example is the weight initialization. Others can be due to activities which shuffle the data. Dropout layers etc. If you use transfer learning randomness maybe be included within the model you select. It would take a lot of work to combine all these sources to mathematically calculate the distribution. There are a lot of question on Stack Overflow on how to get repeatable results in tensorflow. Turns out to do so you have to hunt down every source of random processes and try to eliminate it by using a "seed" that makes the random activity repeatable for each time the model is run. This is no easy task.
https://colab.research.google.com/drive/1EdCL6YXCAvKqpEzgX8zCqWv51Yum2PLO?usp=sharing
Hello,
Above, I'm trying to identify 5 different type of restorations on dental x-rays with tensorflow. i'm using the official documentation to follow the steps but now i'm kind of stucked and i need help. here are my questions:
1-i have my data on my local disk. TF example on the link above downloads the data from a different repository. when i want to test my images, do i have any other way than to use the code below ?:
import numpy as np
from keras.preprocessing import image
from google.colab import files
uploaded = files.upload()
# predicting images
for fn in uploaded.keys():
path = fn
img = image.load_img(path, target_size=(180, 180))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
images = np.vstack([x])
classes = model.predict(images)
print(fn)
print(classes)
i'm asking this because the official documentation just shows the way to test images one-by-one, like this:
img = keras.preprocessing.image.load_img(
sunflower_path, target_size=(img_height, img_width)
)
img_array = keras.preprocessing.image.img_to_array(img)
img_array = tf.expand_dims(img_array, 0) # Create a batch
predictions = model.predict(img_array)
score = tf.nn.softmax(predictions[0])
print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(class_names[np.argmax(score)], 100 * np.max(score))
)
2- i'm using "image_dataset_from_directory" method, so i don't have a separate validation directory. is that ok ? or should i use ImageDataGenerator ? For testing my data, i picked some data randomly from all 5 categories by hand and put them in my test folder which has 5 subfolders as i have that number of categories. is this what i am supposed to do for prediction, also separating the test data into different folders ? if yes, how can i load all these 5 folders simultaneously at test time ?
3- i'm also supposed to create the confusion matrix. but i couldn't understand how i can apply this to my code ? some others say, use scikit-learn's confusion matrix, but this time i have to define y-true, y_pred values, which i cannot fit into this code. am i supposed to evaluate 5 different confusion matrices for 5 different predictions and how ?
4-sometimes, i observe that the validation accuracy starts much higher than the training accuracy. is this unusual ? after 3-4 epochs, train accuracy cathces the validation accuracy and continues in a more balanced way. i thought this should not be happening. is everything alright ?
5- final question, why the first epoch takes much much longer time than other epochs? in my setup, it's about 30-40 minutes to complete the first epoch, and then only about a minute or so to complete every other epoch. is there a way to fix it or does it always happen the same way ?
thanks.
I am no expert in image processing with tf, but let me try to answer as much as possible:
1
I dont really understand this question, because you are using image_dataset_from_directory which should handle the file loading process for you. So far to me, it looks good what you are doing there.
2
Let me cite tf.keras.preprocessing.image_dataset_from_directory:
Then calling image_dataset_from_directory(main_directory,
labels='inferred') will return a tf.data.Dataset that yields batches
of images from the subdirectories class_a and class_b, together with
labels 0 and 1 (0 corresponding to class_a and 1 corresponding to
class_b).
And ImageDataGenerator:
Generate batches of tensor image data with real-time data augmentation.
The data will be looped over (in batches).
As your data is handpicked, there is no need for ImageDataGenerator, as image_dataset_from_directory returns what you want. If you test and validation data (which you should have), you can use the tf.data.Dataset functions for splitting data in test, train and valid. This can be a bit clunky, but the time learning tf.data.dataset is well spent.
3
The confusion matrix give the the F1-Score, Precision and Recall values. But as the confusion matrix is normally for binary classification (which is not your case), it only returns those values for one class (and for not this class). Better use the metrics Tensorflow relies on. Tensorflow can calculate the recall and precision and F1 score for you as metric, so if you ask me, use them.
4
Depending on how the data is shuffled and structured this can be normal. When there are more special cases in the training data, the model will have more difficulties to predict them correct. When there are more simple predictions in the test labels, the model will be better there, which gives you a higher accuracy at that point. It is indeed an indicator, that the classes in your train and test data might not be equally distributed.
5
tf.data.Dataset loads the data when needed. This means, the files are not loaded into memory until the training process has started which results in a very long first epoch (loading all images first) and the second very short epoch (oh cool, all images are already there). You can approve this by checking the gpu usage of your machine, it should often be doing nothing or be very low.
To fix this, you can use .prefetch(z) on your dataset variable. ´prefetch() ´makes the dataset prefetch the next ´z´ values, while the gpu is already doing some calculations. This might speed up the first epoch.
I'm trying to do text classification with scikit-learn.
I have text that does not classify well. I think I can improve the predictions by adding data I can deduce in the form of an array of integers.
For example, sample 1 would come with [3, 1, 5, 2] and sample 2 would come with [2, 1, 4, 2]. This would also be true of the test data.
The idea is that the classifier could use both the text and the numbers to classify the data.
I've read the documentation for scikit learn and I can't find how to do it. It must be possible because all that is classified, internally, is vectors of numbers. So adding another vector of numbers should not be that much of a problem, but I can't figure out how. partial_fit adds more samples, it does not add more information about the existing samples. Is there a way to do what I'm trying to do. I tried to combine GaussianNB with SGDClassifier, but it turns out I don't know how to do that. (Was it a bad idea?)
What should I do?
I think you could add this new feature as another dimension to your training data. You need to modify the training data by adding your new features before calling SGD.
A simple/naive way would be:
For example, if my training data with two samples were
X = [ [1,2,3], [8,9,0] ]
And my new features for each sample was
new_feature_X = [ [11,22,33] , [77,88,00] ]
My new training data would be:
X_new = [[1,2,3,11,22,33] , [8,9,0,77,88,00]]
Then you call SGD.fit(X_new, labels)
As far as my SGD knowledge goes, I don't think there is any other way to combine two features.
The idea is that the classifier could use both the text and the
numbers to classify the data.
I find a neural network to be much more suitable for this. You could use two input layers, one for text vectors, and one for the numbers and feed them together into a network to get the output.
I tried to combine GaussianNB with SGDClassifier, but it turns out I
don't know how to do that. (Was it a bad idea?)
SGD means stochastic gradient descent. Is it possible to find the gradient of NaiveBayes? Whats the corresponding cost function ?
What should I do?
Ensemble. Train two separate classifiers. One using your text data, and another one for your new handcrafted feature. And then take the average of their prediction probabilities. You could train multiple classifier and take their votes. This tutorial is great for that.
Try out MLP Classifier. I used it a while ago, and found it works pretty great with text.
Neural networks. It's pretty easy with Keras.
Read research literature. There is pretty good chance academia might have done some work on your dataset. Try to read some of them. Google scholar, semantic scholar are great places to find published reseaerch.
from keras.layers import Input, Dense,Concatenate
from keras.models import Model
# This returns a tensor
text_input_vec = Input(shape=(784,))
new_numeric_feature = Input(shape=(4,))
# feed your text to a dense layer
dense1 = Dense(64, activation='relu')(text_input_vec)
# feed your numeric feature to another dense layer
dense2 = Dense(64, activation='relu')(new_numeric_feature)
# concatenate/combine the output of both
concat = Concatenate(axis=-1)([dense1,dense2])
# use the above to predict the label of your text. Layer below
# assumes you have 2 classes
predictions = Dense(2, activation='softmax')(concat)
model = Model(inputs=[text_input_vec,new_numeric_feature], outputs=predictions)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.summary()
I'm actually using CNN to classify image. I got 16 classes and around 3000 images(very small dataset). This is an unbalance data set. I do a 60/20/20 split, with same percentage of each class in all set. I use weights regularization. I made test with data augmentation (keras augmenteur, SMOTE, ADSYN) which help to prevent overfitting
When I overfit (epoch=350, loss=2) my model perform better (70+%) accuracy (and other metrics like F1 score) than when I don't overfit (epoch=50, loss=1) accuracy is around 60%. Accuracy is for TEST set when loss is the validation set loss.
Is it really a bad thing to use the overfitted model as best model? Since performance are better on the test set?
I have run same model with another test set (which was previously on the train set) performance are still better (tried 3 different split)
EDIT: About what i have read, validation loss is not always the best metric to affirm model is overfiting. In my situation, it's better to use validation F1 score and recall, when it's start to decrease then model is probably overfiting.
I still don't understand why validation loss is a bad metric for model evaluation, still training loss is used by the model to learn
Yes, it is a bad thing to use over fitted model as best model. By definition, the model which over fits don't really perform well in real world scenarios ie on images that are not in the training or test set.
To avoid over fitting, use image augmentation to balance and increase the number of samples to train. Also try to increase the fraction of dropout to avoid over fitting. I personally use ImageGenerator of Keras to augment the images and save it.
from keras.preprocessing.image import ImageDataGenerator,img_to_array, load_img
import glob
import numpy as np
#There are other parameters too. Check the link given at the end of the answer
datagen = ImageDataGenerator(
brightness_range = (0.4, 0.6),
horizontal_flip = True,
fill_mode='nearest'
)
for i, image_path in enumerate(glob.glob(path_to_images)):
img = load_img(image_path)
x = img_to_array(img) # creating a Numpy array
x = x.reshape((1,) + x.shape)
i = 0
num_of_samples_per_image_augmentation = 8
for batch in datagen.flow(x, save_to_dir='augmented-images/preview/fist', save_prefix='fist', save_format='jpg'):
i += 1
if i > num_of_samples_per_image_augmentation : #
break
Here is the link to image augmentation parameters using Keras, https://keras.io/preprocessing/image/
Feel free to use other libraries of your comfort.
Few other methods to reduce over fitting :
1) Tweak your CNN model by adding more training parameters.
2) Reduce Fully Connected Layers.
3) Use Transfer Learning (Pre-Trained Models)