I am not able to append labels to my training data - python

When I append my labels I end up with 20580 for the length of y when what I'm hoping to do is end up with 120 which is the number of categories. How can I append the categories to my labels?
import numpy as np
import matplotlib.pyplot as plt
import os
import cv2
import random as rand
import time
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Activation, Dropout, Flatten, MaxPooling2D
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.optimizers import Adam
config = tf.compat.v1.ConfigProto(gpu_options=tf.compat.v1.GPUOptions(allow_growth=True))
sess = tf.compat.v1.Session(config=config)
DATADIR = "C:/Users/samue/Documents/Datasets/DogBreeds/images/Images"
CATEGORIES = os.listdir("C:/Users/samue/Documents/Datasets/DogBreeds/images/Images")
IMG_SIZE = 100
training_data = []
def create_training_data():
for category in CATEGORIES:
path = os.path.join(DATADIR, category)
class_num = CATEGORIES.index(category)
for img in os.listdir(path):
try:
img_array = cv2.imread(os.path.join(path,img), cv2.IMREAD_GRAYSCALE)
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
training_data.append([new_array, class_num])
except Exception as e:
pass
create_training_data()
rand.shuffle(training_data)
X = []
y = []
for features, label in training_data:
X.append(features)
y.append(label)
X = np.array(X).reshape(-1, IMG_SIZE, IMG_SIZE, 1)
y = np.array(y).reshape(-1,)
print(len(CATEGORIES))
print(len(X))
print(len(y))
The outputs I get at the end are:
120
20580
20580

I think you should step back a little from the implementation details or even this specific problem to try to understand what is going on. In image classification, the objective is to classify the input, a 2D or - 3D tensor if it's multichannel image - by assigning it to a label. The number of labels is finite, you can only classify into a certain number of class.
To give an example, let's take the MNIST database. It is a well known dummy-dataset used for image classification tasks. In the training set, there are 60,000 1x28x28-images representing handwritten digits. Generally speaking, the goal with this dataset is to classify properly each image to a total of 10 labels. The labels correspond to numbers "0", "1", "2", ..., and so on until "9". So the question in this particular case is given image X, my model needs to predict a class for this image: either "0", "1", ..., or "9", there are only 10 possibilities. In supervised learning, we use labels to train the model. For any given input, we need to know the ground-truth i.e. the real class this input belongs to. So in turn you end up with as many inputs as there are labels: because each one is assigned it's own label, regardless of the number of unique possible labels.
In your use case, it seems you are working with a total of 120 classes and 20,580 images. That's 20,580 unique data inputs. Remember, we need to have, for each one of those images, a corresponding ground-truth: the real class this image belongs to. So naturally you would end up with a total of 20,580 labels as well.
This might have been the source of your confusion: in my own terms label is different to class. A class set is a unique set of entities (animals, digits, ...) while a label refers to a particular class inside a class set.

I think you are a bit confused. You should have a data set consisting of 120 classes.
For each of those classes you need to have images characteristic of that class. For example assume you are building a classifier to distinguish between images of dogs and images of cats. So you have 2 classes. You can structure your directories as follows
source_dir
----------cats_dir
------------------cats first image
------------------cats second images
------------------cats nth image
----------dogs_dir
------------------dogs first image
------------------dogs second image
------------------dogs m th image
For your case you will have 120 sub directories (class directories) below the source_dir and each of these should contain images associated with class. In your case it would appear that you have a total of 20580 images. If they are evenly distributed you have roughly 171 images for each class. Now you want to use these images to train a CNN. You can do it the way you were proceeding however I recommend against it because you will end up putting all 20580 100 X 100 images into memory all at once. This will take a very big memory and you are likely to get an OOM (out of memory) error. The way you solve that is to feed the data to your model in batches. For example 32 images at a time. Now Keras has useful functions to assist you in doing that. If you have the directory structure as shown above you can use the ImageDataGenerator.flow_from_directory to feed your images to the model in batches. Documentation is here. This function also enables you to use image augmentation to help expand the diversity of your data set. Below is the code I recommend for the example of dog/cat classification I mentioned above.
source_dir-r'c:\temp\cats_and_dogs'
v_split=.2 # set this to determine the percentage of data to allocate to the validation set
data_gen=ImageDataGenerator(rescale=1/255,validation_split=v_split)
train_gen=data_gen.flow_from_directory(source_dir, target_size=(100,100),
class_mode='categorical', batch_size=32,
subset='training', color_mode='grayscale)
valid_gen=data_gen.flow_from_directory(source_dir, target_size=(100,100),
class_mode='categorical', batch_size=32,
subset='validation', color_mode='grayscale, shuffle=False)
when you compile your model set loss='categorical_crossentropy'.
you can use the two generators above as inputs to model.fit

Related

How do I use the given XML annotation files in my CNN to classify images

I have been learning about Convolutional Neural Networks over the last month and am finally trying to understand how to use annotated images when doing some sort of categorical classification. I am currently using the images/annotations found here:
http://web.mit.edu/torralba/www/indoor.html
After downloading the tar file linked for the annotations, I dont understand how I'm supposed to use the extracted XML files to help my CNN classify images. I don't understand if they need to be formatted another way or just combined somehow with the normal images I have. I have been looking for references on how it is supposed to be done, but I haven't found anything as far as I can tell.
This is my current code that I am using to build my original image set without the annotations.
I would appreciate any guidance on what I need to do.
import matplotlib.pyplot as plt
from sklearn.preprocessing import OneHotEncoder
import os
import cv2
import pickle
import random
DATADIR = "C:/Users/cadan/OneDrive/Desktop/IndoorImages/Images"
CATEGORIES = os.listdir(DATADIR)
#CATEGORIES = ["airport_inside","artstudio","auditorium","bakery","bar","bathroom","bedroom","bookstore","bowling","buffet"]
new_shape = len(CATEGORIES)
IMG_SIZE = 100
enc = OneHotEncoder(handle_unknown='ignore', categories = 'auto')
NEW_CATEGORIES = np.array(CATEGORIES).reshape(new_shape,1)
transformed = enc.fit_transform(NEW_CATEGORIES[:]).toarray()
training_data = []
def create_training_data():
for category in CATEGORIES:
path = os.path.join(DATADIR, category)
class_num = CATEGORIES.index(category)
for img in os.listdir(path):
try:
img_array = cv2.imread(os.path.join(path,img))
new_array = cv2.resize(img_array, (IMG_SIZE,IMG_SIZE))
training_data.append([new_array,transformed[class_num]])
except Exception as e:
pass
create_training_data()
random.shuffle(training_data)
X = []
y = []
for features, label in training_data:
X.append(features)
y.append(label)
X = np.array(X).reshape(-1, IMG_SIZE, IMG_SIZE, 3)
y = np.array(y)
pickle_out = open("images","wb")
pickle.dump(X, pickle_out)
pickle_out.close()
pickle_out = open("categories","wb")
pickle.dump(y, pickle_out)
pickle_out.close()
It really depends on the task that you want to solve and your description is not completely clear.
Since you are starting to get into DL, I would suggest you start with a simple classification task where you have the set of images as an input, and a set of single labels as an output (in this case, you can use the categories provided by the given dataset). To solve this, you can start with a CNN architecture, for example ResNet. In Keras, you can just import the model architecture and change the top layers to match your desired output shape (that is two lines of code!). I really like the examples given by the Keras community, here you can find a very good entry point for a simple classification task from scratch.
For your specific dataset, I would go in the following way (oversimplified):
Build an XML parser for the image class and pass it to a Pandas Dataframe. One column for the filename and another for the label.
Build the CNN as in the previous links.
Use a Keras ImageDataGenerator from the created Pandas Dataframe.
Train the model using .fit()

Neural network returns three class while it is actual 2

i would like to ask my almost final question related to the previous questions :
there is problem description :
so i have two category(dogs and cats), below i have following code for reading data into list and converting them to the array(numpy array)
it is for mounting google drive
from google.colab import drive
drive.mount("/content/drive", force_remount=True)
importing all necessary libraries(glob actually i dont need but let stay)
import numpy as np
import matplotlib.pyplot as plt
import os
import cv2
import glob
it is just demonstrating of reading and displaying images for the future purpose
#Set main directory and also categories. read the images
MainDirectory ="/content/drive/My Drive/Colab Notebooks/2020YearDeepLearning/Animals/PetImages/"
Categories =["Dog","Cat"]
for category in Categories:
path =os.path.join(MainDirectory,category)
print(path)
for img in os.listdir(path):
img_array =cv2.imread(os.path.join(path,img),cv2.IMREAD_GRAYSCALE)
plt.imshow(img_array,cmap="gray")
plt.show()
break
break
Demonstration of reshaping image
IMG_SIZE=70
img_array =cv2.resize(img_array,(IMG_SIZE,IMG_SIZE))
plt.imshow(img_array,cmap='gray')
plt.show()
now there is actual code which means to read data and also labels(dogs and cat , dog is 0 and cat is 1) and putting them into array
#Create a training Data
training_data =[]
for category in Categories:
path =os.path.join(MainDirectory,category)
class_num =Categories.index(category)
for img in os.listdir(path):
try:
img_array =cv2.imread(os.path.join(path,img),cv2.IMREAD_GRAYSCALE)
img_array =cv2.resize(img_array,(IMG_SIZE,IMG_SIZE))
training_data.append([img_array,class_num])
except Exception as e:
pass
after that one i just shuffled data
import random
random.shuffle(training_data)
separating data into X and y and convert to the numpy array with corresponding reshaping
X =[]
y =[]
for features,label in training_data:
X.append(features)
y.append(label)
X =np.array(X).reshape(-1,IMG_SIZE,IMG_SIZE,1)
y =np.array(y)
i would like to demonstrate that there is only two possible value for y(dog is 0 and cat is 1 )
print(np.unique(y)) - which returns[0, 1]
now actual code
#create simple convolutional neural network
#normalize data and load all necessary libraries
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Dropout,Flatten,Conv2D,MaxPool2D,Activation
X =X/255.0
model =Sequential()
model.add(Conv2D(filters=32,kernel_size=(3,3),input_shape=X.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Conv2D(filters=32,kernel_size=(3,3)))
model.add(Activation('relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(units=32))
model.add(Dense(units=1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
i have trained data using following command
model.fit(X,y,batch_size=16,validation_split=0.1,epochs=10)
and here is image of training
after that , i took random picture of cat and dog and run following command (this example i am using dog picture)
#for testing
image =cv2.imread("/content/drive/My Drive/Colab Notebooks/2020YearDeepLearning/Animals/test.jpg")
image =cv2.resize(image,(IMG_SIZE,IMG_SIZE))
image =np.array(image).reshape(-1,IMG_SIZE,IMG_SIZE,1)
print(model.predict_classes(image))
result is this one :
for more details.
[[0]
[0]
[0]]
for cat i am getting this one
[[1]
[0]
[0]]
should i get result with three element? I mean array of three element? Actually i have two class right? please tell me if i am wrong
Here is what I suspect:
If your image is not gray, meaning it has three channels like a normal RBG image would have, then your resize here image =np.array(image).reshape(-1,IMG_SIZE,IMG_SIZE,1) actually makes the returned image the shape (3, IMG_SIZE, IMG_SIZE, 1), which means that you actually feed in three samples each with 1 channel when you predict, and of course you will get back three results.
Plus, when you load image to train, you load with grayscale, but when you load for predicting, you forgot to do so. So this is why your training works but not predicting.

Keras image classification network always predicting one class, and stays at 50% accuracy

I've been working on a Keras network to classify images as to whether they contain traffic lights or not, but so far I've had 0 success. I have a dataset of 11000+ images, and for my first test I used 240 images (or rather, text files for each image with the grayscale pixel values). There is only one output - a 0 or 1 saying whether the image contains traffic lights.
However, when I ran the test, it only predicted one class. Given that 53/240 images had traffic lights, it was achieving about a 79% accuracy rate because it was just predicting 0 all the time. I read that this might be down to inbalanced data, so I downscaled to just 4 images - 2 with traffic lights, 2 without.
Even with this test, it still stuck at 50% accuracy after 5 epochs; it's just predicting one class! Similar questions have been answered but I haven't found anything that is working for me :(
Here is the code I am using:
from keras.datasets import mnist
from keras import models
from keras import layers
from keras.utils import to_categorical
import numpy as np
import os
train_images = []
train_labels = []
#The following is just admin tasks - extracting the grayscale pixel values
#from the text files, adding them to the input array. Same with the labels, which
#are extracted from text files and added to output array. Not important to performance.
for fileName in os.listdir('pixels1/'):
newRead = open(os.path.join('pixels1/', fileName))
currentList = []
for pixel in newRead:
rePixel = int(pixel.replace('\n', ''))/255
currentList.append(rePixel)
train_images.append(currentList)
for fileName in os.listdir('labels1/'):
newRead = open(os.path.join('labels1/', fileName))
line = newRead.readline()
train_labels.append(int(line))
train_images = np.array(train_images)
train_labels = np.array(train_labels)
train_images = train_images.reshape((4,13689))
#model
model = models.Sequential()
model.add(layers.Dense(13689, input_dim=13689, activation='relu'))
model.add(layers.Dense(13689, activation='relu'))
model.add(layers.Dense(1, activation='softmax'))
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size=1)
I was hoping at the very least it would be able to recognise the images at the end. I really want to move onto running a training session on my full 11,000 examples, but at this point I can't get it to work with 4.
Rough points:
You seem to believe that the number of units in your dense layers should be equal to your data dimension (13869); this is not the case. Change both of them to something smaller (in the range of 100-200) - they do not even have to be equal. A model that big is not recommended with your relatively small number of data samples (images).
Since you are in a binary classification setting with a single node in your last layer, you should use activation=sigmoid for this (last) layer, and compile your model with loss='binary_crossentropy'.
In imaging applications, normally the first couple of layers are convolutional ones.

TensorFlow: Get input batch from numpy array for 3D CNN

I have 3D image(tiff) data and each volume inside a folder. I want to read the data and make batch tensor for convolution network. I can read the data as numpy array but I don't how to make batch tensor input for CNN. Here is the code I have
import os
import tensorflow as tf
import numpy as np
from skimage import io
from matplotlib import pyplot as plt
from pathlib import Path
data_dir = 'C:/Users/myname/Documents/Projects/Segmentation/DeepLearning/L-net/data/'
data_folders = os.listdir(data_dir)
train_input = []
train_output = []
test_input = []
test_output = []
for idx, folder in enumerate(data_folders):
im = io.imread(data_dir+folder+'/f0.tiff')
im = im/im.max()
train_input.append(tf.convert_to_tensor(im, dtype=tf.float32))
im = io.imread(data_dir+folder+'/g0.tiff')
im = im/im.max()
train_output.append(tf.convert_to_tensor(im, dtype=tf.float32))
Since I am using 3D filters for CNN, input should 5D tesnor. Can someone help me with this? Thanks.
With your approach, you have to load all data at once into memory and also you have to take care of all dimensions. I would suggest using Keras flow_from_directory and generators. Keras has this ImageDataGenerator class which allows the users to perform image collection from directories, change all images to any size you want, shuffle them, ... . You can find the documentation here at their website.
Download the train dataset and test dataset, extract them into 2 different folders named as “train” and “test”. The train folder should contain ‘n’ folders each containing images of respective classes. For example, In the Dog vs Cats data set, the train folder should have 2 folders, namely “Dog” and “Cats” containing respective images inside them.
This is an example on how to create a dataset for your model's input:
train_generator = train_datagen.flow_from_directory(
directory=r"C:/Users/myname/Documents/Projects/Segmentation/DeepLearning/L-net/data/",
target_size=(224, 224), # the size of your input images
color_mode="rgb", # could be grayscale or rgb
batch_size=32, # Number of images in each batsh
class_mode="categorical",
shuffle=True, # Whether to shuffle the images or not
seed=42 # Random seed for applying random image augmentation
)
You can perform training like this:
STEP_SIZE_TRAIN=train_generator.n//train_generator.batch_size
STEP_SIZE_VALID=valid_generator.n//valid_generator.batch_size
model.fit_generator(generator=train_generator,
steps_per_epoch=STEP_SIZE_TRAIN,
validation_data=valid_generator,
validation_steps=STEP_SIZE_VALID,
epochs=10
)

How to get Keras network to not output all 1s

I have a bunch of images that look like this of someone playing a videogame (a simple game I created in Tkinter):
The idea of the game is that the user controls the box at the bottom of the screen in order to dodge the falling balls (they can only dodge left and right).
My goal is to have the neural network output the position of the player on the bottom of the screen. If they're totally on the left, the neural network should output a 0, if they're in the middle, a .5, and all the way right, a 1, and all the values in-between.
My images are 300x400 pixels. I stored my data very simply. I recorded each of the images and position of the player as a tuple for each frame in a 50-frame game. Thus my result was a list in the form [(image, player position), ...] with 50 elements. I then pickled that list.
So in my code I try to create an extremely basic feed-forward network that takes in the image and outputs a value between 0 and 1 representing where the box on the bottom of the image is. But my neural network is only outputting 1s.
What should I change in order to get it to train and output values close to what I want?
Of course, here is my code:
# machine learning code mostly from https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
import pickle
def pil_image_to_np_array(image):
'''Takes an image and converts it to a numpy array'''
# from https://stackoverflow.com/a/45208895
# all my images are black and white, so I only need one channel
return np.array(image)[:, :, 0:1]
def data_to_training_set(data):
# split the list in the form [(frame 1 image, frame 1 player position), ...] into [[all images], [all player positions]]
inputs, outputs = [list(val) for val in zip(*data)]
for index, image in enumerate(inputs):
# convert the PIL images into numpy arrays so Keras can process them
inputs[index] = pil_image_to_np_array(image)
return (inputs, outputs)
if __name__ == "__main__":
# fix random seed for reproducibility
np.random.seed(7)
# load data
# data will be in the form [(frame 1 image, frame 1 player position), (frame 2 image, frame 2 player position), ...]
with open("position_data1.pkl", "rb") as pickled_data:
data = pickle.load(pickled_data)
X, Y = data_to_training_set(data)
# get the width of the images
width = X[0].shape[1] # == 400
# convert the player position (a value between 0 and the width of the image) to values between 0 and 1
for index, output in enumerate(Y):
Y[index] = output / width
# flatten the image inputs so they can be passed to a neural network
for index, inpt in enumerate(X):
X[index] = np.ndarray.flatten(inpt)
# keras expects an array (not a list) of image-arrays for input to the neural network
X = np.array(X)
Y = np.array(Y)
# create model
model = Sequential()
# my images are 300 x 400 pixels, so each input will be a flattened array of 120000 gray-scale pixel values
# keep it super simple by not having any deep learning
model.add(Dense(1, input_dim=120000, activation='sigmoid'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam')
# Fit the model
model.fit(X, Y, epochs=15, batch_size=10)
# see what the model is doing
predictions = model.predict(X, batch_size=10)
print(predictions) # this prints all 1s! # TODO fix
EDIT: print(Y) gives me:
so it's definitely not all zeroes.
Of course, a deeper model might give you a better accuracy, but considering the fact that your images are simple, a pretty simple (shallow) model with only one hidden layer should give a medium to high accuracy. So here are the modifications you need to make this happen:
Make sure X and Y are of type float32 (currently, X is of type uint8):
X = np.array(X, dtype=np.float32)
Y = np.array(Y, dtype=np.float32)
When training a neural network it would be much better to normalize the training data. Normalization helps the optimization process to go smoothly and speed up the convergence to a solution. It further prevent large values to cause large gradient updates which would be desruptive. Usually, the values of each feature in the input data should fall in a small range, where two common ranges are [-1,1] and [0,1]. Therefore, to make sure that all values fall in the range [-1,1], we subtract from each feature its mean and divide it by its standard deviation:
X_mean = X.mean(axis=0)
X -= X_mean
X_std = X.std(axis=0)
X /= X_std + 1e-8 # add a very small constant to prevent division by zero
Note that we are normalizing each feature (i.e. each pixel in this case) here not each image. When you want to predict on new data, i.e. in inference or testing mode, you need to subtract X_mean from test data and divide it by X_std (you should NEVER EVER subtract from test data its own mean or divide it by its own standard deviation; rather, use the mean and std of training data):
X_test -= X_mean
X_test /= X_std + 1e-8
If you apply the changes in points one and two, you might notice that the network no longer predicts only ones or only zeros. Rather, it shows some faint signs of learning and predicts a mix of zeros and ones. This is not bad but it is far from good and we have high expectations! The predictions should be much better than a mix of only zeros and ones. There, you should take into account the (forgotten!) learning rate. Since the network has relatively large number of parameters considering a relatively simple problem (and there are a few samples of training data), you should choose a smaller learning rate to smooth the gradient updates and the learning process:
from keras import optimizers
model.compile(loss='mean_squared_error', optimizer=optimizers.Adam(lr=0.0001))
You would notice the difference: the loss value reaches to around 0.01 after 10 epochs. And the network no longer predicts a mix of zeros and ones; rather the predictions are much more accurate and close to what they should be (i.e. Y).
Don't forget! We have high (logical!) expectations. So, how can we do better without adding any new layers to the network (obviously, we assume that adding more layers might help!!)?
4.1. Gather more training data.
4.2. Add weight regularization. Common ones are L1 and L2 regularization (I highly recommend the Jupyter notebooks of the the book Deep Learning with Python written by François Chollet the creator of Keras. Specifically, here is the one which discusses regularization.)
You should always evaluate your model in a proper and unbiased way. Evaluating it on the training data (that you have used to train it) does not tell you anything about how well your model would perform on unseen (i.e. new or real world) data points (e.g. consider a model which stores or memorize all the training data. It would perform perfectly on the training data, but it would be a useless model and perform poorly on new data). So we should have test and train datasets: we train model on the training data and evaluate the model on the test (i.e. new) data. However, during the process of coming up with a good model you are performing lots of experiments: for example, you first change the type and number of layers, train the model and then evaluate it on test data to make sure it is good. Then you change another thing say the learning rate, train it again and then evaluate it again on test data... To make it short, these cycles of tuning and evaluations somehow causes an over-fitting on the test data. Therefore, we would need a third dataset called validation data (read more: What is the difference between test set and validation set?):
# first shuffle the data to make sure it isn't in any particular order
indices = np.arange(X.shape[0])
np.random.shuffle(indices)
X = X[indices]
Y = Y[indices]
# you have 200 images
# we select 100 images for training,
# 50 images for validation and 50 images for test data
X_train = X[:100]
X_val = X[100:150]
X_test = X[150:]
Y_train = Y[:100]
Y_val = Y[100:150]
Y_test = Y[150:]
# train and tune the model
# you can attempt train and tune the model multiple times,
# each time with different architecture, hyper-parameters, etc.
model.fit(X_train, Y_train, epochs=15, batch_size=10, validation_data=(X_val, Y_val))
# only and only after completing the tuning of your model
# you should evaluate it on the test data for just one time
model.evaluate(X_test, Y_test)
# after you are satisfied with the model performance
# and want to deploy your model for production use (i.e. real world)
# you can train your model once more on the whole data available
# with the best configurations you have found out in your tunings
model.fit(X, Y, epochs=15, batch_size=10)
(Actually, when we have few training data available it would be wasteful to separate validation and test data from whole available data. In this case, and if the model is not computationally expensive, instead of separating a validation set which is called cross-validation, one can do K-fold cross-validation or iterated K-fold cross-validation in case of having very few data samples.)
It is around 4 AM at the time of writing this answer and I am feeling sleepy, but I would like to mention one more thing which is not directly related to your question: by using the Numpy library and its functionalities and methods you can write more concise and efficient code and also save yourself a lot time. So make sure you practice using it more as it is heavily used in machine learning community and libraries. To demonstrate this, here is the same code you have written but with more use of Numpy (Note that I have not applied all the changes I mentioned above in this code):
# machine learning code mostly from https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
import pickle
def pil_image_to_np_array(image):
'''Takes an image and converts it to a numpy array'''
# from https://stackoverflow.com/a/45208895
# all my images are black and white, so I only need one channel
return np.array(image)[:, :, 0]
def data_to_training_set(data):
# split the list in the form [(frame 1 image, frame 1 player position), ...] into [[all images], [all player positions]]
inputs, outputs = zip(*data)
inputs = [pil_image_to_np_array(image) for image in inputs]
inputs = np.array(inputs, dtype=np.float32)
outputs = np.array(outputs, dtype=np.float32)
return (inputs, outputs)
if __name__ == "__main__":
# fix random seed for reproducibility
np.random.seed(7)
# load data
# data will be in the form [(frame 1 image, frame 1 player position), (frame 2 image, frame 2 player position), ...]
with open("position_data1.pkl", "rb") as pickled_data:
data = pickle.load(pickled_data)
X, Y = data_to_training_set(data)
# get the width of the images
width = X.shape[2] # == 400
# convert the player position (a value between 0 and the width of the image) to values between 0 and 1
Y /= width
# flatten the image inputs so they can be passed to a neural network
X = np.reshape(X, (X.shape[0], -1))
# create model
model = Sequential()
# my images are 300 x 400 pixels, so each input will be a flattened array of 120000 gray-scale pixel values
# keep it super simple by not having any deep learning
model.add(Dense(1, input_dim=120000, activation='sigmoid'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam')
# Fit the model
model.fit(X, Y, epochs=15, batch_size=10)
# see what the model is doing
predictions = model.predict(X, batch_size=10)
print(predictions) # this prints all 1s! # TODO fix

Categories