I've noticed that when doing the following workflow:
load a pre-trained model from keras.applications with weights from ImageNet
fine-train this model with new data
save the fine-tuned model to an hdf5 file with model.save('file.h5')
re-load the model somewhere else with load_model('file.h5')
The saving and loading steps can take a really long time when using some models.
When using VGG16 or VGG19 or MobileNet, saving and loading happen very quickly (a few seconds at most).
However when using NasNet, InceptionV3 or DenseNet121 then both saving and loading can take up to 10 to 30 minutes each, as illustrated in the following examples:
from keras.layers import GlobalAveragePooling2D
from keras.layers.core import Dense
from keras.models import Model
# VGG16
model_ = keras.applications.vgg16.VGG16(weights='imagenet', include_top=False)
x = GlobalAveragePooling2D()(model_.output)
x = Dense(16, activation='softmax')(x)
my_model = Model(inputs=model_.input, outputs=x)
my_model.fit(some_data)
my_model.save('file.h5') # takes 2 seconds
load_model('file.h5') # takes 2 seconds
# NASNetMobile
model_ = keras.applications.nasnet.NASNetMobile(weights='imagenet', include_top=False)
x = GlobalAveragePooling2D()(model_.output)
x = Dense(16, activation='softmax')(x)
my_model = Model(inputs=model_.input, outputs=x)
my_model.fit(some_data)
my_model.save('file.h5') # takes 10 minutes
load_model('file.h5') # takes 5 minutes
# DenseNet121
model_ = keras.applications.densenet.DenseNet121(weights='imagenet', include_top=False)
x = GlobalAveragePooling2D()(model_.output)
x = Dense(16, activation='softmax')(x)
my_model = Model(inputs=model_.input, outputs=x)
my_model.fit(some_data)
my_model.save('file.h5') # takes 10 minutes
load_model('file.h5') # takes 5 minutes
When querying the command line to monitor the file being created while saving, we can see file.h5 being slowly built up, at around 100Kb per minute in the worst cases, and then suddenly when it reaches 22Mb it very quickly completes to the full size (80-100Mb depending on the model).
I was wondering if that's "standard behaviour", just because these models are inherently complex and then such long saving/loading durations are expected, or is it not normal? Also, can something be done to mitigate this?
Configuration used:
Keras 2.2 with TensorFlow backend
TensorFlow-GPU 1.13
Python 3.6
CUDA 10.1
running on an AWS Deep Learning EC2 pre-configured instance
I'm having a similar experience trying to load a ResNet 50 in TF 2.0 and Keras. Not sure what's up, but I see 100% CPU utilization on a single-core (out of 64 available CPU cores).
Related
Attempting to fit a Keras model on an audio_dataset_from_directory results in the kernel apparently not responding. The following code reproduces my problem (tested in VScode and Jupyter Notebook):
import tensorflow.keras as keras
import pandas as pd
import os
# Create architecture of model
inputs = keras.layers.Input((None, 1))
rnn = keras.layers.SimpleRNN(200)(inputs)
output = keras.layers.Dense(1)(rnn)
# Compile model
model = keras.Model(inputs, output)
model.compile(loss="mean_squared_error")
# Load data
data = pd.read_csv(".\\files\\metadata.csv", index_col="title")
data = keras.utils.audio_dataset_from_directory(
".\\files\\songs",
labels=data["UserLikes"].to_list(),
label_mode="int",
ragged=True,
shuffle=True,
)
# Fit model
model.fit(data, epochs=1, verbose=2)
In this code, data["UserLikes"] (and thus y in the Keras dataset) consists of integers in the range [0, inf). An audiofile is processed by Keras as Tensors of floats of shape (timesteps, channels=1). The total size of the audiofiles is merely 320 MB. The goal of the code is to predict the amount of likes a song gets.
The result of this code is nothing: Everytime I run it, the code gets stuck on model.fit. Sometimes the application (i.e., VScode or Jupyter Notebook) even crashes.
Any advice would be greatly appreciated.
I have, thanks to this question mostly been able to solve the problem of tensorflow allocating memory which I didn't want allocated. However, I have recently found that despite my using set_session with allow_growth=True, using model.fit will still mean that all the memory is allocated and I can no longer use it for the rest of my program, even when the function is exited and the model should no longer have any allocated memory due to the fact that the model is a local variable.
Here is some example code demonstrating this:
from numpy import array
from keras import Input, Model
from keras.layers import Conv2D, Dense, Flatten
from keras.optimizers import SGD
# stops keras/tensorflow from allocating all the GPU's memory immediately
from tensorflow.compat.v1.keras.backend import set_session
from tensorflow.compat.v1 import Session, ConfigProto, GPUOptions
tf_config = ConfigProto(gpu_options=GPUOptions(allow_growth=True))
session = Session(config=tf_config)
set_session(session)
# makes the neural network
def make_net():
input = Input((2, 3, 3))
conv = Conv2D(256, (1, 1))(input)
flattened_input = Flatten()(conv)
output = Dense(1)(flattened_input)
model = Model(inputs=input, outputs=output)
sgd = SGD(0.2, 0.9)
model.compile(sgd, 'mean_squared_error')
model.summary()
return model
def make_data(input_data, target_output):
input_data.append([[[0 for i in range(3)] for j in range(3)] for k in range(2)])
target_output.append(0)
def main():
data_amount = 4096
input_data = []
target_output = []
model = make_model()
for i in range(data_amount):
make_data(input_data, target_output)
model.fit(array(input_data), array(target_output), batch_size=len(input_data))
return
while True:
main()
When I run this code with the Pycharm debugger, I find that the GPU RAM used stays at around 0.1GB until I run model.fit for the first time, at which point the memory usage shoots up to 3.2GB of my 4GB of GPU RAM. I have also noted that the memory usage doesn't increase after the first time that model.fit is run and that if I remove the convolutional layer from my network, the memory increase doesn't happen at all.
Could someone please shine some light on my problem?
UPDATE: Setting per_process_gpu_memory_fraction in GPUOptions to 0.1 helps limit the effect in the code included, but not in my actual program. A better solution would still be helpful.
I used to face this problem. And I found a solution from someone who I can't find anymore. His solution I paste below. In fact, I found that if you set allow_growth=True, tensorflow seems to use all your memory. So you should just set your max limit.
try this:
gpus = tf.config.experimental.list_physical_devices("GPU")
if gpus:
# Restrict TensorFlow to only use the first GPU
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, False)
tf.config.experimental.set_virtual_device_configuration(
gpu,
[
tf.config.experimental.VirtualDeviceConfiguration(
memory_limit=12288 # set your limit
)
],
)
tf.config.experimental.set_visible_devices(gpus[0], "GPU")
logical_gpus = tf.config.experimental.list_logical_devices("GPU")
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
except RuntimeError as e:
# Visible devices must be set before GPUs have been initialized
print(e)
Training with SGD and the whole training data in one batch can (depending on your input data) be very memory consumptive.
Try tweaking your batch_size to a lower size (e.g. 8, 16, 32)
Model.summary() gives me a this output
Now how can i check sequential_1 layers and sequential_3 layer?
I want whole model summary but it gives two sequential so that means two model are combined so how can i get summary of both model?
I only have model.h5 file nothing else
Models saved in .h5 format includes everything about the model.
To inspect the layers summary inside the Model in a Model, like in your case.
You could extract the layers, then call the summary method from each of them.
ie.
layer_summary = [layer.summary() for layer in loaded_model.layers]
Here is the complete code I used in reproducing your scenario.
import tensorflow as tf
print('Running Tensorflow version {}'.format(tf.__version__)) # Tensorflow 2.1.0
model_path = '/content/keras_model.h5'
loaded_model = tf.keras.models.load_model(model_path)
loaded_model.summary()
inp = loaded_model.input
layer_summary = [layer.summary() for layer in loaded_model.layers]
I've also used the model.h5 file you uploaded.
I am currently working on vgg16 model with keras.
I fine tune vgg model with some of my layer.
After fitting my model (training), I save my model with model.save('name.h5').
It can be saved without problem.
However, when I try to reload the model with load_model function, it shows the error:
You are trying to load a weight file containing 17 layers into a model
with 0 layers
Did anyone meet this problem before?
My keras verion is 2.2.
Here is part of my code ...
from keras.models import load_model
vgg_model = VGG16(weights='imagenet',include_top=False,input_shape=(224,224,3))
global model_2
model_2 = Sequential()
for layer in vgg_model.layers:
model_2.add(layer)
for layer in model_2.layers:
layer.trainable= False
model_2.add(Flatten())
model_2.add(Dense(128, activation='relu'))
model_2.add(Dropout(0.5))
model_2.add(Dense(2, activation='softmax'))
model_2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model_2.fit(x=X_train,y=y_train,batch_size=32,epochs=30,verbose=2)
model_2.save('name.h5')
del model_2
model_2 = load_model('name.h5')
Actually I do not delete the model and then load_model immediately,
just for showing my problem.
It seems that this problem is related with the input_shape parameter of the first layer. I had this problem with a wrapper layer (Bidirectional) which did not have an input_shape parameter set. In code:
model.add(Bidirectional(LSTM(units=units, input_shape=(None, feature_size)), merge_mode='concat'))
did not work for loading my old model because the input_shape is only defined for the LSTM layer not the outer one. Instead
model.add(Bidirectional(LSTM(units=units), input_shape=(None, feature_size), merge_mode='concat'))
worked because the wrapper Birectional layer now has an input_shape parameter. Maybe you should check if the VGG net input_shape parameter is set or not or you should add a single input_layer to your model with the correct input_shape parameter.
I spent 6 hours looking around for a solution.. to apply me trained model.
finally i tried VGG16 as model and using h5 weights i´ve trained on my own and Great!
weights_model='C:/Anaconda/weightsnew2.h5' # my already trained weights .h5
vgg=applications.vgg16.VGG16()
cnn=Sequential()
for capa in vgg.layers:
cnn.add(capa)
cnn.layers.pop()
for layer in cnn.layers:
layer.trainable=False
cnn.add(Dense(2,activation='softmax'))
cnn.load_weights(weights_model)
def predict(file):
x = load_img(file, target_size=(longitud, altura))
x = img_to_array(x)
x = np.expand_dims(x, axis=0)
array = cnn.predict(x)
result = array[0]
respuesta = np.argmax(result)
if respuesta == 0:
print("Gato")
elif respuesta == 1:
print("Perro")
In case anyone is still wondering about this error:
I had the same Problem and spent days figuring out, whats causing it. I have a copy of my whole code and dataset on another system on which it worked. I noticed that it is something about the training, because without training my model, saving and loading was no problem.
The only difference between my systems was, that I was using tensorflow-gpu on my main system and for this reason, the tensorflow base version was a little bit lower (1.14.0 instead of 2.2.0). So all I had to do was using
model.fit_generator()
instead of
model.fit()
before saving it. And it works
I'm using Keras with tensorflow as backend.
I have one compiled/trained model.
My prediction loop is slow so I would like to find a way to parallelize the predict_proba calls to speed things up.
I would like to take a list of batches (of data) and then per available gpu, run model.predict_proba() over a subset of those batches.
Essentially:
data = [ batch_0, batch_1, ... , batch_N ]
on gpu_0 => return predict_proba(batch_0)
on gpu_1 => return predict_proba(batch_1)
...
on gpu_N => return predict_proba(batch_N)
I know that it's possible in pure Tensorflow to assign ops to a given gpu (https://www.tensorflow.org/tutorials/using_gpu). However, I don't know how this translates to my situation since I've built/compiled/trained my model using Keras' api.
I had thought that maybe I just needed to use python's multiprocessing module and start a process per gpu that would run predict_proba(batch_n). I know this is theoretically possible given another SO post of mine: Keras + Tensorflow and Multiprocessing in Python. However, this still leaves me with the dilemma of not knowing how to actually "choose" a gpu to operate the process on.
My question boils down to: how does one parallelize prediction for one model in Keras across multiple gpus when using Tensorflow as Keras' backend?
Additionally I am curious if similar parallelization for prediction is possible with only one gpu.
A high level description or code example would be greatly appreciated!
Thanks!
I created one simple example to show how to run keras model across multiple gpus. Basically, multiple processes are created and each of process owns a gpu. To specify the gpu id in process, setting env variable CUDA_VISIBLE_DEVICES is a very straightforward way (os.environ["CUDA_VISIBLE_DEVICES"]). Hope this git repo can help you.
https://github.com/yuanyuanli85/Keras-Multiple-Process-Prediction
You can use this function to parallelize a Keras model (credits to kuza55).
https://github.com/kuza55/keras-extras/blob/master/utils/multi_gpu.py
.
from keras.layers import merge
from keras.layers.core import Lambda
from keras.models import Model
import tensorflow as tf
def make_parallel(model, gpu_count):
def get_slice(data, idx, parts):
shape = tf.shape(data)
size = tf.concat([ shape[:1] // parts, shape[1:] ],axis=0)
stride = tf.concat([ shape[:1] // parts, shape[1:]*0 ],axis=0)
start = stride * idx
return tf.slice(data, start, size)
outputs_all = []
for i in range(len(model.outputs)):
outputs_all.append([])
#Place a copy of the model on each GPU, each getting a slice of the batch
for i in range(gpu_count):
with tf.device('/gpu:%d' % i):
with tf.name_scope('tower_%d' % i) as scope:
inputs = []
#Slice each input into a piece for processing on this GPU
for x in model.inputs:
input_shape = tuple(x.get_shape().as_list())[1:]
slice_n = Lambda(get_slice, output_shape=input_shape, arguments={'idx':i,'parts':gpu_count})(x)
inputs.append(slice_n)
outputs = model(inputs)
if not isinstance(outputs, list):
outputs = [outputs]
#Save all the outputs for merging back together later
for l in range(len(outputs)):
outputs_all[l].append(outputs[l])
# merge outputs on CPU
with tf.device('/cpu:0'):
merged = []
for outputs in outputs_all:
merged.append(merge(outputs, mode='concat', concat_axis=0))
return Model(input=model.inputs, output=merged)