AssertionError when using MirroredStrategy: isinstance(x, dataset_ops.DatasetV2) - python

I am trying to use MirroredStrategy to fit my sequential model using two Titan Xp GPUs. I am using tensorflow 2.0 alpha on ubuntu 16.04.
I successfully run the code snippet from the tensorflow documentation:
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))])
model.compile(loss='mse', optimizer='sgd')
dataset = tf.data.Dataset.from_tensors(([1.], [1.])).repeat(100).batch(10)
model.fit(dataset, epochs=2)
model.evaluate(dataset)
However, when I try to train on my data, which is a sparse matrix of shape (using adam optimizer and binary crossentropy):
Shape X_train: (91422, 65545)
Shape y_train: (91422, 1)
I receive an assertion error in _distribution_standardize_user_data at
assert isinstance(x, dataset_ops.DatasetV2)
In the TensorFlow code, line 2166 in training.py seems to be causing this assertion error.
Can someone explain to me what the problem with my data could be?

I got similar error when using dataset= strategy.experimental_distribute_dataset(train_dataset) with model.fit(dataset) .
I after I remove the strategy.experimental_distribute_dataset. It works fine. It is similar to the TF document where they said that keras.Model.fit() handle everything automatically and we need distributed dataset manually only when we want to do customized training with tf.GradientTape().
You can go through the offical tutorial of MNIST for more info

Seems like you are feed dataset into model.fit, model.fit are expecting an numpy.ndarray.

Related

Keras model.fit hangs

Attempting to fit a Keras model on an audio_dataset_from_directory results in the kernel apparently not responding. The following code reproduces my problem (tested in VScode and Jupyter Notebook):
import tensorflow.keras as keras
import pandas as pd
import os
# Create architecture of model
inputs = keras.layers.Input((None, 1))
rnn = keras.layers.SimpleRNN(200)(inputs)
output = keras.layers.Dense(1)(rnn)
# Compile model
model = keras.Model(inputs, output)
model.compile(loss="mean_squared_error")
# Load data
data = pd.read_csv(".\\files\\metadata.csv", index_col="title")
data = keras.utils.audio_dataset_from_directory(
".\\files\\songs",
labels=data["UserLikes"].to_list(),
label_mode="int",
ragged=True,
shuffle=True,
)
# Fit model
model.fit(data, epochs=1, verbose=2)
In this code, data["UserLikes"] (and thus y in the Keras dataset) consists of integers in the range [0, inf). An audiofile is processed by Keras as Tensors of floats of shape (timesteps, channels=1). The total size of the audiofiles is merely 320 MB. The goal of the code is to predict the amount of likes a song gets.
The result of this code is nothing: Everytime I run it, the code gets stuck on model.fit. Sometimes the application (i.e., VScode or Jupyter Notebook) even crashes.
Any advice would be greatly appreciated.

Save Keras model for production mode without TensorFlow

I want to save a trained Keras model, so that it can be used in the Django REST backend of an application. I did a lot of research, but it seems there isn't any way to use these models without TensorFlow installed.
So, what is the use of this storage? I don't want to install a heavy library like TensorFlow on the server. I tested saving with pickle and joblib, as well as Keras' own model.save().
Is there a way to load this model without installing TensorFlow and only with Keras itself?
This is a part of my code,
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout
xtrain, ytrain = np.array(xtrain), np.array(ytrain)
ytrain = np.reshape(ytrain, (ytrain.shape[0], 1, 1))
model = Sequential()
model.add(LSTM(150, return_sequences=True, input_shape=(xtrain.shape[1], 1)))
model.add(LSTM(150, return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(xtrain, ytrain, batch_size=1, epochs=7)
model.save('model.h5')
which normally works perfectly, but if I use the model elsewhere, I get this error:
ModuleNotFoundError: No module named 'tensorflow'
You do not need to use TensorFlow in production. You can use coefficient by replacing what random functions in your programming language.
Sample: Input array, time coefficients matrixes, and unboxed system inputs to output with feedback system in the box containers.
temp = tf.random.normal([10], 1, 0.2, tf.float32)
temp = np.asarray(temp) * np.asarray([ coefficient_0, coefficient_1, coefficient_2, coefficient_3, coefficient_4, coefficient_5, coefficient_6, coefficient_7, coefficient_8, coefficient_9 ]) #action = actions['up']
temp = tf.nn.softmax(temp)
action = int(np.argmin(temp))

Get summary of tensorflow model

Model.summary() gives me a this output
Now how can i check sequential_1 layers and sequential_3 layer?
I want whole model summary but it gives two sequential so that means two model are combined so how can i get summary of both model?
I only have model.h5 file nothing else
Models saved in .h5 format includes everything about the model.
To inspect the layers summary inside the Model in a Model, like in your case.
You could extract the layers, then call the summary method from each of them.
ie.
layer_summary = [layer.summary() for layer in loaded_model.layers]
Here is the complete code I used in reproducing your scenario.
import tensorflow as tf
print('Running Tensorflow version {}'.format(tf.__version__)) # Tensorflow 2.1.0
model_path = '/content/keras_model.h5'
loaded_model = tf.keras.models.load_model(model_path)
loaded_model.summary()
inp = loaded_model.input
layer_summary = [layer.summary() for layer in loaded_model.layers]
I've also used the model.h5 file you uploaded.

Warning `tried to deallocate nullptr` when using tensorflow eager execution with tf.keras

As per the tensorflow team suggestion, I'm getting used to tensorflow's eager execution with tf.keras. However, whenever I train a model, I receive a warning (EDIT: actually, I receive this warning repeated many times, more than once per training step, flooding my standard output):
E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
The warning doesn't seem to affect the quality of the training but I wonder what it means and if it is possible to get rid of it.
I use a conda virtual environment with python 3.7 and tensorflow 1.12 running on a CPU. (EDIT: a test with python 3.6 gives the same results.) A minimal code that reproduces the warnings follows. Interestingly, it is possible to comment the line tf.enable_eager_execution() and see that the warnings disappear.
import numpy as np
import tensorflow as tf
tf.enable_eager_execution()
N_EPOCHS = 50
N_TRN = 10000
N_VLD = 1000
# the label is positive if the input is a number larger than 0.5
# a little noise is added, just for fun
x_trn = np.random.random(N_TRN)
x_vld = np.random.random(N_VLD)
y_trn = ((x_trn + np.random.random(N_TRN) * 0.02) > 0.5).astype(float)
y_vld = ((x_vld + np.random.random(N_VLD) * 0.02) > 0.5).astype(float)
# a simple logistic regression
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(1, input_dim=1))
model.add(tf.keras.layers.Activation('sigmoid'))
model.compile(
optimizer=tf.train.AdamOptimizer(),
# optimizer=tf.keras.optimizers.Adam(), # doesn't work at all with tf eager execution
loss='binary_crossentropy',
metrics=['accuracy']
)
# Train model on dataset
model.fit(
x_trn, y_trn,
epochs=N_EPOCHS,
validation_data=(x_vld, y_vld),
)
model.summary()
Quick solutions:
It did not appear when I ran the same script in TF 1.11 while the optimization was performed to reach the same final validation accuracy on a synthetic dataset.
OR
Suppress the errors/warning using the native os module (adapted from https://stackoverflow.com/a/38645250/2374160). ie; by setting the Tensorflow logging environment variable to not show any error messages.
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
More info:
Solving this error in the correct way may require familiarity with MKL library calls and its interfacing on Tensorflow which is written in C (this is beyond my current TF expertise)
In my case, this memory deallocation error occurred whenever the
apply_gradients() method of an optimizer was called. In your script, it is called when the model is being fitted to the training data.
This error is raised from here: tensorflow/core/common_runtime/mkl_cpu_allocator.h
I hope this helps as a temporary solution for convenience.

Variable size inputs for CNTK in Keras

I want to feed a CNN with images from different resolutions using Keras. Thus, I defined the input layer shape as (None,None,3), since the images have 3 channels. My problem is that this works well on TensorFlow, but gives and error on CNTK (and I must use CNTK).
The following python code illustrates my problem:
import numpy as np
from keras.models import Model
from keras.layers import Conv2D, Input
input_layer = Input(shape=(None,None,3),name='input')
x = Conv2D(16,3)(input_layer)
x = Conv2D(16,3)(x)
model = Model(input=input_layer,output=x)
model.compile('adam','mse')
X = np.random.random((1,32,32,3))
Y = model.predict(X)
print Y.shape
If I run using Keras+TensorFlow it will execute nicely, however changing Keras backend to CNTK gives the error:
ValueError: Convolution operation requires that kernel dim 3 <= input dim 1.
As far as I could find over the internet, this problem should have been fixed from CNTK 2.2 and so on, however I'm using CNTK 2.5. Any ideas on how can I overcome this issue?

Categories