I'm struggling in loading data into model.fit efficiently. My code creates training_data object with samples and values. Samples is a standard python list of objects of tf.Tensor class. Values is a list of integers.
When running
model.fit(training_data.samples, training_data.values, epochs=10)
I get an error
ValueError: Failed to find data adapter that can handle input: (<class 'list'> containing values of types {"<class 'tensorflow.python.framework.ops.EagerTensor'>"}), (<class 'list'> containing values of types {"<class 'int'>"})
I can get this two work, by pre-converting it all to numpy arrays like this:
s, v = np.asarray(training_data.samples), np.asarray(training_data.values)
model.fit(s, v, epochs=10)
However this is impossibly slow. Loading data and very heavy preprocessing (signal chunking, fft, etc. etc.) takes about a minute and then just the data conversion with 1800 samples this part hangs for an hour and I lose patience before actual learning starts. Tensors's shape is (94, 257) so nothing big.
So what's an efficient way to pass data to model.fit, given that I already have it in memory.
Hi this is just a suggestion but try using a generator object from tf.keras.utils.Sequence but this also depends on what type of data you are using?
You can look at the example here:
https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
I implemented my own here you can check it out:
https://github.com/edwin-19/custom_keras_generator/blob/master/notebooks/Model%20Comparison.ipynb
For me it's a common error when I use the wrong input format (typically when I pass a list).
Try to convert only training_data.values to a numpy array or a tensor if it's a list.
As pinpointed in the documentation (x, y, validation_data,..) have a limited number of accepted inputs, i.e. numpy array and tensor:
model.fit
Related
I'm trying to make an image captioning model using the federated learning library provided by tensorflow, but I'm stuck at this error
Input 0 of layer dense is incompatible with the layer: : expected min_ndim=2, found ndim=1.
this is my input_spec:
input_spec=collections.OrderedDict(x=(tf.TensorSpec(shape=(2048,), dtype=tf.float32), tf.TensorSpec(shape=(34,), dtype=tf.int32)), y=tf.TensorSpec(shape=(None), dtype=tf.int32))
The model takes image features as the first input and a list of vocabulary as a second input, but I can't express this in the input_spec variable. I tried expressing it as a list of lists but it still didn't work. What can I try next?
Great question! It looks to me like this error is coming out of TensorFlow proper--indicating that you probably have the correct nested structure, but the leaves may be off. Your input spec looks like it "should work" from TFF's perspective, so it seems it is probably slightly mismatched with the data you have
The first thing I would try--if you have an example tf.data.Dataset which will be passed in to your client computation, you can simply read input_spec directly off this dataset as the element_spec attribute. This would look something like:
# ds = example dataset
input_spec = ds.element_spec
This is the easiest path. If you have something like "lists of lists of numpy arrays", there is still a way for you to pull this information off the data itself--the following code snippet should get you there:
# data = list of list of numpy arrays
input_spec = tf.nest.map_structure(lambda x: tf.TensorSpec(x.shape, x.dtype), data)
Finally, if you have a list of lists of tf.Tensors, TensorFlow provides a similar function:
# tensor_structure = list of lists of tensors
tf.nest.map_structure(tf.TensorSpec.from_tensor, tensor_structure)
In short, I would reocmmend not specifying input_spec by hand, but rather letting the data tell you what its input spec should be.
I have a list of numpy arrays which are to be given as input to the neural network model by a generator. However, the shape of numpy arrays can be different (*,1) where * is the dynamic number that is populated for every image. How I can write it in generator? I tried to give this like this in the neural network model.
preds = Input(shape=(None,1),name='preds')
In generator, I am trying to give something like this:
result = np.zeros((batchsize,None,1))
for i in range(batch_size):
result[i,:,:]=predictions
But this is giving me the error that TypeError: 'NoneType' object cannot be interpreted as an integer in declaration of np.zeros(batchsize,None,1). What is the correct way to give dynamic shape for a numpy array.
I need to feed an image and a vector sampled from normal distribution simultaneously. As the image dataset I'm using is too large, I create a ImageDeserializer for that part. But I also need to add random vector (sampled from numpy normal distribution), to the input map before feed it to the network. Is there any way to achieve this?
I also test:
mb_data = reader_train.next_minibatch(mb_size, input_map=input_map)
mb_data[random_input_node] = np.random.normal((mb_size, 100))
but get the following error:
TypeError: cannot convert value of dictionary to N4CNTK13MinibatchDataE
The problem solved with the following snippet to feed data to trainer:
mb_data = reader_train.next_minibatch(mb_size, input_map=input_map)
z = np.random.normal(mb_size)
my_trainer.train_minibatch({feature_image: mb_data[image].data, feature_z: z})
Also thanks to #mewahl. Defining new reader is another suitable way to solve the problem, and I think it must be faster than what I have done.
How do you fix inconsistent numbers of samples when using GaussianNB()? Also, is it possible for input pandas dataframe as arguments for model.fit function?
The issue is that GaussianNB is expecting weather to be in the shape (n_samples, n_features). You currently have it as a one-dimensional array, so GaussianNB is interpreting it as a 1 sample with 14 features.
To convert to the right shape, you can use weather[:,None] as described in this answer. So, the following should do the trick:
model.fit(weather[:,None], play)
I am trying to use nltk's wrapper for scikit-learn's classifiers. I use this code to train the classifier:
classifier = SklearnClassifier(GaussianNB())
classifier.train(self.training_set)
Where training_set looks like
[({'name':'Alpha Hotel', 'clicks':765, 'zip_code':75025},'no bookings')]
The error I am getting is
TypeError: A sparse matrix was passed, but dense data is required. Use
X.toarray() to convert to a dense numpy array.
I don't know how to convert to a dense array, especially since nltk's documentation for the train method requires A list of (featureset, label) where each featureset is a dict mapping strings to either numbers, booleans or strings.
You have three features just two of them is in numerical format.You first should convert the 'name' feature to a number. If the name variable is categorical then you can encode it in a meaningful manner as described here:
http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features
i think your labels also limited, so you can encode them too. The last step is really easy you just need to convert nltk format to numpy array format. just read each feature in a loop and then insert your desired features in X (features) and Y (labels):
http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html
Maybe it's was late, but maybe help other who get same problem(cz i got this problem yesterday).
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
Like error said, it's need to convert to array so i just convert this to array as the error said
vector = vectorizer.transform(corpus).toarray()
So just add .toarray() solve this problem.
;)
when i switch to MultinomialNB or BernoulliNB, neither they didn't error. with or without toarray().
note: dont forget to convert to fit and transform your text to word representation(numeric values).