I need to feed an image and a vector sampled from normal distribution simultaneously. As the image dataset I'm using is too large, I create a ImageDeserializer for that part. But I also need to add random vector (sampled from numpy normal distribution), to the input map before feed it to the network. Is there any way to achieve this?
I also test:
mb_data = reader_train.next_minibatch(mb_size, input_map=input_map)
mb_data[random_input_node] = np.random.normal((mb_size, 100))
but get the following error:
TypeError: cannot convert value of dictionary to N4CNTK13MinibatchDataE
The problem solved with the following snippet to feed data to trainer:
mb_data = reader_train.next_minibatch(mb_size, input_map=input_map)
z = np.random.normal(mb_size)
my_trainer.train_minibatch({feature_image: mb_data[image].data, feature_z: z})
Also thanks to #mewahl. Defining new reader is another suitable way to solve the problem, and I think it must be faster than what I have done.
Related
I'm trying to create an automated data pre-processing library and I want to transform the string data into numerical so it can be ran through ML algorithms. But I can't seem to reverse it back to its original state, which should be relatively simple given that Sci-Kit has a built in "inverse_transform()" method.
le=LabelEncoder()
def transformCatagorical(data):
catagorical_data = data.select_dtypes(include=['object']).columns.tolist()
for cat in catagorical_data:
transform = le.fit_transform(data[cat].astype(str))
data[cat] = transform
This is our transformation function which yields good results as shown here:
Transformed Data
But when we try to reverse it using this function:
def reverse(orig, data):
cols = get_categorical_columns(orig)
for col in cols:
data[col] = le.inverse_transform(data[col])
It transforms it into a complete random, coordinate like structure? Im not sure how to explain it without a picture:
Picture of wrongly transformed data
I've been trying to figure out how/why it's doing this but honestly I'm completely lost. Any help would be appreciated! Thank you!
I'm trying to create a function for some deep learning issues for satellite images classification. I have searched through a lot of libraries and I haven't found my needs I tried this sikit-learn but I feel that it is not what I need
Any hint for a specialised function that I may not see?
The sklearn train_test_split seems to fit all your needs.
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
This should do the trick. You can use the permutation array on the X and y data separately if you like.
num_tr, num_va = int(len(data)*0.5), int(len(data)*0.2)
perm = np.random.permutation(len(data))
tr_data = data[perm[:num_tr]]
va_data = data[perm[num_tr:num_tr+num_va]]
te_data = data[perm[num_tr+num_va:]]
I'm trying to make an image captioning model using the federated learning library provided by tensorflow, but I'm stuck at this error
Input 0 of layer dense is incompatible with the layer: : expected min_ndim=2, found ndim=1.
this is my input_spec:
input_spec=collections.OrderedDict(x=(tf.TensorSpec(shape=(2048,), dtype=tf.float32), tf.TensorSpec(shape=(34,), dtype=tf.int32)), y=tf.TensorSpec(shape=(None), dtype=tf.int32))
The model takes image features as the first input and a list of vocabulary as a second input, but I can't express this in the input_spec variable. I tried expressing it as a list of lists but it still didn't work. What can I try next?
Great question! It looks to me like this error is coming out of TensorFlow proper--indicating that you probably have the correct nested structure, but the leaves may be off. Your input spec looks like it "should work" from TFF's perspective, so it seems it is probably slightly mismatched with the data you have
The first thing I would try--if you have an example tf.data.Dataset which will be passed in to your client computation, you can simply read input_spec directly off this dataset as the element_spec attribute. This would look something like:
# ds = example dataset
input_spec = ds.element_spec
This is the easiest path. If you have something like "lists of lists of numpy arrays", there is still a way for you to pull this information off the data itself--the following code snippet should get you there:
# data = list of list of numpy arrays
input_spec = tf.nest.map_structure(lambda x: tf.TensorSpec(x.shape, x.dtype), data)
Finally, if you have a list of lists of tf.Tensors, TensorFlow provides a similar function:
# tensor_structure = list of lists of tensors
tf.nest.map_structure(tf.TensorSpec.from_tensor, tensor_structure)
In short, I would reocmmend not specifying input_spec by hand, but rather letting the data tell you what its input spec should be.
I am trying to use nltk's wrapper for scikit-learn's classifiers. I use this code to train the classifier:
classifier = SklearnClassifier(GaussianNB())
classifier.train(self.training_set)
Where training_set looks like
[({'name':'Alpha Hotel', 'clicks':765, 'zip_code':75025},'no bookings')]
The error I am getting is
TypeError: A sparse matrix was passed, but dense data is required. Use
X.toarray() to convert to a dense numpy array.
I don't know how to convert to a dense array, especially since nltk's documentation for the train method requires A list of (featureset, label) where each featureset is a dict mapping strings to either numbers, booleans or strings.
You have three features just two of them is in numerical format.You first should convert the 'name' feature to a number. If the name variable is categorical then you can encode it in a meaningful manner as described here:
http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features
i think your labels also limited, so you can encode them too. The last step is really easy you just need to convert nltk format to numpy array format. just read each feature in a loop and then insert your desired features in X (features) and Y (labels):
http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html
Maybe it's was late, but maybe help other who get same problem(cz i got this problem yesterday).
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
Like error said, it's need to convert to array so i just convert this to array as the error said
vector = vectorizer.transform(corpus).toarray()
So just add .toarray() solve this problem.
;)
when i switch to MultinomialNB or BernoulliNB, neither they didn't error. with or without toarray().
note: dont forget to convert to fit and transform your text to word representation(numeric values).
I am trying to do some classification task with python and SVM.
From collected data I extracted the feature vectors for each class and created a training set. The feature vectors have n-dimensions(39 or more). So, say for 2 classes I have a set of 39-d feature vectors and a single array of class labels corresponding to each entry in the feature vector.Currently, I am using mlpy and doing something like this:
import numpy as np
import mlpy
svm=mlpy.Svm('gaussian') #tried a linear kernel too but not having the convergence
instance= np.vstack((featurevector1,featurevector1))
label=np.hstack((np.ones((1,len(featurevector1),dtype=int),-1*np.ones((1,len(featurevector2),dtype=int)))
#Assigning a label(+1/-1) for each entry in instance, (+1 for entries coming from
#featurevector 1 and -1 for featurevector2
svm.compute(instance,label) #it converges and outputs 1
svm.predict(testdata) #This one says all class label are 1 only whereas I ve testing data from both classes
Am I doing some mistake here? Or should I use some other library? Please help.
I don't use mlpy, but np.ones((1,len(featurevector1)) should perhaps be just np.ones(len(featurevector1)) --
print .shape of each to see the difference.
(If you have a link to public data anything like yours, could you post it please ?)