I have a pretrained keras model that has output with dimesion of [n, 4000] (It makes the classification on 4000 classes).
I need to make a prediction on the test data (300K observations).
But when I call method model.predict(X_train), I get an run-out memory error, because I don't have enough RAM to store matrix with shape (300K , 4000).
Therefore, it would be logical to convert the model output to a sparse matrix.
But wrapping the predict method into scipy function sparse.csr_matrix does not work (sparse.csr_matrix(model.predict(X_train))), because it first allocates space in the RAM for the prediction, and only then it converts into the sparse matrix.
I can also make a prediction on a specific batch of test data, and then convert it using forloop.
But it seems to me that this is not optimal and very resource-consuming way.
Please give me advice, can there be any other methods for converting the model output into a sparse matrix?
Isn't there a batch_size parameter in the predict()?
If I get it correct the n means number of sample right?
Assume that you system ram is enough to hold the entire data but the VRAM is not.
Related
I have trained a keras model which uses TFrecords because the training data and also validation data does not fit into memory. Now I would like to do a confusion matrix but the classic method which uses numpy arrays or "in-memory" tensors will not work. (with tf.math.confusion_matrix for example)
Is there any way I can do confusion matrix which from TFrecords simply? Even if the amount data is huge, making a confusion matrix should not eat too much memory. (Answers saying "No because..." will also be very helpful to me.)
Or I will just have to work with a smaller validation data?
I have a model trained in keras and is saved as a .h5 file. The model is trained with single precision floating point values with tensorflow backend. Now I want to implement an hardware accelerator which performs the convolution operation on an Xilinx FPGA. However, before I decide on the fixed point bit width to be used on the FPGA, I need to evaluate the model accuracy by quantizing the weights to 8 or 16 bit numbers. I came across the tensorflow quantise but I am not sure how I can go about taking weights from each layer, quantise it and store it in a list of numpy arrays. After all layers are quantised, I want to set the weights of the model to the new formed quantised weights. Could someone help me do this?
This is what I have tried so far to reduce precision from float32 to float16. Please let me know if this is the correct approach.
for i in range(len(w_orginal)):
temp_shape = w_orginal[i].shape
print('Shape of index: '+ str(i)+ 'array is :')
print(temp_shape)
temp_array = w_orginal[i]
temp_array_flat = w_orginal[i].flatten()
for j in range(len(temp_array)):
temp_array_flat[j] = temp_array_flat[j].astype(np.float16)
temp_array_flat = temp_array_flat.reshape(temp_shape)
w_fp_16_test.append(temp_array_flat)
Sorry for that I'm not familiar to tensorflow, so I can't give you the code, but maybe my experience with quantizing a caffe model could make sense.
If I understand you correctly, you have a tensorflow model(float32) which you want to quantize it to int8 and save it in a numpy.array.
Firstly, you should read all weights for each layer, which might be python list or numpy.array or something else, it does't matter.
Then, the quantize algorithm will influence the accuracy significantly, you must choose the best one for your model. However, these algorithms have the same core -- scale. All you need to do is scale all the weights to -127 to 127(int8), like the scale layer without bias, and record the scale factor.
Meanwile, if want to implement it on FPGA, the data should be qiantized too. Here we have a new problem -- the result of int8 * int8 is a int16, which is obvious overflow.
To solve this, we create a new parameter -- shift -- to shift int16 result back to int8. Notice, the shift parameter won't be constant 8, suppose we have 0 * 0 = 0, we don't need to shift the result at all.
The last question we shoud think over is that if the net is too deep, the layer result could overflow because some unreasonable scale parameters, so we can't directly quantize each single layer without think about other layers.
After all the net finished on FPGA, if you want to dequantize int8 to float32, just use the last scale parameter(of final result) to do some mul/div(depend on how you define scale).
This is a basic quantize algorithm, others like tf.quantization may have higher accuracy. Now we have the quantized model, you can save it into whatever you like, it's not a hard work.
P.S. Why numpy? bin file is the best for FPGA, isn't it?
And, do you have some idea about implementing softmax on FPGA? I'm confused about it...
I have some embedding_vectors and I need to use the following new_embeddings:
new_embeddings = tf.nn.embedding_lookup_sparse(
params=embedding_vectors,
sp_ids=some_ids,
sp_weights=None,
)
The problem is that some_ids is really big and remarkably sparsed but constant for the given data 2-D tensor. My pipeline includes the evaluation of its indices, values and shape which I use directly with the sparse_placeholder in training loop to feed up the some_ids placeholder.
Unfortunately it is very slow. It seems that in every training step the some_ids are converted to dense tensor which seems really unnecessary and strange. Am I right about this convertion and is there any alternative for embedding_lookup_sparse?
I find tf.sparse_tensor_dense_matmul() is mush faster than tf.nn.embedding_lookup_sparse().
I'm trying to train a linear model on a very large dataset.
The feature space is small but there are too many samples to hold in memory.
I'm calculating the Gram matrix on-the-fly and trying to pass it as an argument to sklearn Lasso (or other algorithms) but, when I call fit, it needs the actual X and y matrices.
Any idea how to use the 'precompute' feature without storing the original matrices?
(My answer is based on the usage of svm.SVC, Lasso may be different.)
I think that you are supposed pass the Gram matrix instead of X to the fit method.
Also, the Gram matrix has shape (n_samples, n_samples) so it should also be too large for memory in your case, right?
I'm trying to plug a bunch of data (sentiment-tagged tweets) into an SVM using scikit-learn. I've been using CountVectorizer to build a sparse array of word counts, and it's all working fine with smallish data sets (~5000 tweets). However, when I try to use a larger corpus (ideally 150,000 tweets, but I'm currently exploring with 15,000), .toarray(), which converts a sparse format to a denser format, immediately starts taking up immense amounts of memory (30k tweets hit over 50gb before the MemoryError.
So my question is -- is there a way to feed LinearSVC() or a different manifestation of SVM a sparse matrix? Am I necessarily required to use a dense matrix? It doesn't seem like a different vectorizer would help fix this problem (as this problem seems to be solved by: MemoryError in toarray when using DictVectorizer of Scikit Learn). Is a different model the solution? It seems like all of the scikit-learn models require a dense array representation at some point, unless I've been looking in the wrong places.
cv = CountVectorizer(analyzer=str.split)
clf = svm.LinearSVC()
X = cv.fit_transform(data)
trainArray = X[:breakpt].toarray()
testArray = X[breakpt:].toarray()
clf.fit(trainArray, label)
guesses = clf.predict(testArray)
LinearSVC.fit and its predict method can both handle a sparse matrix as the first argument, so just removing the toarray calls from your code should work.
All estimators that take sparse inputs are documented as doing so. E.g., the docstring for LinearSVC states:
Parameters
----------
X : {array-like, sparse matrix}, shape = [n_samples, n_features]
Training vector, where n_samples in the number of samples and
n_features is the number of features.