"OSError: Failed to interpret file as a pickle" after saving - python

My code began giving this error after I opened and saved NEweights.npy:
OSError: Failed to interpret file 'D:\\NeuralNetwork\\NEweights.npy' as a pickle
It was working initially before I saved it. Why am I receiving this error only now, and is there any way I can still access the data in NEweights.npy? (Just for context, NEweights.npy is an array of neural network weights trained via Nesterov Accelerated Gradient. I was testing different NN optimizers.)
I have this code to save the numpy arrays in a npy file:
np.save(f'{path}GDweights.npy', np.array(weights, dtype=object))
I have this to access the numpy arrays:
def getWeights(path):
return np.load(path, allow_pickle=True)
path = 'D:\\NeuralNetwork\\'
inputs, outputs = grab(f'{path}test.csv')
weightsGD = getWeights(f'{path}GDweights.npy')
weightsM = getWeights(f'{path}Mweights.npy')
weightsNE = getWeights(f'{path}NEweights.npy')
weightsNA = getWeights(f'{path}NAweights.npy')
weightsD = getWeights(f'{path}Dweights.npy')

This error is raised as an IOError and according to this If the input file does not exist or cannot be read, this error is raised.

Related

Huge output message

I'm trying to execute the following code using tensorflow, Hugginface's transformer and openai/whisper-base model
import tensorflow as tf
import transformers
# Load the model and tokenizer
model = transformers.TFWhisperModel.from_pretrained('openai/whisper-base')
tokenizer = transformers.WhisperTokenizer.from_pretrained('openai/whisper-base')
# Read the audio file and convert it to a tensor
audio_file = "data/preamble.wav"
with open(audio_file, 'rb') as f:
audio = f.read()
input_ids = tf.constant(tokenizer.encode(audio, return_tensors='tf'))
# Transcribe the audio
output = model(input_ids)[0]
transcription = tokenizer.decode(output, skip_special_tokens=True)
with open("something.txt", "w") as f:
f.write(transcription)
I'm getting this huge output error, too big to copy and paste here, below is an error snippet. The entire message consists of the same syntax except for the last line, which I've pasted below. The add picture is the top of the error message that I had to screenshot before it disappears.
Top of Error message picture
The 1st output to terminal after running script
Bottom of Error Snippet
c\xff\x0c\x00\xeb\xff\xb3\xff\xc5\xff\x0f\x00\xde\xff\x16\x00B\x00\x0e\x00\xfd\xff$\x000\x00\xff\x
ff\xe7\xff<\x00\xfb\xff\n\x00/\x008\x00\x06\x00\x17\x00\x1d\x00\xde\xff\xf2\xff\xec\xff\xff\xff\x0
f\x00\x1b\x008\x00\x1d\x003\x00%\x00#\x00\r\x00\x16\x00\x1d\x00\x19\x00\xf7\xff\x14\x00\xff\xff\xc
c\xff\x06\x00\xf1\xff\x11\x00\xf0\xff*\x00P\x00\xe7\xffH\x00\t\x00\xd0\xff\xd0\xff\xee\xff\xf6\xff
\xc6\xff\xe4\xff\xce\xff' is not valid. Should be a string, a list/tuple of strings or a list/tuple
of integers.
The last line is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers. is my only clue as to my next step.
I cannot scroll to the top to find where in my code is throwing the error. I'm new to machine learning and I don't know what I'm seeing. Any help is appreciated.
Thank you in advance!!!
I tried a try execpt block around output and transcription with no change, same output message
I've tried:
input_ids = str(tf.constant(tokenizer.encode(audio, return_tensors='tf')))
input_ids = []
input_ids = input_ids.append(int(tf.constant(tokenizer.encode(audio, return_tensors='tf'))))
output = model(str(input_ids))[0]
No change to the output

Is there a good way to write 2d arrays or tensors to TFRecords in Tensorflow?

I am currently working on a project using audio data. The first step of the project is to use another model to produce features for the audio example that are about [400 x 10_000] for each wav file and each wav file will have a label that I'm trying to predict. I will then build another model on top of this to produce my final result.
I don't want to run preprocessing every time I run the model, so my plan was to have a preprocessing pipeline that runs the feature extraction model and saves it into a new folder and then I can just have the second model use the saved features directly. I was looking at using TFRecords, but the documentation is quite unhelpful.
tf.io.serialize_tensor
tfrecord
This is what I've come up with to test it so far:
serialized_features = tf.io.serialize_tensor(features)
feature_of_bytes = tf.train.Feature(
bytes_list=tf.train.BytesList(value=[serialized_features.numpy()]))
features_for_example = {
'feature0': feature_of_bytes
}
example_proto = tf.train.Example(
features=tf.train.Features(feature=features_for_example))
filename = 'test.tfrecord'
writer = tf.io.TFRecordWriter(filename)
writer.write(example_proto.SerializeToString())
filenames = [filename]
raw_dataset = tf.data.TFRecordDataset(filenames)
for raw_record in raw_dataset.take(1):
example = tf.train.Example()
example.ParseFromString(raw_record.numpy())
print(example)
But I'm getting this error:
tensorflow.python.framework.errors_impl.DataLossError: truncated record at 0' failed with Read less bytes than requested
tl;dr:
Getting the above error with TFRecords. Any recommendations to get this example working or another solution not using TFRecords?

Gensim Model : class 'FileNotFoundError'

Well the issue is I have 1000s of the document and I passed all the documents for the training of Gensim model and I successfully trained and saved the model in .model format.
But with the current format, 2 new files have also been generated
doc2vec.model
doc2vec.model.trainables.syn1neg.npy
doc2vec.model.wv.vectors.npy
Due to the limitation of Hardware I trained and saved the model on Google Colab and Google Driver respectively. When I downloaded the generated models and extra files in my local machine and ran the code it's giving me a File Not Found Error, whereas I have added the particular files where the .py file is or current working directory is.
Well I used below code
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from nltk.tokenize import word_tokenize
files = readfiles("CuratedData")
data = [TaggedDocument(words=word_tokenize(_d.decode('utf-8').strip().lower()), tags=[str(i)]) for i, _d in enumerate(files)]
max_epochs = 100
vec_size = 300
alpha = 0.025
model = Doc2Vec(vector_size=vec_size,
alpha=alpha,
min_alpha=0.00025,
min_count=1,
dm=1)
model.build_vocab(data)
for epoch in range(max_epochs):
print('iteration {0}'.format(epoch))
model.train(data,
total_examples=model.corpus_count,
epochs=model.iter)
# decrease the learning rate
model.alpha -= 0.0002
# fix the learning rate, no decay
model.min_alpha = model.alpha
model.save("doc2vec.model")
print("Model Saved")
Code for Loading the Model
webVec = ""
try:
path = os.path.join(os.getcwd(), "doc2vec.model")
model = Word2Vec.load(path)
data = word_tokenize(content['htmlResponse'].lower())
# Webvector
webVec = model.infer_vector(data)
except ValueError as ve:
print(ve)
except (TypeError, ZeroDivisionError) as ty:
print(ty)
except:
print("Oops!", sys.exc_info()[0], "occurred.")
Any help would be greatly appreciated. Thanks, Cheers
Saving a large model will usually create several subsidiary files for the large internal arrays. All those files must be kept together. (They will all start with the same string, the name you originally specified - in your case, doc2vec.model.)
It's possible there was another file you failed to download. But without seeing the code you used to trigger the error, or the full error traceback stack (with filenames and lines-of-code involved), it's hard to guess what exactly you did to trigger a FileNotFoundError. You may want to edit your question to add that info, so it's clearer what code you ran before, and what library code is involved in, the exact error.

Can't convert Frozen Inference Graph to .tflite

I am new to the object detection API and TensorFlow in general. I followed this tutorial and in the end I produced a frozen_inference_graph.pb. I want to run this object detection model on my phone, which in my understanding requires me to convert it to .tflite (please lmk if this doesn't make any sense).
When I tried to convert it using this standard code here:
import tensorflow as tf
graph = 'pathtomygraph'
input_arrays = ['image_tensor']
output_arrays = ['all_class_predictions_with_background']
converter = tf.lite.TFLiteConverter.from_frozen_graph(graph, input_arrays, output_arrays)
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)
It throws an error, saying:
ValueError: None is only supported in the 1st dimension. Tensor
'image_tensor' has invalid shape '[None, None, None, 3]'
This is a common error I found on the internet, and after searching through many threads, I tried to give an extra parameter to the code:
converter = tf.lite.TFLiteConverter.from_frozen_graph(
graph, input_arrays, output_arrays,input_shapes={"image_tensor":[1,600,600,3]})
Now it looks like this:
import tensorflow as tf
graph = 'pathtomygraph'
input_arrays = ['image_tensor']
output_arrays = ['all_class_predictions_with_background']
converter = tf.lite.TFLiteConverter.from_frozen_graph(
graph, input_arrays, output_arrays,input_shapes={"image_tensor":[1,600,600,3]})
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)
This works at first, but throws another error at the end, saying:
Check failed: array.data_type == array.final_data_type Array
"image_tensor" has mis-matching actual and final data types
(data_type=uint8, final_data_type=float). Fatal Error: Aborted
I understand that my input tensor has the data type of uint8 and this causes a mismatch, I guess. My question would be, is this the correct way to approach things? (I want to run my model on my phone). If it is, how do I then fix the error? :/
Thank you very much.
Change your model input (the image_tensor placeholder) to have data type tf.float32.

macos: converting dot to png

I've studied the solutions outlined here:
Converting dot to png in python
However, none of these solutions work for me. In particular, when I try the check_call method, I get the following error:
File "/Users/anaconda/lib/python2.7/subprocess.py", line 1343, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
When I use pydot, I get this error: (graph,)=pydot.graph_from_dot_data(dotfile.getvalue())
TypeError: 'Dot' object is not iterable
Here is some example code I found on one of the above posts that I've been testing:
from sklearn import tree
import pydot
import StringIO
from subprocess import check_call
# Define training and target set for the classifier
train = [[1,2,3],[2,5,1],[2,1,7]]
target = [10,20,30]
# Initialize Classifier. Random values are initialized with always the same random seed of value 0
# (allows reproducible results)
dectree = tree.DecisionTreeClassifier(random_state=0)
dectree.fit(train, target)
# Test classifier with other, unknown feature vector
test = [2,2,3]
predicted = dectree.predict(test)
dotfile = StringIO.StringIO()
tree.export_graphviz(dectree, out_file=dotfile)
check_call(['dot','-Tpng','InputFile.dot','-o','OutputFile.png'])
(graph,)=pydot.graph_from_dot_data(dotfile.getvalue())
graph.write_png("dtree.png")
Thanks in advance.
OSError: [Errno 2] No such file or directory indicates that InputFile.dot may not be present on your filesystem.
In case you're just interested in converting dot to png, I've created a simple python example sample_tree.py which generates a png from a dot file which is working on my mac,
import pydot
from subprocess import check_call
graph = pydot.Dot(graph_type='graph')
for i in xrange(2):
edge = pydot.Edge("a", "b%d" % i)
graph.add_edge(edge)
graph.write_png('sample_tree.png')
# If a dot file needs to be created as well
graph.write_dot('sample_tree.dot')
check_call(['dot','-Tpng','sample_tree.dot','-o','OutputFile.png'])
Btw, this dtree example has also been used here Sckit learn with GraphViz exports empty outputs just in case any other similar issues are encountered. Thanks.

Categories