I am currently working on a project using audio data. The first step of the project is to use another model to produce features for the audio example that are about [400 x 10_000] for each wav file and each wav file will have a label that I'm trying to predict. I will then build another model on top of this to produce my final result.
I don't want to run preprocessing every time I run the model, so my plan was to have a preprocessing pipeline that runs the feature extraction model and saves it into a new folder and then I can just have the second model use the saved features directly. I was looking at using TFRecords, but the documentation is quite unhelpful.
tf.io.serialize_tensor
tfrecord
This is what I've come up with to test it so far:
serialized_features = tf.io.serialize_tensor(features)
feature_of_bytes = tf.train.Feature(
bytes_list=tf.train.BytesList(value=[serialized_features.numpy()]))
features_for_example = {
'feature0': feature_of_bytes
}
example_proto = tf.train.Example(
features=tf.train.Features(feature=features_for_example))
filename = 'test.tfrecord'
writer = tf.io.TFRecordWriter(filename)
writer.write(example_proto.SerializeToString())
filenames = [filename]
raw_dataset = tf.data.TFRecordDataset(filenames)
for raw_record in raw_dataset.take(1):
example = tf.train.Example()
example.ParseFromString(raw_record.numpy())
print(example)
But I'm getting this error:
tensorflow.python.framework.errors_impl.DataLossError: truncated record at 0' failed with Read less bytes than requested
tl;dr:
Getting the above error with TFRecords. Any recommendations to get this example working or another solution not using TFRecords?
I am trying to save sequence of predicted images into specific folder but its not working. Used below code and its running but not writing/saving the images into "results" folder.
save_image_path = f"results/{image_name}"
cv2.imwrite(save_image_path, cat_image)
Tried with full folder path also but its not working and error is occurring.
save_image_path = r'D:\Medical Imaging\Code\segmentation\results'
cv2.imwrite(save_image_path/{image_name}, cat_image)
Tried without extension and error is occurring.
save_image_path = r'D:\Medical Imaging\Code\segmentation\results'
cv2.imwrite(save_image_path, cat_image)
I am using pycharm IDE, Please suggest/guide if possible.
You need to spesify the file type ( ".jpg",".png",..) as;
save_image_path = f"results/{image_name}.jpg"
cv2.imwrite(save_image_path, cat_image)
Following is the code :-
modelDoc = Doc2Vec(size=300, window=5, dm=0, dbow_words=1, hs=0, negative=10, alpha=0.05, min_count=20,
workers=cores, sample=1e-5, seed=0, iter=10)
modelDoc.build_vocab(finalSent)
modelDoc.save(save_model)
my version :
gensim==3.8.1
numpy==1.16.2
after saving the model
only vocab_model file is generated
vocab_model.docvecs.doctag_syn0.npy is not generated.
what is the use of this file and does it is necessary to generate this file.
Were there any errors shown during the .save()?
Does the saved file load & work as expected? (In this case, since the original model wasn't trained, does it train alright as if the save-then-load hadn't happened?)
If there's no error, & it works, it's fine.
(Wha's the reason that a file of this name was expected, and its absence was a concern?)
I am trying to use the flow_from_dataframe method of Keras to read training and testing images.
Both my training and testing images are in same directory, and I read the paths from two different csv files.
My code for reading test images looks like,
# Read test file
testdf = pd.read_csv("test.csv")
# load images
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_dataframe(
dataframe=testdf, directory=IMAGE_PATH,
x_col='image_name', y_col=None,
has_ext=True, target_size=(10,10)
,batch_size=32,color_mode='rgb',shuffle=False, class_mode=None)
I get output like this
Found 0 images.
While the similar code for reading training data works properly. I checked if the images exist at the given path, which they do. What are some possible reasons for this error? How can I try to debug the issue?
EDIT: This is a regression task, so all images are in a single directory, and not in subdirectories, as would be expected for a classification task.
EDIT 2: I added usecols=[0] to read_csv, and now test_datagen finds all the images in the directory, and not just the one's that are mentioned in the test.csv file
The issue happens due to NaN's in the dataframe. Ignoring those columns doesn't work. The solution is to replace the NaN's with something else. For example,
testdf = pd.read_csv("test.csv")
testdf.fillna(0, inplace=True)
This replaces the NaN's with 0. Then using ImageDataGenerator as usual works.
I was also facing the same error and found a solution for this.
I was using the absolute path, was using correct DataFrame and everything was fine still the code was throwing an error - "image not found".
I inspected and found that my dataframe was containing image names without extension and the images in the folder was having extension also.
E.g. The image name in DataFrame was 'abc' but the image in the folder was having a name 'abc.png'.
Just add .png in the image names in DataFrame and it will solve your problem.
I just tried below code and it worked out..!!!!
def append_ext(fn):
return fn+".png"
train_valid_data["id_code"]=train_valid_data["id_code"].apply(append_ext)
test_data["id_code"]=test_data["id_code"].apply(append_ext)
Let me know if it solves your problem or if you need any further explanation.
I have the same problem. First, make sure you got the absolute path correctly for the parameter directory.
The filename in my df has value image.pgm.png and the actual image file in the folder has the format image.pgm.
I tried to change the filename in df to image.pgm => Still not working
I renamed the image file from image.pgm to image.pgm.png which matches exactly the format in the df => Worked!
I had the same error,
What I found is that I missed the directory path, and the image extension that was not in the data frame,
So make sure that your directory path is correct and an extension to your image, as you can do the following:
def extention_train_data(x):
return x+".jpg"
change the jpg extension if you have an other one.
then you apply this to you data frame:
train_data['image'] = train_data['image_id'].apply(extention_train_data)
once you have the image column containing your image with its extension then
train_generator = datagen.flow_from_dataframe(
train_data,
directory="/kaggle/input/plant-pathology-2020-fgvc7/images/",
x_col = "image",
y_col = "label",
target_size = size,
class_mode = "binary",
batch_size = batch_size,
subset="training",
shuffle = True,
seed = 42,
)
Okay, so I have been having the same issues. Where my data labels were in a csv file , and the image data in a separate folder.I thought, the issue was being caused by the labels and the images in the folder not aligning properly.Did a whole bunch of stuff to rectify and process the data. It was not the problem.
So, anyone who's having issues.
I tried #Oussama Ouardini's answer and it worked. Thank you!
I am also going to add - that if you are doing a train and validation split to make sure the initial ImageDataGenerator object you create has the validation split specified.
def extension_train_data(x):
return "xc"+str(x)+".png"
train_df['file_id'] = train_df['file_id'].apply(extension_train_data)
Here is my code -
datagen=ImageDataGenerator(rescale=1./255,validation_split=0.2)
#rescale all pixel values from 0-255, so after this step all our
#pixel values are in range (0,1)
train_generator=datagen.flow_from_dataframe(dataframe=train_df,directory='./img_data/', x_col="file_id", y_col="english_cname",
class_mode="categorical",save_to_dir='./new folder/',
target_size=(64,64),subset="training",
seed=42,batch_size=32,shuffle=False)
val_generator=datagen.flow_from_dataframe(dataframe=train_df,directory='./img_d
ata/', x_col="file_id", y_col="english_cname",
class_mode="categorical",
target_size=(64,64),subset="validation",
seed=42,batch_size=32,shuffle=False)
print("\n Sanity check Line.--------")
My output was a succesfully validated image files. :)
Found 212 validated image filenames belonging to 88 classes.
Found 52 validated image filenames belonging to 88 classes.
Sanity check Line.----------
I hope someone will find this useful. Cheers!
I'm trying to build a CNN with Tensorflow (r1.4) based on the API tf.estimator. It's a canned model. The idea is to train and evaluate the network with estimator in python and use the prediction in C++ without estimator by loading a pb file generated after the training.
My first question is, is it possible?
If yes, the training part works and the prediction part works too (with pb file generated without estimator) but it doesn't work when I load a pb file from estimator.
I got this error : "Data loss: Can't parse saved_model.pb as binary proto"
My pyhon code to export my model :
feature_spec = {'input_image': parsing_ops.FixedLenFeature(dtype=dtypes.float32, shape=[1, 48 * 48])}
export_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)
input_fn = tf.estimator.inputs.numpy_input_fn(self.eval_features,
self.eval_label,
shuffle=False,
num_epochs=1)
eval_result = self.model.evaluate(input_fn=input_fn, name='eval')
exporter = tf.estimator.FinalExporter('save_model', export_input_fn)
exporter.export(estimator=self.model, export_path=MODEL_DIR,
checkpoint_path=self.model.latest_checkpoint(),
eval_result=eval_result,
is_the_final_export=True)
It doesn't work neither with tf.estimator.Estimator.export_savedmodel()
If one of you knows an explicit tutorial on estimator with canned model and how to export it, I'm interested
Please look at this issue on github, it looks like you have the same problem. Apparently (at least when using estimator.export_savedmodel) you should load the graph with LoadSavedModel instead of ReadBinaryProto, because it's not saved as a graphdef file.
You'll find here a bit more instructions about how to use it:
const string export_dir = ...
SavedModelBundle bundle;
...
LoadSavedModel(session_options, run_options, export_dir, {kSavedModelTagTrain},
&bundle);
I can't seem to find the SavedModelBundle documentation for c++ to use it afterwards, but it's likely close to the same class in Java, in which case it basically contains the session and the graph you'll be using.