tf.image.decode_jpeg often taking forever to load file - python

The following code is part of my code for a tf graph to read images. When I use this code to iterate through the data, the program gets stuck in tf.io.read_file(path) after a few hundred images forever and doesn't do anything. More specifically, the code even can't be paused and I had to restart the session every time.
#tf.function()
def read_image(path):
image = tf.io.read_file(path)
image = tf.image.decode_jpeg(image)
return image
...
div8k_list=[os.path.join(div8k_save_path, x) for x in os.listdir(div8k_save_path)]
train_path = tf.data.Dataset.from_tensor_slices(div8k_list)
train_images = train_path.map(read_image, num_parallel_calls=tf.data.AUTOTUNE)
I first suspected that there were a few corrupted images or wrong paths in the data that were causing this problem and tested the following code.
for path in train_path:
print(path)
image = tf.io.read_file(path)
image = tf.image.decode_jpeg(image)
Surprisingly, there was no common characteristic of the image path the loop was stuck. And it was not a problem of the image because the loop was once stuck at 1056.png but when I explicitly loaded 1056.png, there was no problem.
What could be the cause of this problem?
edit: to summarize, the program is stuck at read_image forever, while I couldn't find a problem in the dataset.
My dataset is the DIV8K dataset and I am running in COLAB.
EDIT The function that is slowing my code is decode_jpeg, because the following definition of read_image worked multiple times.
#tf.function()
def read_image(path):
image = tf.io.read_file(path)
image = tf.image.decode_jpeg(image)
return image

As mentioned in the comment, try the following function to decode the image file as it can handle mixed extension file format (jpg, png etc), ref.
tf.io.decode_image(image, expand_animations = False)
However, decode_jpeg should also able to handle the .png file format now. Without the image file, it's hard to break down what's causing to prevent this. Most probably the file is somehow corrupted or not the valid extension for the decode_jpeg, though it's named that way, check this solution.

Related

convert all img in one pdf .?

I would like to finish my script, I tried a lot to solve but being a beginner failed.
I have a function imageio which takes image from website and after that, i would like resize all images in 63x88 and put all my images in one pdf.
full_path = os.path.join(filePath1, name + ".png")
if os.path.exists(full_path):
number = 1
while True:
full_path = os.path.join(filePath1, name + str(number) + ".png")
if not os.path.exists(full_path):
break
number += 1
imageio.imwrite(full_path, im_padded.astype(np.uint8))
os.chmod(full_path, mode=0o777)
thanks for answer
We (ImageIO) currently don't have a PDF reader/writer. There is a long-standing features request for it, which hasn't been implemented yet because there is currently nobody willing to contribute it.
Regarding the loading of images, we have an example for this in the docs:
import imageio as iio
from pathlib import Path
images = list()
for file in Path("path/to/folder").iterdir():
im = iio.imread(file)
images.append(im)
The caveat is that this particular example assumes that you want to read all images in a folder, and that there is only images in said folder. If either of these cases doesn't apply to you, you can easily customize the snippet.
Regarding the resizing of images, you have several options, and I recommend scikit-image's resize function.
To then get all the images into a PDF, you could have a look at matplotlib, which can generate a figure which you can save as a PDF file. The exact steps to do so will depend on the desired layout of your resulting pdf.

List Index out of range.. works on google colab but not on local machine?

I'm trying to recreate this project on my local machine. It's designed to run on Google Colab and I've recreated it there, and it works just fine. I want to try running it on my local machine now, so I installed all the required packages, anaconda, Juypter Notebook etc.
When I come to the part where I process the images:
# Loops through imagepaths to load images and labels into arrays
for path in imagepaths:
img = cv2.imread(path) # Reads image and returns np.array
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Converts into the corret colorspace (GRAY)
img = cv2.resize(img, (320, 120)) # Reduce image size so training can be faster
X.append(img)
#Processing label in image path
category = path.split("/")[3]
label = int(category.split("_")[0][1])
y.append(label)
It throws the following error:
IndexError: list index out of range
The code has not been changed, for the most part, and the dataset is the same. The only difference is I'm running locally vs google colab. I searched online and someone said do len(path) to verify that (in my case) it goes up to [3], which it does (its size 33).
Code has changed here:
I did not use this line, as I'm not using google colab:
from google.colab import files
The "files" is used in this part of code:
# We need to get all the paths for the images to later load them
imagepaths = []
# Go through all the files and subdirectories inside a folder and save path to images inside list
for root, dirs, files in os.walk(".", topdown=False):
for name in files:
path = os.path.join(root, name)
if path.endswith("png"): # We want only the images
imagepaths.append(path)
print(len(imagepaths)) # If > 0, then a PNG image was loaded
On my local machine, I removed the from google.colab... line, and ran everything else normally. The keyword files is used in the code snippet above, however when running it I was thrown no errors. **NOTE len(path) on Jupyter shows 33, len(path) on Google shows 16..?**
Does anyone have any idea what the issue could be? I don't think it came from removing that one line of code. If it did, what do you suggest I do to fix it?
Your local machine is running on Windows while the colab runs on linux and the path separators are different for both.
Now you need to replace
category = path.split("/")[3]
with
category = path.split("\\")[2]
And your code should work.

Memory error while reading lots of images

Reading 10000 images using OpenCv python
for filename in os.listdir(directory)
img = cv2.imread(os.path.join(directory,filename))
#Different Image preprocessing function applied after reading it
#After 700 image preprocess it gives memory error
How to resolve this error
We overcome this problem by using the Global image read variable and empty the variable after getting certain information.

overcome Graphdef cannot be larger than 2GB in tensorflow

I am using tensorflow's imageNet trained model to extract the last pooling layer's features as representation vectors for a new dataset of images.
The model as is predicts on a new image as follows:
python classify_image.py --image_file new_image.jpeg
I edited the main function so that I can take a folder of images and return the prediction on all images at once and write the feature vectors in a csv file. Here is how I did that:
def main(_):
maybe_download_and_extract()
#image = (FLAGS.image_file if FLAGS.image_file else
# os.path.join(FLAGS.model_dir, 'cropped_panda.jpg'))
#edit to take a directory of image files instead of a one file
if FLAGS.data_folder:
images_folder=FLAGS.data_folder
list_of_images = os.listdir(images_folder)
else:
raise ValueError("Please specify image folder")
with open("feature_data.csv", "wb") as f:
feature_writer = csv.writer(f, delimiter='|')
for image in list_of_images:
print(image)
current_features = run_inference_on_image(images_folder+"/"+image)
feature_writer.writerow([image]+current_features)
It worked just fine for around 21 images but then crashed with the following error:
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1912, in as_graph_def
raise ValueError("GraphDef cannot be larger than 2GB.")
ValueError: GraphDef cannot be larger than 2GB.
I thought by calling the method run_inference_on_image(images_folder+"/"+image) the previous image data would be overwritten to only consider the new image data, which doesn't seem to be the case. How to resolve this issue?
The problem here is that each call to run_inference_on_image() adds nodes to the same graph, which eventually exceeds the maximum size. There are at least two ways to fix this:
The easy but slow way is to use a different default graph for each call to run_inference_on_image():
for image in list_of_images:
# ...
with tf.Graph().as_default():
current_features = run_inference_on_image(images_folder+"/"+image)
# ...
The more involved but more efficient way is to modify run_inference_on_image() to run on multiple images. Relocate your for loop to surround this sess.run() call, and you will no longer have to reconstruct the entire model on each call, which should make processing each image much faster.
You can move the create_graph() to somewhere before this loop for image in list_of_images: (which loops over files).
What it does is performing inference multiple times on the same graph.
The simplest way is put create_graph() at the first of main function.
Then, it just create graph only
A good explanation of why such errors is mentioned here, I encountered the same error while using tf dataset api and came to the understanding that data when iterated over in the session, its getting appended on the existing graph. so what I did is used tf.reset_default_graph() before the dataset iterator to make sure that previous graph is cleared away.
Hope this helps for such a scenario.

PIL "IOError: image file truncated" with big images

I think this problem is not Zope-related. Nonetheless I'll explain what I'm trying to do:
I'm using a PUT_factory in Zope to upload images to the ZODB per FTP. The uploaded image is saved as a Zope Image inside a newly created container object. This works fine, but I want to resize the image if it exceeds a certain size (width and height). So I'm using the thumbnail function of PIL to resize them i.e. to 200x200. This works fine as long as the uploaded images are relatively small. I didn't check out the exact limit, but 976x1296px is still ok.
With bigger pictures I get:
Module PIL.Image, line 1559, in thumbnail
Module PIL.ImageFile, line 201, in load
IOError: image file is truncated (nn bytes not processed).
I tested a lot of jpegs from my camera. I don't think they are all truncated.
Here is my code:
if img and img.meta_type == 'Image':
pilImg = PIL.Image.open( StringIO(str(img.data)) )
elif imgData:
pilImg = PIL.Image.open( StringIO(imgData) )
pilImg.thumbnail((width, height), PIL.Image.ANTIALIAS)
As I'm using a PUT_factory, I don't have a file object, I'm using either the raw data from the factory or a previously created (Zope) Image object.
I've heard that PIL handles image data differently when a certain size is exceeded, but I don't know how to adjust my code. Or is it related to PIL's lazy loading?
I'm a little late to reply here, but I ran into a similar problem and I wanted to share my solution. First, here's a pretty typical stack trace for this problem:
Traceback (most recent call last):
...
File ..., line 2064, in ...
im.thumbnail(DEFAULT_THUMBNAIL_SIZE, Image.ANTIALIAS)
File "/Library/Python/2.7/site-packages/PIL/Image.py", line 1572, in thumbnail
self.load()
File "/Library/Python/2.7/site-packages/PIL/ImageFile.py", line 220, in load
raise IOError("image file is truncated (%d bytes not processed)" % len(b))
IOError: image file is truncated (57 bytes not processed)
If we look around line 220 (in your case line 201—perhaps you are running a slightly different version), we see that PIL is reading in blocks of the file and that it expects that the blocks are going to be of a certain size. It turns out that you can ask PIL to be tolerant of files that are truncated (missing some file from the block) by changing a setting.
Somewhere before your code block, simply add the following:
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
...and you should be good.
EDIT: It looks like this helps for the version of PIL bundled with Pillow ("pip install pillow"), but may not work for default installations of PIL
Here is what I did:
Edit LOAD_TRUNCATED_IMAGES = False line from /usr/lib/python3/dist-packages/PIL/ImageFile.py:40 to LOAD_TRUNCATED_IMAGES = True.
Editing the file requires root access though.
I encountered this error while running some pytorch which was maybe using the PIL library.
Do this fix only if you encounter this error, without directly using PIL.
Else please do
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
Best thing is that you can:
if img and img.meta_type == 'Image':
pilImg = PIL.Image.open( StringIO(str(img.data)) )
elif imgData:
pilImg = PIL.Image.open( StringIO(imgData) )
try:
pilImg.load()
except IOError:
pass # You can always log it to logger
pilImg.thumbnail((width, height), PIL.Image.ANTIALIAS)
As dumb as it seems - it will work like a miracle. If your image has missing data, it will be filled with gray (check the bottom of your image).
Note: usage of camel case in Python is discouraged and is used only in class names.
This might not be a PIL issue. It might be related to your HTTP Server setting. HTTP servers put a limit on the size of the entity body that will be accepted.
For eg, in Apache FCGI, the option FcgidMaxRequestLen determines the maximum size of file that can be uploaded.
Check that for your server - it might be the one that is limiting the upload size.
I was trapped with the same problem. However, setting ImageFile.LOAD_TRUNCATED_IMAGES = True is not suitable in my case, and I have checked that all my image files were unbroken, but big.
I read the images using cv2, and then converted it to PIL.Image to get round the problem.
img = cv2.imread(imgfile, cv2.IMREAD_GRAYSCALE)
img = Image.fromarray(img)
I had to change the tds version to 7.2 to prevent this from happening. Also works with tds version 8.0, however I had some other issues with 8.0.
When image was partly broken, OSError will be caused.
I use below code for check and save the wrong image list.
try:
img = Image.open(file_path)
img.load()
## do the work with the loaded image
except OSError as error:
print(f"{error} ({file_path})")
with open("./error_file_list.txt", "a") as error_log:
log.write(str(file_path))

Categories