Python PIL (Pillow) resizes my pictures after modifying exif data

Python PIL (Pillow) resizes my pictures after modifying exif data - python

i made a small script in Python, which can set the exif data of my old Whatsapp pictures based on their filename.
I use the piexif and the PIL (Pillow) package.
import piexif
from PIL import Image
from collections import defaultdict
img = Image.open(fname)
try:
exif_dict = piexif.load(img.info["exif"])
except KeyError:
exif_dict = defaultdict(dict)
exif_dict['Exif'][piexif.ExifIFD.DateTimeOriginal] = exiftime(date)
exif_dict['Exif'][piexif.ExifIFD.DateTimeDigitized] = exiftime(date)
exif_bytes = piexif.dump(exif_dict)
img.save('%s' % fname, "jpeg", exif=exif_bytes)
The exiftime() function is only for formatting the date.
However, the script is setting some exif fields, i don't modify compression or someting like that.
My problem is, that the pictures get much smaller, after running that script.
I tested this script with some sample images, e.g. a picture shot with a Nikon D5300 with an resolution of 6000x4000. The original file has about 12Mb, after the script it has only 4Mb.
Does the script cause a quality loss of the picture, or is it just a better compression?

Pillow's .save automatically compresses with a default of 75% quality according to the documentation. You can bump that up to 100% (add quality=100), which will minimize compression and looks like it will skip some compression components completely, but Pillow apparently doesn't have the ability to skip compression altogether. Very few packages do this, and I'm unaware of any in the form of a Python module. Note that the docs say not to raise quality over 95, and I can attest that doing so outputs a BIGGER file.. weird.

A bit of a late answer, but a similar post pointed out a solution here to only write exif information in the file (using piexif).
As a consequence the content of the image is not altered, since no compression (through the "save" command) is done.

Related

loading and saving a JPEG image results in different file content

Here's a script that reads a JPG image and then writes 2 JPG images:
import cv2
# https://github.com/opencv/opencv/blob/master/samples/data/baboon.jpg
input_path = './baboon.jpg'
# Read image
im = cv2.imread(input_path)
# Write image using default quality (95)
cv2.imwrite('./baboon_out.jpg', im)
# Write image using best quality
cv2.imwrite('./baboon_out_100.jpg', im, [cv2.IMWRITE_JPEG_QUALITY, 100])
after running the above script, here's what the files look like:
.
├── baboon.jpg
├── baboon_out.jpg
├── baboon_out_100.jpg
└── main.py
However, the MD5 checksums of the JPGs created by the script do not match the original:
>>> md5 ./baboon.jpg
MD5 (./baboon.jpg) = 9a7171af1d6c6f0901d36d04e1bd68ad
>>> md5 ./baboon_out.jpg
MD5 (./baboon_out.jpg) = 1014782b9e228e848bc63bfba3fb49d9
>>> md5 ./baboon_out_100.jpg
MD5 (./baboon_out_100.jpg) = dbadd2fadad900e289e285393778ad89
Is there anyway to preserve the original image content with OpenCV? In which step is the data being modified?

It's a simplification, but the image will probably always change the checksum if you're reading a JPEG image because it's a lossy format being re-encoded and read by an avalanching hash and definitely will if you do any meaningful work on it, even if (especially if) you directly manipulated the bits of the file rather than loading it as an image
If you want the hash to be the same, just copy the file!
If you're looking for pixel-perfect manipulation, try a lossless format like PNG and understand if you are comparing the pixels or meta attributes a-la Exif (ie. should the Google, GitHub (Microsoft), and Twitter-affiliated fields from your baboon.jpg be included in your comparison? what about color data or depth information..?)
$ exiftool baboon.jpg | grep Description
Description : Open Source Computer Vision Library. Contribute to opencv/opencv development by creating an account on GitHub.
Twitter Description : Open Source Computer Vision Library. Contribute to opencv/opencv development by creating an account on GitHub.
Finally, watch out specifically when using MD5 as it's largely considered broken - a more modern checksum like SHA-256 might help with unforseen issues

There's a fundamental misconception here about the way images are stored and what lossy/lossless means.
Let's imagine we have a really simple 1-pixel grey image and that pixel has brightness 100. Let's make our own format for storing our image... I'm going to choose to store it as a Python script. Ok, here's our file:
print(100)
Now, let's do another lossless version, like PNG might:
print(10*10)
Now we have recreated our image losslessly, because we still get 100, but the MD5 hash is clearly going to be different. PNG does filtering before compressing and it checks whether each pixel is similar to the ones above and to the left in order to save space. One encoder might decide it can do better by looking at the pixel above, another one might think it can do better by looking at the pixel to the left. The result is the same.
Now let's make a new version of the file that stores the image:
# Updated 2022-08-12T21:15:46.544Z
print(100)
Now there's a new comment. The metadata about the image has changed, and so has the MD5 hash. But no pixel data has changed so it is a lossless change. PNG images often store the date/time in the file like this, so even two bit-for-bit identical images will have a different hash for the same pixels if saved 1 second apart.
Now let's make our single pixel image like a JPEG:
print(99)
According to JPEG, that'll look pretty similar, and it's 1-byte shorter, so it's fine. How do you think the MD5 hash will compare?
Now there's a different JPEG library, it thinks this is pretty close too:
print(98)
Again, the image will appear indistinguishable from the original, but your MD5 hash will be different.
TL/DR;
If you want an identical, bitwise perfect copy of your file, use the copy command.
Further food for thought... when you read an image with OpenCV, you actually get a Numpy array of pixels, don't you. Where is the comment that was in your image? The copyright? The GPS location? The camera make and model and shutter speed? The ICC colour profile? They are not in your Numpy array of pixels, are they? They are lost. So, even if the PNG/JPEG encoder made exactly the same decisions as the original encoder, your MD5 hash would still be different because all that other data is missing.

Resize image with Python and keep EXIF and XMP metadata

Its been asked a lot of times how to resize an image and keep the existing exif data. I'm able to do that easily with PIL:
from PIL import Image
im = Image.open("image.jpeg")
exif = im.info['exif']
# process the image, for example resize:
im_resized = im.resize((1920, 1080), resample=PIL.Image.LANCZOS)
im_resized.save("resized.jpeg", quality=70, exif=exif)
I was wondering is there a way to keep the XMP metadata from the original image? There's a lot of GPS data in the XMP which I would love to keep in the resized version.

Not even the successor Pillow can read XMP (and IPTC). Also you're not really keeping anything - you create a whole new file and add a copy of the other EXIF data (including now potentially invalid information like width/height).
JFIF files aren't black magic - their file format can be parsed rather easily; XMP is most likely found in an APP1 segment (just like EXIF). In rare cases it spans over multiple segments. You could first use PIL/Pillow to create your file and then modify it - inserting additional segments is trivial and needs no additional work.

How can I compress an image in Python?

I'm working on a image transmition project in which my JPEG image must be transfered via LoRa, so there are a lot of limitations.
I'm working transfering the image in small chunks but their actual size aren't good enough and I can't reduce their individual sizes by dividing the image even more cause the time to send each chunk is significative.
So, I'm looking for alternatives to compress the data of these small chunks but didn't found anything in Python that allow me to do this to an Image loaded with Pillow.
Note that I don't want to resize the image, just to compress it's data.
I'm looking for suggestion on how to do this.
Must say that I can change my mind on using Pillow if necessary.
One strange effect that is happening and I don't know why is that I never get a chunk with less than about 600bytes. I need something close to 300 bytes.

Modern image formats such PNG and JPEG are already compressed and my general recommendation is take Brendan Long's advice and use those formats and exploit all the work that's been put into them.
That said, if you want to compress the contents of any arbitrary file in Python, here's a very simple example:
import zlib
with open("MyImage.jpg", "rb") as in_file:
compressed = zlib.compress(in_file.read(), 9)
with open("MyCompressedFile", "wb") as out_file:
out_file.write(compressed)

Reading a .JPG Image and Saving it without file size change

I want to write a python code that reads a .jpg picture, alter some of its RBG components and save it again, without changing the picture size.
I tried to load the picture using OpenCV and PyGame, however, when I tried a simple Load/Save code, using three different functions, the resulting images is greater in size than the initial image. This is the code I used.
>>> import cv, pygame # Importing OpenCV & PyGame libraries.
>>> image_opencv = cv.LoadImage('lena.jpg')
>>> image_opencv_matrix = cv.LoadImageM('lena.jpg')
>>> image_pygame = pygame.image.load('lena.jpg')
>>> cv.SaveImage('lena_opencv.jpg', image_opencv)
>>> cv.SaveImage('lena_opencv_matrix.jpg', image_opencv_matrix)
>>> pygame.image.save(image_pygame, 'lena_pygame.jpg')
The original size was 48.3K, and the resulting are 75.5K, 75.5K, 49.9K.
So, I'm not sure I'm missing something that makes the picture original size changes, although I only made a Load/Save, or not?
And is there a better library to use rather than OpenCV or PyGame ?!

JPEG is a lossy image format. When you open and save one, you’re encoding the entire image again. You can adjust the quality settings to approximate the original file size, but you’re going to lose some image quality regardless. There’s no general way to know what the original quality setting was, but if the file size is important, you could guess until you get it close.

The size of a JPEG output depends on 3 things:
The dimensions of the original image. In your case these are the same for all 3 examples.
The color complexity within the image. An image with a lot of detail will be bigger than one that is totally blank.
The quality setting used in the encoder. In your case you used the defaults, which appear to be higher for OpenCV vs. PyGame. A better quality setting will generate a file that's closer to the original (less lossy) but larger.
Because of the lossy nature of JPEG some of this is slightly unpredictable. You can save an image with a particular quality setting, open that new image and save it again at the exact same quality setting, and it will probably be slightly different in size because of the changes introduced when you saved it the first time.

Convert DICOM to TIFF

I'm new to Python so forgive my ignorance If I don't have all the info correct. I'm trying raster through a directory and convert all the DICOM files within to TIFF files. I have gotten the search functionality to work, but I am having a hard time saving the images as TIFFs. I'm using the pydicom libraries to read in the DICOM and manipulate the header information. Also, I have tried using the save_as function in pydicom to save to TIFF, but I would rather use the save function in PIL to properly set the compression of the TIFF. I think the problem is that I can't/don't understand how to extract the actual image data from a DICOM and place it in a new image.Any Help would be greatly appreciated ... Cheers
Python 2.7
PIL 1.1.7
Pydicom 0.9.6

Found an answer online to the same query sometime back, although I don't remember the site but as I applied it to my code, sharing it here for others as well :
import pylab
import dicom
ImageFile=dicom.read_file(<SourceFilePath>) #Path to "*.dcm" file
pylab.imshow(ImageFile.pixel_array,cmap=pylab.cm.bone) #to view image
or if you want to save the image then instead use:
pylab.imsave('<DestinationFilePath>',ImageFile.pixel_array,cmap=pylab.cm.bone)
The imsave will by default save the image in .png format though. You can specify the desired format in the imsave() if it is supported.
Hope it is useful.

If you know how to use PIL to save image data as .tiff, this example should help you to pass image data from pydicom to PIL (there is more here in the comments).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.