Remove EXIF from Image Before Upload to S3 in Python

Remove EXIF from Image Before Upload to S3 in Python - python

I want to remove exif from an image before uploading to s3. I found a similar question (here), but it saves as a new file (I don't want it). Then I found an another way (here), then I tried to implemented it, everything was ok when I tested it. But after I deployed to prod, some users reported they occasionally got a problem while uploading images with a size of 1 MB and above, so they must try it several times.
So, I just want to make sure is my code correct?, or maybe there is something I can improve.
from PIL import Image
# I got body from http Request
img = Image.open(body)
img_format = img.format
# Save it in-memory to remove EXIF
temp = io.BytesIO()
img.save(temp, format=img_format)
body = io.BytesIO(temp.getvalue())
# Upload to s3
s3_client.upload_fileobj(body, BUCKET_NAME, file_key)
*I'm still finding out if this issue is caused by other things.

You should be able to copy the pixel data and palette (if any) from an existing image to a new stripped image like this:
from PIL import Image
# Load existing image
existing = Image.open(...)
# Create new empty image, same size and mode
stripped = Image.new(existing.mode, existing.size)
# Copy pixels, but not metadata, across
stripped.putdata(existing.getdata())
# Copy palette across, if any
if 'P' in existing.mode: stripped.putpalette(existing.getpalette())
Note that this will strip ALL metadata from your image... EXIF, comments, IPTC, 8BIM, ICC colour profiles, dpi, copyright, whether it is progressive, whether it is animated.
Note also that it will write JPEG images with PIL's default quality of 75 when you save it, which may or may not be the same as your original image had - i.e. the size may change.
If the above stripping is excessive, you could just strip the EXIF like this:
from PIL import Image
im = Image.open(...)
# Strip just EXIF data
if 'exif' in im.info: del im.info['exif']
When saving, you could test if JPEG, and propagate the existing quality forward with:
im.save(..., quality='keep')
Note: If you want to verify what metadata is in any given image, before and after stripping, you can use exiftool or ImageMagick on macOS, Linux and Windows, as follows:
exiftool SOMEIMAGE.JPG
magick identify -verbose SOMEIMAGE.JPG

Related

Save JPEG comment using Pillow

I need to save an Image in Python (created as a Numpy array) as a JPEG file, while including a "comment" in the file with some specific metadata. This metadata will be used by another (third-party) application and is a simple ASCII string. I have a sample image including such a "comment", which I can read out using Pillow (PIL), via the image.info['comment'] or the image.app['COM'] property. However, when I try a simple round-trip, i.e. loading my sample image and save it again using a different file name, the comment is no longer preserved. Equally, I found no way to include a comment in a newly created image.
I am aware that EXIF tags are the preferred way to save metadata in JPEG images, but as mentioned, the third-party application only accepts this data as a "comment", not as EXIF, which I cannot change. After reading this question, I looked into the binary structure of my sample file and found the comment at the start of the file, after a few bytes of some other (meta)data. I do however not know a lot about binary file manipulation, and also I was wondering if there is a more elegant way, other than messing with the binary...
EDIT: minimum example:
from PIL import Image
img = Image.open(path) # where path is the path to the sample image
# this prints the desired metadata if it is correctly saved in loaded image
print(img.info["comment"])
img.save(new_path) # save with different file name
img.close()
# now open to see if it has been saved correctly
new_img = Image.open(new_path)
print(new_img.info['comment']) # now results in KeyError
I also tried img.save(new_path, info=img.info), but this does not seem to have an effect. Since img.info['comment'] appears identical to img.app['COM'], I tried img.save(new_path, app=img.app), again does not work.

Just been having a play with this and I couldn't see anything directly in Pillow to support this. I've found that the save() method supports a parameter called extra that can be used to pass arbitrary bytes to the output file.
We then just need a simple method to turn a comment into a valid JPEG segment, for example:
import struct
from PIL import Image
def make_jpeg_variable_segment(marker: int, payload: bytes) -> bytes:
"make a JPEG segment from the given payload"
return struct.pack('>HH', marker, 2 + len(payload)) + payload
def make_jpeg_comment_segment(comment: bytes) -> bytes:
"make a JPEG comment/COM segment"
return make_jpeg_variable_segment(0xFFFE, comment)
# open source image
with Image.open("foo.jpeg") as im:
# save out with new JPEG comment
im.save('bar.jpeg', extra=make_jpeg_comment_segment("hello world".encode()))
# read file back in to ensure comment round-trips
with Image.open('bar.jpeg') as im:
print(im.app['COM'])
print(im.info['comment'])
Note that in my initial attempts I tried appending the comment segment at the end of the file, but Pillow wouldn't load this comment even after calling the .load() method to force it to load the entire JPEG file.
Update: The upcoming version Pillow version 9.4.0 will support this by passing a comment parameter while saving, e.g.:
with Image.open("foo.jpeg") as im:
im.save('bar.jpeg', comment="hello world")
hopefully that makes things easier!

Only one image from 5 is downloaded and it knocks out an error

import requests
from PIL import Image
url_shoes_for_choice = [
"https://content.adidas.co.in/static/Product-CM7531/Unisex_OUTDOOR_SANDALS_CM7531_1.jpg",
"https://cdn.shopify.com/s/files/1/0080/1374/2161/products/product-image-897958210_640x.jpg?v=1571713841",
"https://cdn.chamaripashoes.com/media/catalog/product/cache/9/image/9df78eab33525d08d6e5fb8d27136e95/1/_/1_8_3.jpg",
"https://ae01.alicdn.com/kf/HTB1EyKjaI_vK1Rjy0Foq6xIxVXah.jpg_q50.jpg",
"https://www.converse.com/dw/image/v2/BCZC_PRD/on/demandware.static/-/Sites-cnv-master-catalog/default/dwb9eb8c43/images/a_107/167708C_A_107X1.jpg"
]
def img():
for url in url_shoes_for_choice:
image = requests.get(url, stream=True).raw
out = Image.open(image)
out.save('image/image.jpg', 'jpg')
if __name__=="__main__":
img()
Error:
OSError: cannot identify image file <_io.BytesIO object at 0x7fa185c52d58>

The problem is that one of the images is making issues with the byte data returned by the requests.get(url, stream=True).raw, I'm not sure but I guess the data of the 3rd image is invalid byte data so instead of getting the raw data we can just fetch the content and then by using BytesIO we can fix the byte data.
I fixed one more thing from your original code, I added numbering to your images so each can be saved with different name.
from io import BytesIO
def img():
for count, url in enumerate(url_shoes_for_choice):
image = requests.get(url, stream=True)
with BytesIO(image.content) as f:
with Image.open(f) as out:
# out.show() # See the images
out.save('image/image{}.jpg'.format(count))
(Though this works fine but I'm not sure what was the main issue. If anyone knows exactly what is the issue please comment and explain.)

I opened the first link in my browser and saved the image. It's actually a webp file.
$ file Unisex_OUTDOOR_SANDALS_CM7531_1.webp
Unisex_OUTDOOR_SANDALS_CM7531_1.webp: RIFF (little-endian) data, Web/P image, VP8 encoding, 500x500, Scaling: [none]x[none], YUV color, decoders should clamp
You explicitly tell the image library that it should expect a jpg. When you remove that parameter and let it figure it out on its own using out.save('image/image.jpg') the first image successfully downloads for me.
The first two images work this way if you make sure to save each under a different name:
def img():
i = 0
for url in url_shoes_for_choice:
i+=1
image = requests.get(url, stream=True).raw
out = Image.open(image)
out.save('image{}.jpg'.format(i))
the third is a valid jpeg file, as well as the fourth, but using the JFIF standard 1.01 which I hear the first time of. I'm pretty sure you'll have to figure out support for different such filetypes.
It is worth noting that if I download the images in chrome and open those with python, nothing fails. So chrome might be adding information to the file.
The documentation of PIL/pillow explains here that you need a new enough version for animated images, but that is not your problem.
Support for animated WebP files will only be enabled if the system
WebP library is v0.5.0 or later. You can check webp animation support
at runtime by calling features.check(“webp_anim”).

Image does not change when adding an alpha channel

The pillow package has a method called Image.putalpha() which is used to add or change the alpha channel of an image.
I tried to play with this method and found that I can not change the background color of an image. The original image is
This is my code to add alpha to it
from PIL import Image
im_owl = Image.open("owl.jpg")
alpha = Image.new("L", im_owl.size, 50)
im_owl.putalpha(alpha)
im_owl.show()
The produced image is nothing different from the original image. I have tried with different value of alpha and see no difference.
What could have been wrong?

try to save the image and see it.
I am also not able to see the image directly from
im_owl.show()
but when I saved it
im_owl.save()
I am able to see the image changed.

Try using
im_owl.save("alphadOwl.png")
And then view the saved image. It would seem that the alpha channel isn't applied to bmp or jpg files. It is a bmp file that gets displayed with im.show()
(For the record, I'm on a mac, I don't know if im.show() uses different applications on other devices).

As #sanyam and #Pam have pointed out, we can save the converted image and it shows correctly. This is because on Windows, images are saved as temporary BMP file before they are shown using the system default image viewer, as per the PIL documentation:
Image.show(title=None, command=None)
Displays this image. This method is mainly intended for debugging purposes.
On Unix platforms, this method saves the image to a temporary PPM file, and calls
either the xv utility or the display utility, depending on which one can be found.
On macOS, this method saves the image to a temporary BMP file, and opens it with
the native Preview application.
On Windows, it saves the image to a temporary BMP file, and uses the standard BMP
display utility to show it (usually Paint).
To fix this issue, we can patch the Pillow code to use PNG format as default. First, we need to find the root of Pillow package:
import PIL
print(PIL.__path__)
On my system, the output is:
[’D:\Anaconda\lib\site-packages\PIL’]
Go to this directory and open the file ImageShow.py. I add the following code after the line register(WindowsViewer):
class WindowsPNGViewer(Viewer):
format = "PNG"
def get_command(self, file, **options):
return ('start "Pillow" /WAIT "%s" '
'&& ping -n 2 127.0.0.1 >NUL '
'&& del /f "%s"' % (file, file))
register(WindowsPNGViewer, -1)
After that, I can show the image with alpha channel correctly.
References
https://github.com/python-pillow/Pillow/issues/3695

How to get Pillow to make identical copies (edit EXIF in-line)

You would think it's quite simple, but the following code doesn't work as I would expect:
from hashlib import md5
from PIL import Image
im = Image.open("/tmp/original.jpg")
im.save("/tmp/new.jpg", quality="keep")
original = Image.open("/tmp/original.jpg")
new = Image.open("/tmp/new.jpg")
assert md5(original.tobytes()).hexdigest() == md5(new.tobytes()).hexdigest()
Why is it that when I'm simply saving an image as a new file, and keeping the quality settings the same, that the image data isn't identical? What am I missing?
Update (Explanation):
My problem is that I have a Pillow JpegImage object being handed to my code as part of a pipeline and I don't have control over the step at which the file is saved to disk:
<magic> → <my code> → <magic that saves to disk>
All I want my code to do is add/update/replace (any of these) the EXIF data for the to-be-saved jpeg image. As this info doesn't appear to be editable on an image object, the only way that I can figure to do this is to save the image to a temporary place (like BytesIO), save it and the re-open it with Image.open() before passing it to the next function in the chain.
Please tell me that there's a smarter, more efficient way to do this?

Django: Download Image from URL, Resize it, append "small" to the end of the filename

I'm looking for a way to download an 640x640 image from a URL, resize the image to 180x180 and append the word small to the end of the resized image filename.
For example, the image is located at this link
http://0height.com/wp-content/uploads/2013/07/18-japanese-food-instagram-1.jpg
Once resized, I would like to append the world small to the end of the filename like so:
18-japanese-food-instagram-1small.jpeg
How can this be done? Also will the downloaded image be saved to memory or will it save to the actual drive? If it does save to the drive, is it possible to delete the original image and keep the resized version?

Why don't you try urllib?
import urllib
urllib.urlretrieve("http://0height.com/wp-content/uploads/2013/07/18-japanese-food-instagram-1.jpg", "18-japanese-food-instagram-1.jpg")
Then, to resize this you can use PIL or another library
import Image
im1 = Image.open("18-japanese-food-instagram-1.jpg")
im_small = im1.resize((width, height), Image.ANTIALIAS)
im_small.save("18-japanese-food-instagram-1_small.jpg")
References:
http://www.daniweb.com/software-development/python/code/216637/resize-an-image-python
Downloading a picture via urllib and python

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove EXIF from Image Before Upload to S3 in Python - python

Related

Save JPEG comment using Pillow

Only one image from 5 is downloaded and it knocks out an error

Image does not change when adding an alpha channel

How to get Pillow to make identical copies (edit EXIF in-line)

Django: Download Image from URL, Resize it, append "small" to the end of the filename

Categories

Resources