Image Cropping Tool (Python) - python

I'm a film photographer who deals a lot with cropping/image resizing. Because I shoot film, I have to scan my negatives and crop each frame out of the batch scan. My scanner scans four strips of six images each (24 frames/crops per scan).
A friend of mine wrote me a script for Python that automatically crops images based on inputted coordinates. The script works well but it has problems in the file format of the exported images.
From the scan, each frame should produce a 37mb TIFF at 240 DPI (when I crop and export in Adobe Lightroom). Instead, the Cropper outputs a 13mb 72 DPI TIFF.
Terminal (I'm on Mac) warns me about a "Decompression Bomb" whenever I run the Cropper. My friend is stumped and suggested I ask Stack Overflow.
I've no Python experience. I can provide the code he wrote and the commands Terminal gives me.
Thoughts?
This would be greatly appreciated and a huge HUGE timesaver.
THANK YOU!
ERROR MESSAGE: /Library/Python/2.7/site-packages/PIL/Image.py:2192: DecompressionBombWarning: Image size (208560540 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack.

PIL is merely trying to protect you. It'll not open larger images, as that could be a vector of attack for a malicious user to give you a large image that'll expand to use up all memory. Quoting from the PIL.Image.open() documentation:
Warning: To protect against potential DOS attacks caused by “decompression bombs” (i.e. malicious files which decompress into a huge amount of data and are designed to crash or cause disruption by using up a lot of memory), Pillow will issue a DecompressionBombWarning if the image is over a certain limit.
Since you are not a malicious user and are not accepting images from anyone else, you can simply disable the limit:
from PIL import Image
Image.MAX_IMAGE_PIXELS = None
Setting Image.MAX_IMAGE_PIXELS disables the check altogether. You can also set it to a (high) integer value; the default is 1024 * 1024 * 1024 // 4 // 3, nearly 90 million pixels or about a 250MB uncompressed data for a 3-channel image.
Note that for PIL versions up to 4.3.0, by default, all that happens is that a warning is issued. You could also disable the warning:
import warnings
from PIL import Image
warnings.simplefilter('ignore', Image.DecompressionBombWarning)
Inversely, if you want to prevent such images from being loaded altogether, turn the warning into an exception:
import warnings
from PIL import Image
warnings.simplefilter('error', Image.DecompressionBombWarning)
and you can then expect the Image.DecompressionBombWarning object to be raised as an exception whenever you pass an image in that would otherwise demand a lot of memory.
As of PIL v5.0.0 (released Jan 2018), images that use twice the number of pixels as the MAX_IMAGE_PIXELS value will result in a PIL.Image.DecompressionBombError exception.
Note that these checks also apply to the Image.crop() operation (you can create a larger image by cropping), and you need to use PIL version 6.2.0 or newer (released in October 2019) if you want to benefit from this protection when working with GIF or ICO files.

From the Pillow docs:
Warning: To protect against potential DOS attacks caused by "decompression bombs" (i.e. malicious files which decompress into a huge amount of data and are designed to crash or cause disruption by using up a lot of memory), Pillow will issue a DecompressionBombWarning if the image is over a certain limit. If desired, the warning can be turned into an error with warnings.simplefilter('error', Image.DecompressionBombWarning) or suppressed entirely with warnings.simplefilter('ignore', Image.DecompressionBombWarning). See also the logging documentation to have warnings output to the logging facility instead of stderr.

Related

Strange behavior in pyvips, impossible to write some images

I'm currently trying to make pyvips work for a project where I need to manipulate images of a "big but still sensible" size, between 1920x1080 and 40000x40000.
The installation worked well, but these particular 2 lines sometime work, sometime don't.
img = pyvips.Image.new_from_file('global_maps/MapBigBig.png')
img.write_to_file('global_maps/MapTest.png')
It seems that for the biggest images, I get the following error message when writing back the image (the loading works fine):
pyvips.error.Error: unable to call VipsForeignSavePngFile
pngload: arithmetic overflow
vips2png: unable to write to target global_maps/MapFishermansRowHexTest.png
I say it seems, because the following lines work perfectly well (with a size of 100 000 x 100 000, far bigger than the problematic images):
size = 100000
test = pyvips.Image.black(size, size, bands=3)
test.write_to_file('global_maps/Test.png')
I could not find an answer anywhere, do you have any idea what I'm doing wrong ?
EDIT:
Here is a link to an image that does not work (it weights 102 Mo).
This image was created using pyvips and a 40 time smaller image, this way:
img = pyvips.Image.new_from_file('global_maps/MapNormal.png')
out = img.resize(40, kernel='linear')
out.write_to_file('global_maps/MapBigBig.png')
And it can be read using paint3D or gimp.
I found your error message in libspng:
https://github.com/randy408/libspng/blob/master/spng/spng.c#L5989
It looks like it's being triggered if the decompressed image size would go over your process pointer size. If I try a 32-bit libvips on Windows I see:
$ ~/w32/vips-dev-8.12/bin/vips.exe copy MapFishermansRowHexBigBig.png x2.png
pngload: arithmetic overflow
vips2png: unable to write to target x2.png
But a 64-bit libvips on Windows works fine:
$ ~/vips-dev-8.12/bin/vips.exe copy MapFishermansRowHexBigBig.png x.png
$
So I think switching to a 64-bit libvips build would probably fix your problem. You'll need a 64-bit python too, of course.
I also think this is probably a libspng bug (or misfeature?) since you can read >4gb images on a 32-bit machine as long as you don't try to read them all in one go (libvips reads in chunks, so it should be fine). I'll open an issue on the libspng repo.

need help creating Jpeg Generational Degradation code

I am currently creating a Generation loss code for .jpeg images.
Theory:- .jpg is a lossy compression format (for the most part). i.e. every time the image is converted to .jpg some contents/data of the original image is lost in the process. This results in lower file sizes, but due to the loss of data the image is of lower quality then the original. In most use cases, the degradation in quality is negligible. But if this process is carried out a lot of time, all the pixel data of the image get's compressed (lost) so many times, that we end up with just random noise.
I have tried doing it on PIL and cv2, but had no success.
What i tried:- Opening the image (let's say a image of format .png), and converting it into a .jpg. Then converting the image (which is currently of format .jpg) back to .png, so that the before mentioned process can be carried out several times.
My reasoning behind this is, since we are converting the original image into a jpeg, some data should be lost.
I am displaying the image using cv2.imshow() because the window stays active until destroyed explicitly, or an cv2.destroyWindow()/cv2.destroyAllWindows() is encountered.
I expected the image to show up, and its quality to gradually decrease as the program goes by, but for some reason the image stays the same. So, I am expecting someone to help me create the code from scratch (as my current efforts are in vain).
P.S.:- The Reason why I didn't posted any code, is because it's more of a bodge rather then anything concrete, and does nothing towards achieving the objective. So me uploading it would only waste, others time analysing it.
The flaw in your theory is here:
every time the image is converted to .jpg some contents/data of the original image is lost in the process.
If you have already converted to JPEG and recompress with the same settings you might not loose data.

matplotlib animation.save to animated gif very slow

I'm animating a convergence process that I'm simulating in an IPython 3.1 notebook. I'm visualizing the scatter plot result in a matplotlib animation, which I'm writing out to an animated gif via ImageMagick. There are 3000 frames, each with about 5000 points.
I'm not sure exactly how matplotlib creates these animation files, but it appears to cache up a bunch of frames and then write them out all together-- when I look at the CPU usage, it's dominated by python in the beginning and then by convert at the end.
Writing out the gif is happening exceedingly slowly. It's taking more than an hour to write out a 70MB file to an SSD on a modern MacBook Pro. 'convert' is taking the equivalent of 90% of one core on an 4 (8 hyperthread) core machine.
It takes about 15 minutes to write the first 65MB, and over 2 hours to write the last 5MB.
I think the interesting bits of the code follow-- if there's something else that would be helpful, let me know.
def updateAnim(i,cg,scat,mags):
if mags[i]==0: return scat,
cg.convergeStep(mags[i])
scat.set_offsets(cg._chrgs[::2,0:2])
return scat,
fig=plt.figure(figsize=(6,10))
plt.axis('equal')
plt.xlim(-1.2,1.2);plt.ylim(-1,3)
c=np.where(co._chrgs[::2,3]>0,'blue','red')
scat=plt.scatter(co._chrgs[::2,0],co._chrgs[::2,1],s=4,color=c,marker='o',alpha=0.25);
ani=animation.FuncAnimation(fig,updateAnim,frames=mags.size,fargs=(co,scat,mags),blit=True);
ani.save('Files/Capacitance/SpherePlateAnimation.gif',writer='imagemagick',fps=30);
Any idea what the bottleneck might be or how I might speed it up? I'd prefer the write out time be small compared to simulation time.
Version: ImageMagick 6.9.0-0 Q16 x86_64 2015-05-30 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2014 ImageMagick Studio LLC
Features: DPC Modules
Delegates (built-in): bzlib cairo djvu fftw fontconfig freetype gslib gvc jbig jng jp2 jpeg lcms lqr ltdl lzma openexr pangocairo png ps rsvg tiff webp wmf x xml zlib
ps -aef reports:
convert -size 432x720 -depth 8 -delay 3.3333333333333335 -loop 0 rgba:- Files/Capacitance/SpherePlateAnimation.gif
Update
Please read the original answer below before doing anything suggested in this update.
If you want to debug this in some depth, you could separate the ImageMagick part out and identify for sure where the issue is. To do that, I would locate your ImageMagick convert program like this:
which convert # result may be "/usr/local/bin/convert"
and then go to the containing directory, e.g.
cd /usr/local/bin
Now save the original convert program as convert.real - you can always change it back again later by reversing the last two parameters below:
mv convert convert.real
Now, save the following file as convert
#!/bin/bash
dd bs=128k > $HOME/plot.rgba 2> /dev/null
and make that executable by doing
chmod +x convert
Now, when you run matplotlib again, it will execute the script above rather than ImageMagick, and the script will save the raw RGBA data in your login directory in a file called plot.rgba. This will then tell you two things... firstly you will see if matplotlib now runs faster as there is no longer any ImageMagick processing, secondly you will see if the filesize is around 4GB like I am guessing.
Now you can use ImageMagick to process the file after matplotlib is finished using this, with a 10GB memory limit:
convert.real -limit memory 10000000 -size 432x720 -depth 8 -delay 3.33 -loop 0 $HOME/plot.rgba Files/Capacitance/SpherePlateAnimation.gif
You could also consider splitting the file into 2 (or 4), using dd and processing the two halves in parallel and appending them together to see if that helps. Ask, if you want to investigate that option.
Original Answer
I am kind of speaking out loud here in the hope that it either helps you directly or it jogs someone else's brain into grasping the problem...
It seems from the commandline you have shared that matplotlib is writing directly to the stdin of ImageMagick's convert tool - I can see that from the RGBA:- parameter that tells me it is sending RGB plus Alpha transparency as raw values on stdin.
That means that there are no intermediate files that I can suggest placing on a RAM-disk, which is where I was heading to with my comment...
The second thing is that, as the raw pixel data is being sent, every single pixel is computed and sent by matplotlib so it is invariant with the 5,000 points in your simulation - so no point reducing or optimising the number of points.
Another thing to note is that you are using the 16-bit quantisation version of ImageMagick (Q16 in your version string). That effectively doubles the memory requirement, so if you can easily recompile ImageMagick for an 8-bit quantum depth, that may help.
Now, let's look at that input stream, RGBA -depth 8 means 4 bytes per pixel and 432x720 pixels per frame, or 1.2MB per frame. Now, you have 3,000 frames, so that makes 3.6GB minimum, plus the output file of 75MB. I suspect that this is just over the limit of ImageMagick's natural memory limit and that is why it slows down at the end, so my suggestion would be to check the memory limits on ImageMagick and consider increasing them to 4GB-6GB or more if you have it.
To check the memory and other resource limits:
identify -list resource
Resource limits:
Width: 214.7MP
Height: 214.7MP
Area: 4.295GP
Memory: 2GiB <---
Map: 4GiB
Disk: unlimited
File: 192
Thread: 1
Throttle: 0
Time: unlimited
As you cannot raise the memory limits on the commandline that matplotlib executes, you could do it via an environment variable that you export prior to starting matplotlib like this:
export MAGICK_MEMORY_LIMIT=4294967296
identify -list resource
Resource limits:
Width: 214.7MP
Height: 214.7MP
Area: 4.295GP
Memory: 4GiB <---
Map: 4GiB
Disk: unlimited
File: 192
Thread: 1
Throttle: 0
Time: unlimited
You can also change it in your policy.xml file, but that is more involved, so please try this way initially and ask if you get stuck!
Please pass feedback on this as I may be able to suggest other things depending on whether this works. Please also run identify -list configure and edit your question and paste the output there.
Mark Setchell's answer provides a good explanation of what goes wrong, but exporting a higher memory limit did not work. I do actually recommend to change policy.xml instead, the only trick part is to find where it is saved on your system. This answer lists some locations, but it is most probably at
/etc/ImageMagick-6/policy.xml
Open it, and edit the line that says
<policy domain="resource" name="memory" value="256MiB"/>
to something like
<policy domain="resource" name="memory" value="4GiB"/>
Or any limit that is above the size of the unprocessed gif. If it is not, the convert subprocess will be killed.

Python: Manipulating a 16-bit .tiff image in PIL &/or pygame: convert to 8-bit somehow?

Hello all,
I am working on a program which determines the average colony size of yeast from a photograph, and it is working fine with the .bmp images I tested it on. The program uses pygame, and might use PIL later.
However, the camera/software combo we use in my lab will only save 16-bit grayscale tiff's, and pygame does not seem to be able to recognize 16-bit tiff's, only 8-bit. I have been reading up for the last few hours on easy ways around this, but even the Python Imaging Library does not seem to be able to work with 16-bit .tiff's, I've tried and I get "IOError: cannot identify image file".
import Image
img = Image.open("01 WT mm.tif")
My ultimate goal is to have this program be user-friendly and easy to install, so I'm trying to avoid adding additional modules or requiring people to install ImageMagick or something.
Does anyone know a simple workaround to this problem using freeware or pure python? I don't know too much about images: bit-depth manipulation is out of my scope. But I am fairly sure that I don't need all 16 bits, and that probably only around 8 actually have real data anyway. In fact, I once used ImageMagick to try to convert them, and this resulted in an all-white image: I've since read that I should use the command "-auto-levels" because the data does not actually encompass the 16-bit range.
I greatly appreciate your help, and apologize for my lack of knowledge.
P.S.: Does anyone have any tips on how to make my Python program easy for non-programmers to install? Is there a way, for example, to somehow bundle it with Python and pygame so it's only one install? Can this be done for both Windows and Mac? Thank you.
EDIT: I tried to open it in GIMP, and got 3 errors:
1) Incorrect count for field "DateTime" (27, expecting 20); tag trimmed
2) Sorry, can not handle images with 12-bit samples
3) Unsupported layout, no RGBA loader
What does this mean and how do I fit it?
py2exe is the way to go for packaging up your application if you are on a windows system.
Regarding the 16bit tiff issue:
This example http://ubuntuforums.org/showthread.php?t=1483265 shows how to convert for display using PIL.
Now for the unasked portion question: When doing image analysis, you want to maintain the highest dynamic range possible for as long as possible in your image manipulations - you lose less information that way. As you may or may not be aware, PIL provides you with many filters/transforms that would allow you enhance the contrast of an image, even out light levels, or perform edge detection. A future direction you might want to consider is displaying the original image (scaled to 8 bit of course) along side a scaled image that has been processed for edge detection.
Check out http://code.google.com/p/pyimp/wiki/screenshots for some more examples and sample code.
I would look at pylibtiff, which has a pure python tiff reader.
For bundling, your best bet is probably py2exe and py2app.
This is actually a 2 part question:
1) 16 bit image data mangling for Python - I usually use GDAL + Numpy. This might be a bit too much for your requirements, you can use PIL + Numpy instead.
2) Release engineering Python apps can get messy. Depending on how complex your app is you can get away with py2deb, py2app and py2exe. Learning distutils will help too.

Why does this PIL call crash python?

import Image
from numpy import zeros, asarray
YUV = zeros((240, 320, 3), dtype='uint8')
im = Image.fromarray(YUV, mode="YCbCr")
blah = asarray(im)
When I run this (IPython 0.10.1 on Py 2.7.1) it seems to make python read some memory which it shouldn't be reading. Sometimes it crashes, sometimes it doesn't but I can surely make it crash by increasing the 320x240 zeros to, say, 3200x2400 and/or calling blah.copy(). Also, if I do:
from matplotlib import pyplot as p
p.subplot(221); p.imshow(blah[:,:,0])
p.subplot(222); p.imshow(blah[:,:,1])
p.subplot(223); p.imshow(blah[:,:,2])
p.subplot(224); p.imshow(blah[:,:,3])
p.gray()
p.show()
I start to see junk memory appearing in blah at around about row 180. What's going on here? Am I converting from PIL Image to numpy array in a bad way? What is the difference between using array(im) instead of asarray(im), and what is preferred? (note in the former case it does still crash sometimes, but it seems to be more stable and less junk)
(Here's a related question)
I noticed that your image is YCbCr 3-channel, but you're displaying 4 channels. Turns out the "junk data" problem is caused by a bug in PIL's array interface, and a fix was committed in Nov 2010. PIL's array interface is returning a 4th channel.
I ran your test case under PIL 1.1.7 and see the noise. I commented out the 224 subplot and re-ran your test using the latest PIL trunk code and a proper 3-channel array is produced, with no noise. The crashing may also be related, but I couldn't reproduce that in my environment.

Categories