I'm animating a convergence process that I'm simulating in an IPython 3.1 notebook. I'm visualizing the scatter plot result in a matplotlib animation, which I'm writing out to an animated gif via ImageMagick. There are 3000 frames, each with about 5000 points.
I'm not sure exactly how matplotlib creates these animation files, but it appears to cache up a bunch of frames and then write them out all together-- when I look at the CPU usage, it's dominated by python in the beginning and then by convert at the end.
Writing out the gif is happening exceedingly slowly. It's taking more than an hour to write out a 70MB file to an SSD on a modern MacBook Pro. 'convert' is taking the equivalent of 90% of one core on an 4 (8 hyperthread) core machine.
It takes about 15 minutes to write the first 65MB, and over 2 hours to write the last 5MB.
I think the interesting bits of the code follow-- if there's something else that would be helpful, let me know.
def updateAnim(i,cg,scat,mags):
if mags[i]==0: return scat,
cg.convergeStep(mags[i])
scat.set_offsets(cg._chrgs[::2,0:2])
return scat,
fig=plt.figure(figsize=(6,10))
plt.axis('equal')
plt.xlim(-1.2,1.2);plt.ylim(-1,3)
c=np.where(co._chrgs[::2,3]>0,'blue','red')
scat=plt.scatter(co._chrgs[::2,0],co._chrgs[::2,1],s=4,color=c,marker='o',alpha=0.25);
ani=animation.FuncAnimation(fig,updateAnim,frames=mags.size,fargs=(co,scat,mags),blit=True);
ani.save('Files/Capacitance/SpherePlateAnimation.gif',writer='imagemagick',fps=30);
Any idea what the bottleneck might be or how I might speed it up? I'd prefer the write out time be small compared to simulation time.
Version: ImageMagick 6.9.0-0 Q16 x86_64 2015-05-30 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2014 ImageMagick Studio LLC
Features: DPC Modules
Delegates (built-in): bzlib cairo djvu fftw fontconfig freetype gslib gvc jbig jng jp2 jpeg lcms lqr ltdl lzma openexr pangocairo png ps rsvg tiff webp wmf x xml zlib
ps -aef reports:
convert -size 432x720 -depth 8 -delay 3.3333333333333335 -loop 0 rgba:- Files/Capacitance/SpherePlateAnimation.gif
Update
Please read the original answer below before doing anything suggested in this update.
If you want to debug this in some depth, you could separate the ImageMagick part out and identify for sure where the issue is. To do that, I would locate your ImageMagick convert program like this:
which convert # result may be "/usr/local/bin/convert"
and then go to the containing directory, e.g.
cd /usr/local/bin
Now save the original convert program as convert.real - you can always change it back again later by reversing the last two parameters below:
mv convert convert.real
Now, save the following file as convert
#!/bin/bash
dd bs=128k > $HOME/plot.rgba 2> /dev/null
and make that executable by doing
chmod +x convert
Now, when you run matplotlib again, it will execute the script above rather than ImageMagick, and the script will save the raw RGBA data in your login directory in a file called plot.rgba. This will then tell you two things... firstly you will see if matplotlib now runs faster as there is no longer any ImageMagick processing, secondly you will see if the filesize is around 4GB like I am guessing.
Now you can use ImageMagick to process the file after matplotlib is finished using this, with a 10GB memory limit:
convert.real -limit memory 10000000 -size 432x720 -depth 8 -delay 3.33 -loop 0 $HOME/plot.rgba Files/Capacitance/SpherePlateAnimation.gif
You could also consider splitting the file into 2 (or 4), using dd and processing the two halves in parallel and appending them together to see if that helps. Ask, if you want to investigate that option.
Original Answer
I am kind of speaking out loud here in the hope that it either helps you directly or it jogs someone else's brain into grasping the problem...
It seems from the commandline you have shared that matplotlib is writing directly to the stdin of ImageMagick's convert tool - I can see that from the RGBA:- parameter that tells me it is sending RGB plus Alpha transparency as raw values on stdin.
That means that there are no intermediate files that I can suggest placing on a RAM-disk, which is where I was heading to with my comment...
The second thing is that, as the raw pixel data is being sent, every single pixel is computed and sent by matplotlib so it is invariant with the 5,000 points in your simulation - so no point reducing or optimising the number of points.
Another thing to note is that you are using the 16-bit quantisation version of ImageMagick (Q16 in your version string). That effectively doubles the memory requirement, so if you can easily recompile ImageMagick for an 8-bit quantum depth, that may help.
Now, let's look at that input stream, RGBA -depth 8 means 4 bytes per pixel and 432x720 pixels per frame, or 1.2MB per frame. Now, you have 3,000 frames, so that makes 3.6GB minimum, plus the output file of 75MB. I suspect that this is just over the limit of ImageMagick's natural memory limit and that is why it slows down at the end, so my suggestion would be to check the memory limits on ImageMagick and consider increasing them to 4GB-6GB or more if you have it.
To check the memory and other resource limits:
identify -list resource
Resource limits:
Width: 214.7MP
Height: 214.7MP
Area: 4.295GP
Memory: 2GiB <---
Map: 4GiB
Disk: unlimited
File: 192
Thread: 1
Throttle: 0
Time: unlimited
As you cannot raise the memory limits on the commandline that matplotlib executes, you could do it via an environment variable that you export prior to starting matplotlib like this:
export MAGICK_MEMORY_LIMIT=4294967296
identify -list resource
Resource limits:
Width: 214.7MP
Height: 214.7MP
Area: 4.295GP
Memory: 4GiB <---
Map: 4GiB
Disk: unlimited
File: 192
Thread: 1
Throttle: 0
Time: unlimited
You can also change it in your policy.xml file, but that is more involved, so please try this way initially and ask if you get stuck!
Please pass feedback on this as I may be able to suggest other things depending on whether this works. Please also run identify -list configure and edit your question and paste the output there.
Mark Setchell's answer provides a good explanation of what goes wrong, but exporting a higher memory limit did not work. I do actually recommend to change policy.xml instead, the only trick part is to find where it is saved on your system. This answer lists some locations, but it is most probably at
/etc/ImageMagick-6/policy.xml
Open it, and edit the line that says
<policy domain="resource" name="memory" value="256MiB"/>
to something like
<policy domain="resource" name="memory" value="4GiB"/>
Or any limit that is above the size of the unprocessed gif. If it is not, the convert subprocess will be killed.
Related
I'm currently trying to make pyvips work for a project where I need to manipulate images of a "big but still sensible" size, between 1920x1080 and 40000x40000.
The installation worked well, but these particular 2 lines sometime work, sometime don't.
img = pyvips.Image.new_from_file('global_maps/MapBigBig.png')
img.write_to_file('global_maps/MapTest.png')
It seems that for the biggest images, I get the following error message when writing back the image (the loading works fine):
pyvips.error.Error: unable to call VipsForeignSavePngFile
pngload: arithmetic overflow
vips2png: unable to write to target global_maps/MapFishermansRowHexTest.png
I say it seems, because the following lines work perfectly well (with a size of 100 000 x 100 000, far bigger than the problematic images):
size = 100000
test = pyvips.Image.black(size, size, bands=3)
test.write_to_file('global_maps/Test.png')
I could not find an answer anywhere, do you have any idea what I'm doing wrong ?
EDIT:
Here is a link to an image that does not work (it weights 102 Mo).
This image was created using pyvips and a 40 time smaller image, this way:
img = pyvips.Image.new_from_file('global_maps/MapNormal.png')
out = img.resize(40, kernel='linear')
out.write_to_file('global_maps/MapBigBig.png')
And it can be read using paint3D or gimp.
I found your error message in libspng:
https://github.com/randy408/libspng/blob/master/spng/spng.c#L5989
It looks like it's being triggered if the decompressed image size would go over your process pointer size. If I try a 32-bit libvips on Windows I see:
$ ~/w32/vips-dev-8.12/bin/vips.exe copy MapFishermansRowHexBigBig.png x2.png
pngload: arithmetic overflow
vips2png: unable to write to target x2.png
But a 64-bit libvips on Windows works fine:
$ ~/vips-dev-8.12/bin/vips.exe copy MapFishermansRowHexBigBig.png x.png
$
So I think switching to a 64-bit libvips build would probably fix your problem. You'll need a 64-bit python too, of course.
I also think this is probably a libspng bug (or misfeature?) since you can read >4gb images on a 32-bit machine as long as you don't try to read them all in one go (libvips reads in chunks, so it should be fine). I'll open an issue on the libspng repo.
How would you scale/optimize/minimally output a PNG image so that it just falls below a certain maximum file size? (The input sources are various - PDF, JPEG, GIF, TIFF...)
I've looked in many places but can't find an answer to this question.
In ImageMagick a JPEG output can do this with extent (see e.g. ImageMagick: scale JPEG image with a maximum file-size), but there doesn't seem to be an equivalent for other file formats e.g. PNG.
I could use Wand or PIL in a loop (preference for python) until the filesize is below a certain value, but for 1000s of images this will have a large I/O overhead unless there's a way to predict/estimate the filesize without writing it out first. Perhaps this is the only option.
I could also wrap the various (macOS) command-line tools in python.
Additionally, I only want to do any compression at all where it's absolutely necessary (the source is mainly text), which leaves a choice of compression algorithms.
Thanks for all help.
PS Other relevant questions:
Scale image according a maximum file size
Compress a PNG image with ImageMagick
python set maximum file size when converting (pdf) to jpeg using e.g. Wand
Edit: https://stackoverflow.com/a/40588202/1021819 is quite close too - though the exact code there already (inevitably?) makes some choices about how to go about reducing the file size (resize in that case). Perhaps there is no generalized way to do this without a multi-dimensional search.
Also, since the input files are PDFs, can this even be done with PIL? The first choice is about rasterization, for which I have been using Wand.
https://stackoverflow.com/a/34618887/1021819 is also useful, in that it uses Wand, so putting that operation within the binary-chop loop seems to be a way forward.
With PNG there is no tradeoff of compression method and visual appearance because PNG is lossless. Just go for the smallest possible file, using
my "pngcrush" application, "optipng", "zopflipng" or the like.
If you need a smaller file than any of those can produce, try reducing the number of colors to 255 or fewer, which will allow the PNG codec to produce an indexed-color PNG (color-type 3) which is around 1/3 of the filesize of an RGB PNG. You can use ImageMagick's "-colors 255" option to do this. However, I recommend the "pngquant" application for this; it does a better job than IM does in most cases.
I'm a film photographer who deals a lot with cropping/image resizing. Because I shoot film, I have to scan my negatives and crop each frame out of the batch scan. My scanner scans four strips of six images each (24 frames/crops per scan).
A friend of mine wrote me a script for Python that automatically crops images based on inputted coordinates. The script works well but it has problems in the file format of the exported images.
From the scan, each frame should produce a 37mb TIFF at 240 DPI (when I crop and export in Adobe Lightroom). Instead, the Cropper outputs a 13mb 72 DPI TIFF.
Terminal (I'm on Mac) warns me about a "Decompression Bomb" whenever I run the Cropper. My friend is stumped and suggested I ask Stack Overflow.
I've no Python experience. I can provide the code he wrote and the commands Terminal gives me.
Thoughts?
This would be greatly appreciated and a huge HUGE timesaver.
THANK YOU!
ERROR MESSAGE: /Library/Python/2.7/site-packages/PIL/Image.py:2192: DecompressionBombWarning: Image size (208560540 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack.
PIL is merely trying to protect you. It'll not open larger images, as that could be a vector of attack for a malicious user to give you a large image that'll expand to use up all memory. Quoting from the PIL.Image.open() documentation:
Warning: To protect against potential DOS attacks caused by “decompression bombs” (i.e. malicious files which decompress into a huge amount of data and are designed to crash or cause disruption by using up a lot of memory), Pillow will issue a DecompressionBombWarning if the image is over a certain limit.
Since you are not a malicious user and are not accepting images from anyone else, you can simply disable the limit:
from PIL import Image
Image.MAX_IMAGE_PIXELS = None
Setting Image.MAX_IMAGE_PIXELS disables the check altogether. You can also set it to a (high) integer value; the default is 1024 * 1024 * 1024 // 4 // 3, nearly 90 million pixels or about a 250MB uncompressed data for a 3-channel image.
Note that for PIL versions up to 4.3.0, by default, all that happens is that a warning is issued. You could also disable the warning:
import warnings
from PIL import Image
warnings.simplefilter('ignore', Image.DecompressionBombWarning)
Inversely, if you want to prevent such images from being loaded altogether, turn the warning into an exception:
import warnings
from PIL import Image
warnings.simplefilter('error', Image.DecompressionBombWarning)
and you can then expect the Image.DecompressionBombWarning object to be raised as an exception whenever you pass an image in that would otherwise demand a lot of memory.
As of PIL v5.0.0 (released Jan 2018), images that use twice the number of pixels as the MAX_IMAGE_PIXELS value will result in a PIL.Image.DecompressionBombError exception.
Note that these checks also apply to the Image.crop() operation (you can create a larger image by cropping), and you need to use PIL version 6.2.0 or newer (released in October 2019) if you want to benefit from this protection when working with GIF or ICO files.
From the Pillow docs:
Warning: To protect against potential DOS attacks caused by "decompression bombs" (i.e. malicious files which decompress into a huge amount of data and are designed to crash or cause disruption by using up a lot of memory), Pillow will issue a DecompressionBombWarning if the image is over a certain limit. If desired, the warning can be turned into an error with warnings.simplefilter('error', Image.DecompressionBombWarning) or suppressed entirely with warnings.simplefilter('ignore', Image.DecompressionBombWarning). See also the logging documentation to have warnings output to the logging facility instead of stderr.
Right now reportlab is making PDFs most of the time. However when one file gets several large images (125 files with a total on disk size of 7MB), we end up running out of memory and crashing trying to build a PDF that should ultimately be smaller than 39MB. The problem stems from:
elif mode not in ('L','RGB','CMYK'):
im = im.convert('RGB')
self.mode = 'RGB'
Where nice b&w (bitonal) images are converted to RGB and when you have images with sizes in the 2595x3000, they consume a lot of memory. (Not sure why they consume 2GB, but that point is moot. When we add them to reportlab our entire python memory footprint is about 50MB, when we call
doc.build(elements, canvasmaker=canvasmaker)
Memory usage skyrockets as we go from bitonal PNGs to RGB and then render them onto the page.
While I try to see if I can figure out how to inject bitonal images into reportlab PDFs, I thought I would see if anyone else had an idea of how to fix this problem either in reportlab or with another tool.
We have a working PDF maker using PODOFO in C++, one of my possible solutions is to write a script/outline for that tool that will simply generate the PDF in a subprocess and then return that via a file or stdout.
Short of redoing PIL you are out of luck. The Images are converted internally in PIL to 24 bit color TIFs. This is not something you can easily change.
We switched to Podofo and generate the PDF outside of python.
Hello all,
I am working on a program which determines the average colony size of yeast from a photograph, and it is working fine with the .bmp images I tested it on. The program uses pygame, and might use PIL later.
However, the camera/software combo we use in my lab will only save 16-bit grayscale tiff's, and pygame does not seem to be able to recognize 16-bit tiff's, only 8-bit. I have been reading up for the last few hours on easy ways around this, but even the Python Imaging Library does not seem to be able to work with 16-bit .tiff's, I've tried and I get "IOError: cannot identify image file".
import Image
img = Image.open("01 WT mm.tif")
My ultimate goal is to have this program be user-friendly and easy to install, so I'm trying to avoid adding additional modules or requiring people to install ImageMagick or something.
Does anyone know a simple workaround to this problem using freeware or pure python? I don't know too much about images: bit-depth manipulation is out of my scope. But I am fairly sure that I don't need all 16 bits, and that probably only around 8 actually have real data anyway. In fact, I once used ImageMagick to try to convert them, and this resulted in an all-white image: I've since read that I should use the command "-auto-levels" because the data does not actually encompass the 16-bit range.
I greatly appreciate your help, and apologize for my lack of knowledge.
P.S.: Does anyone have any tips on how to make my Python program easy for non-programmers to install? Is there a way, for example, to somehow bundle it with Python and pygame so it's only one install? Can this be done for both Windows and Mac? Thank you.
EDIT: I tried to open it in GIMP, and got 3 errors:
1) Incorrect count for field "DateTime" (27, expecting 20); tag trimmed
2) Sorry, can not handle images with 12-bit samples
3) Unsupported layout, no RGBA loader
What does this mean and how do I fit it?
py2exe is the way to go for packaging up your application if you are on a windows system.
Regarding the 16bit tiff issue:
This example http://ubuntuforums.org/showthread.php?t=1483265 shows how to convert for display using PIL.
Now for the unasked portion question: When doing image analysis, you want to maintain the highest dynamic range possible for as long as possible in your image manipulations - you lose less information that way. As you may or may not be aware, PIL provides you with many filters/transforms that would allow you enhance the contrast of an image, even out light levels, or perform edge detection. A future direction you might want to consider is displaying the original image (scaled to 8 bit of course) along side a scaled image that has been processed for edge detection.
Check out http://code.google.com/p/pyimp/wiki/screenshots for some more examples and sample code.
I would look at pylibtiff, which has a pure python tiff reader.
For bundling, your best bet is probably py2exe and py2app.
This is actually a 2 part question:
1) 16 bit image data mangling for Python - I usually use GDAL + Numpy. This might be a bit too much for your requirements, you can use PIL + Numpy instead.
2) Release engineering Python apps can get messy. Depending on how complex your app is you can get away with py2deb, py2app and py2exe. Learning distutils will help too.