Strange behavior in pyvips, impossible to write some images

Strange behavior in pyvips, impossible to write some images - python

I'm currently trying to make pyvips work for a project where I need to manipulate images of a "big but still sensible" size, between 1920x1080 and 40000x40000.
The installation worked well, but these particular 2 lines sometime work, sometime don't.
img = pyvips.Image.new_from_file('global_maps/MapBigBig.png')
img.write_to_file('global_maps/MapTest.png')
It seems that for the biggest images, I get the following error message when writing back the image (the loading works fine):
pyvips.error.Error: unable to call VipsForeignSavePngFile
pngload: arithmetic overflow
vips2png: unable to write to target global_maps/MapFishermansRowHexTest.png
I say it seems, because the following lines work perfectly well (with a size of 100 000 x 100 000, far bigger than the problematic images):
size = 100000
test = pyvips.Image.black(size, size, bands=3)
test.write_to_file('global_maps/Test.png')
I could not find an answer anywhere, do you have any idea what I'm doing wrong ?
EDIT:
Here is a link to an image that does not work (it weights 102 Mo).
This image was created using pyvips and a 40 time smaller image, this way:
img = pyvips.Image.new_from_file('global_maps/MapNormal.png')
out = img.resize(40, kernel='linear')
out.write_to_file('global_maps/MapBigBig.png')
And it can be read using paint3D or gimp.

I found your error message in libspng:
https://github.com/randy408/libspng/blob/master/spng/spng.c#L5989
It looks like it's being triggered if the decompressed image size would go over your process pointer size. If I try a 32-bit libvips on Windows I see:
$ ~/w32/vips-dev-8.12/bin/vips.exe copy MapFishermansRowHexBigBig.png x2.png
pngload: arithmetic overflow
vips2png: unable to write to target x2.png
But a 64-bit libvips on Windows works fine:
$ ~/vips-dev-8.12/bin/vips.exe copy MapFishermansRowHexBigBig.png x.png
$
So I think switching to a 64-bit libvips build would probably fix your problem. You'll need a 64-bit python too, of course.
I also think this is probably a libspng bug (or misfeature?) since you can read >4gb images on a 32-bit machine as long as you don't try to read them all in one go (libvips reads in chunks, so it should be fine). I'll open an issue on the libspng repo.

Related

matplotlib animation.save to animated gif very slow

I'm animating a convergence process that I'm simulating in an IPython 3.1 notebook. I'm visualizing the scatter plot result in a matplotlib animation, which I'm writing out to an animated gif via ImageMagick. There are 3000 frames, each with about 5000 points.
I'm not sure exactly how matplotlib creates these animation files, but it appears to cache up a bunch of frames and then write them out all together-- when I look at the CPU usage, it's dominated by python in the beginning and then by convert at the end.
Writing out the gif is happening exceedingly slowly. It's taking more than an hour to write out a 70MB file to an SSD on a modern MacBook Pro. 'convert' is taking the equivalent of 90% of one core on an 4 (8 hyperthread) core machine.
It takes about 15 minutes to write the first 65MB, and over 2 hours to write the last 5MB.
I think the interesting bits of the code follow-- if there's something else that would be helpful, let me know.
def updateAnim(i,cg,scat,mags):
if mags[i]==0: return scat,
cg.convergeStep(mags[i])
scat.set_offsets(cg._chrgs[::2,0:2])
return scat,
fig=plt.figure(figsize=(6,10))
plt.axis('equal')
plt.xlim(-1.2,1.2);plt.ylim(-1,3)
c=np.where(co._chrgs[::2,3]>0,'blue','red')
scat=plt.scatter(co._chrgs[::2,0],co._chrgs[::2,1],s=4,color=c,marker='o',alpha=0.25);
ani=animation.FuncAnimation(fig,updateAnim,frames=mags.size,fargs=(co,scat,mags),blit=True);
ani.save('Files/Capacitance/SpherePlateAnimation.gif',writer='imagemagick',fps=30);
Any idea what the bottleneck might be or how I might speed it up? I'd prefer the write out time be small compared to simulation time.
Version: ImageMagick 6.9.0-0 Q16 x86_64 2015-05-30 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2014 ImageMagick Studio LLC
Features: DPC Modules
Delegates (built-in): bzlib cairo djvu fftw fontconfig freetype gslib gvc jbig jng jp2 jpeg lcms lqr ltdl lzma openexr pangocairo png ps rsvg tiff webp wmf x xml zlib
ps -aef reports:
convert -size 432x720 -depth 8 -delay 3.3333333333333335 -loop 0 rgba:- Files/Capacitance/SpherePlateAnimation.gif

Update
Please read the original answer below before doing anything suggested in this update.
If you want to debug this in some depth, you could separate the ImageMagick part out and identify for sure where the issue is. To do that, I would locate your ImageMagick convert program like this:
which convert # result may be "/usr/local/bin/convert"
and then go to the containing directory, e.g.
cd /usr/local/bin
Now save the original convert program as convert.real - you can always change it back again later by reversing the last two parameters below:
mv convert convert.real
Now, save the following file as convert
#!/bin/bash
dd bs=128k > $HOME/plot.rgba 2> /dev/null
and make that executable by doing
chmod +x convert
Now, when you run matplotlib again, it will execute the script above rather than ImageMagick, and the script will save the raw RGBA data in your login directory in a file called plot.rgba. This will then tell you two things... firstly you will see if matplotlib now runs faster as there is no longer any ImageMagick processing, secondly you will see if the filesize is around 4GB like I am guessing.
Now you can use ImageMagick to process the file after matplotlib is finished using this, with a 10GB memory limit:
convert.real -limit memory 10000000 -size 432x720 -depth 8 -delay 3.33 -loop 0 $HOME/plot.rgba Files/Capacitance/SpherePlateAnimation.gif
You could also consider splitting the file into 2 (or 4), using dd and processing the two halves in parallel and appending them together to see if that helps. Ask, if you want to investigate that option.
Original Answer
I am kind of speaking out loud here in the hope that it either helps you directly or it jogs someone else's brain into grasping the problem...
It seems from the commandline you have shared that matplotlib is writing directly to the stdin of ImageMagick's convert tool - I can see that from the RGBA:- parameter that tells me it is sending RGB plus Alpha transparency as raw values on stdin.
That means that there are no intermediate files that I can suggest placing on a RAM-disk, which is where I was heading to with my comment...
The second thing is that, as the raw pixel data is being sent, every single pixel is computed and sent by matplotlib so it is invariant with the 5,000 points in your simulation - so no point reducing or optimising the number of points.
Another thing to note is that you are using the 16-bit quantisation version of ImageMagick (Q16 in your version string). That effectively doubles the memory requirement, so if you can easily recompile ImageMagick for an 8-bit quantum depth, that may help.
Now, let's look at that input stream, RGBA -depth 8 means 4 bytes per pixel and 432x720 pixels per frame, or 1.2MB per frame. Now, you have 3,000 frames, so that makes 3.6GB minimum, plus the output file of 75MB. I suspect that this is just over the limit of ImageMagick's natural memory limit and that is why it slows down at the end, so my suggestion would be to check the memory limits on ImageMagick and consider increasing them to 4GB-6GB or more if you have it.
To check the memory and other resource limits:
identify -list resource
Resource limits:
Width: 214.7MP
Height: 214.7MP
Area: 4.295GP
Memory: 2GiB <---
Map: 4GiB
Disk: unlimited
File: 192
Thread: 1
Throttle: 0
Time: unlimited
As you cannot raise the memory limits on the commandline that matplotlib executes, you could do it via an environment variable that you export prior to starting matplotlib like this:
export MAGICK_MEMORY_LIMIT=4294967296
identify -list resource
Resource limits:
Width: 214.7MP
Height: 214.7MP
Area: 4.295GP
Memory: 4GiB <---
Map: 4GiB
Disk: unlimited
File: 192
Thread: 1
Throttle: 0
Time: unlimited
You can also change it in your policy.xml file, but that is more involved, so please try this way initially and ask if you get stuck!
Please pass feedback on this as I may be able to suggest other things depending on whether this works. Please also run identify -list configure and edit your question and paste the output there.

Mark Setchell's answer provides a good explanation of what goes wrong, but exporting a higher memory limit did not work. I do actually recommend to change policy.xml instead, the only trick part is to find where it is saved on your system. This answer lists some locations, but it is most probably at
/etc/ImageMagick-6/policy.xml
Open it, and edit the line that says
<policy domain="resource" name="memory" value="256MiB"/>
to something like
<policy domain="resource" name="memory" value="4GiB"/>
Or any limit that is above the size of the unprocessed gif. If it is not, the convert subprocess will be killed.

how to load a large RGB image in python?

I am a beginner in Python. I would to load some tif images in Python and then do some image processing over them. At the starting point, I faced a problem of loading. the images have the size of (2000,2000,3) but Python just load up to 1920 on rows and columns. I have copied my code for loading which is really simple and I expected that it would work but it did not. If anyone has suggestion for altering the code, I would be thankful for that.
infile2= 'e:/orthoData/test-PIL/a1.tif'
im2= Image.open (infile2)
im2.size
I know it is really simple but I have really stocked in this point.I tried to read about it in different Python documentation by was not successful.

Update - disregard previous answer (I completely misread the question)
The following should work fine. If it doesn't, the image is likely corrupted.
tif = Image.open('e:/orthoData/test-PIL/a1.tif')

Python qrcode not consistent

I've got a very strange problem with python-qrcode.
I've had it working in our dev environment for a while now, without any issues. We use it to create two QR codes both of which contain URLs of almost exactly the same length (one contains an extra letter and two extra slashes). It's crucial that these two codes be exactly the same size.
Since we setup python-qrcode about five months ago, every single qrcode we have generated has been exactly the same size without fail. However, we've now pushed everything through to a production server and suddenly we have a problem.
Basically, one of the codes we generate is bigger than normal (this is the one with the three extra characters). The other code is the correct size. The two codes are generated using exactly the same function, we just pass the different URL to be encoded.
On my local machine and on our dev server, all the qrcodes are exactly the same size (including the one with the extra characters), but on the production server, the longer one is bigger while the other is correct.
We use Git version control, so all the files/functions etc are identical between the servers. The only difference between the setups is the version of Ubuntu (12.04 vs 12.10 on the production server), but I can't see why that would cause this issue.
If both codes were bigger, I could understand, but I can't work out why one would be bigger than the other on only one server.....?
If anyone can make any suggestion as to where to start working this out, I'd be very grateful!
EDIT:
Here's the relevant code:
myQrGenerator = qrcode.QRCode(
version=QRCODE_SIZE,
error_correction=qrcode.constants.ERROR_CORRECT_M,
box_size=QRCODE_BOX_SIZE,
border=QRCODE_BORDER_SIZE
)
myQrGenerator.add_data('%s%s/' % (theBaseUrl, str(theHash)))
myQrGenerator.make(fit=True)
We get those variables from local_settings.py

After a lengthy discussion it was established that the two servers used different URLs. The one that spewed out a larger QR code (in terms of QR pixels, and subsequently in terms of image pixels) overflowed, where the limit of bits it could store for the predefined size was not enough, and qrcode made it fit by increasing the amount of data it could store.
To fix this, fit was set False to provide a constraint for overflows, and version was increased to accomodate more bits from the start. box_size can be decreased a bit to maintain, more or less, the original image size.

Probably a difference in the way PIL is installed on the box. Looking at the python-qrcode source, it does:
try:
from PIL import Image, ImageDraw
except ImportError:
import Image, ImageDraw
See what happens when you do:
from PIL import Image, ImageDraw
On each machine. Either way if it really isn't a bug in the application code (make doubly sure the same code is on each box), it sounds like it's going to be because of some difference in the way PIL builds itself on Ubuntu 12.10 vs 12.04, probably due to some different headers / libs it uses to compile. Look into ensuring the PIL installation consistent with the other boxes.

Can you reduce memory consumption by ReportLab when embedding very large images, or is there a Python PDF toolkit that can?

Right now reportlab is making PDFs most of the time. However when one file gets several large images (125 files with a total on disk size of 7MB), we end up running out of memory and crashing trying to build a PDF that should ultimately be smaller than 39MB. The problem stems from:
elif mode not in ('L','RGB','CMYK'):
im = im.convert('RGB')
self.mode = 'RGB'
Where nice b&w (bitonal) images are converted to RGB and when you have images with sizes in the 2595x3000, they consume a lot of memory. (Not sure why they consume 2GB, but that point is moot. When we add them to reportlab our entire python memory footprint is about 50MB, when we call
doc.build(elements, canvasmaker=canvasmaker)
Memory usage skyrockets as we go from bitonal PNGs to RGB and then render them onto the page.
While I try to see if I can figure out how to inject bitonal images into reportlab PDFs, I thought I would see if anyone else had an idea of how to fix this problem either in reportlab or with another tool.
We have a working PDF maker using PODOFO in C++, one of my possible solutions is to write a script/outline for that tool that will simply generate the PDF in a subprocess and then return that via a file or stdout.

Short of redoing PIL you are out of luck. The Images are converted internally in PIL to 24 bit color TIFs. This is not something you can easily change.
We switched to Podofo and generate the PDF outside of python.

Python: Manipulating a 16-bit .tiff image in PIL &/or pygame: convert to 8-bit somehow?

Hello all,
I am working on a program which determines the average colony size of yeast from a photograph, and it is working fine with the .bmp images I tested it on. The program uses pygame, and might use PIL later.
However, the camera/software combo we use in my lab will only save 16-bit grayscale tiff's, and pygame does not seem to be able to recognize 16-bit tiff's, only 8-bit. I have been reading up for the last few hours on easy ways around this, but even the Python Imaging Library does not seem to be able to work with 16-bit .tiff's, I've tried and I get "IOError: cannot identify image file".
import Image
img = Image.open("01 WT mm.tif")
My ultimate goal is to have this program be user-friendly and easy to install, so I'm trying to avoid adding additional modules or requiring people to install ImageMagick or something.
Does anyone know a simple workaround to this problem using freeware or pure python? I don't know too much about images: bit-depth manipulation is out of my scope. But I am fairly sure that I don't need all 16 bits, and that probably only around 8 actually have real data anyway. In fact, I once used ImageMagick to try to convert them, and this resulted in an all-white image: I've since read that I should use the command "-auto-levels" because the data does not actually encompass the 16-bit range.
I greatly appreciate your help, and apologize for my lack of knowledge.
P.S.: Does anyone have any tips on how to make my Python program easy for non-programmers to install? Is there a way, for example, to somehow bundle it with Python and pygame so it's only one install? Can this be done for both Windows and Mac? Thank you.
EDIT: I tried to open it in GIMP, and got 3 errors:
1) Incorrect count for field "DateTime" (27, expecting 20); tag trimmed
2) Sorry, can not handle images with 12-bit samples
3) Unsupported layout, no RGBA loader
What does this mean and how do I fit it?

py2exe is the way to go for packaging up your application if you are on a windows system.
Regarding the 16bit tiff issue:
This example http://ubuntuforums.org/showthread.php?t=1483265 shows how to convert for display using PIL.
Now for the unasked portion question: When doing image analysis, you want to maintain the highest dynamic range possible for as long as possible in your image manipulations - you lose less information that way. As you may or may not be aware, PIL provides you with many filters/transforms that would allow you enhance the contrast of an image, even out light levels, or perform edge detection. A future direction you might want to consider is displaying the original image (scaled to 8 bit of course) along side a scaled image that has been processed for edge detection.
Check out http://code.google.com/p/pyimp/wiki/screenshots for some more examples and sample code.

I would look at pylibtiff, which has a pure python tiff reader.
For bundling, your best bet is probably py2exe and py2app.

This is actually a 2 part question:
1) 16 bit image data mangling for Python - I usually use GDAL + Numpy. This might be a bit too much for your requirements, you can use PIL + Numpy instead.
2) Release engineering Python apps can get messy. Depending on how complex your app is you can get away with py2deb, py2app and py2exe. Learning distutils will help too.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.