converting PDF to TIFF using ghostscript but not writing file to disk - python

I know that ghostscript can convert pdf to tiff and even has python bindings but I'm wondering whether there is a way to avoid writing the resulting tiff to disk (-SOutputFile=/path/to/file.tiff. Rather I want to keep the resulting tiff in memory and use it as a PIL image.

Basically, no, not using the standard devices. That's because the strip size of the TIFF file (and potentially other entries in the TIFF header) can't be written until after the compressed bitmap has been created because the sizes aren't known. So you need to be able to seek back to the beginning and update the header.
Now you could modify the standard TIFF output devices so that they maintain the output in memory, rather than writing to disk, but that's not how they currently work.

Related

Trouble using PIL to change a PNG to a TGA

I am trying to use PIL to convert a PNG to a TGA. I want it to be a non compressed 32-bit image.
I don't know if, or where, it's documented, but some experimentation shows that PIL retains the compression of the input file. I mean, if you open an RLE ("Run Length Encoded") compressed file, it saves it with the same compression, whereas if you open an uncompressed file and subsequently save it, an uncompressed file will be written.
So, if you are getting a compressed file out, I am guessing you must be putting a compressed file into PIL. So, you need to explicitly tell PIL to override the compression like this:
from PIL import Image
# Open an RLE compressed file
im = Image.open('compressed.tga')
# Explicitly save uncompressed
im.save('uncompressed.tga', compression=None)
Keywords: Python, image processing, Targa, TGA, compressed, RLE, uncompressed

How can I compress an image in Python?

I'm working on a image transmition project in which my JPEG image must be transfered via LoRa, so there are a lot of limitations.
I'm working transfering the image in small chunks but their actual size aren't good enough and I can't reduce their individual sizes by dividing the image even more cause the time to send each chunk is significative.
So, I'm looking for alternatives to compress the data of these small chunks but didn't found anything in Python that allow me to do this to an Image loaded with Pillow.
Note that I don't want to resize the image, just to compress it's data.
I'm looking for suggestion on how to do this.
Must say that I can change my mind on using Pillow if necessary.
One strange effect that is happening and I don't know why is that I never get a chunk with less than about 600bytes. I need something close to 300 bytes.
Modern image formats such PNG and JPEG are already compressed and my general recommendation is take Brendan Long's advice and use those formats and exploit all the work that's been put into them.
That said, if you want to compress the contents of any arbitrary file in Python, here's a very simple example:
import zlib
with open("MyImage.jpg", "rb") as in_file:
compressed = zlib.compress(in_file.read(), 9)
with open("MyCompressedFile", "wb") as out_file:
out_file.write(compressed)

ImageMagick: scale PNG image with a maximum file-size

How would you scale/optimize/minimally output a PNG image so that it just falls below a certain maximum file size? (The input sources are various - PDF, JPEG, GIF, TIFF...)
I've looked in many places but can't find an answer to this question.
In ImageMagick a JPEG output can do this with extent (see e.g. ImageMagick: scale JPEG image with a maximum file-size), but there doesn't seem to be an equivalent for other file formats e.g. PNG.
I could use Wand or PIL in a loop (preference for python) until the filesize is below a certain value, but for 1000s of images this will have a large I/O overhead unless there's a way to predict/estimate the filesize without writing it out first. Perhaps this is the only option.
I could also wrap the various (macOS) command-line tools in python.
Additionally, I only want to do any compression at all where it's absolutely necessary (the source is mainly text), which leaves a choice of compression algorithms.
Thanks for all help.
PS Other relevant questions:
Scale image according a maximum file size
Compress a PNG image with ImageMagick
python set maximum file size when converting (pdf) to jpeg using e.g. Wand
Edit: https://stackoverflow.com/a/40588202/1021819 is quite close too - though the exact code there already (inevitably?) makes some choices about how to go about reducing the file size (resize in that case). Perhaps there is no generalized way to do this without a multi-dimensional search.
Also, since the input files are PDFs, can this even be done with PIL? The first choice is about rasterization, for which I have been using Wand.
https://stackoverflow.com/a/34618887/1021819 is also useful, in that it uses Wand, so putting that operation within the binary-chop loop seems to be a way forward.
With PNG there is no tradeoff of compression method and visual appearance because PNG is lossless. Just go for the smallest possible file, using
my "pngcrush" application, "optipng", "zopflipng" or the like.
If you need a smaller file than any of those can produce, try reducing the number of colors to 255 or fewer, which will allow the PNG codec to produce an indexed-color PNG (color-type 3) which is around 1/3 of the filesize of an RGB PNG. You can use ImageMagick's "-colors 255" option to do this. However, I recommend the "pngquant" application for this; it does a better job than IM does in most cases.

Using gdcm (Grassroots DICOM) to Decompress DICOM Image Data

Is it possible to use the Python wrappers for GDCM to decode the image data in a DICOM file?
I have a byte array / string with the bytes of the image data in a DICOM file (i.e. the contents of tag 7fe0,0010 ("Pixel Data")) and I want to decode the image to something raw RGB or greyscale.
I am thinking of something along the lines of this but working with just the image data and not a path to the actual DICOM file itself.
You can read the examples, there is one that shows how one can take an input compressed DICOM and convert it to an uncompressed one. See the code online here:
Decompress Image
If you are a big fan of NumPy, checkout:
This module add support for converting a gdcm.Image to a numpy array.
This is sort of a low level example, which shows how to retrieve that actual raw buffer of the image.
A much nicer class for handling Transfer Syntax conversion would be to use gdcm.ImageChangeTransferSyntax class (allow decompression of icon)
gdcm::ImageChangeTransferSyntax Class Reference
If you do not mind reading a little C++, you can trivially convert the following code from C++ to Python:
Compress Image
Pay attention that this example is actually compressing the image rather than decompressing it.
Finally if you really only have access to the data values contains in the Pixel Data attribute, then you should really have a look at (C# syntax should be close to Python syntax):
Decompress JPEG File
This example shows how one can decompress a JPEG Lossless, Nonhierarchical, First- Order Prediction file. The JPEG data is read from file but it should work if the data is already in memory for your case.

Can you reduce memory consumption by ReportLab when embedding very large images, or is there a Python PDF toolkit that can?

Right now reportlab is making PDFs most of the time. However when one file gets several large images (125 files with a total on disk size of 7MB), we end up running out of memory and crashing trying to build a PDF that should ultimately be smaller than 39MB. The problem stems from:
elif mode not in ('L','RGB','CMYK'):
im = im.convert('RGB')
self.mode = 'RGB'
Where nice b&w (bitonal) images are converted to RGB and when you have images with sizes in the 2595x3000, they consume a lot of memory. (Not sure why they consume 2GB, but that point is moot. When we add them to reportlab our entire python memory footprint is about 50MB, when we call
doc.build(elements, canvasmaker=canvasmaker)
Memory usage skyrockets as we go from bitonal PNGs to RGB and then render them onto the page.
While I try to see if I can figure out how to inject bitonal images into reportlab PDFs, I thought I would see if anyone else had an idea of how to fix this problem either in reportlab or with another tool.
We have a working PDF maker using PODOFO in C++, one of my possible solutions is to write a script/outline for that tool that will simply generate the PDF in a subprocess and then return that via a file or stdout.
Short of redoing PIL you are out of luck. The Images are converted internally in PIL to 24 bit color TIFs. This is not something you can easily change.
We switched to Podofo and generate the PDF outside of python.

Categories