Python fast way to save huge numpy array as lossless image (tiff) - python

I have a program that processes huge RGB images in the range of 30000x30000 px.
To load I use Pillow, which works good.
Then I process it with NumPy and then I need to save it lossless as tiff.
However, whether I'm using Pillow or OpenCV, this takes very long compared to the runtime of all the other stuff. I think this is because of the image compression. Without compression, the saving does not take long at all but my files are >2 GB.
I found the module tifffile but it takes just as long as OpenCV, unless I missed a parameter.
Is there a module that can compress faster? The ones I tried only use one CPU core.
It also seems, that it's faster on an intel machine with i7-9700k 16GB than on my PC with AMD Ryzen 5600X 32GB?
Here is the code I used to test:
from PIL import Image
import cv2
import tifffile
import numpy as np
import time
arr = np.random.default_rng().integers(0, 255, size=(30000,30000,3), endpoint=True, dtype=np.uint8)
st = time.time()
Image.fromarray(arr).save("test_pil.tiff", compression="tiff_adobe_deflate")
print(f"Pil took {time.time()-st} s")
st = time.time()
cv2.imwrite("test_cv2.tiff", arr, params=(cv2.IMWRITE_TIFF_COMPRESSION, 32946))
print(f"Opencv took {time.time()-st} s")
st = time.time()
tifffile.imwrite("test_tifff.tiff", arr, compression="zlib", compressionargs={'level':5}, predictor=True, tile=(64,64))
print(f"Tifffile took {time.time()-st} s")
I know these also use different compression algorithms, but I haven't found matching parameters. This feature is generally very poorly documented.
Result (intel):
Pil took 32.01173210144043 s
Opencv took 60.46461296081543 s
Tifffile took 59.410102128982544 s

Related

Efficiently saving tiles to a bigtiff image

I have thousands of grayscale tiles of 256 x 256 pixels with dtype np.uint8 and I want to combine those into one BigTiff pyramidical image as fast as possible.
My current approach is to create a numpy array with the size of the final image, in which I paste all the tiles (This only takes a few seconds). For saving I have looked into multiple approaches.
1) Tifffile, using the imsave function, which turned out to be very slow, I would estimate over 10 minutes at least for a file that would end up at around 700MB
2) pyvips, by converting the massive numpy image to a pyvips image using pyvips.Image.new_from_memory, and then saving it using this:
vips_img.tiffsave(filename, tile=True, compression='lzw', bigtiff=True, pyramid=True, Q=80)
Constructing the vips_img takes ~42 seconds and saving it to disk takes another ~30, but this is all done using a single thread. I am wondering if there is any way to do this more time efficiently, either using a different method or leverage multithreading. High speed storage is available, so things could potentially be saved in a different format first or transferred to a different programming language if needed.
Just brainstorming: all the tiles come from an already existing BigTiff image and have been put through a preprocessing pipeline and now need to be saved again. I'm wondering if there could potentially be a way to copy the original file and replace data in there efficiently.
edit with more information:
The dimensions of the image are roughly 55k by 45k, but I would like to use this code for larger images too, up to 150k by 150k for example.
For the image of 55k by 45k and tiles of 256 by 256, we're talking about ~53k tiles. These tiles don't all contain information i'm interested in, so in the end I might end up with 50% of the tiles that I want to save again, the remained of the image can be black. Saving the processed in the same format seems the most convenient approach to me, as I would like to display it as an overlay
edit with intermediate solution
Earlier I mentioned that creating a pyvips image from a numpy array took 40 seconds. The cause of this was that my input was a transposed numpy array. The transpose operation itself is very fast, but I suspect it remained in memory as before, which caused a lot of cache misses when reading from it in transposed form.
So currently the following line takes 30 seconds (to write a 200MB file)
vips_img.tiffsave(filename, tile=True, compression='lzw', bigtiff=True, pyramid=True, Q=80)
It would be nice if this could be faster, but it seems reasonable.
Code Example
In my case, only ~15% of the tiles is interesting and will be preprocessed. These are all over the image though. I would still like to save this in a gigapixel format, as that allows me to use openslide to retrieve parts of the image using their convenient library. In the example I just generated ~15% random data to simulate the percentage of black / information and the performance of the example is similar to the actual implementation where the data is more scattered over the image.
import numpy as np
import pyvips
def numpy2vips(a):
dtype_to_format = {
'uint8': 'uchar',
'int8': 'char',
'uint16': 'ushort',
'int16': 'short',
'uint32': 'uint',
'int32': 'int',
'float32': 'float',
'float64': 'double',
'complex64': 'complex',
'complex128': 'dpcomplex',
}
height, width, bands = a.shape
linear = a.reshape(width * height * bands)
vi = pyvips.Image.new_from_memory(linear.data, width, height, bands,
dtype_to_format[str(a.dtype)])
return vi
left = np.random.randint(0, 256, (7500, 45000), np.uint8)
right = np.zeros((50000, 45000), np.uint8)
img = np.vstack((left, right))
vips_img = numpy2vips(np.expand_dims(img, axis=2))
start = time.time()
vips_img.tiffsave("t1", tile=True, compression='deflate', bigtiff=True, pyramid=True)
print("pyramid deflate took: ", time.time() - start)
start = time.time()
vips_img.tiffsave("t2", tile=True, compression='lzw', bigtiff=True, pyramid=True)
print("pyramid lzw took: ", time.time() - start)
start = time.time()
vips_img.tiffsave("t3", tile=True, compression='jpeg', bigtiff=True, pyramid=True)
print("pyramid jpg took: ", time.time() - start)
start = time.time()
vips_img.dzsave("t4", tile_size=256, depth='one', overlap=0, suffix='.jpg[Q=75]')
print("dzi took: ", time.time() - start)
output
pyramid deflate took: 32.69183301925659
pyramid lzw took: 32.10764741897583
pyramid jpg took: 59.79427194595337
I did not wait for the dzsave to finish, as it was taking over a couple of minutes.
I tried your test program on my laptop (ubuntu 19.10) and I see:
pyramid deflate took: 35.757954359054565
pyramid lzw took: 42.69455623626709
pyramid jpg took: 26.614688634872437
dzi took: 44.16632699966431
I'd guess you are not using libjpeg-turbo, the SIMD libjpeg fork. Unfortunately it's very difficult to install on macOS, due to brew being stuck on the non-SIMD version, but it should be easy on your deployment system, just install the libjpeg-turbo package instead of libjpeg (they are binary compatible).
There are various similar projects for zlib that should speed up deflate compression dramatically.

Is there a way to examine how much memory an image is occupying with python?

pillow provides size to examine the resolution of an image.
>> from PIL import Image
>> img = Image.open('Lenna.png')
>> img.size
(512, 512)
is there a way to examine how many memory the image is occupying? is the image using 512*512*4 Bytes memory?
You could use the sys library to get the size of an object in bytes. The difference with Kai's answer is that he's calculating the size of the image on the disk, while this calculates the size of the loaded python object (with all its metadata):
import sys
sys.getsizeof(img)
EDIT: After seeing this website, sys.getsizeof() seems to work mainly for primitive types.
You could have a look at a more thorough implementation (deep_getsizeof()) here .
This post gives also a lot of details.
And finally, there is also the pympler library that provides tools to calculate the RAM memory used by an object.
from pympler import asizeof
asizeof.asizeof(img)
import os
print os.stat('somefile.ext').st_size
or
import os
os.path.getsize('path_to_file.jpg')`

Python file I/O significantly slower than OpenCV imread

I have an image dataset where the byte stream for several JPEG images is merged into a single binary file. The byte offsets and number of bytes occupied by each image are known, so I'm using the following code snippet to read it:
import cv2
import numpy as np
def get_image(binary_file_path, byte_offset, buffer_size):
with open(binary_file_path, 'rb') as ifile:
ifile.seek(byte_offset)
image_buffer = np.asarray(bytearray(ifile.read(buffer_size)), np.uint8) # TOO SLOW
return cv2.imdecode(image_buffer, cv2.IMREAD_COLOR)
On my machine, the line marked with the "TOO SLOW" comment takes around 50ms to execute. By contrast, if I use cv2.imread to parse a regular JPEG image file, it hardly takes 10ms.
Worth noting: In the code snippet above, it's taking 50ms just to read the bytes, whereas cv2.imread is able to read the file and decode the JPEG format all in 10ms.
I would really like to know the reason behind this large discrepancy.
P.S. The times mentioned above were obtained when running the code from within a multiprocessing pool, in case that is relevant.

Lossy compression of numpy array (image, uint8) in memory

I am trying to load a data set of 1.000.000 images into memory. As standard numpy arrays (uint8) all images combined fill around 100 GB of RAM, but I need to get this down to < 50 GB while still being able to quickly read the images back into numpy (that's the whole point of keeping everything in memory). Lossless compression like blosc only reduces file size by around 10%, so I went to JPEG compression. Minimum example:
import io
from PIL import Image
numpy_array = (255 * np.random.rand(256, 256, 3)).astype(np.uint8)
image = Image.fromarray(numpy_array)
output = io.BytesIO()
image.save(output, format='JPEG')
At runtime I am reading the images with:
[np.array(Image.open(output)) for _ in range(1000)]
JPEG compression is very effective (< 10 GB), but the time it takes to read 1000 images back into numpy array is around 2.3 seconds, which seriously hurts the performance of my experiments. I am searching for suggestions that give a better trade-off between compression and read-speed.
I am still not certain I understand what you are trying to do, but I created some dummy images and did some tests as follows. I'll show how I did that in case other folks feel like trying other methods and want a data set.
First, I created 1,000 images using GNU Parallel and ImageMagick like this:
parallel convert -depth 8 -size 256x256 xc:red +noise random -fill white -gravity center -pointsize 72 -annotate 0 "{}" -alpha off s_{}.png ::: {0..999}
That gives me 1,000 images called s_0.png through s_999.png and image 663 looks like this:
Then I did what I think you are trying to do - though it is hard to tell from your code:
#!/usr/local/bin/python3
import io
import time
import numpy as np
from PIL import Image
# Create BytesIO object
output = io.BytesIO()
# Load all 1,000 images and write into BytesIO object
for i in range(1000):
name="s_{}.png".format(i)
print("Opening image: {}".format(name))
im = Image.open(name)
im.save(output, format='JPEG',quality=50)
nbytes = output.getbuffer().nbytes
print("BytesIO size: {}".format(nbytes))
# Read back images from BytesIO ito list
start=time.clock()
l=[np.array(Image.open(output)) for _ in range(1000)]
diff=time.clock()-start
print("Time: {}".format(diff))
And that takes 2.4 seconds to read all 1,000 images from the BytesIO object and turn them into numpy arrays.
Then, I palettised the images by reducing to 256 colours (which I agree is lossy - just as your method) and saved a list of palettised image objects which I can readily later convert back to numpy arrays by simply calling:
np.array(ImageList[i].convert('RGB'))
Storing the data as a palettised image saves 66% of the space because you only store one byte of palette index per pixel rather than 3 bytes of RGB, so it is better than the 50% compression you seek.
#!/usr/local/bin/python3
import io
import time
import numpy as np
from PIL import Image
# Empty list of images
ImageList = []
# Load all 1,000 images
for i in range(1000):
name="s_{}.png".format(i)
print("Opening image: {}".format(name))
im = Image.open(name)
# Add palettised image to list
ImageList.append(im.quantize(colors=256, method=2))
# Read back images into numpy arrays
start=time.clock()
l=[np.array(ImageList[i].convert('RGB')) for i in range(1000)]
diff=time.clock()-start
print("Time: {}".format(diff))
# Quick test
# Image.fromarray(l[999]).save("result.png")
That now takes 0.2s instead of 2.4s - let's hope the loss of colour accuracy is acceptable to your unstated application :-)

Convolution with kernel size larger than 5x5 in python-pillow

I want to filter an image with a simple convolution kernel in python-pillow. However, to achieve optimal results, I need a 9x9 kernel. This is not possible in pillow, at least when using ImageFilter.Kernel and the built-in filter() method, which are limited to 5x5 kernels.
Short of implementing my own convolution code, is there a way to filter/convolve an image with a kernel size larger than 5x5?
I'm quite surprised to see that PIL doesn't have support beyond 5 x 5 kernels. As such, it may be prudent to look at other Python packages, such as OpenCV or scipy... for the interest of saving time, let's use scipy. OpenCV is a pain to configure even though it's quite powerful.
I would recommend using scipy to load in your image with imread from the ndimage package, convolve the image with your kernel, then convert to a PIL image when you're done. Use convolve from the ndimage package, then convert back to a PIL image by Image.fromArray. It does have support to convert a numpy.ndarray (which is what is loaded in when you use scipy.ndimage.imread), which is great.
Something like this, assuming a 9 x 9 averaging filter:
# Import relevant packages
import numpy as np
from scipy import ndimage
from PIL import Image
# Read in image - change filename to whatever you want
img = ndimage.imread('image.jpg')
# Create kernel
ker = (1/81.0)*np.ones((9,9))
# Convolve
out = ndimage.convolve(img, ker)
# Convert back to PIL image
out = Image.fromArray(out, 'RGB')
pyvips is another option, if you're not tied to pillow, numpy or scipy. It's quite a bit faster and needs a lot less memory, especially for larger images. It'll beat opencv too, at least on some benchmarks.
I tried on this laptop:
import sys
import numpy as np
from scipy import ndimage
from PIL import Image
img = ndimage.imread(sys.argv[1])
ker = (1 / 81.0) * np.ones((9, 9))
out = ndimage.convolve(img, ker)
out = Image.fromarray(out)
out.save(sys.argv[2])
I can run it like this:
$ /usr/bin/time -f %M:%e ./try257.py ~/pics/wtc-mono.jpg x.jpg
300352:22.47
So a 10k x 10k pixel mono jpg on a 2015 i5 laptop takes about 22 seconds and needs a peak of 300mb of memory.
In pyvips it's:
import sys
import pyvips
im = pyvips.Image.new_from_file(sys.argv[1], access="sequential")
size = 9
kernel = size * [size * [1.0 / (size * size)]]
im = im.conv(kernel)
im.write_to_file(sys.argv[2])
I see:
$ /usr/bin/time -f %M:%e ./try258.py ~/pics/wtc-mono.jpg x.jpg
44336:4.76
About 5 seconds and 45mb of memory.
That's a float convolution. You can swap it to int precision like this:
im = im.conv(kernel, precision="integer")
And I see:
$ /usr/bin/time -f %M:%e ./try258.py ~/pics/wtc-mono.jpg x.jpg
44888:1.79
1.8 seconds.

Categories