How to read a big tif file in python? - python

I'm loading a tiff file from http://oceancolor.gsfc.nasa.gov/DOCS/DistFromCoast/
from PIL import Image
im = Image.open('GMT_intermediate_coast_distance_01d.tif')
The data is large (im.size=(36000, 18000) 1.3GB) and conventional conversion doesn't work; i.e, imarray.shape returns ()
import numpy as np
imarray=np.zeros(im.size)
imarray=np.array(im)
How can I convert this tiff file to a numpy.array?

May you dont have too much Ram for this image.You'll need at least some more than 1.3GB free memory.
I don't know what you're doing with the image and you read the entire into your memory but i recommend you to read it bit by bit if its possible to avoid blowing up your computer.
You can use Image.getdata() which returns one pixel per time.
Also read some more for Image.open on this link :
http://www.pythonware.com/library/pil/handbook/

So far I have tested many alternatives but only gdal worked always even with huge 16bit images.
You can open an image with something like this:
from osgeo import gdal
import numpy as np
ds = gdal.Open("name.tif")
channel = np.array(ds.GetRasterBand(1).ReadAsArray())

I had huge tif files between 1 and 3 GB and managed to finally open them with Image.open() after manually changing the value of MAX_IMAGE_PIXELS inside the Image.py source code to an arbitrarily large number:
from PIL import Image
im = np.asarray(Image.open("location/image.tif")

For Python 32 bit, version 2.7 you are limited by the number of bytes you can add to the stack at a given time. One option is to read in the image in parts and then resize the individual chunks and reassemble them into a image that requires less RAM.
I recommend using the packages libtiff and opencv for that.
import os
os.environ["PATH"] += os.pathsep + "C:\\Program Files (x86)\\GnuWin32\\bin"
import numpy as np
import libtiff
import cv2
tif = libtiff.TIFF.open("HUGETIFFILE.tif", 'r')
width = tif.GetField("ImageWidth")
height = tif.GetField("ImageLength")
bits = tif.GetField('BitsPerSample')
sample_format = tif.GetField('SampleFormat')
ResizeFactor = 10 #Reduce Image Size by 10
Chunks = 8 #Read Image in 8 Chunks to prevent Memory Error (can be increased for
# bigger files)
ReadStrip = tif.ReadEncodedStrip
typ = tif.get_numpy_type(bits, sample_format)
#ReadStrip
newarr = np.zeros((1, width/ResizeFactor), typ)
for ii in range(0,Chunks):
pos = 0
arr = np.empty((height/Chunks, width), typ)
size = arr.nbytes
for strip in range((ii*tif.NumberOfStrips()/Chunks),((ii+1)*tif.NumberOfStrips()/Chunks)):
elem = ReadStrip(strip, arr.ctypes.data + pos, max(size-pos, 0))
pos = pos + elem
resized = cv2.resize(arr, (0,0), fx=float(1)/float(ResizeFactor), fy=float(1)/float(ResizeFactor))
# Now remove the large array to free up Memory for the next chunk
del arr
# Finally recombine the individual resized chunks into the final resized image.
newarr = np.vstack((newarr,resized))
newarr = np.delete(newarr, (0), axis=0)
cv2.imwrite('resized.tif', newarr)

you can try to use 'dask' library:
import dask_image.imread
ds = dask_image.imread.imread('name.tif')

Related

Convert python wand hdr image to numpy array and back

Python wand supports converting images directly to a Numpy arrays, such as can be seen in related questions.
However, when doing this for .hdr (high dynamic range) images, this appears to compress the image to 0/255. As a result, converting from a Python Wand image to a np array and back drastically reduces file size/quality.
# Without converting to a numpy array
img = Image('image.hdr') # Open with Python Wand Image
img.save(filename='test.hdr') # Save with Python wand
Running this opens the image and saves it again, which creates a file with a size of 41.512kb. However, if we convert it to numpy before saving it again..
# With converting to a numpy array
img = Image(filename=os.path.join(path, 'N_SYNS_89.hdr')) # Open with Python Wand Image
arr = np.asarray(img, dtype='float32') # convert to np array
img = Image.from_array(arr) # convert back to Python Wand Image
img.save(filename='test.hdr') # Save with Python wand
This results in a file with a size of 5.186kb.
Indeed, if I look at arr.min() and arr.max() I see that the min and max values for the numpy array are 0 and 255. If I open the .hdr image with cv2 however as an numpy array, the range is much higher.
img = cv2.imread('image.hdr'), -1)
img.min() # returns 0
img.max() # returns 868352.0
Is there a way to convert back and forth between numpy arrays and Wand images without this loss?
As per the comment of #LudvigH, the following worked as in this answer.
img = Image(filename='image.hdr'))
img.format = 'rgb'
img.alpha_channel = False # was not required for me, including it for completion
img_array = np.asarray(bytearray(img.make_blob()), dtype='float32')
Now we much reshape the returned img_array. In my case I could not run the following
img_array.reshape(img.shape)
Instead, for my img.size was a (x,y) tuple that should have been an (x,y,z) tuple.
n_channels = img_array.size / img.size[0] / img.size[1]
img_array = img_array.reshape(img.size[0],img.size[1],int(n_channels))
After manually calculating z as above, it worked fine. Perhaps this is also what caused the original fault in converting using arr = np.asarray(img, dtype='float32')

Efficient way to make h5py file with memory constraint

Let's say I have image like below:
root
|___dog
| |___img1.jpg
| |___img2.jpg
| |___...
|
|___cat
|___...
I want to make image files to h5py files.
First, I tried to read all image files and make it to h5 file.
import os
import numpy as np
import h5py
import PIL.Image as Image
datafile = h5py.File(data_path, 'w')
label_list = os.listdir('root')
for i, label in enumerate(label_list):
files = os.listdir(os.path.join('root', label_list))
for filename in files:
img = Image.open(os.path.join('root', label, filename))
ow, oh = 128, 128
img = img.resize((ow, oh), Image.BILINEAR)
data_x.append(np.array(img).tolist())
data_y.append(i)
datafile = h5py.File(data_path, 'w')
datafile.create_dataset("data_image", dtype='uint8', data=data_x)
datafile.create_dataset("data_label", dtype='int64', data=data_y)
But I can't make it because of the memory constraint (Each folder have image more than 200,000 with 224x224 size).
So, what is the best way to make this image to h5 file?
The HDF5/h5py dataset objects have a much smaller memory footprint than the same size NumPy array. (That's one advantage to using HDF5.) You can create the HDF5 file and allocate the datasets BEFORE you start looping on the image files. Then you can operate on the images one at a time (read, resize, and write image 0, then image 1, etc).
The code below creates the necessary datasets presized for 200,000 images. The code logic is rearranged to work as I described. img_cnt variable used to position new image data in existing datasets. (Note: I think this works as written. However without the data, I couldn't test, so it may need minor tweaking.) If you want to adjust the dataset sizes in the future, you can add the maxshape=() parameter to the create_dataset() function.
# Open HDF5 and create datasets in advance
datafile = h5py.File(data_path, 'w')
datafile.create_dataset("data_image", (200000,224,224), dtype='uint8')
datafile.create_dataset("data_label", (200000,), dtype='int64')
label_list = os.listdir('root')
img_cnt = 0
for i, label in enumerate(label_list):
files = os.listdir(os.path.join('root', label_list))
for filename in files:
img = Image.open(os.path.join('root', label, filename))
ow, oh = 128, 128
img = img.resize((ow, oh), Image.BILINEAR)
datafile["data_image"][img_cnt,:,:] = np.array(img).tolist())
datafile["data_label"][img_cnt] = i
img_cnt += 1
datafile.close()

How to shrink data set output from avi file

I'm trying to create a data set from an avi file I have and I know I've made a mistake somewhere.
The Avi file I have is 1,827 KB (4:17) but after running my code to convert the frames into arrays of number I now have a file that is 1,850,401 KB. This seems a little large to me.
How can I reduce the size of my data set / where did I go wrong?
# Program To Read video
# and Extract Frames
import cv2
import numpy as np
import time
# Function to extract frames
def FrameCapture(path):
# Path to video file
vidObj = cv2.VideoCapture(path)
# Used as counter variable
count = 0
# checks whether frames were extracted
success = 1
newDataSet = []
try:
while success:
# vidObj object calls read
# function extract frames
success, image = vidObj.read()
img_reverted = cv2.bitwise_not(image)
new_img = img_reverted / 255.0
newDataSet.append(new_img)
#new_img >> "frame%d.txt" % count
# Saves the frames with frame-count
#cv2.imwrite("frame%d.jpg" % count, image)
count += 1
except:
timestr = time.strftime("%Y%m%d-%H%M%S")
np.save("DataSet" + timestr , newDataSet)
# Driver Code
if __name__ == '__main__':
# Calling the function
FrameCapture("20191212-150041output.avi")
I'm going to guess that the video mainly consist of similar pixels blocked together that the video have compressed to such a low file size. When you load single images into arrays all that compression goes away and depending on the fps of the video you will have thousands of uncompressed images. When you first load an image it will be saved as a numpy array of dtype uint8 and the image size will be WIDTH * HEIGHT * N_COLOR_CHANNELS bytes. After you divide it with 255.0 to normalize between 0 and 1 the dtype changes to float64 and the image size increases eightfold. You can use this information to calculate expected size of the images.
So your options is to either decrease the height and width of your images (downscale), change to grayscale or if your application allows it to stick with uint8 values. If the images doesn't change too much and you don't need thousands of them you could also only save every 10th or whatever seems reasonable. If you need them all as is but they don't fit in memory consider using a generator to load them on demand. It will be slower but at least it will run.

Opencv Python open dng format

I can't figure out how to open a dng file in opencv.
The file was created when using the pro options of the Samsung Galaxy S7.
The images that are created when using those options are a dng file as well as a jpg of size 3024 x 4032 (I believe that is the dimensions of the dng file as well).
I tried using the answer from here (except with 3 colors instead of grayscale) like so:
import numpy as np
fd = open("image.dng", 'rb')
rows = 4032
cols = 3024
colors = 3
f = np.fromfile(fd, dtype=np.uint8,count=rows*cols*colors)
im = f.reshape((rows, cols,colors)) #notice row, column format
fd.close()
However, i got the following error:
cannot reshape array of size 24411648 into shape (4032,3024,3)
Any help would be appreciated
As far as i know it is possible that DNG files can be compressed (even though it is lossless format), so you will need to decode your dng image first. https://www.libraw.org/ is capable of doing that.
There is python wrapper for that library (https://pypi.python.org/pypi/rawpy)
import rawpy
import imageio
path = 'image.dng'
with rawpy.imread(path) as raw:
rgb = raw.postprocess()
process_raw supports both read and write .dng format raw image. Here is a python example:
import cv2
from process_raw import DngFile
# Download raw.dng for test:
# wget https://github.com/yl-data/yl-data.github.io/raw/master/2201.process_raw/raw-12bit-GBRG.dng
dng_path = "./raw-12bit-GBRG.dng"
dng = DngFile.read(dng_path)
rgb1 = dng.postprocess() # demosaicing by rawpy
cv2.imwrite("rgb1.jpg", rgb1[:, :, ::-1])
rgb2 = dng.demosaicing(poww=0.3) # demosaicing with gamma correction 0.3
cv2.imwrite("rgb2.jpg", rgb2[:, :, ::-1])
DngFile.save(dng_path + "-save.dng", dng.raw, bit=dng.bit, pattern=dng.pattern)

Load a tiff stack in a numpy array with python

I am having a little issue with .tif files. I am sure it is only a minor problem that I canĀ“t get around (keep in mind, I am a relatively new programmer).
Basically: I have prepared .tif files that are 64x64xn in size (n up until 1000). The image is only a single file that contains all of this slices. I would like to load the image into a (multidimensional) numpy array. I have tried:
from PIL import Image as pilimage
file_path=(D:\luca\test\test.tif)
print("The selected stack is a .tif")
dataset = pilimage(file_path)
tiffarray = np.array(dataset)
expim = tiffarray.astype(np.double);
print(expim.shape)
and other things (like tifffile). I only seem to be able to read the first slice of the stack. Is it possible for "expim" to contain all information that is saved in the tiff stack?
I am not sure if there is a way to get PIL to open multiple slices of a tiff stack.
If you are not bound to using PIL, however, an alternative is scikit-image, which opens multiple slices from a tiff stack by default. Here is some sample code of how to load a tiff stack into a Numpy array using scikit-image:
>>> from skimage import io
>>> im = io.imread('an_image.tif')
>>> print(im.shape)
(2, 64, 64)
Note that the imread function loads the image directly into a Numpy array. Also, the dimensions of the resulting array are ordered (z, y, x) where z represents the depth, y represents the height, and x represents the width. Thus, to get a single slice from the stack all you have to do is:
>>> print(im[1].shape)
(64, 64)
PIL has a function seek to move to different slices of a tiff stack.
from PIL import Image
file_path=(D:\luca\test\test.tif)
print("The selected stack is a .tif")
dataset = Image.open(file_path)
h,w = np.shape(dataset)
tiffarray = np.zeros((h,w,dataset.n_frames))
for i in range(dataset.n_frames):
dataset.seek(i)
tiffarray[:,:,i] = np.array(dataset)
expim = tiffarray.astype(np.double);
print(expim.shape)

Categories