Read large numpy-array as czi-file in python

Read large numpy-array as czi-file in python - python

I have the following snippet:
from aicspylibczi import CziFile
from pathlib import Path
pth = Path('/Volumes/USB/20x_HE.czi')
czi = CziFile(pth)
image, shp = czi.read_image(C=0, M=0) # very slow
The parameters C und M are there to slice the big array in to little numpy pieces.
The File is 3,4GB big and it is taking to long(with 8GB RAM Macbook) so I abort it always.
I think thats not okay because I want to have the first slice of the array, not the whole matrix.

You can try slideio python package (http://slideio.com). It makes use of internal image pyramids. You can read the image partially with high resolution or the whole image with low resolution.
The code below rescales the image so that the width of the delivered raster will be 500 pixels (the height is computed to keep the image size ratio).
import slideio
slide = slideio.open_slidei(file_path="/data/a.czi",driver_id="CZI")
scene = slide.get_scene(0)
block = scene.read_block(size=(500,0))

By slice do you mean the first slice of a z-stack? The package you are using, aicspylibczi, allows you to specify a z coordinate e.g. to read the first z-slice:
image, shp = czi.read_image(C=0, M=0, Z=0)

Related

zonal_stats: width and height must be > 0 error

I am trying to use the function zonal_stats from rasterstats Python package to get the raster statistics from a .tif file of each shape in a .shp file. I manage to do it in QGIS without any problems, but I have to do the same with more than 200 files, which will take a lot of time, so I'm trying the Python way. Both files and replication code are in my Google Drive.
My script is:
import rasterio
import geopandas as gpd
import numpy as np
from rasterio.plot import show
from rasterstats import zonal_stats
from rasterio.transform import Affine
# Import .tif file
raster = rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Arroz_2019-03.tif')
# Read the raster values
array = raster.read(1)
# Get the affine
affine = raster.transform
# Import shape file
shapefile = gpd.read_file(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Setores_Censit_SP_WGS84.shp')
# Zonal stats
zs_shapefile = zonal_stats(shapefile, array, affine = affine,
stats=['min', 'max', 'mean', 'median', 'majority'])
I get the following error:
Input In [1] in <cell line: 22>
zs_shapefile = zonal_stats(shapefile, array, affine = affine,
File ~\Anaconda3\lib\site-packages\rasterstats\main.py:32 in zonal_stats
return list(gen_zonal_stats(*args, **kwargs))
File ~\Anaconda3\lib\site-packages\rasterstats\main.py:164 in gen_zonal_stats
rv_array = rasterize_geom(geom, like=fsrc, all_touched=all_touched)
File ~\Anaconda3\lib\site-packages\rasterstats\utils.py:41 in rasterize_geom
rv_array = features.rasterize(
File ~\Anaconda3\lib\site-packages\rasterio\env.py:387 in wrapper
return f(*args, **kwds)
File ~\Anaconda3\lib\site-packages\rasterio\features.py:353 in rasterize
raise ValueError("width and height must be > 0")
I have found this question about the same problem, but I can't make it work with the solution: I have tried to reverse the signal of the items in the Affine of my raster data, but I couldn't make it work:
''' Trying to use the same solution of question: https://stackoverflow.com/questions/62010050/from-zonal-stats-i-get-this-error-valueerror-width-and-height-must-be-0 '''
old_tif = rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Arroz_2019-03.tif')
print(old_tif.profile) # copy & paste the output and change signs
new_tif_profile = old_tif.profile
# Affine(0.004611149999999995, 0.0, -46.828504575,
# 0.0, 0.006521380000000008, -24.01169169)
new_tif_profile['transform'] = Affine(0.004611149999999995, 0.0, -46.828504575,
0.0, -0.006521380000000008, 24.01169169)
new_tif_array = old_tif.read(1)
new_tif_array = np.fliplr(np.flip(new_tif_array))
with rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\tentativa.tif', "w", **new_tif_profile) as dest:
dest.write(new_tif_array, indexes=1)
dem = rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\tentativa.tif')
# Read the raster values
array = dem.read(1)
# Get the affine
affine = dem.transform
# Import shape file
shapefile = gpd.read_file(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Setores_Censit_SP_WGS84.shp')
# Zonal stats
zs_shapefile = zonal_stats(shapefile, array, affine=affine,
stats=['min', 'max', 'mean', 'median', 'majority'])
Doing this way, I don't get the "width and height must be > 0" error! But every stat in zs_shapefile is "NoneType", so it doesn't help my problem.
Does anyone understands why this error happens, and which sign I have to reverse for making it work? Thanks in advance!

I would be careful with overriding the geotransform of your raster like this, unless you are really convinced the original metadata is incorrect. I'm not too familiar with Affine, but it looks like you're setting the latitude now as positive? Placing the raster on the northern hemisphere. My guess would be that this lack of intersection between the vector and raster causes the NoneType results.
I'm also not familiar with raster_stats, but I'm guessing it boils down to GDAL & Numpy at the core of it. So something you could try as a test is to add the all_touched=True keyword:
https://pythonhosted.org/rasterstats/manual.html#rasterization-strategy
If that works, it might indicate that the rasterization fails because your polygons are so small compared to the pixels, that the default rasterization method results in a rasterized polygon of size 0 (in at least one of the dimensions). And that's what the error also hints at (my guess).
Keep in mind that all_touched=True changes the stats you get in result, so I would only do it for testing, or if you're comfortable with this difference.
If you really need a valid value for these (too) small polygons, there are a few workarounds you could try. Something I've done is to simply take the centroid for these polygons, and take the value of the pixel where this centroid falls on.
A potential way to identify these polygons would be to use all_touched with the "count" statistic, every polygon with a count of only 1 might be too small to get rasterized correctly. To really find this out you would probably have to do the rasterization yourself using GDAL, given that raster_stats doesn't seem to allow it.
Note that due to the shape of some of the polygons you use, the centroid might fall outside of the polygon. But given how course your raster data is, relative to the vector, I don't think it would impact the result all that much.
An alternative is, instead of modifying the vector, to significantly increase the resolution of your raster. You could use gdal_translate to output this to a VRT, with some form of resampling, and avoid having to write this data to disk. Once the resolution is high enough that all polygons rasterize to at least a 1x1 array, it should probably work. But your polygons are tiny compared to the pixels, so it'll be a lot. You could guess it, or analyze the envelopes of all polygons. For example take the smallest edge of the envelope as more or less the resolution that's necessary for a correct rasterization.
Edit; To clarify the above a bit further.
The default rasterization strategy of GDAL (all_touched=False) is to consider a pixel "within" the polygon if the centroid of the pixel intersects with the polygon.
Using QGIS you can for example convert the pixels to points, and then do a spatial join with your vector. If you remove polygons that can't be joined (there's a checkbox), you'll get a different vector that most likely should work with raster_stats, given your current raster.
You could perhaps use that in the normal way (all_touched=False), and get the stats for the small polygons using all_touched=True.
In the image below, the green polygons are the ones that intersect with the centroid of a pixel, the red ones don't (and those are probably the ones raster_stats "tries" to rasterize to a size 0 array).

Moving/running window of a Multi-dimensional image array

I am trying to work on an efficient numpy solution to perform a running average of an array of color images across the 4th dimension. A set of color images in a directory is read in a loop and I would like to average in subsets of 3. ie. If there are n = 5 color images in the directory I would like to average [1,2,3],[2,3,4], [3,4,5], [4,5,1], and [5,1,2] thus writing 5 output average images.
from os import listdir
from os.path import isfile, join
import numpy as np
import cv2
from matplotlib import pyplot as plt
mypath = 'C:/path/to/5_image/dir'
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
img = np.empty(len(onlyfiles), dtype=object)
temp = np.zeros((960, 1280, 3, 3), dtype='uint8')
temp_avg = np.zeros((960, 1280, 3), dtype='uint8')
for n in range(0, len(onlyfiles)):
img[n] = cv2.imread(join(mypath, onlyfiles[n]))
for n in range(0, len(img)):
if (n+2) < len(img)-1:
temp[:, :, :, 0] = img[n]
temp[:, :, :, 1] = img[n + 1]
temp[:, :, :, 2] = img[n + 2]
temp_avg = np.mean(temp,axis=3)
plt.imshow(temp_avg)
plt.show()
else:
break
This script is in no way complete or elegant. The issues i am having is while plotting the average images the color space seems distorted and appears like CMKY. I am not accounting for the last two moving windows [4,5,1] and [5,1,2]. Critique and suggestions welcome.

For performing local operations (such as a running average) across the pixels of an image (or across multiple images), convolution with a kernel is usually a good approach.
Here's how this could be done in your case.
Generating Some Example Data
I used the following to generate 10 images containing random noise to work with:
for i in range(10):
an_img = np.random.randint(0, 256, (960,1280,3))
cv2.imwrite("img_"+str(i)+".png", an_img)
Preparing the Images
This is how I load the images back in:
# Get target file names
mypath = os.getcwd() # or whatever path you like
fnames = [f for f in listdir(mypath) if f.endswith('.png')]
# Create an array to hold all the images
first_img = cv2.imread(join(mypath, fnames[0]))
y,x,c = first_img.shape
all_imgs = np.empty((len(fnames),y,x,c), dtype=np.uint8)
# Load all the images
for i,fname in enumerate(fnames):
all_imgs[i,...] = cv2.imread(join(mypath, fnames[i]))
Some notes:
I use f.endswith('.png') to be a bit more specific with how I generate the list of filenames, allowing other files to be in the same directory without causing problems.
I place all of the images in a single 4D uint8 array of shape (image,y,x,c) instead of the object array you were using. This is necessary to employ the convolution approach below.
I use the first image to get the dimensions of the images, which makes the code just a little bit more general.
Performing Local Averaging by Kernel Convolution
This is all it takes.
from scipy.ndimage import uniform_filter
done = uniform_filter(all_imgs, size=(3,0,0,0), origin=-1, mode='wrap')
Some notes:
I am using scipy.ndimage because it readily allows for its convolution filters to be applied to images with many dimensions (4 in your case). For cv2, I am only aware of cv2.filter2D, which does not have that functionality as far as I know. However, I am not very familiar with cv2, so I may be wrong about this (will edit if someone corrects me in a comment).
The size kwarg specifies the size of the kernel to use along each dimension of the array. By using (3,0,0,0), I make sure that only the first dimension (=the different images) is used for the averaging.
By default, the running window (or rather the kernel) is used to compute the value of its central pixel. To match this more closely with your code, I used origin=-1, so the kernel computes the value of the pixel one to the left of its center.
By default, the edge cases (the two last images in this case) are handled by padding with a reflection. Your question suggests that what you want is to use the first images again instead. This is done using mode='wrap'.
By default, the filter returns the result in the same dtype as the input, here np.uint8. This is probably desirable, but your example code produces floats, so perhaps you want the filter to return floats as well, which you can do by simply changing the dtype of the input, i.e. done = uniform_filter(all_imgs.astype(np.float), size....
As for the distorted color space when you plot your averages; I cannot reproduce that. Your approach seems to produce the correct output for my random noise example images (after correction of the issue I pointed out in my comment to your question). Perhaps you could try plt.imshow(temp_avg, interpolation='none') to avoid possible artefacting from imshow's interpolation?

Resample all images in the database to the same voxel size

I have 3 dicom stacks of size 512x512x133, 512x512x155 and 512x512x277. I would like to resample all the stack to make the sizes 512x512x277, 512x512x277 and 512x512x277. How to do that?
I know I can do resampling using slice thickness and pixel spacing. But that would not ensure same number of slices in each cases.

You can use scipy.ndimage.interpolate.zoom, specifying the array of zoom factors for each axis like this:
# example for first image
zoomArray = desiredshape.astype(float) / original.shape
zoomed = scipy.ndimage.interpolate.zoom(original, zoomArray)
UPDATE:
If that is too slow, you could try somehow to create separate images from the vertical slices of your "image cube", process them with some high-speed image library (some folks love ImageMagick, there's also PIL, opencv, etc.), and stack them together again. That way, you'd take 512 images of size 512x133 and resize them to 512x277, then stack again to 512x512x277 which is your final desired size. Also, this separation would allow for parallelization. One think to consider is: this would only work if the transversal axis (the one along which you will slice the 2D images) would not be resized!

You can use the Resample transform in TorchIO.
import torchio as tio
small, medium, large = dicom_dirs # the folders of your three DICOMs
reference = tio.ScalarImage(large)
resample = tio.Resample(reference)
small_resampled = resample(small)
medium_resampled = resample(medium)
The three images now have the same shape, 512 x 512 x 277.
Disclaimer: I am the main developer of TorchIO.

OpenGL Texturing - some jpg's are being distorted in a strange way

I am trying to draw a textured square using Python, OpenGL and GLFW.
Here are all the images I need to show you.
Sorry for the way of posting images, but I don't have enough reputation to post more than 2 links (and I can't even post a photo).
I am getting this:
[the second image from the album]
Instead of that:
[the first image from the album]
BUT if I use some different jpg files:
some of them are being displayed properly,
some of them are being displayed properly until I rotate them 90 degrees (I mean using numpy rot90 function on an array with RGB components) and then send them to the GPU. And it looks like that (colors don't change, I only get some distortion):
Before rotation:
[the third image from the album]
After rotation:
[the fourth image from the album]
It all depends on a file.
Does anybody know what I do wrong? Or see anything that I don't see?
Code:
First, I do the thing with initializing glfw, creating a window, etc.
if __name__ == '__main__':
import sys
import glfw
import OpenGL.GL as gl
import numpy as np
from square import square
from imio import imread,rgb_flag,swap_rb
from txio import tx2gpu,txrefer
glfw.glfwInit()
win =glfw.glfwCreateWindow(800,800,"Hello")
glfw.glfwMakeContextCurrent(win)
glfw.glfwSwapInterval(1)
gl.glClearColor(0.75,0.75,0.75,1.0)
Then I load an image using OpenCV imread function and I remember about swapping red with blue. Then I send the image to gpu - I will describe tx2gpu in a minute.
image = imread('../imtools/image/ummagumma.jpg')
if not rgb_flag: swap_rb(image)
#image = np.rot90(image)
tx_id = tx2gpu(image)
The swap_rb() function (defined in a different file, imported):
def swap_rb(mat):
X = mat[:,:,2].copy()
mat[:,:,2] = mat[:,:,0]
mat[:,:,0] = X
return mat
Then comes the main loop (in a while I will describe txrefer and square):
while not glfw.glfwWindowShouldClose(win):
gl.glClear(gl.GL_COLOR_BUFFER_BIT)
txrefer(tx_id); square(2); txrefer(0)
glfw.glfwSwapBuffers(win)
glfw.glfwPollEvents()
And here is the end of the main function:
glfw.glfwDestroyWindow(win)
glfw.glfwTerminate()
NOW IMPORTANT THINGS:
A function that defines a square looks like that:
def square(scale=1.0,color=None,solid=True):
s = scale*.5
if type(color)!=type(None):
if solid:
gl.glBegin(gl.GL_TRIANGLE_FAN)
else:
gl.glBegin(gl.GL_LINE_LOOP)
gl.glColor3f(*color[0][:3]); gl.glVertex3f(-s,-s,0)
gl.glColor3f(*color[1][:3]); gl.glVertex3f(-s,s,0)
gl.glColor3f(*color[2][:3]); gl.glVertex3f(s,s,0)
gl.glColor3f(*color[3][:3]); gl.glVertex3f(s,-s,0)
else:
if solid:
gl.glBegin(gl.GL_TRIANGLE_FAN)
else:
gl.glBegin(gl.GL_LINE_LOOP)
gl.glTexCoord2f(0,0); gl.glVertex3f(-s,-s,0)
gl.glTexCoord2f(0,1); gl.glVertex3f(-s,s,0)
gl.glTexCoord2f(1,1); gl.glVertex3f(s,s,0)
gl.glTexCoord2f(1,0); gl.glVertex3f(s,-s,0)
gl.glEnd()
And texturing functions look like that:
import OpenGL.GL as gl
unit_symbols = [
gl.GL_TEXTURE0,gl.GL_TEXTURE1,gl.GL_TEXTURE2,
gl.GL_TEXTURE3,gl.GL_TEXTURE4,
gl.GL_TEXTURE5,gl.GL_TEXTURE6,gl.GL_TEXTURE7,
gl.GL_TEXTURE8,gl.GL_TEXTURE9,
gl.GL_TEXTURE10,gl.GL_TEXTURE11,gl.GL_TEXTURE12,
gl.GL_TEXTURE13,gl.GL_TEXTURE14,
gl.GL_TEXTURE15,gl.GL_TEXTURE16,gl.GL_TEXTURE17,
gl.GL_TEXTURE18,gl.GL_TEXTURE19,
gl.GL_TEXTURE20,gl.GL_TEXTURE21,gl.GL_TEXTURE22,
gl.GL_TEXTURE23,gl.GL_TEXTURE24,
gl.GL_TEXTURE25,gl.GL_TEXTURE26,gl.GL_TEXTURE27,
gl.GL_TEXTURE28,gl.GL_TEXTURE29,
gl.GL_TEXTURE30,gl.GL_TEXTURE31]
def tx2gpu(image,flip=True,unit=0):
gl.glActiveTexture(unit_symbols[unit])
texture_id = gl.glGenTextures(1)
gl.glBindTexture(gl.GL_TEXTURE_2D,texture_id)
gl.glTexParameteri(gl.GL_TEXTURE_2D,gl.GL_TEXTURE_WRAP_S,gl.GL_REPEAT)
gl.glTexParameteri(gl.GL_TEXTURE_2D,gl.GL_TEXTURE_WRAP_T,gl.GL_REPEAT)
gl.glTexParameteri(gl.GL_TEXTURE_2D,gl.GL_TEXTURE_MAG_FILTER,gl.GL_LINEAR)
gl.glTexParameteri(gl.GL_TEXTURE_2D,gl.GL_TEXTURE_MIN_FILTER,gl.GL_LINEAR)
yres,xres,cres = image.shape
from numpy import flipud
gl.glTexImage2D(gl.GL_TEXTURE_2D,0,gl.GL_RGB,xres,yres,0,gl.GL_RGB,gl.GL_UNSIGNED_BYTE,flipud(image))
gl.glBindTexture(gl.GL_TEXTURE_2D,0)
return texture_id
def txrefer(tex_id,unit=0):
gl.glColor4f(1,1,1,1);
gl.glActiveTexture(unit_symbols[unit])
if tex_id!=0:
gl.glEnable(gl.GL_TEXTURE_2D)
gl.glBindTexture(gl.GL_TEXTURE_2D,tex_id)
else:
gl.glBindTexture(gl.GL_TEXTURE_2D,0)
gl.glDisable(gl.GL_TEXTURE_2D)

The problem you have there are alignment issues. OpenGL initial alignment setting for "unpacking" images is that each row starts on a 4 byte boundary. This happens if the image width is not a multiple of 4 or if there are not 4 bytes per pixel. But it's easy enough to change this:
glPixelStorei(GL_UNPACK_ALIGNMENT, 1)
would probably do the trick for you. Call it right before glTex[Sub]Image.
Another thing: Your unit_symbols list is completely unnecessary. The OpenGL specification explicitly says that GL_TEXTUREn = GL_TEXTURE0 + n. You can simply do glActiveTexture(GL_TEXTURE0 + n). However when loading a texture image the unit is completely irrelevant; the only thing it may matter is, that loading a texture only goes with binding one, which happens in a texture unit; a texture can be bound in any texture unit desired.
Personally I use the highest texture unit for loading images, to avoid accidently clobbering required state.

How to reduce an image size in image processing (scipy/numpy/python)

Hello I have an image ( 1024 x 1024) and I used "fromfile" command in numpy to put every pixel of that image into a matrix.
How can I reduce the size of the image ( ex. to 512 x 512) by modify that matrix a?
a = numpy.fromfile(( - path - ,'uint8').reshape((1024,1024))
I have no idea how to modify the matrix a to reduce the size of the image. So if somebody has any idea, please share your knowledge and I will be appreciated. Thanks
EDIT:
When I look at the result, I found that the reader I got read the image and put it into a "matrix". So I changed the "array" to matrix.
Jose told me I can take only even column and even row and put it into a new matrix . That will reduce the image to half size. What command in scipy/numpy do I need to use to do that?
Thanks

If you want to resize to specific resolution, use scipy.misc.imresize:
import scipy.misc
i_width = 640
i_height = 480
scipy.misc.imresize(original_image, (i_height, i_width))

Use the zoom function from scipy:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.zoom.html#scipy.ndimage.zoom
from scipy.ndimage.interpolation import zoom
a = np.ones((1024, 1024))
small_a = zoom(a, 0.5)

I think the easyiest way is to take only some columns and some rows of the image. Makeing a sample of the array. Take for example, only those even rows and the even columns, put it in a new array and you would have a half size new image.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.