I know that item assignment in Dask arrays is not possible but I wonder if there is a better way of doing the following.
The (simplified) problem is as follow: I have got 100 images (2048x2048) that need to be mosaicked together to make a large 20000x20000 mosaic. These images do overlap so as a first step I compute the offsets between pairs and then their absolute position in the large mosaic. Now it comes the problem of creating the mosaic.
Currently what I am doing is sending all the small images (as dask arrays) to a delayed function that uses normal numpy slicing to build the mosaic. Issues with this approach is that is not done in parallel, it requires quite a bit of memory per mosaic since the arrays are stored in just one process/thread and there is a communication overhead because all images in dask arrays need to be transferred to a single thread. Anyway this is the pseudocode:
#delayed
def mosaic(imgs, abs_pos, output_shape):
big_image = np.zeros(output_shape)
weight_image = np.zeros(output_shape)
for i in range(len(imgs)):
im= imgs[i]
y_pos, x_pos = abs_pos[i]
y_size, x_size = im.shape
big_image[y_pos + y_size, x_pos + x_size] += im
weight_image[y_pos + y_size, x_pos + x_size] += 1
return big_image, weight_image
here imgs is a list of dask arrays, abs_pos is a lost containing the absolute position of one of the corners of each image in the mosaic and output_shape is 20000x20000.
In reality several of these mosaics need to be computed, so the task stream plot looks like the following:
dask stream
where it is building here three mosaics, red is transfer of data to the thread and green the mosaic function.
I am wondering if there is a better solution to make this more parallel.
Related
I've been fooling around lately with taking the webcam's video steam and giving it a pixel-dependent time delay.
A very simple example for that idea is the famous rolling shutter, but when applied in order of seconds instead of 1/100ths, it looks like this https://youtu.be/mQ0hS7l9ckY
Now, rolling shutter is fun and all, but I want something more general. I want a delay map, a (height, width, 3) shaped array that tells my how far back to go in the video. A pseudo-code for this would be
output_image[y, x, c] = video_cache[delay_map[y,x,c], y, x, c]
where the first index of the video cache is time, y,x are self-explanatory, and c is the color channel (BGR because open cv is weird).
In essence, each pixel of the output is a pixel of the video at the same position, but at a time determined by the delay map at the very same position.
Here's the solution I have now: I flattened everything, I access the video cache similar to how you unravel multi-index nonsense, and once I'm done I reshape the result into an image.
This solution works pretty fast, and I'm pretty proud of it. It almost keeps up with my webcam's frame rate (I think I average on 20 of these per second).
I think the flattening and reshaping of each frame costs me some time, and if I could get rid of those I'd get much better results.
Link to the whole file at the bottom.
Here's a skeleton of my implementation.
I have a class called CircularCacheDelayAccess. It stores a cache of video frames (with given number of frames, called cache_size in my implementation). It enables you to store frames, and get the delay-mapped frame.
Instead of pushing all the frames around each time I store a new one, I keep an index that goes around in a circle, and video[delay=3] would be found via something like cache[index-3]. Thanks to python's funny negative index tricks, I don't even have to get the positive modulo.
The delay_map is actually a float array; when I use circ_cache.getFrame I input the integer part of delay_map.flatten(), and then I use the fractional part to interpolate between frames.
class CircularCacheDelayAccess:
def __init__(self, img_shape: tuple, cache_size: int):
self.image_shape = img_shape
self.cache_size = cache_size
# some useful stuff
self.multi_index_shape = (cache_size,) + img_shape
self.image_size = int(np.prod(img_shape))
self.size = cache_size * self.image_size
# the index, going around in circles
self.cache_index = 0
self.cache = np.empty(self.size)
# raveled_image_indices is a running index over a frame; it is the same thing as writing
# y, x, c = np.mgrid[0:height, 0:width, 0:3]
# raveled_image_indices = c + 3 * (x + width * y)
# but it's a lot easier
self.raveled_image_indices = np.arange(self.image_size)
def store(self, image: np.ndarray):
# (in my implementation I check that the shape matches and raise a ValueError if it does not)
self.cache_index = (self.cache_index + 1) % self.cache_size
# since cache holds entire image frames, the start of each frame is index * image size
cIndex = self.image_size * self.cache_index
self.cache[cIndex: cIndex + self.image_size] = image.flatten()
def getFrame(self, delay_map: np.ndarray):
# delay_map may either have shape == self.image_shape, or shape = (self.image_size,)
# (more asserts, for the shape of delay_map, and to check its values do not exceed the cache size)
# (if delay_map.shape == image_shape, I flatten it. If we were already given a flattened version,
# there's no need to do so)
frame = self.cache[self.image_size * (self.cache_index - delay_map) + self.raveled_image_indices]\
.reshape(self.image_shape)
return frame
As I've already stated, this works pretty good, but I think I could get it to work better if I could just side-step the flatten and reshape steps.
Also, keeping a flattened version of an array that makes sense in its full-shaped form is pretty awkward.
And, I've mentioned the interpolation part. It felt wrong to do that in CircularCacheDelayAccess, but doing the interpolation after I getFrame twice means I need the fractional part of delay_map to be in the full-shaped form, and I need the int part flattened, which is pretty silly.\
Here are some fun examples which would probably be pretty hard to understand without seeing the video, but are still fun to look at. It looks even better with a face, but I don't think I should show my face here, so sorry about that:
horizontal rolling shutter, color delay psychedelia, my weirdest effect so far
And here is a link to the entire code, with capture and stuff if you wanna mess around with it and read the entire code.
Thanks in advance!
Currently I have some tracks that simulate people walking around in an area 1280x720pixels that span over 12 hours. The recordings stored are x, y coordinate and the timeframe (in seconds) of the specific recordings.
I want to create a movie sequence that shows how the people are walking over the 12 hours. To do this I split the data into 43200 different frames. This correspond to every frame is one second. My end goal is to use these data in a machine learning algorithm.
The idea is then simple. Initilize the frames, loop through all the x,y coordinate and add them to the array frames with their respective timeframe:
>>> frames = np.zeros((43200, 1280, 720,1))
>>> for track in tracks:
>>> for x,y,time in track:
>>> frames[int(time), y,x] = 255 # to visualize the walking
This will in theory create a frame of 43200 that can be saved as a mp4, gif or some other format and be played. However, the problem occurs when I try to initialize the numpy array:
>>> np.zeros((43200,720,1280,1))
MemoryError: Unable to allocate 297. GiB for an array with shape (43200, 1280, 720, 1) and data type float64
This makes sense because im trying to allocate:
>>> (43200 * 1280 * 720 * 8) * 1024**3
296.630859375
I then thought about saving each frame to a npy file but each file will be 7.4MB which sums up to 320GB.
I also thought about splitting the frames up into five different arrays:
>>> a = np.zeros((8640, 720, 1280, 1))
>>> b = np.zeros((8640, 720, 1280, 1))
>>> c = np.zeros((8640, 720, 1280, 1))
>>> d = np.zeros((8640, 720, 1280, 1))
>>> e = np.zeros((8640, 720, 1280, 1))
But I think that seems cumbersome and it does not feel like the best solution. It will most likely slow the training of my machine learning algorithm. Is there a smarter way to do this?
I would just build the video a few frames at a time, then join the frames together using ffmpeg. There should be no need to store the whole video in memory at once based on the description of the use case.
I think you will have to split your data in different, small arrays, and that probably won't be an issue for machine learning purposes.
However, I don't know if you will be able to create these five numpy arrays as they will also take a total of 297Gb of RAM.
I would probably :
save the numpy arrays as PNGs using for instance matplotlib.pyplot.imsave, or
store them as short videos, as a person won't be seen more than that on your video anyway, or
reduce the fps or the resolution if you really want the whole video in one variable
Let me also add that :
The snippet of code you gave can be executed in a much faster time with frames = np.ones((43200, 1280, 720,1))*255, as intricated for loops are very expensive
If you were to create an array by setting all of its coefficients one by one, it would be more effective to initialize it with np.empty(shape), as it would spare you the time needed to put all the coefficients to zero only to overwrite them in your for loop
I copied some image data to an instance on Google Cloud (8 vCPU's, 64GB memory, Tesla K80 GPU) and am running into memory problems when converting the raw data into features, and changing the data structure of the output. Eventually I'd like to use the derived features in Keras/Tensorflow neural net.
Process
After copying the data to a storage bucket, I run a build_features.py function to convert the raw data into processed data for the neural network. In this pipeline, I first take each raw image and put it into a list x (which stores the derived features).
Since I'm working with a large number of images (tens of thousands of images that are type float32 and have dimensions 250x500x3) the list x becomes quite large. Each element of x is numpy array that stores the image in shape 250x500x3.
Problem 1 - reduced memory as list x grows
I took 2 screenshots that show available memory decreasing as x grows (below). I'm eventually able to complete this step but I'm only left with a few GB of memory so I definitely want to fix this (in the future I want to work with larger data sets). How can I build features in a way where I'm not limited by the size of x?
Problem 2 - Memory error when converting x into numpy array
The step where the instance actually fails is the following:
x = np.array(x)
The failure message is:
Traceback (most recent call last):
File "build_features.py", line 149, in <module>
build_features(pipeline='9_11_2017_fan_3_lights')
File "build_features.py", line 122, in build_features
x = np.array(x)
MemoryError
How can I adjust this step so that I don't run out of memory?
Your code has two copies of every image - one in the list, and one in the array:
images = []
for i in range(many):
images[i] = load_img(i) # here's the first image
x = np.array(images) # joint them all together into a second copy
Just load the images straight into the array
x = np.zeros((many, 250, 500, 3)
for i in range(many):
x[i] = load_img(i)
Which means that you only hold a copy of one image at a time.
If you don't know the size or dtype of the image ahead of time, or don't want to hard code it, you can use:
x0 = load_img(0)
x = np.zeros((many,) + x0.shape, x0.dtype)
for i in range(1, many):
x[i] = load_img(i)
Having said that, you're on a tricky path here. If you don't have enough room to store your dataset twice in memory, you also don't have room to compute y = x + 1.
You might want to consider using np.float16 to buy more storage, at the cost of precision
I have 3 dicom stacks of size 512x512x133, 512x512x155 and 512x512x277. I would like to resample all the stack to make the sizes 512x512x277, 512x512x277 and 512x512x277. How to do that?
I know I can do resampling using slice thickness and pixel spacing. But that would not ensure same number of slices in each cases.
You can use scipy.ndimage.interpolate.zoom, specifying the array of zoom factors for each axis like this:
# example for first image
zoomArray = desiredshape.astype(float) / original.shape
zoomed = scipy.ndimage.interpolate.zoom(original, zoomArray)
UPDATE:
If that is too slow, you could try somehow to create separate images from the vertical slices of your "image cube", process them with some high-speed image library (some folks love ImageMagick, there's also PIL, opencv, etc.), and stack them together again. That way, you'd take 512 images of size 512x133 and resize them to 512x277, then stack again to 512x512x277 which is your final desired size. Also, this separation would allow for parallelization. One think to consider is: this would only work if the transversal axis (the one along which you will slice the 2D images) would not be resized!
You can use the Resample transform in TorchIO.
import torchio as tio
small, medium, large = dicom_dirs # the folders of your three DICOMs
reference = tio.ScalarImage(large)
resample = tio.Resample(reference)
small_resampled = resample(small)
medium_resampled = resample(medium)
The three images now have the same shape, 512 x 512 x 277.
Disclaimer: I am the main developer of TorchIO.
I've created a class of which I pass an image (2D array, 1280x720). It's suppose to iterate through, looking for the highest value:
import bumpy as np
class myCv:
def maxIntLoc(self,image):
intensity = image[0,0] #columns, rows
coordinates = (0,0)
for y in xrange(0,len(image)):
for x in xrange(0,len(image[0])):
if np.all(image[x,y] > intensity):
intensity = image[x,y]
coordinates = (x,y)
return (intensity,coordinates)
Yet when I run it I get the error:
if np.all(image[x,y] > intensity):
IndexError: index 720 is out of bounds for axis 0 with size 720
Any help would be great as I'm new to Python.
Thanks,
Shaun
Regardless of the index error that you are experience, which has been addressed by others, iterating through pixels/voxels is not a valid method for manipulating images. The issue becomes particularly evident in multi-dimensional images, where you face the curse of dimensionality.
The correct way to do this is to use vectorisation in programming languages that support it (e.g. Python, Julia, MATLAB). Through this method, you will achieve the results you're looking for much more efficiently (and thousands of times faster). Click here to find out more about vectorisation (aka. array programming). In Python, this can be achieved either using generators, which are not suitable for images as they don't really produce the results until called; or using NumPy arrays.
Here is an example:
Masking image matrices by vectorisation
from numpy.random import randint
from matplotlib.pyplot import figure, imshow, title, grid, show
def mask_img(img, thresh, replacement):
# Copy of the image for masking. Use of |.copy()| is essential to
# prevent memory mapping.
masked = initial_image.copy()
# Replacement is the value to replace anything that
# (in this case) is bellow the threshold.
masked[initial_image<thresh] = replacement # Mask using vectorisation methods.
return masked
# Initial image to be masked (arbitrary example here).
# In this example, we assign a 100 x 100 matrix of random integers
# between 1 and 256 as our sample image.
initial_image = randint(0, 256, [100, 100])
threshold = 150 # Threshold
# Masking process.
masked_image = mask_img(initial_image, threshold, 0)
# Plots.
fig = figure(figsize=[16,9])
fig.add_subplot(121)
imshow(initial_image, interpolation='None', cmap='gray')
title('Initial image')
grid('off')
fig.add_subplot(122)
imshow(masked_image, interpolation='None', cmap='gray')
title('Masked image')
grid('off')
show()
Which returns:
Of course you can put the masking process (function) in a loop to do this on a batch of images. You can modify the indices and do it on 3D, 4D (e.g. MRI), or 5D (e.g. CAT scan) images too, without the need to iterate over each individual pixel or voxel.
Hope this helps.
In python, like most programming languages, indexes start at 0.
So you can access only pixels from 0 to 719.
Check with a debug print that len(image) and len(image[0]) are indeed returning 1280 and 720.