Currently I have some tracks that simulate people walking around in an area 1280x720pixels that span over 12 hours. The recordings stored are x, y coordinate and the timeframe (in seconds) of the specific recordings.
I want to create a movie sequence that shows how the people are walking over the 12 hours. To do this I split the data into 43200 different frames. This correspond to every frame is one second. My end goal is to use these data in a machine learning algorithm.
The idea is then simple. Initilize the frames, loop through all the x,y coordinate and add them to the array frames with their respective timeframe:
>>> frames = np.zeros((43200, 1280, 720,1))
>>> for track in tracks:
>>> for x,y,time in track:
>>> frames[int(time), y,x] = 255 # to visualize the walking
This will in theory create a frame of 43200 that can be saved as a mp4, gif or some other format and be played. However, the problem occurs when I try to initialize the numpy array:
>>> np.zeros((43200,720,1280,1))
MemoryError: Unable to allocate 297. GiB for an array with shape (43200, 1280, 720, 1) and data type float64
This makes sense because im trying to allocate:
>>> (43200 * 1280 * 720 * 8) * 1024**3
296.630859375
I then thought about saving each frame to a npy file but each file will be 7.4MB which sums up to 320GB.
I also thought about splitting the frames up into five different arrays:
>>> a = np.zeros((8640, 720, 1280, 1))
>>> b = np.zeros((8640, 720, 1280, 1))
>>> c = np.zeros((8640, 720, 1280, 1))
>>> d = np.zeros((8640, 720, 1280, 1))
>>> e = np.zeros((8640, 720, 1280, 1))
But I think that seems cumbersome and it does not feel like the best solution. It will most likely slow the training of my machine learning algorithm. Is there a smarter way to do this?
I would just build the video a few frames at a time, then join the frames together using ffmpeg. There should be no need to store the whole video in memory at once based on the description of the use case.
I think you will have to split your data in different, small arrays, and that probably won't be an issue for machine learning purposes.
However, I don't know if you will be able to create these five numpy arrays as they will also take a total of 297Gb of RAM.
I would probably :
save the numpy arrays as PNGs using for instance matplotlib.pyplot.imsave, or
store them as short videos, as a person won't be seen more than that on your video anyway, or
reduce the fps or the resolution if you really want the whole video in one variable
Let me also add that :
The snippet of code you gave can be executed in a much faster time with frames = np.ones((43200, 1280, 720,1))*255, as intricated for loops are very expensive
If you were to create an array by setting all of its coefficients one by one, it would be more effective to initialize it with np.empty(shape), as it would spare you the time needed to put all the coefficients to zero only to overwrite them in your for loop
Related
Let a numpy array video of shape (T,w,h,3) be given. Here T is a positive integer representing number of frames, w is a positive integer representing the width, h is a positive integer representing the height. Every entry of video is an integer from 0 to 255. In other words, video is a numpy array represents a video in the sense that video[t] is an RGB image for every non-negative integer t < T. After video is given, another array of floats time of shape (T) is given. This array time satisfy time[0]=0 and time[t] < time[t+1] for every non-negative integer t < T. An example of the above situation is given here:
import numpy as np
shape = (200, 500, 1000, 3)
random = np.random.randint(0, 255, shape, dtype= np.uint16)
time = np.zeros((shape[0]), dtype = np.float16)
time[0] = 0
for i in range(1, shape[0]):
x = np.random.random_sample()
time[i] = time[i-1] + x
My goal is to save video and time a playable video file such that:
The video file is in format of either avi or mp4 (so that we can just double click it and play it).
Each frame of the video respects the time array in the following sense: for every non-negative integer t < T, the viewer is seeing the picture video[t] during the time period from time[t] to time[t+1]. The moment time[T-1] is the end of the video.
If possible, keep the original size (in the given example the size is (500,1000)).
How can I achieve this? I tried using the opencv's video writer and it seems I have to enter some fps information which I do not have because the time array can be very non-uniform in terms of when each picture is displayed.
That is impossible with OpenCV. OpenCV's VideoWriter only supports fixed/constant frame rate. Anything based on that will require rounding to the nearest frame time and/or higher-than-necessary frame rates and duplicated frames (or rather frames that contain no change).
You want presentation timestamps (PTS). That's an inherent aspect of media containers and streams. A word of caution: some video players may assume a "reasonable" time span between frames, and may glitch otherwise, like becoming laggy/unresponsive because the whole GUI is tied to video timing... That's the fault of the video player though.
Use PyAV. It's the only ffmpeg wrapper for python I know of that actually uses API calls rather than messing around with subprocesses.
Here's the relevant example: https://github.com/PyAV-Org/PyAV/blob/main/examples/numpy/generate_video_with_pts.py
In short: set frame.pts = int(round(my_pts / stream.codec_context.time_base)) where my_pts is something in seconds.
I wrote that example, derived from the sibling "fixed rate" example. I put some effort into getting the ffmpeg API usage "right" (time bases, containers/streams/contexts) but if it happens to fail or act up, you're allowed and encouraged to question what I did there.
The solution to your problem is to generate all the video frames necessary for a given value of FPS and as a video needs a constant frame rate you have to decide first at which granularity you want your video.
After you have decided the FPS value you go and generate all the required video frames, so you can use the export to video method with a constant frame rate.
The numpy arrays representing the image of the frame stay in the video array same as the last one displayed until there is time to change to another one. The chosen frame rate FPS decides then with which accuracy the changes to new frame image hit the specified time values.
Below Python code with an improved version of getting the time values. It generates all the video frames and the explanations are implemented by self-explaining choice of variable names. The logic behind the algorithm used is to generate a single frame image and repeat it as frame of the resulting video as long as the next value on the time axis is not reached. If the next value on the time axis is reached a new image is generated and repeated as long as the video time does not exceed the next time value. The code writes the created data to an .mp4 file:
import numpy as np
import cv2 as cv
FPS = 15
fps_timeDelta = 1/FPS
noOfImages = 5 # 200
imageShape = (210, 297 , 3) # (500, 1000, 3)
vidWriter = cv.VideoWriter(
'opencv_writeVideo.mp4',
cv.VideoWriter_fourcc(*'MPEG'),
FPS, (imageShape[1], imageShape[0 ])
)
vidFrameTime = np.concatenate(
(np.zeros(1), np.add.accumulate(
np.random.random_sample(size=noOfImages)))
)
vidTime = 0.0
indxVidFrameTime = 1
singleImageRGB = np.random.randint(
0, 256, imageShape, dtype= np.uint8)
cv.imshow("singleImageRGB", singleImageRGB/255 )
cv.waitKey(0)
while vidTime <= vidFrameTime[-1]:
vidTime += fps_timeDelta
if vidTime >= vidFrameTime[indxVidFrameTime]:
singleImageRGB = np.random.randint(0, 255, imageShape, dtype= np.uint8)
indxVidFrameTime += 1
vidWriter.write(singleImageRGB)
I'm working in python with a large dataset of images, each 3600x1800. There are ~360000 images in total. Each image is added to the stack one by one since an initial processing step is run on each image. H5py has proven effective at building the stack, image by image, without it filling the entire memory.
The analysis I am running is calculated on the grid cells - so on 1x1x360000 slices of the stack. Since the analysis of each slice depends on the max and min values of within that slice, I think it is necessary to hold the 360000-long array in memory. I have a fair bit of RAM to work with (~100GB) but not enough to hold the entire stack of 3600x1800x360000 in memory at once.
This means I need (or I think I need) a time-efficient way of accessing the 360000-long arrays. While h5py is efficient at adding each image to the stack, it seems that slicing perpendicular to the images is much, much slower (hours or more).
Am I missing an obvious method to slice the data perpendicular to the images?
Code below is a timing benchmark for 2 different slice directions:
file = "file/path/to/large/stack.h5"
t0 = time.time()
with h5py.File(file, 'r') as f:
dat = f['Merged_liqprec'][:,:,1]
print('Time = ' + str(time.time()- t0))
t1 = time.time()
with h5py.File(file, 'r') as f:
dat = f['Merged_liqprec'][500,500,:]
print('Time = ' + str(time.time()- t1))
Output:
## time to read a image slice, e.g. [:,:,1]:
Time = 0.0701
## time to read a slice thru the image stack, e.g. [500,500,:]:
Time = multiple hours, server went offline for maintenance while running
You have the right approach. Using numpy slicing notation to read the stack of interest reduces the memory footprint.
However, with this large dataset, I suspect I/O performance is going to depend on chunking: 1) Did you define chunks= when you created the dataset, and 2) if so, what is the chunked size? The chunk is the shape of the data block used when reading or writing. When an element in a chunk is accessed, the entire chunk is read from disk. As a result, you will not be able to optimize the shape for both writing images (3600x1800x1) and reading stacked slices (1x1x360000). The optimal shape for writing an image is (shape[0], shape[1], 1) and for reading a stacked slice is (1, 1, shape[2])
Tuning the chunk shape is not trivial. h5py docs recommend the chunk size should be between 10 KiB and 1 MiB, larger for larger datasets. You could start with chunks=True (h5py determines the best shape) and see if that helps.
Assuming you will create this file once, and read many times, I suggest optimizing for reading. However, creating the file may take a long time. I wrote a simple example that you can use to "tinker" with the chunk shape on a small file to observe the behavior before you work on the large file. The table below shows the effect of different chunk shapes on 2 different file sizes. The code follows the table.
Dataset shape=(36,18,360)
chunks
Writing
Reading
None
0.349
0.102
(36,18,1)
0.187
0.436
(1,1,360)
2.347
0.095
Dataset shape=(144,72,1440) (4x4x4 larger)
chunks
Writing
Reading
None
59.963
1.248
(9, 9, 180) / True
11.334
1.588
(144, 72, 100)
79.844
2.637
(10, 10, 1440)
56.945
1.464
Code below:
f = 4
a0, a1, n = f*36, f*18, f*360
c0, c1, cn = a0, a1, 100
#c0, c1, cn = 10, 10, n
arr = np.random.randint(256,size=(a0,a1))
print(f'Writing dataset shape=({a0},{a1},{n})')
start = time.time()
with h5py.File('SO_73791464.h5','w') as h5f:
# Use chunks=True for default size or use chunks=(c0,c1,cn) for user defined size
ds = h5f.create_dataset('images',dtype=int,shape=(a0,a1,n),chunks=True)
print(f'chunks={ds.chunks}')
for i in range(n):
ds[:,:,i] = arr
print(f"Time to create file:{(time.time()-start): .3f}")
start = time.time()
with h5py.File('SO_73791464.h5','r') as h5f:
ds = h5f['images']
for i in range(ds.shape[0]):
for j in range(ds.shape[1]):
dat = ds[i,j,:]
print(f"Time to read file:{(time.time()-start): .3f}")
HDF5 chunked storage improves I/O time, but can still be very slow when reading small slices (e.g., [1,1,360000]). To further improve performance, you need to read larger slices into an array as described by #Jérôme Richard in the comments below your question. You can then quickly access a single slice from the array (because it is in memory).
This answer combines the 2 techniques: 1) HDF5 chunked storage (from my first answer), and 2) reading large slices into an array and then reading a single [i,j] slice from that array. The code to create the file is very similar to the first answer. It is setup to create a dataset of shape [3600, 1800, 100] with default chunk size ([113, 57, 7] on my system). You can increase n to test with larger datasets.
When reading the file, the large slice [0] and [1] dimensions are set equal to the associated chunk shape (so I only access each chunk once). As a result, the process to read the file is "slightly more complicated" (but worth it). There are 2 loops: the 1st loop reads a large slice into an array 'dat_chunk', and the 2nd loop reads a [1,1,:] slice from 'dat_chunk' into a second array 'dat'.
Differences in timing data for 100 images is dramatic. It takes 8 seconds to read all of the data using the method below. It required 74 min with the first answer (reading each [i,j] pair directly). Clearly that is too slow. Just for fun, I increased the dataset size to 1000 images (shape=[3600, 1800, 1000]) in my test file and reran. It takes 4:33 (m:ss) to read all slices with this method. I didn't even try with the previous method (for obvious reasons). Note: my computer is pretty old & slow with a HDD, so your timing data should be faster.
Code to create the file:
a0, a1, n = 3600, 1800, 100
print(f'Writing dataset shape=({a0},{a1},{n})')
start = time.time()
with h5py.File('SO_73791464.h5','w') as h5f:
ds = h5f.create_dataset('images',dtype=int,shape=(a0,a1,n),chunks=True)
print(f'chunks={ds.chunks}')
for i in range(n):
arr = np.random.randint(256,size=(a0,a1))
ds[:,:,i] = arr
print(f"Time to create file:{(time.time()-start): .3f}")
Code to read the file using large array slices:
start = time.time()
with h5py.File('SO_73791464.h5','r') as h5f:
ds = h5f['images']
print(f'shape={ds.shape}')
print(f'chunks={ds.chunks}')
ds_i_max, ds_j_max, ds_k_max = ds.shape
ch_i_max, ch_j_max, ch_k_max = ds.chunks
i = 0
while i < ds_i_max:
i_stop = min(i+ch_i_max,ds_i_max)
print(f'i_range: {i}:{i_stop}')
j = 0
while j < ds_j_max:
j_stop = min(j+ch_j_max,ds_j_max)
print(f' j_range: {j}:{j_stop}')
dat_chunk = ds[i:i_stop,j:j_stop,:]
# print(dat_chunk.shape)
for ic in range(dat_chunk.shape[0]):
for jc in range(dat_chunk.shape[1]):
dat = dat_chunk[ic,jc,:]
j = j_stop
i = i_stop
print(f"Time to read file:{(time.time()-start): .3f}")
I've been fooling around lately with taking the webcam's video steam and giving it a pixel-dependent time delay.
A very simple example for that idea is the famous rolling shutter, but when applied in order of seconds instead of 1/100ths, it looks like this https://youtu.be/mQ0hS7l9ckY
Now, rolling shutter is fun and all, but I want something more general. I want a delay map, a (height, width, 3) shaped array that tells my how far back to go in the video. A pseudo-code for this would be
output_image[y, x, c] = video_cache[delay_map[y,x,c], y, x, c]
where the first index of the video cache is time, y,x are self-explanatory, and c is the color channel (BGR because open cv is weird).
In essence, each pixel of the output is a pixel of the video at the same position, but at a time determined by the delay map at the very same position.
Here's the solution I have now: I flattened everything, I access the video cache similar to how you unravel multi-index nonsense, and once I'm done I reshape the result into an image.
This solution works pretty fast, and I'm pretty proud of it. It almost keeps up with my webcam's frame rate (I think I average on 20 of these per second).
I think the flattening and reshaping of each frame costs me some time, and if I could get rid of those I'd get much better results.
Link to the whole file at the bottom.
Here's a skeleton of my implementation.
I have a class called CircularCacheDelayAccess. It stores a cache of video frames (with given number of frames, called cache_size in my implementation). It enables you to store frames, and get the delay-mapped frame.
Instead of pushing all the frames around each time I store a new one, I keep an index that goes around in a circle, and video[delay=3] would be found via something like cache[index-3]. Thanks to python's funny negative index tricks, I don't even have to get the positive modulo.
The delay_map is actually a float array; when I use circ_cache.getFrame I input the integer part of delay_map.flatten(), and then I use the fractional part to interpolate between frames.
class CircularCacheDelayAccess:
def __init__(self, img_shape: tuple, cache_size: int):
self.image_shape = img_shape
self.cache_size = cache_size
# some useful stuff
self.multi_index_shape = (cache_size,) + img_shape
self.image_size = int(np.prod(img_shape))
self.size = cache_size * self.image_size
# the index, going around in circles
self.cache_index = 0
self.cache = np.empty(self.size)
# raveled_image_indices is a running index over a frame; it is the same thing as writing
# y, x, c = np.mgrid[0:height, 0:width, 0:3]
# raveled_image_indices = c + 3 * (x + width * y)
# but it's a lot easier
self.raveled_image_indices = np.arange(self.image_size)
def store(self, image: np.ndarray):
# (in my implementation I check that the shape matches and raise a ValueError if it does not)
self.cache_index = (self.cache_index + 1) % self.cache_size
# since cache holds entire image frames, the start of each frame is index * image size
cIndex = self.image_size * self.cache_index
self.cache[cIndex: cIndex + self.image_size] = image.flatten()
def getFrame(self, delay_map: np.ndarray):
# delay_map may either have shape == self.image_shape, or shape = (self.image_size,)
# (more asserts, for the shape of delay_map, and to check its values do not exceed the cache size)
# (if delay_map.shape == image_shape, I flatten it. If we were already given a flattened version,
# there's no need to do so)
frame = self.cache[self.image_size * (self.cache_index - delay_map) + self.raveled_image_indices]\
.reshape(self.image_shape)
return frame
As I've already stated, this works pretty good, but I think I could get it to work better if I could just side-step the flatten and reshape steps.
Also, keeping a flattened version of an array that makes sense in its full-shaped form is pretty awkward.
And, I've mentioned the interpolation part. It felt wrong to do that in CircularCacheDelayAccess, but doing the interpolation after I getFrame twice means I need the fractional part of delay_map to be in the full-shaped form, and I need the int part flattened, which is pretty silly.\
Here are some fun examples which would probably be pretty hard to understand without seeing the video, but are still fun to look at. It looks even better with a face, but I don't think I should show my face here, so sorry about that:
horizontal rolling shutter, color delay psychedelia, my weirdest effect so far
And here is a link to the entire code, with capture and stuff if you wanna mess around with it and read the entire code.
Thanks in advance!
I know that item assignment in Dask arrays is not possible but I wonder if there is a better way of doing the following.
The (simplified) problem is as follow: I have got 100 images (2048x2048) that need to be mosaicked together to make a large 20000x20000 mosaic. These images do overlap so as a first step I compute the offsets between pairs and then their absolute position in the large mosaic. Now it comes the problem of creating the mosaic.
Currently what I am doing is sending all the small images (as dask arrays) to a delayed function that uses normal numpy slicing to build the mosaic. Issues with this approach is that is not done in parallel, it requires quite a bit of memory per mosaic since the arrays are stored in just one process/thread and there is a communication overhead because all images in dask arrays need to be transferred to a single thread. Anyway this is the pseudocode:
#delayed
def mosaic(imgs, abs_pos, output_shape):
big_image = np.zeros(output_shape)
weight_image = np.zeros(output_shape)
for i in range(len(imgs)):
im= imgs[i]
y_pos, x_pos = abs_pos[i]
y_size, x_size = im.shape
big_image[y_pos + y_size, x_pos + x_size] += im
weight_image[y_pos + y_size, x_pos + x_size] += 1
return big_image, weight_image
here imgs is a list of dask arrays, abs_pos is a lost containing the absolute position of one of the corners of each image in the mosaic and output_shape is 20000x20000.
In reality several of these mosaics need to be computed, so the task stream plot looks like the following:
dask stream
where it is building here three mosaics, red is transfer of data to the thread and green the mosaic function.
I am wondering if there is a better solution to make this more parallel.
I copied some image data to an instance on Google Cloud (8 vCPU's, 64GB memory, Tesla K80 GPU) and am running into memory problems when converting the raw data into features, and changing the data structure of the output. Eventually I'd like to use the derived features in Keras/Tensorflow neural net.
Process
After copying the data to a storage bucket, I run a build_features.py function to convert the raw data into processed data for the neural network. In this pipeline, I first take each raw image and put it into a list x (which stores the derived features).
Since I'm working with a large number of images (tens of thousands of images that are type float32 and have dimensions 250x500x3) the list x becomes quite large. Each element of x is numpy array that stores the image in shape 250x500x3.
Problem 1 - reduced memory as list x grows
I took 2 screenshots that show available memory decreasing as x grows (below). I'm eventually able to complete this step but I'm only left with a few GB of memory so I definitely want to fix this (in the future I want to work with larger data sets). How can I build features in a way where I'm not limited by the size of x?
Problem 2 - Memory error when converting x into numpy array
The step where the instance actually fails is the following:
x = np.array(x)
The failure message is:
Traceback (most recent call last):
File "build_features.py", line 149, in <module>
build_features(pipeline='9_11_2017_fan_3_lights')
File "build_features.py", line 122, in build_features
x = np.array(x)
MemoryError
How can I adjust this step so that I don't run out of memory?
Your code has two copies of every image - one in the list, and one in the array:
images = []
for i in range(many):
images[i] = load_img(i) # here's the first image
x = np.array(images) # joint them all together into a second copy
Just load the images straight into the array
x = np.zeros((many, 250, 500, 3)
for i in range(many):
x[i] = load_img(i)
Which means that you only hold a copy of one image at a time.
If you don't know the size or dtype of the image ahead of time, or don't want to hard code it, you can use:
x0 = load_img(0)
x = np.zeros((many,) + x0.shape, x0.dtype)
for i in range(1, many):
x[i] = load_img(i)
Having said that, you're on a tricky path here. If you don't have enough room to store your dataset twice in memory, you also don't have room to compute y = x + 1.
You might want to consider using np.float16 to buy more storage, at the cost of precision