operations on arrays in python from memory perspective - python

I am trying to understand the memory allocation in the following operation:
x_batch,images_path,ImageValidStatus = tf_resize_images(path_list, img_type=col_mode, im_size=IMAGE_SIZE)
x_batch=x_batch/255;
x_batch = 1.0-x_batch
x_batch = x_batch.reshape(x_batch.shape[0],IMAGE_SIZE[0]*IMAGE_SIZE[1]*IMAGE_SIZE[2])
what I am interested in is the x_batch, this is a multi-dim numpy array (100x64x64x3)
where 100 is the number of images and 64x64x3 is the dimensions of the image.
what is the maximum number of copies of the images located inside the memory at one point of time.
in other words, how exactly the operations (x_batch/255) , (1-x_batch) and x_batch.reshape are happening from memory perspective.
my main concern is in some cases I am trying to process 500K images at the same time, if I will make multiple copies of these images in the memory, it will be very difficult to fit everything in the memory.

I see "tf" in your code, so I am unsure if you are asking about tensors or arrays. Lets assume you are asking about arrays. In general, arrays are written to memory once and then manipulated. For example,
import numpy as np
data = np.empty((1000,30,30,5)) #This took up 1000*30*30*5*dtype_size bytes (plus epsilon).
data.reshape((1000,30,150)) #Does nothing but update how numpy accesses the array.
data += 1 #Adds one to all the entries in the array.
data = 1-data #Overwrites the array with the data of 1-data.
x = data + 1 #Re-allocates and copies the whole memory.
As long as you don't change the array size (re-allocate the memory), then numpy operates very quickly and efficiently on the data. Not as good as tensorflow, but very very fast. In place adding, functions, operations, are all done without using more memory. Things like appending to the array could cause problems and make python rewrite the array in memory.

Related

This 5-d numpy array takes up hardly no RAM if shuffled

I have some images stored in chunks of 80 in a numpy array.
trainImages.shape
# (715, 80, 96, 96, 3)
For example trainImages has got 715 chunks with each 80 images of size (96,96,3).
The array is dtype=float32 so it takes up quite a lot of space in RAM, approximately 6GB.
I shuffle the chunks with this code
shuffler = np.random.permutation(trainImages.shape[0])
trainImages = trainImages[shuffler]
I notice that the RAM usage drops to almost 0. The shape is still the same and I can display the images. All and all the array looks fine but it hardly takes up any RAM after the shuffle. How can it be?
I'm using Google Colab Pro with 25GB of RAM and I monitor the RAM usage from the indicator at the top.
You can easily reproduce this behavior by pasting this code in a Colab notebook
import numpy as np
a = np.random.rand(715, 80, 96, 96, 3).astype(np.float32)
shuffler = np.random.permutation(a.shape[0])
a = a[shuffler]
I've also tried to shuffle the same array, but reshaped to (57200,96,96,1) so I was suffling every image. In this case I didn't notice any change in RAM usage, as expected.
It looks like you are running out of memory. Generally when slicing in numpy, the result is a view which does not take up a lot of memory. However when slicing with a boolean mask or a random array of integers there is no regularity so numpy will not return a view but rather a copy. In the line:
a = a[shuffler]
Python will first allocate a new 6GB array, then copy the data according to the index in shuffler to the new array and finally reassign the new array to the old one releasing the memory of the old one. However it will have to have 12GB allocated at one point! I suspect that Colab, only having ~13GB kills the python kernel that tries to allocate more memory than allowed. As a result, you see the RAM dropping to 0.
I am not quite sure why the reshaped array works, perhaps when you tested it, you had more available memory so you just barely managed to fit into the 13GB.
However what you can do to shuffle the samples more memory efficiently is to use
np.random.shuffle(a)
This method will shuffle the first axis of your data in place which should prevent the issue with memory overflow.
If you need to shuffle the first axis of two different arrays in a consistent order (for example the input features x and output labels y), you can set a seed before each suffle ensuring that the two shuffles are equivalent:
np.random.seed(42)
np.random.shuffle(x)
np.random.seed(42)
np.random.shuffle(y)

Python - Store individual numpy matrices to minimize memory on disk and loading time. Binary files?

I wrote some code to generate a large dataset of complex numpy matrices for ML applications, which I would like to somehow store on disk.
The most suitable idea seems to be saving the matrices into separate binary files. However, commands such as bytearray() seems to flatten the matrices into 1D arrays, thus losing the information over the matrix shape.
I guess I might need to fill each line independently, maybe using an additional for loop, but this would also require a for loop when loading and re-assembling the matrix.
What would be the correct procedure for storing those matrices in a way that minimizes the amount of space on disk and loading time?

A quick way to randomly read many rows of a large numpy matrix with h5py

I'd like to read 2048 randomly chosen rows of a stored numpy matrix of column size 200 within 100ms. So far I've tried with h5py. In my case, contiguous mode works faster than chunks, and for various other reasons I'm trying with the former. Writing (in a certain more orderly way) is very fast (~3ms); unfortunately, reading 2048 randomly chosen rows takes about 250ms. The reading part I'm trying is as follows:
a = f['/test']
x = []
for i in range(2048):
r = random.randint(1,2048)
x.append(a[[r],...])
x = np.concatenate(x, 0)
Clearly, the speed bottleneck is due to accessing 'a' for 2048 times because I don't know whether there exists an one-shot way of accessing to random rows or not. np.concatenate consumes a negligible amount of time. Since the matrix eventually reaches to the size of (2048*100k, 200), I probably can't use a method other than contiguous h5py. I've tried with a smaller maximal matrix size, but it didn't affect the computation time at all. For reference, the following is the entire task I'm trying to achieve as a part of deep reinforcement learning algorithm:
Generate a numpy array of size (2048, 200)
Write it onto the next available 2048 rows in an extendable list (None, 200)
Randomly pick 2048 rows from the filled rows of the extendable list (irrespective of the generated chunk in the step 1)
Read the picked rows
Continue 1-4 for 100k times (so the total list size becomes (2048*100k, 200))
If rows can be selected more than once, I would try with:
random.choices(a, k=2048)
Otherwise, using:
random.sample(a, 2048)
Both methods will return a list of numpy arrays if a is an numpy ndarray.
Furthermore, if a is already a numpy array why not take advantage of numpy's slicing capabilities and shorten your code to:
x.append(a[np.randint(1, 2048, 2048)])
That way a is still accessed multiple time, but it is all done in optimized C code, which should be faster.
Hope those points you in the right direction.

How to efficiently work with large complex numpy arrays?

For my research I am working with large numpy arrays consisting of complex data.
arr = np.empty((15000, 25400), dtype='complex128')
np.save('array.npy'), arr)
When stored they are about 3 GB each. Loading these arrays is a time consuming process, which made me wonder if there are ways to speed this process up
One of the things I was thinking of was splitting the array into its complex and real part:
arr_real = arr.real
arr_im = arr.imag
and saving each part separately. However, this didn't seem to improve processing speed significantly. There is some documentation about working with large arrays, but I haven't found much information on working with complex data. Are there smart(er) ways to work with large complex arrays?
If you only need parts of the array in memory, you can load it using memory mapping:
arr = np.load('array.npy', mmap_mode='r')
From the docs:
A memory-mapped array is kept on disk. However, it can be accessed and
sliced like any ndarray. Memory mapping is especially useful for
accessing small fragments of large files without reading the entire
file into memory.

Techniques for working with large Numpy arrays? [duplicate]

This question already has answers here:
Very large matrices using Python and NumPy
(11 answers)
Closed 2 years ago.
There are times when you have to perform many intermediate operations on one, or more, large Numpy arrays. This can quickly result in MemoryErrors. In my research so far, I have found that Pickling (Pickle, CPickle, Pytables etc.) and gc.collect() are ways to mitigate this. I was wondering if there are any other techniques experienced programmers use when dealing with large quantities of data (other than removing redundancies in your strategy/code, of course).
Also, if there's one thing I'm sure of is that nothing is free. With some of these techniques, what are the trade-offs (i.e., speed, robustness, etc.)?
I feel your pain... You sometimes end up storing several times the size of your array in values you will later discard. When processing one item in your array at a time, this is irrelevant, but can kill you when vectorizing.
I'll use an example from work for illustration purposes. I recently coded the algorithm described here using numpy. It is a color map algorithm, which takes an RGB image, and converts it into a CMYK image. The process, which is repeated for every pixel, is as follows:
Use the most significant 4 bits of every RGB value, as indices into a three-dimensional look up table. This determines the CMYK values for the 8 vertices of a cube within the LUT.
Use the least significant 4 bits of every RGB value to interpolate within that cube, based on the vertex values from the previous step. The most efficient way of doing this requires computing 16 arrays of uint8s the size of the image being processed. For a 24bit RGB image that is equivalent to needing storage of x6 times that of the image to process it.
A couple of things you can do to handle this:
1. Divide and conquer
Maybe you cannot process a 1,000x1,000 array in a single pass. But if you can do it with a python for loop iterating over 10 arrays of 100x1,000, it is still going to beat by a very far margin a python iterator over 1,000,000 items! It´s going to be slower, yes, but not as much.
2. Cache expensive computations
This relates directly to my interpolation example above, and is harder to come across, although worth keeping an eye open for it. Because I am interpolating on a three-dimensional cube with 4 bits in each dimension, there are only 16x16x16 possible outcomes, which can be stored in 16 arrays of 16x16x16 bytes. So I can precompute them and store them using 64KB of memory, and look-up the values one by one for the whole image, rather than redoing the same operations for every pixel at huge memory cost. This already pays-off for images as small as 64x64 pixels, and basically allows processing images with x6 times the amount of pixels without having to subdivide the array.
3. Use your dtypes wisely
If your intermediate values can fit in a single uint8, don't use an array of int32s! This can turn into a nightmare of mysterious errors due to silent overflows, but if you are careful, it can provide a big saving of resources.
First most important trick: allocate a few big arrays, and use and recycle portions of them, instead of bringing into life and discarding/garbage collecting lots of temporary arrays. Sounds a little bit old-fashioned, but with careful programming speed-up can be impressive. (You have better control of alignment and data locality, so numeric code can be made more efficient.)
Second: use numpy.memmap and hope that OS caching of accesses to the disk are efficient enough.
Third: as pointed out by #Jaime, work un block sub-matrices, if the whole matrix is to big.
EDIT:
Avoid unecessary list comprehension, as pointed out in this answer in SE.
The dask.array library provides a numpy interface that uses blocked algorithms to handle larger-than-memory arrays with multiple cores.
You could also look into Spartan, Distarray, and Biggus.
If it is possible for you, use numexpr. For numeric calculations like a**2 + b**2 + 2*a*b (for a and b being arrays) it
will compile machine code that will execute fast and with minimal memory overhead, taking care of memory locality stuff (and thus cache optimization) if the same array occurs several times in your expression,
uses all cores of your dual or quad core CPU,
is an extension to numpy, not an alternative.
For medium and large sized arrays, it is faster that numpy alone.
Take a look at the web page given above, there are examples that will help you understand if numexpr is for you.
On top of everything said in other answers if we'd like to store all the intermediate results of the computation (because we don't always need to keep intermediate results in memory) we can also use accumulate from numpy after various types of aggregations:
Aggregates
For binary ufuncs, there are some interesting aggregates that can be computed directly from the object. For example, if we'd like to reduce an array with a particular operation, we can use the reduce method of any ufunc. A reduce repeatedly applies a given operation to the elements of an array until only a single result remains.
For example, calling reduce on the add ufunc returns the sum of all elements in the array:
x = np.arange(1, 6)
np.add.reduce(x) # Outputs 15
Similarly, calling reduce on the multiply ufunc results in the product of all array elements:
np.multiply.reduce(x) # Outputs 120
Accumulate
If we'd like to store all the intermediate results of the computation, we can instead use accumulate:
np.add.accumulate(x) # Outputs array([ 1, 3, 6, 10, 15], dtype=int32)
np.multiply.accumulate(x) # Outputs array([ 1, 2, 6, 24, 120], dtype=int32)
Wisely using these numpy operations while performing many intermediate operations on one, or more, large Numpy arrays can give you great results without usage of any additional libraries.

Categories