Not sure why memory usage seen in `top` is unintuitive

Not sure why memory usage seen in `top` is unintuitive - python

I am working with a simple numpy.dtype array and I am using numpy.savez and numpy.load methods to save the array to a disk and read it from the disk. During both storing and loading of the array, the memory usage as shown by 'top' doesn't appear to be what it should be like. Below is a sample code that demonstrates this.
import sys
import numpy as np
import time
RouteEntryNP = np.dtype([('final', 'u1'), ('prefix_len', 'u1'),
('output_idx', '>u4'), ('children', 'O')])
a = np.zeros(1000000, RouteEntryNP)
time.sleep(10)
print(sys.getsizeof(a))
with open('test.np.npz', 'wb+') as f:
np.savez(f, a=a)
while True:
time.sleep(10)
Program starts with memory usage of 25M - somewhat closer to intuition - the actual size of members of RouteEntryNP is 14 bytes - so 25M is somewhat closer to intuition. But as the data is being written to the file - the memory usage shoots up to approx 250M.
A similar behavior is observed when loading the file, in this case the memory usage shoots up to approximately 160M and explicit gc.collect() doesn't seem to help as well. The way I am reading the file is as follows.
import numpy as np
np.load('test.np.npz')
import gc
gc.collect()
The memory usage stays # 160M. Not sure why this is happening. Is there a way to 'reclaim' this memory?

Related

Is there a way to examine how much memory an image is occupying with python?

pillow provides size to examine the resolution of an image.
>> from PIL import Image
>> img = Image.open('Lenna.png')
>> img.size
(512, 512)
is there a way to examine how many memory the image is occupying? is the image using 512*512*4 Bytes memory?

You could use the sys library to get the size of an object in bytes. The difference with Kai's answer is that he's calculating the size of the image on the disk, while this calculates the size of the loaded python object (with all its metadata):
import sys
sys.getsizeof(img)
EDIT: After seeing this website, sys.getsizeof() seems to work mainly for primitive types.
You could have a look at a more thorough implementation (deep_getsizeof()) here .
This post gives also a lot of details.
And finally, there is also the pympler library that provides tools to calculate the RAM memory used by an object.
from pympler import asizeof
asizeof.asizeof(img)

import os
print os.stat('somefile.ext').st_size
or
import os
os.path.getsize('path_to_file.jpg')`

What's an intelligent way of loading a compressed array completely from disk into memory - also (indentically) compressed?

I am experimenting with a 3-dimensional zarr-array, stored on disk:
Name: /data
Type: zarr.core.Array
Data type: int16
Shape: (102174, 1100, 900)
Chunk shape: (12, 220, 180)
Order: C
Read-only: True
Compressor: Blosc(cname='zstd', clevel=3, shuffle=BITSHUFFLE, blocksize=0)
Store type: zarr.storage.DirectoryStore
No. bytes: 202304520000 (188.4G)
No. bytes stored: 12224487305 (11.4G)
Storage ratio: 16.5
Chunks initialized: 212875/212875
As I understand it, zarr-arrays can also reside in memory - compressed, as if they were on disk. So I thought why not try to load the entire thing into RAM on a machine with 32 GByte memory. Compressed, the dataset would require approximately 50% of RAM. Uncompressed, it would require about 6 times more RAM than available.
Preparation:
import os
import zarr
from numcodecs import Blosc
import tqdm
zpath = '...' # path to zarr data folder
disk_array = zarr.open(zpath, mode = 'r')['data']
c = Blosc(cname = 'zstd', clevel=3, shuffle = Blosc.BITSHUFFLE)
memory_array = zarr.zeros(
disk_array.shape, chunks = disk_array.chunks,
dtype = disk_array.dtype, compressor = c
)
The following experiment fails almost immediately with an out of memory error:
memory_array[:, :, :] = disk_array[:, :, :]
As I understand it, disk_array[:, :, :] will try to create an uncompressed, full-size numpy array, which will obviously fail.
Second attempt, which works but is agonizingly slow:
chunk_lines = disk_array.chunks[0]
chunk_number = disk_array.shape[0] // disk_array.chunks[0]
chunk_remain = disk_array.shape[0] % disk_array.chunks[0] # unhandled ...
for chunk in tqdm.trange(chunk_number):
chunk_slice = slice(chunk * chunk_lines, (chunk + 1) * chunk_lines)
memory_array[chunk_slice, :, :] = disk_array[chunk_slice, :, :]
Here, I am trying to reads a certain number of chunks at a time and put them into my in-memory array. It works, but it is about 6 to 7 times slower than what it took to write this thing to disk in the first place. EDIT: Yes, it's still slow, but the 6 to 7 times happened due to a disk issue.
What's an intelligent and fast way of achieving this? I'd guess, besides not using the right approach, my chunks might also be too small - but I am not sure.
EDIT: Shape, chunk size and compression are supposed to be identical for the on-disk array and the in-memory array. It should therefore be possible to eliminate the decompress-compress procedure in my example above.
I found zarr.convenience.copy but it is marked as an experimental feature, subject to further change.
Related issue on GitHub

You could conceivably try with fsspec.implementations.memory.MemoryFileSystem, which has a .make_mapper() method, with which you can make the kind of object expected by zarr.
However, this is really just a dict of path:io.BytesIO, which you could make yourself, if you want.

There are a couple of ways one might solve this issue today.
Use LRUStoreCache to cache (some) compressed data in memory.
Coerce your underlying store into a dict and use that as your store.
The first option might be appropriate if you only want some frequently used data in-memory. Of course how much you load into memory is something you can configure. So this could be the whole array. This will only happen with data on-demand, which may be useful for you.
The second option just creates a new in-memory copy of the array by pulling all of the compressed data from disk. The one downside is if you intend to write back to disk this will be something you need to do manually, but it is not too difficult. The update method is pretty handy for facilitating this copying of data between different stores.

numpy memmap memory usage - want to iterate once

let say I have some big matrix saved on disk. storing it all in memory is not really feasible so I use memmap to access it
A = np.memmap(filename, dtype='float32', mode='r', shape=(3000000,162))
now let say I want to iterate over this matrix (not essentially in an ordered fashion) such that each row will be accessed exactly once.
p = some_permutation_of_0_to_2999999()
I would like to do something like that:
start = 0
end = 3000000
num_rows_to_load_at_once = some_size_that_will_fit_in_memory()
while start < end:
indices_to_access = p[start:start+num_rows_to_load_at_once]
do_stuff_with(A[indices_to_access, :])
start = min(end, start+num_rows_to_load_at_once)
as this process goes on my computer is becoming slower and slower and my RAM and virtual memory usage is exploding.
Is there some way to force np.memmap to use up to a certain amount of memory? (I know I won't need more than the amount of rows I'm planning to read at a time and that caching won't really help me since I'm accessing each row exactly once)
Maybe instead is there some other way to iterate (generator like) over a np array in a custom order? I could write it manually using file.seek but it happens to be much slower than np.memmap implementation
do_stuff_with() does not keep any reference to the array it receives so no "memory leaks" in that aspect
thanks

This has been an issue that I've been trying to deal with for a while. I work with large image datasets and numpy.memmap offers a convenient solution for working with these large sets.
However, as you've pointed out, if I need to access each frame (or row in your case) to perform some operation, RAM usage will max out eventually.
Fortunately, I recently found a solution that will allow you to iterate through the entire memmap array while capping the RAM usage.
Solution:
import numpy as np
# create a memmap array
input = np.memmap('input', dtype='uint16', shape=(10000,800,800), mode='w+')
# create a memmap array to store the output
output = np.memmap('output', dtype='uint16', shape=(10000,800,800), mode='w+')
def iterate_efficiently(input, output, chunk_size):
# create an empty array to hold each chunk
# the size of this array will determine the amount of RAM usage
holder = np.zeros([chunk_size,800,800], dtype='uint16')
# iterate through the input, replace with ones, and write to output
for i in range(input.shape[0]):
if i % chunk_size == 0:
holder[:] = input[i:i+chunk_size] # read in chunk from input
holder += 5 # perform some operation
output[i:i+chunk_size] = holder # write chunk to output
def iterate_inefficiently(input, output):
output[:] = input[:] + 5
Timing Results:
In [11]: %timeit iterate_efficiently(input,output,1000)
1 loop, best of 3: 1min 48s per loop
In [12]: %timeit iterate_inefficiently(input,output)
1 loop, best of 3: 2min 22s per loop
The size of the array on disk is ~12GB. Using the iterate_efficiently function keeps the memory usage to 1.28GB whereas the iterate_inefficiently function eventually reaches 12GB in RAM.
This was tested on Mac OS.

I've been experimenting with this problem for a couple days now and it appears there are two ways to control memory consumption using np.mmap. The first is reliable while the second would require some testing and will be OS dependent.
Option 1 - reconstruct the memory map with each read / write:
def MoveMMapNPArray(data, output_filename):
CHUNK_SIZE = 4096
for idx in range(0,x.shape[1],CHUNK_SIZE):
x = np.memmap(data.filename, dtype=data.dtype, mode='r', shape=data.shape, order='F')
y = np.memmap(output_filename, dtype=data.dtype, mode='r+', shape=data.shape, order='F')
end = min(idx+CHUNK_SIZE, data.shape[1])
y[:,idx:end] = x[:,idx:end]
Where data is of type np.memmap. This discarding of the memmap object with each read keeps the array from being collected into memory and will keep memory consumption very low if the chunk size is low. It likely introduces some CPU overhead but was found to be small on my setup (MacOS).
Option 2 - construct the mmap buffer yourself and provide memory advice
If you look at the np.memmap source code here, you can see that it is relatively simple to create your own memmapped numpy array relatively easily. Specifically, with the snippet:
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
mmap_np_array = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm, offset=array_offset, order=order)
Note this python mmap instance is stored as the np.memmap's private _mmap attribute.
With access to the python mmap object, and python 3.8, you can use its madvise method, described here.
This allows you to advise the OS to free memory where available. The various madvise constants are described here for linux, with some generic cross platform options specified.
The MADV_DONTDUMP constant looks promising but I haven't tested memory consumption with it like I have for option 1.

Load .npy file with np.load progress bar

I have a really large .npy file (previously saved with np.save) and I am loading it with:
np.load(open('file.npy'))
Is there any way to see the progress of the loading process? I know tqdm and some other libraries for monitoring the progress but don't how to use them for this problem.
Thank you!

As far I am aware, np.load does not provide any callbacks or hooks to monitor progress. However, there is a work around which may work: np.load can open the file as a memory-mapped file, which means the data stays on disk and is loaded into memory only on demand. We can abuse this machinery to manually copy the data from the memory mapped file into actual memory using a loop whose progress can be monitored.
Here is an example with a crude progress monitor:
import numpy as np
x = np.random.randn(8096, 4096)
np.save('file.npy', x)
blocksize = 1024 # tune this for performance/granularity
try:
mmap = np.load('file.npy', mmap_mode='r')
y = np.empty_like(mmap)
n_blocks = int(np.ceil(mmap.shape[0] / blocksize))
for b in range(n_blocks):
print('progress: {}/{}'.format(b, n_blocks)) # use any progress indicator
y[b*blocksize : (b+1) * blocksize] = mmap[b*blocksize : (b+1) * blocksize]
finally:
del mmap # make sure file is closed again
assert np.all(y == x)
Plugging any progress-bar library into the loop should be straight forward.
I was unable to test this with exceptionally large arrays due to memory constraints, so I can't really tell if this approach has any performance issues.

What are the different use cases of joblib versus pickle?

Background: I'm just getting started with scikit-learn, and read at the bottom of the page about joblib, versus pickle.
it may be more interesting to use joblib’s replacement of pickle (joblib.dump & joblib.load), which is more efficient on big data, but can only pickle to the disk and not to a string
I read this Q&A on Pickle,
Common use-cases for pickle in Python and wonder if the community here can share the differences between joblib and pickle? When should one use one over another?

joblib is usually significantly faster on large numpy arrays because it has a special handling for the array buffers of the numpy datastructure. To find about the implementation details you can have a look at the source code. It can also compress that data on the fly while pickling using zlib or lz4.
joblib also makes it possible to memory map the data buffer of an uncompressed joblib-pickled numpy array when loading it which makes it possible to share memory between processes.
if you don't pickle large numpy arrays, then regular pickle can be significantly faster, especially on large collections of small python objects (e.g. a large dict of str objects) because the pickle module of the standard library is implemented in C while joblib is pure python.
since PEP 574 (Pickle protocol 5) has been merged in Python 3.8, it is now much more efficient (memory-wise and cpu-wise) to pickle large numpy arrays using the standard library. Large arrays in this context means 4GB or more.
But joblib can still be useful with Python 3.8 to load objects that have nested numpy arrays in memory mapped mode with mmap_mode="r".

Thanks to Gunjan for giving us this script! I modified it for Python3 results
#comapare pickle loaders
from time import time
import pickle
import os
import _pickle as cPickle
from sklearn.externals import joblib
file = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'database.clf')
t1 = time()
lis = []
d = pickle.load(open(file,"rb"))
print("time for loading file size with pickle", os.path.getsize(file),"KB =>", time()-t1)
t1 = time()
cPickle.load(open(file,"rb"))
print("time for loading file size with cpickle", os.path.getsize(file),"KB =>", time()-t1)
t1 = time()
joblib.load(file)
print("time for loading file size joblib", os.path.getsize(file),"KB =>", time()-t1)
time for loading file size with pickle 79708 KB => 0.16768312454223633
time for loading file size with cpickle 79708 KB => 0.0002372264862060547
time for loading file size joblib 79708 KB => 0.0006849765777587891

I came across same question, so i tried this one (with Python 2.7) as i need to load a large pickle file
#comapare pickle loaders
from time import time
import pickle
import os
try:
import cPickle
except:
print "Cannot import cPickle"
import joblib
t1 = time()
lis = []
d = pickle.load(open("classi.pickle","r"))
print "time for loading file size with pickle", os.path.getsize("classi.pickle"),"KB =>", time()-t1
t1 = time()
cPickle.load(open("classi.pickle","r"))
print "time for loading file size with cpickle", os.path.getsize("classi.pickle"),"KB =>", time()-t1
t1 = time()
joblib.load("classi.pickle")
print "time for loading file size joblib", os.path.getsize("classi.pickle"),"KB =>", time()-t1
Output for this is
time for loading file size with pickle 1154320653 KB => 6.75876188278
time for loading file size with cpickle 1154320653 KB => 52.6876490116
time for loading file size joblib 1154320653 KB => 6.27503800392
According to this joblib works better than cPickle and Pickle module from these 3 modules. Thanks

Just a humble note ...
Pickle is better for fitted scikit-learn estimators/ trained models. In ML applications trained models are saved and loaded back up for prediction mainly.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Not sure why memory usage seen in `top` is unintuitive - python

Related

Is there a way to examine how much memory an image is occupying with python?

What's an intelligent way of loading a compressed array completely from disk into memory - also (indentically) compressed?

numpy memmap memory usage - want to iterate once

Load .npy file with np.load progress bar

What are the different use cases of joblib versus pickle?

Categories

Resources