I got the following code:
import sys
from sentence_transformers import InputExample
from lib import DataLoader as DL
def load_train_data():
train_sentences = DL.load_entire_corpus(data_corpus_path) # Loading from 9GB files
# 16GB of memory is now allocated by this process
train_data = []
for i in range(len(train_sentences)):
s = train_sentences.pop() # Use pop to release item for garbage collector
train_data.append(InputExample(texts=[s, s])) # Problem is around here I guess
return train_data
train_data = load_train_data()
The files loaded in DL.load_entire_corpus contain lists of sentences.
The code crashes because more than 32GB of RAM are being allocated during the process. Until the for-loop around 16GB is being allocated. During the for loop it raises until 32GB which leads to a crash or a hanging system.
print(sys.getsizeof(train_sentences) + sys.getsizeof(train_data)) within the for loop is never over 10GB. There is no other process that can allocate RAM.
What am I missing?
getsizeof() generally goes only one level deep. For a list, it returns a measure of the bytes consumed by the list structure itself, but not bytes consumed by the objects the list contains. For example,
>>> import sys
>>> x = list(range(2000))
>>> sys.getsizeof(x)
16056
>>> for i in range(len(x)):
... x[i] = None
>>> sys.getsizeof(x)
16056
See? The result doesn't change regardless of what the list contains. The only thing that matters to getsizeof() is len(the_list).
So the "missing" RAM is almost certainly being consumed by the InputExample(texts=[s, s]) objects you're appending to train_data.
Related
I have a class that has a cache implemented as a dict for numpy arrays, which can occupy GBs of data.
class WorkOperations(object):
def __init__(self):
self.data_cache: Dict[str, Dict[str, Tuple[np.ndarray, np.ndarray]]] = {}
def get_data(key):
if key not in data_cache:
add_data(key)
return self.data_cache[key]
def add_data(key)
result = run_heavy_calculation(key)
self.data_cache[key] = result
I am testing the code with this function -
import gc
def perform_operations()
work_operations = WorkOperations()
# input_keys gives a list of keys to process
for keys in input_keys():
data = work_operations.get_data(key)
do_some_operation(data)
del work_operations
perform_operations()
gc.collect()
The result of run_heavy_calculation is heavy in memory and soon data_cache grows and occupies memory in GBs (which is expected).
But memory does not get released even after perform_operations() is done. I tried adding del work_operations and invoking gc.collect() but that did not help either. I checked memory of the process after several hours, but the memory was still not freed up.
If I don't use caching (data_cache) at all (at the cost of latency), memory never goes high.
I am wondering what is it that is taking memory. I tried running tracemalloc, but it just showed lines occupying memory in KBs. I also took a memory dump with gdb by looking at memory address from process pmap and /proc/<pid>/smaps, but that is really long and even with hexeditor I couldn't figure out much.
I am measuring memory used by the process using top command and looking at RES. I also tried outputting memory in the end from within python process as well with -
import psutils
import gc
import logging
GIGABYTE = 1024.0 * 1024.0 * 1024.0
perform_operations()
gc.collect()
memory_full_info = psutil.Process().memory_full_info()
logging.info(f"process memory after running the process {memory_full_info.uss / GIGABYTE}")
Could not reproduce on Ubuntu with this :
import itertools
import time
import os
from typing import Dict, Tuple
import numpy as np
import psutil # installed with pip
process = psutil.Process(os.getpid())
SIZE = 10**7
def run_heavy_calculation(key):
array1 = np.zeros(SIZE)
array2 = np.zeros(SIZE)
# Linux use "virtual memory", which means that the memory required for the arrays was allocated, but will not
# be actually deducted until we use them, so we write some 1 into them
# cf: https://stackoverflow.com/q/29516888/11384184
for i in range(0, SIZE, 1000):
array1[i] = 1
array2[i] = 1
return {key: (array1, array2)}
class WorkOperations(object):
def __init__(self):
self.data_cache: Dict[str, Dict[str, Tuple[np.ndarray, np.ndarray]]] = {}
def get_data(self, key):
if key not in self.data_cache:
self.add_data(key)
return self.data_cache[key]
def add_data(self, key):
result = run_heavy_calculation(key)
self.data_cache[key] = result
def perform_operations(input_keys):
work_operations = WorkOperations()
for key in input_keys():
data = work_operations.get_data(key)
time.sleep(0.2)
print(key, process.memory_info().rss / 10**9)
del work_operations
perform_operations(lambda: map(str, itertools.product("abcdefgh", "0123456789"))) # dummy keys
print("after operations", process.memory_info().rss / 10**9)
input("pause")
('a', '0') 0.113106944
('a', '1') 0.195014656
('a', '2') 0.276926464
...
('h', '7') 6.421118976
('h', '8') 6.503030784
('h', '9') 6.584942592
after operations 0.031363072
pause
It climbed up to using 6,5 Go of RAM, then returned from the function and all of it was released.
You can add a finalizer (__del__) to the class WorkOperations :
def __del__(self):
print("deleted")
I see it printed between the last operation's print and the one after.
Although it does not guarantee that this will always be the case (cf this question), it strongly indicates that everything is working as intended : event without the del, the functions returns (hence the scope is lost), so the reference count for work_operations gets to 0, and then gets GC'ed.
It can be checked with sys.getrefcount too :
print(sys.getrefcount(work_operations) - 1) # see https://stackoverflow.com/a/510417/11384184
del work_operations
which for me print 1.
Please provide a Minimal Reproducible Example and info on your system.
In my code, I create a list of objects inside a for-loop. Something like this: (The get_size functions was obtained from here: How do I determine the size of an object in Python?)
def mem_now():
mem = psutil.Process(os.getpid()).memory_info()
mem = mem[0]/(1024.**2)
return mem
class lista():
def __init__(self, val):
self.x = val
self.y = val**2
self.z = val - 1
intlst = []
objlst = []
for i in range(int(1e5)):
g = lista(i)
objlst.append(g)
intlst.append(g.z)
print 'int_list_size = {:.2f} Mb'.format(get_size(intlst)/1e6)
print 'obj_list_size = {:.2f} Mb'.format(get_size(objlst)/1e6)
print 'mem_now (no del) = {:.2f} Mb'.format(mem_now())
del intlst
print 'mem_now (del int) = {:.2f} Mb'.format(mem_now())
del objlst
print 'mem_now (del obj) = {:.2f} Mb'.format(mem_now())
and the output is as follows:
mem_now = 45.11 Mb
int_list_size = 3.22 Mb
obj_list_size = 43.22 Mb
mem_now (no del) = 103.35 Mb
mem_now (del int) = 103.35 Mb
mem_now (del obj) = 69.10 Mb
As you can see, after deleting the lists, I still haven't cleared all the memory. The more attributes and the more iterations I run, the more the final memory value is. I want to complete delete all the objects I created, how can I do that?
I want to complete delete all the objects I created, how can I do that?
You did. You don't even need to use del (all that del does is remove a reference; the underlying object is cleaned up when there are no more references to it).
Consider a simpler example:
def alloc():
global x
x = list(range(1000000))
for i in range(1000):
alloc()
We repeatedly create the list from scratch in a loop. Even though we did nothing to "delete" the list from previous iterations (you do not need to, and should not try), we do not run out of memory - because the cleanup is automatic. (If you actually did run out of memory, Python would raise MemoryError for a failed allocation.) x takes about 36 megabytes to represent on my system (8 for the list, and 28 for the integers); I do not have 36 gigabytes of RAM, but there is no problem. It just takes a bit of time (about 19 seconds on my system); memory usage does not creep upwards over time.
When I then del x, on my system the process memory usage - as reported by Task Manager - promptly drops. Here's the thing, though: it might not do so consistently, or at all, or immediately, or it may depend on what tool you use to check. It's likely to depend on your operating system and other platform details. In any event, it's not something you could dream of controlling completely from Python, and I suspect a lot of the time you couldn't even do it from assembly. Don't worry about it. The operating system knows better than you how to assign memory to processes. That's why your computer has one.
How to get the amount of memory which has been used by a single process in windows platform with psutil library? (I dont want to have the percentage , I want to know the amount in bytes)
We can use:
psutil.virtual_memory().used
To find the memory usage of the whole OS in bytes, but how about each single process?
Thanks,
Call memory_info_ex:
>>> import psutil
>>> p = psutil.Process()
>>> p.name()
'python.exe'
>>> _ = p.memory_info_ex()
>>> _.wset, _.pagefile
(11665408, 8499200)
The working set includes pages that are shared or shareable by other processes, so in the above example it's actually larger than the paging file commit charge.
There's also a simpler memory_info method. This returns rss and vms, which correspond to wset and pagefile.
>>> p.memory_info()
pmem(rss=11767808, vms=8589312)
For another example, let's map some shared memory.
>>> import mmap
>>> m = mmap.mmap(-1, 10000000)
>>> p.memory_info()
pmem(rss=11792384, vms=8609792)
The mapped pages get demand-zero faulted into the working set.
>>> for i in range(0, len(m), 4096): m[i] = 0xaa
...
>>> p.memory_info()
pmem(rss=21807104, vms=8581120)
A private copy incurs a paging file commit charge:
>>> s = m[:]
>>> p.memory_info()
pmem(rss=31830016, vms=18604032)
I am currently trying to store some data into .h5 files, I quickly realised that might have to store my data into parts, as it is not possible to process it an have in my ram. I started out using numpy.array to compress the memory usage, but that resulted in days spend on formatting data.
So i went back to use list, but made the program monitor the memory usage,
when it was above a specified value, will a part be stored, as a numpy format - such that a another process can load it and make use of it. Problem with doing this, is that what I thought would release my memory isn't releasing the memory. For some reason is the memory the same even though I reset the variable and del the variable. Why isn't the memory being released here?
import numpy as np
import os
import resource
import sys
import gc
import math
import h5py
import SecureString
import objgraph
from numpy.lib.stride_tricks import as_strided as ast
total_frames = 15
total_frames_with_deltas = total_frames*3
dim = 40
window_height = 5
def store_file(file_name,data):
with h5py.File(file_name,'w') as f:
f["train_input"] = np.concatenate(data,axis=1)
def load_data_overlap(saved):
#os.chdir(numpy_train)
print "Inside function!..."
if saved == False:
train_files = np.random.randint(255,size=(1,40,690,4))
train_input_data_interweawed_normalized = []
print "Storing train pic to numpy"
part = 0
for i in xrange(100000):
print resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
if resource.getrusage(resource.RUSAGE_SELF).ru_maxrss > 2298842112/10:
print "Max ram storing part: " + str(part) + " At entry: " + str(i)
print "Storing Train input"
file_name = 'train_input_'+'part_'+str(part)+'_'+str(dim)+'_'+str(total_frames_with_deltas)+'_window_height_'+str(window_height)+'.h5'
store_file(file_name,train_input_data_interweawed_normalized)
part = part + 1
del train_input_data_interweawed_normalized
gc.collect()
train_input_data_interweawed_normalized = []
raw_input("something")
for plot in train_files:
overlaps_reshaped = np.random.randint(10,size=(45,200,5,3))
for ind_plot in overlaps_reshaped.reshape(overlaps_reshaped.shape[1],overlaps_reshaped.shape[0],overlaps_reshaped.shape[2],overlaps_reshaped.shape[3]):
ind_plot_reshaped = ind_plot.reshape(ind_plot.shape[0],1,ind_plot.shape[1],ind_plot.shape[2])
train_input_data_interweawed_normalized.append(ind_plot_reshaped)
print len(train_input_data_interweawed_normalized)
return train_input_data_interweawed_normalized_print
#------------------------------------------------------------------------------------------------------------------------------------------------------------
saved = False
train_input = load_data_overlap(saved)
output:
.....
223662080
224772096
225882112
226996224
228106240
229216256
230326272
Max ram storing part: 0 At entry: 135
Storing Train input
something
377118720
Max ram storing part: 1 At entry: 136
Storing Train input
something
377118720
Max ram storing part: 2 At entry: 137
Storing Train input
something
You need to explicitly force garbage collection, see here:
According to Python Official Documentation, you can force the Garbage Collector to release unreferenced memory with gc.collect()
The following code fills all my memory:
from sys import getsizeof
import numpy
# from http://stackoverflow.com/a/2117379/272471
def getSize(array):
return getsizeof(array) + len(array) * getsizeof(array[0])
class test():
def __init__(self):
pass
def t(self):
temp = numpy.zeros([200,100,100])
A = numpy.zeros([200], dtype = numpy.float64)
for i in range(200):
A[i] = numpy.sum( temp[i].diagonal() )
return A
a = test()
memory_usage("before")
c = [a.t() for i in range(100)]
del a
memory_usage("After")
print("Size of c:", float(getSize(c))/1000.0)
The output is:
('>', 'before', 'memory:', 20588, 'KiB ')
('>', 'After', 'memory:', 1583456, 'KiB ')
('Size of c:', 8.92)
Why am I using ~1.5 GB of memory if c is ~ 9 KiB? Is this a memory leak? (Thanks)
The memory_usage function was posted on SO and is reported here for clarity:
def memory_usage(text = ''):
"""Memory usage of the current process in kilobytes."""
status = None
result = {'peak': 0, 'rss': 0}
try:
# This will only work on systems with a /proc file system
# (like Linux).
status = open('/proc/self/status')
for line in status:
parts = line.split()
key = parts[0][2:-1].lower()
if key in result:
result[key] = int(parts[1])
finally:
if status is not None:
status.close()
print('>', text, 'memory:', result['rss'], 'KiB ')
return result['rss']
The implementation of diagonal() failed to decrement a reference counter. This issue had been previously fixed, but the change didn't make it into 1.7.0.
Upgrading to 1.7.1 solves the problem! The release notes contain various useful identifiers, notably issue 2969.
The solution was provided by Sebastian Berg and Charles Harris on the NumPy mailing list.
Python allocs memory from the OS if it needs some.
If it doesn't need it any longer, it may or may not return it again.
But if it doesn't return it, the memory will be reused on subsequent allocations. You should check that; but supposedly the memory consumption won't increase even more.
About your estimations of memory consumption: As azorius already wrote, your temp array consumes 16 MB, while your A array consumes about 200 * 8 = 1600 bytes (+ 40 for internal reasons). If you take 100 of them, you are at 164000 bytes (plus some for the list).
Besides that, I have no explanation for the memory consumption you have.
I don't think sys.getsizeof returns what you expect
your numpy vector A is 64 bit (8 bytes) - so it takes up (at least)
8 * 200 * 100 * 100 * 100 / (2.0**30) = 1.5625 GB
so at minimum you should use 1.5 GB on the 100 arrays, the last few hundred mg are all the integers used for indexing the large numpy data and the 100 objects
It seems that sys.getsizeof always returns 80 no matter how large a numpy array is:
sys.getsizeof(np.zeros([200,1000,100])) # return 80
sys.getsizeof(np.zeros([20,100,10])) # return 80
In your code you delete a which is a tiny factory object who's t method return huge numpy arrays, you store these huge arrays in a list called c.
try to delete c, then you should regain most of your RAM