Memory leak in Python-script using VTK - python

I am currently using a Python script to process information stored in the EnSight Gold format. My Python (2.6) scipt uses VTK (5.10.0) to process the file, where I used the vtkEnSightGoldReader for reading the data, and loop over time steps. In principle this works find for smaller datasets, however, for large datasets (GBs), I see the memory usage (via top) increasing with time while the process is running. This filling of the memory goes slow, but in some cases problems are inevitable.
The following script is the minimal productive script that I reduced my issue to.
import vtk
reader = vtk.vtkEnSightGoldReader()
reader.SetCaseFileName("case.case")
reader.Update()
# Get time values
timeset=reader.GetTimeSets()
time=timeset.GetItem(0)
timesteps=time.GetSize()
#reader.ReleaseDataFlagOn()
for j in range(timesteps):
curTime=time.GetTuple(j)[0]
print curTime
reader.SetTimeValue(curTime)
reader.Update()
#reader.RemoveAllInputs()
My question is, how can I unload/replace the data that is stored in the memory, instead of using more memory continuously?
As you can see in my source code, I tried member functions "RemoveAllInputs" and "ReleaseDataFlagOn", but they don't work or I used them in the wrong way. Infortunately, I am not getting any closer to a solution.
Something else I tried is the DeepCopy() approach, which I found on the VTK website. However, it seems that this approach is not useful for me, because I get the memory issues even before calling GetOutput()

There is indeed a (minor) memory leak in the vtkEnsightGoldReader. The memory leak is a result of not properly clear collection object, which becomes apparent only for processing very large datasets. Technically it is not a memoryleak, since it gets properly cleared after a run.
This can only be solved by applying a patch to the VTK source and recompiling. I received the patch below via people from Kitware, so I would assume this rolled out in later versions of VTK.
diff --git a/IO/vtkEnSightReader.cxx b/IO/vtkEnSightReader.cxx
index 68a9b8f..7ab8ddd 100644
--- a/IO/vtkEnSightReader.cxx
+++ b/IO/vtkEnSightReader.cxx
## -985,6 +985,8 ## int vtkEnSightReader::ReadCaseFileTime(char* line)
int timeSet, numTimeSteps, i, filenameNum, increment, lineRead;
float timeStep;
+ this->TimeSetFileNameNumbers->RemoveAllItems();
+
// found TIME section
int firstTimeStep = 1;

Related

What is the source of Python3 Memory error? And how to solve it?

I wanted to plot .csv files in a loop. By searching around in Stack I found the solution to be- to use plt.figure(). This did solve my problem when I ran it for 2 files. But when I tried this 20 files it gives me MEMORY ERROR. It runs upto 6th file and then throws the error.
The .csv files I am importing each have sizes approximately (800,000~1mil) x 10.
Failed Solutions/debugging/source of problem-
I know that when you import huge files you may be led to memory error(again info from Stack). But here I am loading the files into same variable over and over. Hence I did not expect memory error as I am not using more memory for each loop.
This is not due to individual files as I successfully ran the program in batches - (1,5),(5,10),(10,15),(15-20). But I want this to happen in a single go.
I tried to define functions for plotting hoping to avoid the problem. But again faced the same problem.
I think I can avoid this problem if I can refresh Python database(? I meant something like cache in browsers) after completion of every loop. But how can I accomplish this?
Thanks in advance.
P.S - If I can somehow speed up the program that would be a bonus.
This is the program
import numpy as np
import matplotlib.pyplot as plt
global n,data1
n = '/media/gautam/New Volume/IIT/Cosmology/2nd year/NEW Codes/k(0.1)_NO-DM.csv'
data1 = np.genfromtxt(n,delimiter=',',dtype=None)
for k in range(0,20):
a = '/media/gautam/New Volume/IIT/Cosmology/2nd year/NEW Codes/k_(0.1)_%d.csv'%k
data2 = np.genfromtxt(a,delimiter=',',dtype=None)
plt.figure()
diff = data1 - data2
plt.plot(np.log10(data1[:,1]),np.absolute(diff[:,6]),label='|diff_d|')
plt.xlabel('log(a)')
plt.ylabel('|diff_d|')
plt.legend()
plt.title('q_d = %d '%data2[0,10])
plt.savefig('/media/gautam/New Volume/IIT/Cosmology/2nd year/NEW Codes/Fig/k_(0.1)_%d_diff.png'%k)
The Matplotlib documentation says:
If you are creating many figures, make sure you explicitly call “close”
on the figures you are not using, because this will enable pylab to properly
clean up the memory.
So, if you are having memory issues, add a call to plt.close() at the bottom of your for loop.

Huge memory usage in pyROOT

My pyROOT analysis code is using huge amounts of memory. I have reduced the problem to the example code below:
from ROOT import TChain, TH1D
# Load file, chain
chain = TChain("someChain")
inFile = "someFile.root"
chain.Add(inFile)
nentries = chain.GetEntries()
# Declare histograms
h_nTracks = TH1D("h_nTracks", "h_nTracks", 16, -0.5, 15.5)
h_E = TH1D("h_E","h_E",100,-0.1,6.0)
h_p = TH1D("h_p", "h_p", 100, -0.1, 6.0)
h_ECLEnergy = TH1D("h_ECLEnergy","h_ECLEnergy",100,-0.1,14.0)
# Loop over entries
for jentry in range(nentries):
# Load entry
entry = chain.GetEntry(jentry)
# Define variables
cands = chain.__ncandidates__
nTracks = chain.nTracks
E = chain.useCMSFrame__boE__bc
p = chain.useCMSFrame__bop__bc
ECLEnergy = chain.useCMSFrame__boECLEnergy__bc
# Fill histos
h_nTracks.Fill(nTracks)
h_ECLEnergy.Fill(ECLEnergy)
for cand in range(cands):
h_E.Fill(E[cand])
h_p.Fill(p[cand])
where someFile.root is a root file with 700,000 entries and multiple particle candidates per entry.
When I run this script it uses ~600 MB of memory. If I remove the line
h_p.Fill(p[cand])
it uses ~400 MB.
If I also remove the line
h_E.Fill(E[cand])
it uses ~150 MB.
If I also remove the lines
h_nTracks.Fill(nTracks)
h_ECLEnergy.Fill(ECLEnergy)
there is no further reduction in memory usage.
It seems that for every extra histogram that I fill of the form
h_variable.Fill(variable[cand])
(i.e. histograms that are filled once per candidate per entry, as opposed to histograms that are just filled once per entry) I use an extra ~200 MB of memory. This becomes a serious problem when I have 10 or more histograms because I am using GBs of memory and I am exceeding the limits of my computing system. Does anybody have a solution?
Update: I think this is a python3 problem.
If I take the script in my original post (above) and run it using python2 the memory usage is ~200 MB, compared to ~600 MB with python3. Even if I try to replicate Problem 2 by using the long variable names, the job still only uses ~200 MB of memory with python2, compared to ~1.3 GB with python3.
During my Googling I came across a few other accounts of people encountering memory leaks when using pyROOT with python3. It seems this is still an issue as of Python 3.6.2 and ROOT 6.08/06, and that for the moment you must use python2 if you want to use pyROOT.
So, using python2 appears to be my "solution" for now, but it's not ideal. If anybody has any further information or suggestions I'd be grateful to hear from you!
I'm glad you figured out Python3 was the problem. But if you (or anyone) continues to have memory usage issues when working with histograms in the future, here are some potential solutions I hope you'll find helpful!
THnSparse
Use THnSparse--THnSparse is an efficient multidimensional histogram that shows its strengths in histograms where only a small fraction of the total bins are filled. You can read more about it here.
TTree
TTrees are data structures in ROOT that are quite frankly glorified tables. However, they are highly optimized. A TTree is composed of branches and leaves that contain data that, through ROOT, can be speedily and efficiently accessed. If you put your data into a TTree first, and then read it into a histogram, I guarantee you will find lower memory usage and higher run times.
Here is some example TTree code.
root_file_path = "../hadd_www.root"
muon_ps = ROOT.TFile(root_file_path)
muon_ps_tree = muon_ps.Get("WWWNtuple")
muon_ps_branches = muon_ps_tree.GetListOfBranches()
canv= ROOT.TCanvas()
num_of_events = 5000
ttvhist = ROOT.TH1F('Statistics2', 'Jet eta for ttV (aqua) vs WWW (white); Pseudorapidity',100, -3, 3)
i = 0
muon_ps_tree.GetEntry(i)
print len(muon_ps_tree.jet_eta)
#sys.exit()
while muon_ps_tree.GetEntry(i):
if i > num_of_events: break
for k in range(0,len(muon_ps_tree.jet_eta)-1):
wwwhist.Fill(float(muon_ps_tree.jet_eta[0]), 1)
i += 1
ttvhist.Write()
ttvhist.Draw("hist")
ttvhist.SetFillColor(70);
And here's a resource where you can learn about how fantastic TTrees are:
TTree ROOT documentation
For more reading, here is a discussion on speeding up ROOT historgram building on the CERN help forum:
Memory-conservative histograms for usage in DQ monitoring
Best of luck with your data analysis, and happy coding!

No space left while using Multiprocessing.Array in shared memory

I am using the multiprocessing functions of Python to run my code parallel on a machine with roughly 500GB of RAM. To share some arrays between the different workers I am creating a Array object:
N = 150
ndata = 10000
sigma = 3
ddim = 3
shared_data_base = multiprocessing.Array(ctypes.c_double, ndata*N*N*ddim*sigma*sigma)
shared_data = np.ctypeslib.as_array(shared_data_base.get_obj())
shared_data = shared_data.reshape(-1, N, N, ddim*sigma*sigma)
This is working perfectly for sigma=1, but for sigma=3 one of the harddrives of the device is slowly filled, until there is no free space anymore and then the process fails with this exception:
OSError: [Errno 28] No space left on device
Now I've got 2 questions:
Why does this code even write anything to the disc? Why isn't it all stored in the memory?
How can I solve this problem? Can I make Python store it entireley in the RAM without writing it to the HDD? Or can I change the HDD on which this array is written?
EDIT: I found something online which suggests, that the array is stored in the "shared memory". But the /dev/shm device has plenty more free space as the /dev/sda1 which is filled up by the code above.
Here is the (relevant part of the) strace log of this code.
Edit #2: I think that I have found a workarround for this problem. By looking at the source I found that multiprocessing tries to create a temporary file in a directory which is determinded by using
process.current_process()._config.get('tempdir')
Setting this value manually at the beginning of the script
from multiprocessing import process
process.current_process()._config['tempdir'] = '/data/tmp/'
seems to be solving this issue. But I think that this is not the best way to solve it. So: are there any other suggestions how to handle it?
These data are larger than 500GB. Just shared_data_base would be 826.2GB on my machine by sys.getsizeof() and 1506.6GB by pympler.asizeof.asizeof(). Even if they were only 500GB, your machine needs some of that memory in order to run. This is why the data are going to disk.
import ctypes
from pympler.asizeof import asizeof
import sys
N = 150
ndata = 10000
sigma = 3
ddim = 3
print(sys.getsizeof(ctypes.c_double(1.0)) * ndata*N*N*ddim*sigma*sigma)
print(asizeof(ctypes.c_double(1.0)) * ndata*N*N*ddim*sigma*sigma)
Note that on my machine (Debian 9), /tmp is the location that fills. If you find that you must use disk, be certain that the location on disk used has enough available space, typically /tmp isn't a large partition.

Is there a faster way to copy from a bytearray to a mmap slice in Python?

I am writing code for an addon to XBMC that copies an image provided in a bytearray to a slice of a mmap object. Using Kern's line profiler, the bottleneck in my code is when I copy the bytearray into the mmap object at the appropriate location. In essence:
length = 1920 * 1080 * 4
mymmap = mmap.mmap(0, length + 11, 'Name', mmap.ACCESS_WRITE)
image = capture.getImage() # This returns a bytearray that is 4*1920*1080 in size
mymmap[11:(11 + length)] = str(image) # This is the bottleneck according to the line profiler
I cannot change the data types of either the image or mmap. XBMC provides the image as a bytearray and the mmap interface was designed by a developer who won't change the implementation. It is NOT being used to write to a file, however - it was his way of getting the data out of XBMC and into his C++ application. I recognize that he could write an interface using ctypes that might handle this better, but he is not interested in further development. The python implementation in XBMC is 2.7.
I looked at the possibility of using ctypes (in a self-contained way withing python) myself with memmove, but can't quite figure out how to convert the bytearray and mmap slice into c structures that can be used with memmove and don't know if that would be any faster. Any advice on a fast way to move these bytes between these two data types?
If the slice assignment to the mmap object is your bottleneck I don't think anything can be done to improve the performance of your code. All the assignment does internally is call memcpy.

OpenCL delete data from RAM

I'm coping data in python using OpenCL onto my graphic card. There I've a kernel processing the data with n threads.
After this step I copy the result back to python and in a new kernel. (The data is very big 900MB and the result is 100MB) With the result I need to calculate triangles which are about 200MB. All data exceed the memory on my graphic card.
I do not need the the first 900MB anymore after the first kernel finished it's work.
My question is, how can I delete the first dataset (stored in one array) from the graphic card?
Here some code:
#Write
self.gridBuf = cl.Buffer(self.context, cl.mem_flags.READ_ONLY | cl.mem_flags.COPY_HOST_PTR, hostbuf=self.grid)
#DO PART 1
...
#Read result
cl.enqueue_read_buffer(self.queue, self.indexBuf,index).wait()
You will need to call clReleaseMemObject with the mem object you created with the call to clCreateBuffer. If the reference count becomes zero with this call, the underlying device/shared memory is released by the implementation.

Categories