I'm looping histogram operation on HDF5 files of size ~800MB each (equal size).
The result of histogram is stored in text files each with ~5column x 30 lines.
t0 = time.time()
for f in filelist:
d = h5py.File(f,'r')
result = make_histogram(d['X'].value)
ascii_write(result)
print time.time()-t0
d.close()
One pass through the loop normally seems to take ~6-7 seconds for each file.
However, at some point it takes significantly longer to pass one loop.
And this point in time seems to start rather randomly if I try running multiple times with different files starting first.
I noticed that in my system monitor, at this point, CPU is in "disk sleep".
How can I fix this?
It seems to be related to this question, but I could not find a definitive answer.
Related
I am opening a list of asm files, and closing them after extracting arrays from them. I had issues with RAM usage which is solved now.
When I was appending the extracted arrays (in array format) in a list, the RAM usage kept stacking up with each iteration. However, when I changed my code to change the format of the extracted arrays to list before appending, the issue resolved. Please see line i_arrays.append(h.tolist()). I'm just trying to understand why was the RAM usage stacking up when I was storing np arrays.
Code >>>>>>>>>>>>>>>>>>>>>>>>>
t1_start = process_time()
files = os.listdir('asmFiles')
i_arrays=[]
file_names=[]
for i in tqdm(files[0:1501]):
f_name=i.split('.')[0]
file_names.append(f_name)
b='asmFiles/'+str(i)
f=open(b,'rb')
ln=os.path.getsize(b)
width=int(ln**0.5)
rem=ln%width
a = array.array("B")
a.fromfile(f,ln-rem)
f.close()
g=np.reshape(a,(int(len(a)/width),width))
g=np.uint(g)
h=g[0][0:800]
i_arrays.append(h.tolist())
print(psutil.virtual_memory()[2])
t1_stop = process_time()
print("Elapsed time during the whole program in seconds:",t1_stop-t1_start)
I'm training my python abilities by making a bunch of generally useless code and today I was attempting to print Bad apple in the console using ASCII art as one does, I did everything just fine until I had to time the prints so they end in 3 minutes and 52 seconds maintaining a consistent framerate. I tried just adding a time.sleep()in between prints hoping it would all just magically work but obviously it didn't.
I customized a version of this git https://github.com/aypro-droid/image-to-ascii to transform frames to ASCII art and used https://pypi.org/project/opencv-python/ for transforming the video to frames
here is my code:
import time
frames = {}
#saving each .txt frame on a dict
for i in range(6955):
f = open("Frames-to-Ascii/output/output{0}.txt".format(i), "rt")
frames['t{0}'.format(i)] = f.read()
f.close()
#start "trigger"
ini = input('start(type anything): ')
start = time.time()
#printing the 6954 frames from the dict
for x in range(6955):
eval("print(frames['t{0}'])".format(x))
#my attempt at timing
time.sleep(0.015)
end = time.time()
#calculating how much time the prints took overall, should be about 211.2 seconds evenly distributed to all the "frames"
print(end-start)
frame example:
here
I'm attempting to time the prints perfectly to the video so I can later use it somewhere else, any tips?
What I understand is that you need to print the frames at a given constant rate?
If yes, then you need to evaluate the time used to print and then sleep for the delay minus the time to print. Something like:
for x in range(6955):
start = time.time()
print("hips")
end = time.time()
time.sleep(0.5-(end-start))
Thus the loop will take (approximatively) 0.5s to run. (Change the value accordingly to your needs).
Of course if a single print takes more time than the delay, you need to find another strategy: for example stepping over the next frame, etc.
Context
While training a neural network I realized the time spent per batch increased when I increased the size of my dataset (without changing the batch size). The important part is, I need to fetch 20 .npy files per data point, this number doesn't depend on the dataset size.
Problem
Training goes from 2s/iteration to 10s/iteration...
There is no apparent reason why training would take longer. However, I managed to track down the bottleneck. It seems to have to do with the loading of the .npy files.
To reproduce this behavior, here's a small script you can run to generate 10,000 dummy .npy files:
def path(i):
return os.sep.join(('../datasets/test', str(i)))
def create_dummy_files(N=10000):
for i in range(N):
x = np.random.random((100, 100))
np.save(path(random.getrandbits(128)), x)
Then you can run the following two scripts and compare them yourself:
The first script where 20 .npy files are randomly selected and loaded:
L = os.listdir('../datasets/test')
S = random.sample(L, 20)
for s in S:
np.load(path(s)) # <- timed this
The second version, where 20 .npy 'sequential' files are selected and loaded.
L = os.listdir('../datasets/test')
i = 100
S = L[i: i + 20]
for s in S:
np.load(path(s)) # <- timed this
I tested both scripts and ran them 100 times each (in the 2nd script I used the iteration count as the value for i so the same files are not loaded twice). I wrapped the np.load(path(s)) line with time.time() calls. I'm not timing the sampling, only the loading. Here are the results:
Random loads (times roughly stay between 0.1s and 0.4s, average is 0.25s):
Non random loads (times roughly stay between 0.010s and 0.014s, average is 0.01s):
I'm assuming those times are related to the CPU's activity when the scripts are loaded. However, it doesn't explain this gap. Why are these two results so different? Is there something to do with the way files are indexed?
Edit: I printed S in the random sample script, copied the list of 20 filenames then ran it again with S as a list literally defined. The time it took is comparable to the 'sequential' script. This means it's not related to files not being sequential in the fs or anything. It seems the random sampling gets counted in the timer, yet timing is defined as:
t = time.time()
np.load(path(s))
print(time.time() - t)
I tried as well wrapping np.load (exclusively) with cProfile: same result.
I did say:
I tested both scripts and ran them 100 times each (in the 2nd script I used the iteration count as the value for i so the same files are not loaded twice)
But as tevemadar mentioned
i should be randomized
I completely messed up the operation of selecting different files in the second version. My code was timing the scripts 100 times like so:
for i in trange(100):
if rand:
S = random.sample(L, 20)
else:
S = L[i: i+20] # <- every loop there's only 1 new file added in the selection,
# 19 files will have already been cached in the previous fetch
For the second script, it should rather be S = L[100*i, 100*i+20]!
And yes, when timing, the results are comparable.
My script takes two movie files as an input, and writes a 2x1 array movie output (stereoscopic Side-by-Side Half-Width). The input video clips are of equal resolution (1280x720), frame rate (60), number of frames (23,899), format (mp4)...
When the write_videofile function starts processing, it provides an estimated time of completion that is very reasonable ~20min. As it processes every frame, the process gets slower and slower and slower (indicated by progress bar and estimated completion time). In my case, the input movie clips are about 6min long. After three minutes of processing, it indicates it will take over 3 hours to complete. After a half hour of processing, it then indicates it will take over 24hours to complete.
I have tried the 'threads' option of the write_videofile function, butit did not help.
Any idea? Thanks for the help.
---- Script ----
movie_L = 'movie_L.mp4'
movie_R = 'movie_R.mp4'
output_movie = 'new_movie.mp4')
clip_L = VideoFileClip(movie_L)
(width_L, height_L) = clip_L.size
clip_L = clip_L.resize((width_L/2, height_L))
clip_R = VideoFileClip(movie_R)
(width_R, height_R) = clip_R.size
clip_R = clip_R.resize((width_R/2, height_R))
print("*** Make an array of the two movies side by side")
arrayClip = clips_array([[clip_L, clip_R]])
print("*** Write the video file")
arrayClip.write_videofile(output_movie, threads=4, audio = False)
I realize that this is old but for anyone still having this issue be sure to add
progress_bar = False to your code. EG.
arrayClip.write_videofile(output_movie, threads=4, audio = False, progress_bar = False)
Having the progress bar printing out each time it updates into IDLE takes up a ton of memory, thus slowing down your program until it stops completely.
I have also had problems with slow rendering. I find that it helps a lot to use multithreading and also to set the bitrate.
This is my configuration:
videoclip.write_videofile("fractal.mp4",fps=20,threads=16,logger=None,codec="mpeg4",preset="slow",ffmpeg_params=['-b:v','10000k'])
This works very well even with preset set to slow. This setting gives better quality for the same number of bits and if this is not an issue, you could set it to medium or fast to gain some more on speed.
My pyROOT analysis code is using huge amounts of memory. I have reduced the problem to the example code below:
from ROOT import TChain, TH1D
# Load file, chain
chain = TChain("someChain")
inFile = "someFile.root"
chain.Add(inFile)
nentries = chain.GetEntries()
# Declare histograms
h_nTracks = TH1D("h_nTracks", "h_nTracks", 16, -0.5, 15.5)
h_E = TH1D("h_E","h_E",100,-0.1,6.0)
h_p = TH1D("h_p", "h_p", 100, -0.1, 6.0)
h_ECLEnergy = TH1D("h_ECLEnergy","h_ECLEnergy",100,-0.1,14.0)
# Loop over entries
for jentry in range(nentries):
# Load entry
entry = chain.GetEntry(jentry)
# Define variables
cands = chain.__ncandidates__
nTracks = chain.nTracks
E = chain.useCMSFrame__boE__bc
p = chain.useCMSFrame__bop__bc
ECLEnergy = chain.useCMSFrame__boECLEnergy__bc
# Fill histos
h_nTracks.Fill(nTracks)
h_ECLEnergy.Fill(ECLEnergy)
for cand in range(cands):
h_E.Fill(E[cand])
h_p.Fill(p[cand])
where someFile.root is a root file with 700,000 entries and multiple particle candidates per entry.
When I run this script it uses ~600 MB of memory. If I remove the line
h_p.Fill(p[cand])
it uses ~400 MB.
If I also remove the line
h_E.Fill(E[cand])
it uses ~150 MB.
If I also remove the lines
h_nTracks.Fill(nTracks)
h_ECLEnergy.Fill(ECLEnergy)
there is no further reduction in memory usage.
It seems that for every extra histogram that I fill of the form
h_variable.Fill(variable[cand])
(i.e. histograms that are filled once per candidate per entry, as opposed to histograms that are just filled once per entry) I use an extra ~200 MB of memory. This becomes a serious problem when I have 10 or more histograms because I am using GBs of memory and I am exceeding the limits of my computing system. Does anybody have a solution?
Update: I think this is a python3 problem.
If I take the script in my original post (above) and run it using python2 the memory usage is ~200 MB, compared to ~600 MB with python3. Even if I try to replicate Problem 2 by using the long variable names, the job still only uses ~200 MB of memory with python2, compared to ~1.3 GB with python3.
During my Googling I came across a few other accounts of people encountering memory leaks when using pyROOT with python3. It seems this is still an issue as of Python 3.6.2 and ROOT 6.08/06, and that for the moment you must use python2 if you want to use pyROOT.
So, using python2 appears to be my "solution" for now, but it's not ideal. If anybody has any further information or suggestions I'd be grateful to hear from you!
I'm glad you figured out Python3 was the problem. But if you (or anyone) continues to have memory usage issues when working with histograms in the future, here are some potential solutions I hope you'll find helpful!
THnSparse
Use THnSparse--THnSparse is an efficient multidimensional histogram that shows its strengths in histograms where only a small fraction of the total bins are filled. You can read more about it here.
TTree
TTrees are data structures in ROOT that are quite frankly glorified tables. However, they are highly optimized. A TTree is composed of branches and leaves that contain data that, through ROOT, can be speedily and efficiently accessed. If you put your data into a TTree first, and then read it into a histogram, I guarantee you will find lower memory usage and higher run times.
Here is some example TTree code.
root_file_path = "../hadd_www.root"
muon_ps = ROOT.TFile(root_file_path)
muon_ps_tree = muon_ps.Get("WWWNtuple")
muon_ps_branches = muon_ps_tree.GetListOfBranches()
canv= ROOT.TCanvas()
num_of_events = 5000
ttvhist = ROOT.TH1F('Statistics2', 'Jet eta for ttV (aqua) vs WWW (white); Pseudorapidity',100, -3, 3)
i = 0
muon_ps_tree.GetEntry(i)
print len(muon_ps_tree.jet_eta)
#sys.exit()
while muon_ps_tree.GetEntry(i):
if i > num_of_events: break
for k in range(0,len(muon_ps_tree.jet_eta)-1):
wwwhist.Fill(float(muon_ps_tree.jet_eta[0]), 1)
i += 1
ttvhist.Write()
ttvhist.Draw("hist")
ttvhist.SetFillColor(70);
And here's a resource where you can learn about how fantastic TTrees are:
TTree ROOT documentation
For more reading, here is a discussion on speeding up ROOT historgram building on the CERN help forum:
Memory-conservative histograms for usage in DQ monitoring
Best of luck with your data analysis, and happy coding!