I'm training my python abilities by making a bunch of generally useless code and today I was attempting to print Bad apple in the console using ASCII art as one does, I did everything just fine until I had to time the prints so they end in 3 minutes and 52 seconds maintaining a consistent framerate. I tried just adding a time.sleep()in between prints hoping it would all just magically work but obviously it didn't.
I customized a version of this git https://github.com/aypro-droid/image-to-ascii to transform frames to ASCII art and used https://pypi.org/project/opencv-python/ for transforming the video to frames
here is my code:
import time
frames = {}
#saving each .txt frame on a dict
for i in range(6955):
f = open("Frames-to-Ascii/output/output{0}.txt".format(i), "rt")
frames['t{0}'.format(i)] = f.read()
f.close()
#start "trigger"
ini = input('start(type anything): ')
start = time.time()
#printing the 6954 frames from the dict
for x in range(6955):
eval("print(frames['t{0}'])".format(x))
#my attempt at timing
time.sleep(0.015)
end = time.time()
#calculating how much time the prints took overall, should be about 211.2 seconds evenly distributed to all the "frames"
print(end-start)
frame example:
here
I'm attempting to time the prints perfectly to the video so I can later use it somewhere else, any tips?
What I understand is that you need to print the frames at a given constant rate?
If yes, then you need to evaluate the time used to print and then sleep for the delay minus the time to print. Something like:
for x in range(6955):
start = time.time()
print("hips")
end = time.time()
time.sleep(0.5-(end-start))
Thus the loop will take (approximatively) 0.5s to run. (Change the value accordingly to your needs).
Of course if a single print takes more time than the delay, you need to find another strategy: for example stepping over the next frame, etc.
Related
Context
While training a neural network I realized the time spent per batch increased when I increased the size of my dataset (without changing the batch size). The important part is, I need to fetch 20 .npy files per data point, this number doesn't depend on the dataset size.
Problem
Training goes from 2s/iteration to 10s/iteration...
There is no apparent reason why training would take longer. However, I managed to track down the bottleneck. It seems to have to do with the loading of the .npy files.
To reproduce this behavior, here's a small script you can run to generate 10,000 dummy .npy files:
def path(i):
return os.sep.join(('../datasets/test', str(i)))
def create_dummy_files(N=10000):
for i in range(N):
x = np.random.random((100, 100))
np.save(path(random.getrandbits(128)), x)
Then you can run the following two scripts and compare them yourself:
The first script where 20 .npy files are randomly selected and loaded:
L = os.listdir('../datasets/test')
S = random.sample(L, 20)
for s in S:
np.load(path(s)) # <- timed this
The second version, where 20 .npy 'sequential' files are selected and loaded.
L = os.listdir('../datasets/test')
i = 100
S = L[i: i + 20]
for s in S:
np.load(path(s)) # <- timed this
I tested both scripts and ran them 100 times each (in the 2nd script I used the iteration count as the value for i so the same files are not loaded twice). I wrapped the np.load(path(s)) line with time.time() calls. I'm not timing the sampling, only the loading. Here are the results:
Random loads (times roughly stay between 0.1s and 0.4s, average is 0.25s):
Non random loads (times roughly stay between 0.010s and 0.014s, average is 0.01s):
I'm assuming those times are related to the CPU's activity when the scripts are loaded. However, it doesn't explain this gap. Why are these two results so different? Is there something to do with the way files are indexed?
Edit: I printed S in the random sample script, copied the list of 20 filenames then ran it again with S as a list literally defined. The time it took is comparable to the 'sequential' script. This means it's not related to files not being sequential in the fs or anything. It seems the random sampling gets counted in the timer, yet timing is defined as:
t = time.time()
np.load(path(s))
print(time.time() - t)
I tried as well wrapping np.load (exclusively) with cProfile: same result.
I did say:
I tested both scripts and ran them 100 times each (in the 2nd script I used the iteration count as the value for i so the same files are not loaded twice)
But as tevemadar mentioned
i should be randomized
I completely messed up the operation of selecting different files in the second version. My code was timing the scripts 100 times like so:
for i in trange(100):
if rand:
S = random.sample(L, 20)
else:
S = L[i: i+20] # <- every loop there's only 1 new file added in the selection,
# 19 files will have already been cached in the previous fetch
For the second script, it should rather be S = L[100*i, 100*i+20]!
And yes, when timing, the results are comparable.
My script takes two movie files as an input, and writes a 2x1 array movie output (stereoscopic Side-by-Side Half-Width). The input video clips are of equal resolution (1280x720), frame rate (60), number of frames (23,899), format (mp4)...
When the write_videofile function starts processing, it provides an estimated time of completion that is very reasonable ~20min. As it processes every frame, the process gets slower and slower and slower (indicated by progress bar and estimated completion time). In my case, the input movie clips are about 6min long. After three minutes of processing, it indicates it will take over 3 hours to complete. After a half hour of processing, it then indicates it will take over 24hours to complete.
I have tried the 'threads' option of the write_videofile function, butit did not help.
Any idea? Thanks for the help.
---- Script ----
movie_L = 'movie_L.mp4'
movie_R = 'movie_R.mp4'
output_movie = 'new_movie.mp4')
clip_L = VideoFileClip(movie_L)
(width_L, height_L) = clip_L.size
clip_L = clip_L.resize((width_L/2, height_L))
clip_R = VideoFileClip(movie_R)
(width_R, height_R) = clip_R.size
clip_R = clip_R.resize((width_R/2, height_R))
print("*** Make an array of the two movies side by side")
arrayClip = clips_array([[clip_L, clip_R]])
print("*** Write the video file")
arrayClip.write_videofile(output_movie, threads=4, audio = False)
I realize that this is old but for anyone still having this issue be sure to add
progress_bar = False to your code. EG.
arrayClip.write_videofile(output_movie, threads=4, audio = False, progress_bar = False)
Having the progress bar printing out each time it updates into IDLE takes up a ton of memory, thus slowing down your program until it stops completely.
I have also had problems with slow rendering. I find that it helps a lot to use multithreading and also to set the bitrate.
This is my configuration:
videoclip.write_videofile("fractal.mp4",fps=20,threads=16,logger=None,codec="mpeg4",preset="slow",ffmpeg_params=['-b:v','10000k'])
This works very well even with preset set to slow. This setting gives better quality for the same number of bits and if this is not an issue, you could set it to medium or fast to gain some more on speed.
How to know total number of Frame in a file ( .avi) through Python using open cv module.
If possible what all the information (resolution, fps,duration,etc) we can get of a video file through this.
With a newer OpenCV version (I use 3.1.0) it works like this:
import cv2
cap = cv2.VideoCapture("video.mp4")
length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print( length )
And similar for other video properties cv2.CAP_PROP_*
import cv2
cap = cv2.VideoCapture(fn)
if not cap.isOpened():
print("could not open :",fn)
return
length = int(cap.get(cv2.cv.CV_CAP_PROP_FRAME_COUNT))
width = int(cap.get(cv2.cv.CV_CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.cv.CV_CAP_PROP_FPS)
see here for more info.
also, all of it with a grain of salt, not all those props are mandatory, some might not be available with your capture / video codec
There are two methods to determine the number of frames in a video file
Method #1: Utilize built-in OpenCV properties to access video file meta information
which is fast and efficient but inaccurate
Method #2: Manually loop over each frame in the video file with a counter which is slow and inefficient but accurate
Method #1 is fast and relys on OpenCV's video property functionality which almost instantaneously determines the number of frames in a video file. However, there is an accuracy trade-off since it is dependent on your OpenCV and video codec versions. On the otherhand, manually counting each frame will be 100% accurate although it will be significantly slower. Here's a function that attempts to perform Method #1 by default, if it fails, it will automatically utilize method #2
def frame_count(video_path, manual=False):
def manual_count(handler):
frames = 0
while True:
status, frame = handler.read()
if not status:
break
frames += 1
return frames
cap = cv2.VideoCapture(video_path)
# Slow, inefficient but 100% accurate method
if manual:
frames = manual_count(cap)
# Fast, efficient but inaccurate method
else:
try:
frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
except:
frames = manual_count(cap)
cap.release()
return frames
Benchmarks
if __name__ == '__main__':
import timeit
import cv2
start = timeit.default_timer()
print('frames:', frame_count('fedex.mp4', manual=False))
print(timeit.default_timer() - start, '(s)')
start = timeit.default_timer()
print('frames:', frame_count('fedex.mp4', manual=True))
print(timeit.default_timer() - start, '(s)')
Method #1 results
frames: 3671
0.018054921 (s)
Method #2 results
frames: 3521
9.447095287 (s)
Note the two methods differ by 150 frames and Method #2 is significantly slower than Method #1. Therefore if you need speed but willing to sacrifice accuracy, use Method #1. In situations where you're fine with a delay but need the exact number of frames, use Method #2
Here is how it works with Python 3.6.5 (on Anaconda) and OpenCV 3.4.2.
[Note]: You need to drop the "CV_" from the "CV_CAP_PROP_xx" for any property as given on the official OpenCV website.
import cv2
cap = cv2.VideoCapture("video.mp4")
property_id = int(cv2.CAP_PROP_FRAME_COUNT)
length = int(cv2.VideoCapture.get(cap, property_id))
print( length )
Another solution that doesn't depend on the sometimes buggy CV_CAP_PROP getters is to traverse your whole video file in a loop
Increase a frame counter variable every time a valid frame is encountered and stop when an invalid one comes (end of the video file).
Gathering information about the resolution is trickier because some codecs support variable resolution (similar to VBR in audio files where the bitrate is not a constant but instead covers some predefined range).
constant resolution - you need only the first frame to determine the resolution of the whole video file in this case so traversing the full video is not required
variable resolution - you need to get the resolution of every single frame (width and height) and calculate an average to get the average resolution of the video
FPS can be calculated however here you have the same problem as with the resolution - constant (CFR) vs variable (VFR). This is more of a mutli-threading problem omho. Personally I would use a frame counter, which increased after each valid frame while at an interval of 1 second a timer (running in a background thread) would trigger saving the current counter's value and then resetting it. You can store the values in a list in order to calculate the average/constant frame rate at the end when you will also know the total number of frames the video has.
The disadvantage of this rather simplistic way of doing things is that you have to traverse the whole file, which - in case it's several hours long - will definitely be noticeable by the user. In this case you can be smart about it and do that in a background process while letting the user do something else while your application is gathering this information about the loaded video file.
The advantage is that no matter what video file you have as long as OpenCV can read from it you will get quite accurate results unlike the CV_CAP_PROP which may or may not work as you expect it to.
I have a simple set of code that runs Clustal Omega (a protein multiple sequence alignment program) from Python:
from Bio.Align.Applications import ClustalOmegaCommandline
segments = range(1, 9)
segments.reverse()
for segment in segments:
in_file = '1.0 - Split FASTA Files/Segment %d.fasta' % segment
out_file = '1.1 - Aligned FASTA Files/Segment %d Aligned.fasta' % segment
distmat = '1.1 - Distmats/Segment %d Distmat.fasta' % segment
cline = ClustalOmegaCommandline(infile=in_file,
outfile=out_file,
distmat_out=distmat,
distmat_full=True,
verbose=True,
force=True)
print cline
cline()
I've done some informal tests at timing how long my multiple sequence alignments (MSAs) take. On average, each one takes 4 hours. To run all 8 one after another took me 32 hours in total. Therefore, that was my original intent in running it as a for loop - that I could let it run and not worry about it.
However, I did yet another informal test - I took the output from the printed cline, and copied-and-pasted it into 8 separate terminal windows spread across two computers, and ran the MSAs that way. On average, each one took about 8 hours or so... but because they were all running at the same time, it took me only 8 hours to get the results.
In some ways, I've discovered parallel processing! :D
But I'm now faced with the dilemma of how to get it running in Python. I've tried looking at the following SO posts, but I still cannot seem to wrap my head around how the multiprocessing module works.
List of posts:
How do I parallelize a simple Python loop?
Perform a for-loop in parallel in Python 3.2
Parallel loop in python
how to parallelize big for loops in python
Would anybody be kind enough to share how they would parallelize this loop? Many loops I do look similar to this loop, in which I perform some action on a file and write to another file, without ever needing to aggregate the results in memory. The specific difference I am facing is the need to do file I/O, rather than aggregate results from parallel runs of the loop.
Possibly the Joblib library is what you are looking for.
Let me give you an example of its use:
import time
from joblib import Parallel, delayed
def long_function():
time.sleep(1)
REPETITIONS = 4
Parallel(n_jobs=REPETITIONS)(
delayed(long_function)() for _ in range(REPETITIONS))
This code runs in 1 second, instead of 4 seconds.
Adapting your code looks like this (sorry, I can't test if this is correct):
from joblib import Parallel, delayed
from Bio.Align.Applications import ClustalOmegaCommandline
def run(segment):
in_file = '1.0 - Split FASTA Files/Segment %d.fasta' % segment
out_file = '1.1 - Aligned FASTA Files/Segment %d Aligned.fasta' % segment
distmat = '1.1 - Distmats/Segment %d Distmat.fasta' % segment
cline = ClustalOmegaCommandline(infile=in_file,
outfile=out_file,
distmat_out=distmat,
distmat_full=True,
verbose=True,
force=True)
print cline
cline()
if __name__ == "__main__":
segments = range(1, 9)
segments.reverse()
Parallel(n_jobs=len(segments)(
delayed(run)(segment) for segment in segments)
Instead of for segment in segments, write def f(segment) and then use multiprocessing.Pool().map(f, segments)
Figuring out how to put this in context is left as an exercise to the reader.
I'm looping histogram operation on HDF5 files of size ~800MB each (equal size).
The result of histogram is stored in text files each with ~5column x 30 lines.
t0 = time.time()
for f in filelist:
d = h5py.File(f,'r')
result = make_histogram(d['X'].value)
ascii_write(result)
print time.time()-t0
d.close()
One pass through the loop normally seems to take ~6-7 seconds for each file.
However, at some point it takes significantly longer to pass one loop.
And this point in time seems to start rather randomly if I try running multiple times with different files starting first.
I noticed that in my system monitor, at this point, CPU is in "disk sleep".
How can I fix this?
It seems to be related to this question, but I could not find a definitive answer.