Progression bar in output depending on progression of calculation in Python - python

If we assume that the following code is a huge and complicated code and lasts for minutes or hours and we want to inform the user how many percents of the code are passing, what should I do?
num=1
for i in range(1,100000):
num=num*i
print(num)
I want to inform users with progression bar, similar to installing something.
I checked here but I did not understand how to write a progression bar depending on my code progression.
In the examples similar to the mentioned link, they are defining the sleep or delaying time. this is not acceptable. Because we do not know the calculation time of Python in different code with different functions.

If your index i corresponds to your actuall progress, the tqdm package is a good option. A simple example:
from tqdm import tqdm
import time
for i in tqdm(range(1000)):
time.sleep(0.01) # sleep 0.01s
Output:
1%| | 1010/100000 [00:10<16:46, 98.30it/s]
Edit: The progress bar also works if the progress is not known.
def loop_without_known_length():
# same as above, but the length is not known outside of this function.
for i in range(1000):
yield i
for i in tqdm(loop_without_known_length()):
time.sleep(0.01)
Output:
60it [00:00, 97.23it/s]

Related

An accurate progress bar for loading files and transforming data using Vaex and Pandas

I am looking for the method to include a progress bar to see the remaining time for loading a file with Vaex (big data files) or transform big data with Panda. I have checked this thread https://stackoverflow.com/questions/3160699/python-progress-bar, but unfortunately, all the progress bar codes are absolutely inaccurate for my needs because the command or the code already finished before the progress bar was complete (absolutely fail).
I am looking for something similar to %time in which the time spent by a line, or a command, is printed out. In my case I want to see the estimation time and the progress bar for any command without using a for-loop.
Here is my code:
from progress.bar import Bar
with Bar('Processing', max=1) as bar:
%time sample_tolls_amount=df_panda_tolls.sample(n = 4999);
bar.next()
Processing |################################| 1/1CPU times: total: 11.1 s
Wall time: 11.1 s
The for loop is unneccesary because I need to run this command once. Actually, with the for loop, the progress bar was still running when the data (sample_tolls_amount) was done (in the case of max=20). Is there any way to check feasibly the progress of any command? Just like &time does.
I have tried several functions but all of them fail to show the real progress of the command.
I don't have for loops. I have commands to load or trandform big data files. Therefore, I want to know the progress done and the remaining time every time I run a code with my commands. Just like dowloading a file from the browser: you see how many Gb has been dowloaded and how much data remain to download.
I am looking for something easy to apply. Easy like %time (%progress).
i use these two progress bar variants that do not require imports and one can embed into the code quite easily.
simple progress bar:
import time
n = 25
for i in range(n):
time.sleep(0.1)
progress = int(i / n * 50)
print(f'running {i+1} of {n} {progress*"."}', end='\r', flush=True)
more elaborate progress bar:
import time
def print_progressbar(total, current, barsize=60):
progress = int(current*barsize/total)
completed = str(int(current*100/total)) + '%'
print('[', chr(9608)*progress, ' ', completed, '.'*(barsize-progress), '] ', str(current)+'/'+str(total), sep='', end='\r', flush=True)
total = 600
barsize = 60
print_frequency = max(min(total//barsize, 100), 1)
print("Start Task..")
for i in range(1, total+1):
time.sleep(0.0001)
if i%print_frequency == 0 or i == 1:
print_progressbar(total, i, barsize)
print("\nFinished")

How do you make tqdm give the correct speed estimate when using an initial counter value which is not zero?

I'm trying to show a progress bar for a process that was previously interrupted and was resumed. The progress bar needs to start partially filled, showing what has been completed before interruption. I'm using the "initial" parameter to set what has been completed.
However, the speed being shown is the total amount of iterations completed, including those that set in initial, divided by the time elapsed during this run, which gives an underestimation of the actual speed. Is there a way to make tqdm ignore the iterations it was initialised with when calculating the speed?
Here is a minimal reproducable code:
import tqdm
import time
prog = tqdm.tqdm(initial=50, total=100, smoothing=0.0)
for _ in range(50, 100):
time.sleep(1)
prog.update(1)
prog.refresh()
prog.close()
This is happening because you have set the smoothing to 0 set the smoothing 1 and you will get the correct iteration speed or you can even set it to default without sending the value of smoothing and you will get the exact iteration speed
Here is the code:
import tqdm
import time
prog = tqdm.tqdm(initial=50, total=100)
for _ in range(50, 100):
time.sleep(1)
prog.update(1)
prog.refresh()
prog.close()

progress bar slows down code by factor of 5 using tqdm and multiprocess

I added a progress bar to my 2.7 python code using tqdm but it has slowed down my code significantly. Without the progress bar for one example it takes 12 seconds while with the progress bar it takes 57 seconds.
The code without the progress bar looks like this:
p = mp.Pool()
combs = various combinations
result = p.map(self.parallelize, combs)
p.close()
p.join()
The code with the progress bar is as follows:
from tqdm import tqdm
p = mp.Pool()
combs = various combinations
result = list(tqdm(p.imap(self.parallelize, combs), total = 5000))
p.close()
p.join()
Is there a better way that wouldn't slow down my code as much?
Can it be related to the usage of map and imap rather than twdm? See this great answer from the community. multiprocessing.Pool: What's the difference between map_async and imap?
Plus, you can adjust the update frequency of tqdm with the ministers parameter. If it is really related to tqdm reducing the update frequency might solve your problem.
miniters : int or float, optional Minimum progress display update
interval, in iterations. If 0 and dynamic_miniters, will automatically
adjust to equal mininterval (more CPU efficient, good for tight
loops). If > 0, will skip display of specified number of iterations.
Tweak this and mininterval to get very efficient loops. If your
progress is erratic with both fast and slow iterations (network,
skipping items, etc) you should set miniters=1.
https://github.com/tqdm/tqdm#usage

Python memory_profiler inconsistent plots

I have recently started using the python memory profiler from here. As a test run, I tried to profile the toy code from here following the instructions therein. I have some naive questions on the outputs I saw.
import time
#profile
def test1():
n = 10000
a = [1] * n
time.sleep(1)
return a
#profile
def test2():
n = 100000
b = [1] * n
time.sleep(1)
return b
if __name__ == "__main__":
test1()
test2()
This is the output using mprof run and then plot command line options:
After removing the #profile lines, I ran the profiler again and obtained the following result:
Except for the brackets for the functions, I was expecting almost identical plots (since the code is simple), but I am seeing some significant differences such as the ending time of the plot, variations within brackets etc.
Can someone please shed light into these differences?
Edit:
For small intervals, the plot with function profiling looks like:
The differences you are seeing are probably due to the fact that the information stored by #profile is counted within the total memory used by the program. There is also a slight overhead of storing this information, hence the different running times.
Also, you might get slightly different plots in different runs just due to variations in how Python manages memory.

Looping at a constant rate with high precision for signal sampling

I am trying to sample a signal at 10Khz in Python. There is no problem when try to run this code(at 1KHz):
import sched, time
i = 0
def f(): # sampling function
s.enter(0.001, 1, f, ())
global i
i += 1
if i == 1000:
i = 0
print "one second"
s = sched.scheduler(time.time, time.sleep)
s.enter(0.001, 1, f, ())
s.run()
When I try to make the time less, it starts to exceed one second(in my computer, 1.66s at 10e-6).
It it possible to run a sampling function at a specific frequency in Python?
You didn't account for the code's overhead. Each iteration, this error adds up and skews the "clock".
I'd suggest to use a loop with time.sleep() instead (see comments to https://stackoverflow.com/a/14813874/648265) and count the time to sleep from the next reference moment so the inevitable error doesn't add up:
period=0.001
t=time.time()
while True:
t+=period
<...>
time.sleep(max(0,t-time.time())) #max is needed in Windows due to
#sleep's behaviour with negative argument
Note that the OS scheduling will not allow you to reach precisions beyond a certain level since other processes have to preempt yours from time to time. In this case, you'll need to use some OS-specific facilities for multimedia applications or work out a solution that doesn't need this level of accuracy (e.g. sample the signal with a specialized app and work with its saved output).

Categories