Find the speed of download for a progressbar - python

I'm writing a script to download videos from a website. I've added a report hook to get download progress. So, far it shows the percentage and size of the downloaded data. I thought it'd be interesting to add download speed and eta.
Problem is, if I use a simple speed = chunk_size/time the speeds shown are accurate enough but jump around like crazy. So, I've used the history of time taken to download individual chunks. Something like, speed = chunk_size*n/sum(n_time_history).
Now it shows a stable download speed, but it is most certainly wrong because its value is in a few bits/s, while the downloaded file visibly grows at a faster pace.
Can somebody tell me where I'm going wrong?
Here's my code.
def dlProgress(count, blockSize, totalSize):
global init_count
global time_history
try:
time_history.append(time.monotonic())
except NameError:
time_history = [time.monotonic()]
try:
init_count
except NameError:
init_count = count
percent = count*blockSize*100/totalSize
dl, dlu = unitsize(count*blockSize) #returns size in kB, MB, GB, etc.
tdl, tdlu = unitsize(totalSize)
count -= init_count #because continuation of partial downloads is supported
if count > 0:
n = 5 #length of time history to consider
_count = n if count > n else count
time_history = time_history[-_count:]
time_diff = [i-j for i,j in zip(time_history[1:],time_history[:-1])]
speed = blockSize*_count / sum(time_diff)
else: speed = 0
n = int(percent//4)
try:
eta = format_time((totalSize-blockSize*(count+1))//speed)
except:
eta = '>1 day'
speed, speedu = unitsize(speed, True) #returns speed in B/s, kB/s, MB/s, etc.
sys.stdout.write("\r" + percent + "% |" + "#"*n + " "*(25-n) + "| " + dl + dlu + "/" + tdl + tdlu + speed + speedu + eta)
sys.stdout.flush()
Edit:
Corrected the logic. Download speed shown is now much better.
As I increase the length of history used to calculate the speed, the stability increases but sudden changes in speed (if download stops, etc.) aren't shown.
How do I make it stable, yet sensitive to large changes?
I realize the question is now more math oriented, but it'd be great if somebody could help me out or point me in the right direction.
Also, please do tell me if there's a more efficient way to accomplish this.

_count = n if count > n else count
time_history = time_history[-_count:]
time_weights = list(range(1,len(time_history))) #just a simple linear weights
time_diff = [(i-j)*k for i,j in zip(time_history[1:], time_history[:-1],time_weights)]
speed = blockSize*(sum(time_weights)) / sum(time_diff)
To make it more stable and not react when download spikes up or down you could add this as well:
_count = n if count > n else count
time_history = time_history[-_count:]
time_history.remove(min(time_history))
time_history.remove(max(time_history))
time_weights = list(range(1, len(time_history))) #just a simple linear weights
time_diff = [(i-j)*k for i,j in zip(time_history[1:], time_history[:-1],time_weights)]
speed = blockSize*(sum(time_weights)) / sum(time_diff)
This will remove highest and lowest spike in time_history which will make number displayed more stable. If you want to be picky, you probably could generate weights before removal, and then filter mapped values using time_diff.index(min(time_diff)).
Also using non-linear function (like sqrt()) for weights generation will give you better results. Oh and as I said in comments: adding statistical methods to filter times should be marginally better, but I suspect it's not worth overhead it would add.

Related

Getting the results simultaneously from different sources

python newbie here!
I am trying to write a small program for myself which is basically getting price information from different exchanges and comparing them, so far it is working great but honestly, I want to make it better in terms of performance and efficiency.
What I mean by efficiency is my program is checking prices step by step and printing the results. My question is can I convert it to checking the prices simultaneously from different exchanges and print them all at the same time?
Below is the part of the code that I wrote:
#Novadax
symbol_novadax = coin_list[i] + "_USDT"
response_novadax = requests.get('https://api.novadax.com/v1/market/ticker?symbol=' + symbol_novadax)
novadax_dic = json.loads(response_novadax.content)
try:
if "ask" in novadax_dic["data"]:
novadax_bid_price = float(novadax_dic["data"]["bid"])
print("novadax_bid_price "+str(novadax_bid_price))
novadax_ask_price = float(novadax_dic["data"]['ask'])
print("novadax_ask_price " + str(novadax_ask_price))
if (max_bid_val < novadax_bid_price):
max_bid_val = novadax_bid_price
max_bid_place = "novadax"
if (min_ask_val > novadax_ask_price):
min_ask_val = novadax_ask_price
min_ask_place = "novadax"
except:
print(coin_list[i] + " not in novadax")
if is_run == False:
telegram_send.send(messages=["False novadax"], parse_mode=None)
break
#ZT
symbol_zt = coin_list[i] + "_USDT"
response_zt = requests.get('https://www.ztb.im/api/v1/tickers')
zt_dic = json.loads(response_zt.content)
# print(next(item for item in zt_dic["ticker"] if item["symbol"] == symbol_zt))
try:
if "buy" in next(item for item in zt_dic["ticker"] if item["symbol"] == symbol_zt):
zt_bid_price = float(next(item for item in zt_dic["ticker"] if item["symbol"] == symbol_zt)["buy"])
print("zt_bid_price "+str(zt_bid_price))
zt_ask_price = float(next(item for item in zt_dic["ticker"] if item["symbol"] == symbol_zt)['sell'])
print("zt_ask_price " + str(zt_ask_price))
if (max_bid_val < zt_bid_price):
max_bid_val = zt_bid_price
max_bid_place = "zt"
if (min_ask_val > zt_ask_price):
min_ask_val = zt_ask_price
min_ask_place = "zt"
except:
print(coin_list[i] + " not in zt")
if is_run == False:
telegram_send.send(messages=["False zt"], parse_mode=None)
break
my input is something like that:
zt_bid_price = 0.12
zt_ask_price = 0.14
novadax_bid_price = 0.13
novadax_ask_price= 0.14
To be more clear, I am not getting those results at the same time. I mean it prints in order and I am planning to add more exchanges in the future which means if I decide to print everything at the end of the code that will give me slightly old data. Does anyone have any idea about how can I solve this problem?
Thanks in advance!
Depending on your implementation you can use multi-processing to keep each ticker going. However there is overhead for each process which depending on your system, may or may not lag. You could have all the processes(tickers) running and either on a signal or on a time interval, have each simultaneously poll their source (with time stamp) and return their data.
There is a learning curve. The line below will get you started.
from multiprocessing import Pool

Syncing gifs to tempo of music results in shorter duration than expected

I'm attempting to sync a gif to the beat of music playing on Spotify, but I'm encountering a speed issue while doing so. I must be going crazy because I can't find a reason as to why this isn't working. Below is my methodology:
Take initial BPM (ex: 150) and find the Beats/Second (BPS)
BPS = BPM / 60
Find the Seconds/Beat (SPB) from the Beats/Second (BPS)
SPB = 1 / BPS
Find the Seconds/Loop (SPL) by multiplying by the number of Beats/Loop (BPL) of the .gif
SPL = SPB * BPL
Convert Seconds/Loop (SPL) to Milliseconds/Loop (MSPL)
MSPL = SPL * 1000
Divide the Milliseconds/Loop (MSPL) by the number of frames (num_frames) in the .gif to find the time required for one frame (frame_time), rounding to the nearest even number since .gif frame times are only accurate up to whole milliseconds
frame_time = MSPL / num_frames
Add up the total frame times (actual_duration) and loop through frames adding or subtracting 1 millisecond until actual_duration matches ceil(MSPL) (always prioritizing longer actual duration over shorter duration)
difference = MSPL - actual_duration
if not math.isclose(0, difference):
# Add the difference and always prioritize longer duration compared to real duration value
correction = int(math.ceil(difference))
for i in range(0, abs(correction)):
# Add/subtract corrections as necessary to get actual duration as close as possible to calculated duration
frame_times[i % len(frame_times)] += math.copysign(1, correction)
Now from this, the actual Milliseconds/Loop of the gif should always be equal to MSLP or greater than MSLP. However, when I save the .gif with the specified frame times, if the correction value is not 0 then the .gif always plays at a faster speed than expected. I have noticed that when using other services online that provide the same "sync gif to music" functionality, this is also the case; so it's not just me going crazy I think.
Below is the actual code used to get frame times:
def get_frame_times(tempo: float, beats_per_loop: int, num_frames: int):
# Calculate the number of seconds per beat in order to get number of milliseconds per loop
beats_per_sec = tempo / 60
secs_per_beat = 1 / beats_per_sec
duration = math.ceil(secs_per_beat * beats_per_loop * 1000)
frame_times = []
# Try to make frame times as even as possible by dividing duration by number of frames and rounding
actual_duration = 0
for _ in range(0, num_frames):
# Rounding method: Bankers Rounding (round to the nearest even number)
frame_time = round(duration / num_frames)
frame_times.append(frame_time)
actual_duration += frame_time
# Add the difference and always prioritize longer duration compared to real duration value
difference = duration - actual_duration
if not math.isclose(0, difference):
correction = int(math.ceil(difference))
for i in range(0, abs(correction)):
# Add/subtract corrections as necessary to get actual duration as close as possible to calculated duration
frame_times[i % len(frame_times)] += math.copysign(1, correction)
return frame_times
I'm saving the gif by using PIL (Pillow)'s Image module:
frame_times = get_frame_times(tempo, beats_per_loop, num_frames)
frames = []
for i in range(0, num_frames):
# Frames are appended to frames list here
# disposal=2 used since the frames may be transparent
frames[0].save(
output_file,
save_all=True,
append_images=frames[1:],
loop=0,
duration=frame_times,
disposal=2)
Is there anything that I am doing wrong here? I can't seem to find out why this isn't working and why the actual duration of the gif is much shorter than the specified frame times. It makes me feel slightly better that other sites/services that provide this functionality end up with the same results, but at the same time I feel like this should definitely be possible.
Solved! I don't know if this is a limitation of the .gif format or if it's a limitation of the Image module in PIL, but it appears that frame times can only be accurate up to multiples of 10 milliseconds. Upon inspecting the actual frame times of the modified image, they were being floored to the nearest multiple of 10 resulting in an overall faster playback speed than expected.
To fix this, I modified the code to choose frame times in increments of 10 (again prioritizing a longer actual duration if necessary) and dispersing the frame time adjustments as evenly as possible throughout the list:
def get_frame_times(tempo: float, beats_per_loop: int, num_frames: int):
# Calculate the number of seconds per beat in order to get number of milliseconds per loop
beats_per_sec = tempo / 60
secs_per_beat = 1 / beats_per_sec
duration = round_tens(secs_per_beat * beats_per_loop * 1000)
frame_times = []
# Try to make frame times as even as possible by dividing duration by number of frames
actual_duration = 0
for _ in range(0, num_frames):
frame_time = round_tens(duration / num_frames)
frame_times.append(frame_time)
actual_duration += frame_time
# Adjust frame times to match as closely as possible to the actual duration, rounded to multiple of 10
# Keep track of which indexes we've added to already and attempt to split corrections as evenly as possible
# throughout the frame times
correction = duration - actual_duration
adjust_val = int(math.copysign(10, correction))
i = 0
seen_i = {i}
while actual_duration != duration:
frame_times[i % num_frames] += adjust_val
actual_duration += adjust_val
if i not in seen_i:
seen_i.add(i)
elif len(seen_i) == num_frames:
seen_i.clear()
i = 0
else:
i += 1
i += num_frames // abs(correction // 10)
return frame_times

Efficiency when printing progress updates, print x vs if x%y==0: print x

I am running an algorithm which reads an excel document by rows, and pushes the rows to a SQL Server, using Python. I would like to print some sort of progression through the loop. I can think of two very simple options and I would like to know which is more lightweight and why.
Option A:
for x in xrange(1, sheet.nrows):
print x
cur.execute() # pushes to sql
Option B:
for x in xrange(1, sheet.nrows):
if x % some_check_progress_value == 0:
print x
cur.execute() # pushes to sql
I have a feeling that the second one would be more efficient but only for larger scale programs. Is there any way to calculate/determine this?
I'm a newbie, so I can't comment. An "answer" might be overkill, but it's all I can do for now.
My favorite thing for this is tqdm. It's minimally invasive, both code-wise and output-wise, and it gets the job done.
I am one of the developers of tqdm, a Python progress bar that tries to be as efficient as possible while providing as many automated features as possible.
The biggest performance sink we had was indeed I/O: printing to the console/file/whatever.
But if your loop is tight (more than 100 iterations/second), then it's useless to print every update, you'd just as well print just 1/10 of the updates and the user would see no difference, while your bar would be 10 times less overhead (faster).
To fix that, at first we added a mininterval parameter which updated the display only every x seconds (which is by default 0.1 seconds, the human eye cannot really see anything faster than that). Something like that:
import time
def my_bar(iterator, mininterval=0.1)
counter = 0
last_print_t = 0
for item in iterator:
if (time.time() - last_print_t) >= mininterval:
last_print_t = time.time()
print_your_bar_update(counter)
counter += 1
This will mostly fix your issue as your bar will always have a constant display overhead which will be more and more negligible as you have bigger iterators.
If you want to go further in the optimization, time.time() is also an I/O operation and thus has a cost greater than simple Python statements. To avoid that, you want to minimize the calls you do to time.time() by introducing another variable: miniters, which is the minimum number of iterations you want to skip before even checking the time:
import time
def my_bar(iterator, mininterval=0.1, miniters=10)
counter = 0
last_print_t = 0
last_print_counter = 0
for item in iterator:
if (counter - last_print_counter) >= miniters:
if (time.time() - last_print_t) >= mininterval:
last_print_t = time.time()
last_print_counter = counter
print_your_bar_update(counter)
counter += 1
You can see that miniters is similar to your Option B modulus solution, but it's better fitted as an added layer over time because time is more easily configured.
With these two parameters, you can manually finetune your progress bar to make it the most efficient possible for your loop.
However, miniters (or modulus) is tricky to get to work generally for everyone without manual finetuning, you need to make good assumptions and clever tricks to automate this finetuning. This is one of the major ongoing work we are doing on tqdm. Basically, what we do is that we try to calculate miniters to equal mininterval, so that time checking isn't even needed anymore. This automagic setting kicks in after mininterval gets triggered, something like that:
from __future__ import division
import time
def my_bar(iterator, mininterval=0.1, miniters=10, dynamic_miniters=True)
counter = 0
last_print_t = 0
last_print_counter = 0
for item in iterator:
if (counter - last_print_counter) >= miniters:
cur_time = time.time()
if (cur_time - last_print_t) >= mininterval:
if dynamic_miniters:
# Simple rule of three
delta_it = counter - last_print_counter
delta_t = cur_time - last_print_t
miniters = delta_it * mininterval / delta_t
last_print_t = cur_time
last_print_counter = counter
print_your_bar_update(counter)
counter += 1
There are various ways to compute miniters automatically, but usually you want to update it to match mininterval.
If you are interested in digging more, you can check the dynamic_miniters internal parameters, maxinterval and an experimental monitoring thread of the tqdm project.
Using the modulus check (counter % N == 0) is almost free compared print and a great solution if you run a high frequency iteration (log a lot).
Specially if you does not need to print for each iteration but want some feedback along the way.

To/From Paging in Python

All,
This may be a pretty novice question but I am stuck on how to do this in Python. What I need to do is, set the to and from params when requesting data from Panaramio.
http://www.panoramio.com/map/get_panoramas.php?set=public&from=0&to=100&minx=-180&miny=-90&maxx=180&maxy=90&size=medium&mapfilter=true
Panoramio only allows you to return 100 records at a time so I need to build out the url string to show the advancement of the sets of 100. eg. 101-200, 201-300, etc. Is there an example anywhere that will show me how to do this type of paging using Python?
Thanks,
Adam
UPDATE:
The following example seems to do what I want it to do. Now I have to figure out how to do the actual iteration from 101-200, 201-300, etc...From there I can take those values and build out my query string. Does this make sense?
def counter(low, high):
current = low
while current <= high:
yield current
current += 100
if __name__ == '__main__':
for c in counter(100, 200):
print c
UPDATE #2: I was making it harder than it should have been
def counter(low, high):
while low <= high:
yield low, high
low += 100
high += 100
for i in counter(1, 100):
print i
for number in range(1, 301, 100):
low = number
high = low + 100

Python Beginner: Selective Printing in loops

I'm a very new python user (had only a little prior experience with html/javascript as far as programming goes), and was trying to find some ways to output only intermittent numbers in my loop for a basic bicycle racing simulation (10,000 lines of biker positions would be pretty excessive :P).
I tried in this loop several 'reasonable' ways to communicate a condition where a floating point number equals its integer floor (int, floor division) to print out every 100 iterations or so:
for i in range (0,10000):
i = i + 1
t = t + t_step #t is initialized at 0 while t_step is set at .01
acceleration_rider1 = (power_rider1 / (70 * velocity_rider1)) - (force_drag1 / 70)
velocity_rider1 = velocity_rider1 + (acceleration_rider1 * t_step)
position_rider1 = position_rider1 + (velocity_rider1 * t_step)
force_drag1 = area_rider1 * (velocity_rider1 ** 2)
acceleration_rider2 = (power_rider2 / (70 * velocity_rider1)) - (force_drag2 / 70)
velocity_rider2 = velocity_rider2 + (acceleration_rider2 * t_step)
position_rider2 = position_rider2 + (velocity_rider2 * t_step)
force_drag2 = area_rider1 * (velocity_rider2 ** 2)
if t == int(t): #TRIED t == t // 1 AND OTHER VARIANTS THAT DON'T WORK HERE:(
print t, "biker 1", position_rider1, "m", "\t", "biker 2", position_rider2, "m"
The for loop auto increments for you, so you don't need to use i = i + 1.
You don't need t, just use % (modulo) operator to find multiples of a number.
# Log every 1000 lines.
LOG_EVERY_N = 1000
for i in range(1000):
... # calculations with i
if (i % LOG_EVERY_N) == 0:
print "logging: ..."
To print out every 100 iterations, I'd suggest
if i % 100 == 0: ...
If you'd rather not print the very first time, then maybe
if i and i % 100 == 0: ...
(as another answer noted, the i = i + 1 is supererogatory given that i is the control variable of the for loop anyway -- it's not particularly damaging though, just somewhat superfluous, and is not really relevant to the issue of why your if doesn't trigger).
While basing the condition on t may seem appealing, t == int(t) is unlikely to work unless the t_step is a multiple of 1.0 / 2**N for some integer N -- fractions cannot be represented exactly in a float unless this condition holds, because floats use a binary base. (You could use decimal.Decimal, but that would seriously impact the speed of your computation, since float computation are directly supported by your machine's hardware, while decimal computations are not).
The other answers suggest that you use the integer variable i instead. That also works, and is the solution I would recommend. This answer is mostly for educational value.
I think it's a roundoff error that is biting you. Floating point numbers can often not be represented exactly, so adding .01 to t for 100 times is not guaranteed to result in t == 1:
>>> sum([.01]*100)
1.0000000000000007
So when you compare to an actual integer number, you need to build in a small tolerance margin. Something like this should work:
if abs(t - int(t)) < 1e-6:
print t, "biker 1", position_rider1, "m", "\t", "biker 2", position_rider2, "m"
You can use python library called tqdm (tqdm derives from the Arabic word taqaddum (تقدّم) which can mean "progress) for showing progress and use write() method from tqdm to print intermittent log statements as answered by #Stephen
Why using tqdm is useful in your case?
Shows compact & fancy progress bar with very minimal code change.
Does not fill your console with thousands of log statement and yet shows accurate iteration progress of your for loop.
Caveats:
Can not use logging library as it writes output stdout only. Though you can redirect it to logfile very easily.
Adds little performance overhead.
Code
from tqdm import tqdm
from time import sleep
# Log every 100 lines.
LOG_EVERY_N = 100
for i in tqdm(range(1,1000)):
if i%LOG_EVERY_N == 0:
tqdm.write(f"loggig : {i}")
sleep(0.5)
How to install ?
pip install tqdm
Sample GIF that shows console output

Categories