To/From Paging in Python - python

All,
This may be a pretty novice question but I am stuck on how to do this in Python. What I need to do is, set the to and from params when requesting data from Panaramio.
http://www.panoramio.com/map/get_panoramas.php?set=public&from=0&to=100&minx=-180&miny=-90&maxx=180&maxy=90&size=medium&mapfilter=true
Panoramio only allows you to return 100 records at a time so I need to build out the url string to show the advancement of the sets of 100. eg. 101-200, 201-300, etc. Is there an example anywhere that will show me how to do this type of paging using Python?
Thanks,
Adam
UPDATE:
The following example seems to do what I want it to do. Now I have to figure out how to do the actual iteration from 101-200, 201-300, etc...From there I can take those values and build out my query string. Does this make sense?
def counter(low, high):
current = low
while current <= high:
yield current
current += 100
if __name__ == '__main__':
for c in counter(100, 200):
print c
UPDATE #2: I was making it harder than it should have been
def counter(low, high):
while low <= high:
yield low, high
low += 100
high += 100
for i in counter(1, 100):
print i

for number in range(1, 301, 100):
low = number
high = low + 100

Related

Function returning different result despite the same inputs in Python

Here is my function that uses the Poloniex Exchange API. It gets a dict of asks (tuples of price and amount) and then calculates the total amount of BTC that would be obtained using a given spend.
But running the function several times returns different amounts despite the dict of asks and spend remaining the same. This problem should be replicable by printing "asks" (defined below) and the function result several times.
def findBuyAmount(spend):
#getOrderBook
URL = "https://poloniex.com/public?command=returnOrderBook&currencyPair=USDT_BTC&depth=20"
#request the bids and asks (returns nested dict)
r_ab = requests.get(url = URL)
# extracting data in json format -> returns a dict in this case!
ab_data = r_ab.json()
asks = ab_data.get('asks',[])
#convert strings into decimals
asks=[[float(elem[0]), elem[1]] for elem in asks]
amount=0
for elem in asks: #each elem is a tuple of price and amount
if spend > 0:
if elem[1]*elem[0] > spend: #check if the ask exceeds volume of our spend
amount = amount+((elem[1]/elem[0])*spend) #BTC that would be obtained using our spend at this price
spend = 0 #spend has been used entirely, leading to a loop break
if elem[1]*elem[0] < spend: #check if the spend exceeds the current ask
amount = amount + elem[1] #BTC that would be obtained using some of our spend at this price
spend = spend - elem[1]*elem[0] #remainder
else:
break
return amount
If the first ask in the asks dict was [51508.93591717, 0.62723766] and spend was 1000, I would expect amount to equal (0.62723766/51508.93591717) * 1000 but I get all kinds of varied outputs instead. How can I fix this?
You get all kinds of varied outputs because you're fetching new data every time you run the function. Split the fetch and the calculation into separate functions so you can test them independently. You can also make the logic much clearer by naming your variables properly:
import requests
def get_asks(url="https://poloniex.com/public?command=returnOrderBook&currencyPair=USDT_BTC&depth=20"):
response = requests.get(url=url)
ab_data = response.json()
asks = ab_data.get('asks', [])
#convert strings into decimals
return [(float(price), qty) for price, qty in asks]
def find_buy_amount(spend, asks):
amount = 0
for price, qty in asks:
if spend > 0:
ask_value = price * qty
if ask_value >= spend:
amount += spend / price
spend = 0
else:
amount += qty
spend -= ask_value
else:
break
return amount
asks = get_asks()
print("Asks:", asks)
print("Buy: ", find_buy_amount(1000, asks))
Your math was wrong for when the ask value exceeds remaining spend; the quantity on the order book doesn't matter at that point, so the amount you can buy is just spend / price.
With the functions split up, you can also run find_buy_amount any number of times with the same order book and see that the result is, in fact, always the same.
The problem is in your "we don't have enough money" path. In that case, the amount you can buy does not depend on the amount that was offered.
if elem[1]*elem[0] > spend:
amount += spend/elem[0]

Creating a monte carlo simulation from a loop python

I am attempting to calculate the probablility of a for loop returning a value lower than 10% of the initial value input using a monte-carlo simulation.
for i in range(0, period):
if i < 1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock_initial * (1+r)
elif i >=1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock * (1+r)
print(stock)
This is the for-loop that I wish to run a large number of times (200000 as a rough number) and calculate the probability that:
stock < stock_initial * .9
I've found examples that define their initial loop as a function and then will use that function in the loop, so I have tried to define a function from my loop:
def stock_value(period):
for i in range(0, period):
if i < 1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock_initial * (1+r)
elif i >=1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock * (1+r)
return(stock)
This produces values for 'stock' that don't seem to fit the same range as before being defined as a function.
using this code I tried to run a monte-carlo simulation:
# code to implement monte-carlo simulation
number_of_loops = 200 # lower number to run quicker
for stock_calc in range(1,period+1):
moneyneeded = 0
for i in range(number_of_loops):
stock=stock_value(stock_calc)
if stock < stock_initial * 0.90:
moneyneeded += 1
#print(stock) this is to check the value of stock being produced.
stock_percentage = float(moneyneeded) / number_of_loops
print(stock_percentage)
but this returns no results outside the 10% range even when looped 200000 times, it seems the range/spread of results gets hugely reduced in my defined function somehow.
Can anyone see a problem in my defined function 'stock_value' or can see a way of implementing a monte-carlo simulation in a way I've not come across?
My full code for reference:
#import all modules required
import numpy as np # using different notation for easier writting
import scipy as sp
import matplotlib.pyplot as plt
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#collect variables provided by the user
stock_initial = float(12000) # can be input for variable price of stock initially.
period = int(63) # can be edited to an input() command for variable periods.
mrgn_dec = .10 # decimal value of 10%, can be manipulated to produce a 10% increase/decrease
addmoremoney = stock_initial*(1-mrgn_dec)
rtn_annual = np.repeat(np.arange(0.00,0.15,0.05), 31)
sig_annual = np.repeat(np.arange(0.01,0.31,0.01), 3) #use .31 as python doesn't include the upper range value.
#functions for variables of daily return and risk.
rtn_daily = float((1/252))*rtn_annual
sig_daily = float((1/(np.sqrt(252))))*sig_annual
D=np.random.normal(size=period) # unsure of range to use for standard distribution
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# returns the final value of stock after 63rd day(possibly?)
def stock_value(period):
for i in range(0, period):
if i < 1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock_initial * (1+r)
elif i >=1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock * (1+r)
return(stock)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# code to implement monte-carlo simulation
number_of_loops = 20000
for stock_calc in range(1,period+1):
moneyneeded = 0
for i in range(number_of_loops):
stock=stock_value(stock_calc)
if stock < stock_initial * 0.90:
moneyneeded += 1
print(stock)
stock_percentage = float(moneyneeded) / number_of_loops
print(stock_percentage)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Posting an answer as I don't have the points to comment. Some queries about your code - going through these might help you find an answer:
Why have you defined rtn_annual as an array, np.repeat(np.arange(0.00,0.15,0.05), 31)? Since it just repeats the values [0.0, 0.05, 0.1], why not define it as a function?:
def rtn_annual(i):
vals = [0.0, 0.05, 0.1]
return vals[i % 3]
Likewise for sig_annual, rtn_daily, and sig_daily - the contents of these are all straightforward functions of the index, so I'm not sure what the advantage could be of making them arrays.
What does D actually represent? As you've defined it, it's a random variable with mean of 0.0, and standard deviation 1.0. So around 95% of the values in D will be in the range (-2.0, +2.0) - is that what you expect?
Have you tested your stock_value() function even on small periods (e.g. from 0 to a few days) to ensure it's doing what you think it should? It's not clear from your question whether you've verified that it's ever doing the right thing, for any input, and your comment "...(possibly?)" doesn't sound very confident.
Spoiler alert - it almost certainly doesn't. In the function stock_value, your return statement is within the for loop. It will get executed the first time round, when i = 0, and the loop will never get any further than that. This would be the chief reason why the function is giving different results to the loop.
Also, where you say "returning a value lower than 10% of...", I assume you mean "returning a value at least 10% lower than...", since that's what your probability stock < stock_initial * .9 is calculating.
I hope this helps. You may want to step through your code with a debugger in your preferred IDE (idle, or thonny, or eclipse, whatever it may be) to see what your code is actually doing.

Efficiency when printing progress updates, print x vs if x%y==0: print x

I am running an algorithm which reads an excel document by rows, and pushes the rows to a SQL Server, using Python. I would like to print some sort of progression through the loop. I can think of two very simple options and I would like to know which is more lightweight and why.
Option A:
for x in xrange(1, sheet.nrows):
print x
cur.execute() # pushes to sql
Option B:
for x in xrange(1, sheet.nrows):
if x % some_check_progress_value == 0:
print x
cur.execute() # pushes to sql
I have a feeling that the second one would be more efficient but only for larger scale programs. Is there any way to calculate/determine this?
I'm a newbie, so I can't comment. An "answer" might be overkill, but it's all I can do for now.
My favorite thing for this is tqdm. It's minimally invasive, both code-wise and output-wise, and it gets the job done.
I am one of the developers of tqdm, a Python progress bar that tries to be as efficient as possible while providing as many automated features as possible.
The biggest performance sink we had was indeed I/O: printing to the console/file/whatever.
But if your loop is tight (more than 100 iterations/second), then it's useless to print every update, you'd just as well print just 1/10 of the updates and the user would see no difference, while your bar would be 10 times less overhead (faster).
To fix that, at first we added a mininterval parameter which updated the display only every x seconds (which is by default 0.1 seconds, the human eye cannot really see anything faster than that). Something like that:
import time
def my_bar(iterator, mininterval=0.1)
counter = 0
last_print_t = 0
for item in iterator:
if (time.time() - last_print_t) >= mininterval:
last_print_t = time.time()
print_your_bar_update(counter)
counter += 1
This will mostly fix your issue as your bar will always have a constant display overhead which will be more and more negligible as you have bigger iterators.
If you want to go further in the optimization, time.time() is also an I/O operation and thus has a cost greater than simple Python statements. To avoid that, you want to minimize the calls you do to time.time() by introducing another variable: miniters, which is the minimum number of iterations you want to skip before even checking the time:
import time
def my_bar(iterator, mininterval=0.1, miniters=10)
counter = 0
last_print_t = 0
last_print_counter = 0
for item in iterator:
if (counter - last_print_counter) >= miniters:
if (time.time() - last_print_t) >= mininterval:
last_print_t = time.time()
last_print_counter = counter
print_your_bar_update(counter)
counter += 1
You can see that miniters is similar to your Option B modulus solution, but it's better fitted as an added layer over time because time is more easily configured.
With these two parameters, you can manually finetune your progress bar to make it the most efficient possible for your loop.
However, miniters (or modulus) is tricky to get to work generally for everyone without manual finetuning, you need to make good assumptions and clever tricks to automate this finetuning. This is one of the major ongoing work we are doing on tqdm. Basically, what we do is that we try to calculate miniters to equal mininterval, so that time checking isn't even needed anymore. This automagic setting kicks in after mininterval gets triggered, something like that:
from __future__ import division
import time
def my_bar(iterator, mininterval=0.1, miniters=10, dynamic_miniters=True)
counter = 0
last_print_t = 0
last_print_counter = 0
for item in iterator:
if (counter - last_print_counter) >= miniters:
cur_time = time.time()
if (cur_time - last_print_t) >= mininterval:
if dynamic_miniters:
# Simple rule of three
delta_it = counter - last_print_counter
delta_t = cur_time - last_print_t
miniters = delta_it * mininterval / delta_t
last_print_t = cur_time
last_print_counter = counter
print_your_bar_update(counter)
counter += 1
There are various ways to compute miniters automatically, but usually you want to update it to match mininterval.
If you are interested in digging more, you can check the dynamic_miniters internal parameters, maxinterval and an experimental monitoring thread of the tqdm project.
Using the modulus check (counter % N == 0) is almost free compared print and a great solution if you run a high frequency iteration (log a lot).
Specially if you does not need to print for each iteration but want some feedback along the way.

Find the speed of download for a progressbar

I'm writing a script to download videos from a website. I've added a report hook to get download progress. So, far it shows the percentage and size of the downloaded data. I thought it'd be interesting to add download speed and eta.
Problem is, if I use a simple speed = chunk_size/time the speeds shown are accurate enough but jump around like crazy. So, I've used the history of time taken to download individual chunks. Something like, speed = chunk_size*n/sum(n_time_history).
Now it shows a stable download speed, but it is most certainly wrong because its value is in a few bits/s, while the downloaded file visibly grows at a faster pace.
Can somebody tell me where I'm going wrong?
Here's my code.
def dlProgress(count, blockSize, totalSize):
global init_count
global time_history
try:
time_history.append(time.monotonic())
except NameError:
time_history = [time.monotonic()]
try:
init_count
except NameError:
init_count = count
percent = count*blockSize*100/totalSize
dl, dlu = unitsize(count*blockSize) #returns size in kB, MB, GB, etc.
tdl, tdlu = unitsize(totalSize)
count -= init_count #because continuation of partial downloads is supported
if count > 0:
n = 5 #length of time history to consider
_count = n if count > n else count
time_history = time_history[-_count:]
time_diff = [i-j for i,j in zip(time_history[1:],time_history[:-1])]
speed = blockSize*_count / sum(time_diff)
else: speed = 0
n = int(percent//4)
try:
eta = format_time((totalSize-blockSize*(count+1))//speed)
except:
eta = '>1 day'
speed, speedu = unitsize(speed, True) #returns speed in B/s, kB/s, MB/s, etc.
sys.stdout.write("\r" + percent + "% |" + "#"*n + " "*(25-n) + "| " + dl + dlu + "/" + tdl + tdlu + speed + speedu + eta)
sys.stdout.flush()
Edit:
Corrected the logic. Download speed shown is now much better.
As I increase the length of history used to calculate the speed, the stability increases but sudden changes in speed (if download stops, etc.) aren't shown.
How do I make it stable, yet sensitive to large changes?
I realize the question is now more math oriented, but it'd be great if somebody could help me out or point me in the right direction.
Also, please do tell me if there's a more efficient way to accomplish this.
_count = n if count > n else count
time_history = time_history[-_count:]
time_weights = list(range(1,len(time_history))) #just a simple linear weights
time_diff = [(i-j)*k for i,j in zip(time_history[1:], time_history[:-1],time_weights)]
speed = blockSize*(sum(time_weights)) / sum(time_diff)
To make it more stable and not react when download spikes up or down you could add this as well:
_count = n if count > n else count
time_history = time_history[-_count:]
time_history.remove(min(time_history))
time_history.remove(max(time_history))
time_weights = list(range(1, len(time_history))) #just a simple linear weights
time_diff = [(i-j)*k for i,j in zip(time_history[1:], time_history[:-1],time_weights)]
speed = blockSize*(sum(time_weights)) / sum(time_diff)
This will remove highest and lowest spike in time_history which will make number displayed more stable. If you want to be picky, you probably could generate weights before removal, and then filter mapped values using time_diff.index(min(time_diff)).
Also using non-linear function (like sqrt()) for weights generation will give you better results. Oh and as I said in comments: adding statistical methods to filter times should be marginally better, but I suspect it's not worth overhead it would add.

SQLalchemy performance when iterating queries millions of time

I'm writing a disease simulation in Python, using SQLalchemy, but I'm hitting some performance issues when running queries on a SQLite file I create earlier in the simulation.
The code is below. There are more queries in the outer for loop, but what I've posted is what slowed it down to a crawl. There are 365 days, about 76,200 mosquitos, and each mosquito makes 5 contacts per day, bringing it to about 381,000 queries per simulated day, and 27,813,000 through the entire simulation (and that's just for the mosquitos). It goes along at about 2 days / hour which, if I'm calculating correctly, is about 212 queries per second.
Do you see any issues that could be fixed that could speed things up? I've experimented with indexing the fields which are used in selection but that didn't seem to change anything. If you need to see the full code, it's available here on GitHub. The function begins on line 399.
Thanks so much, in advance.
Run mosquito-human interactions
for d in range(days_to_run):
... much more code before this, but it ran reasonably fast
vectors = session.query(Vectors).yield_per(1000) #grab each vector..
for m in vectors:
i = 0
while i < biting_rate:
pid = random.randint(1, number_humans) # Pick a human to bite
contact = session.query(Humans).filter(Humans.id == pid).first() #Select the randomly-chosen human from SQLite table
if contact: # If the random id equals an ID in the table
if contact.susceptible == 'True' and m.infected == 'True' and random.uniform(0, 1) < beta: # if the human is susceptible and mosquito is infected, infect the human
contact.susceptible = 'False'
contact.exposed = 'True'
elif contact.infected == 'True' and m.susceptible == 'True': # otherwise, if the mosquito is susceptible and the human is infected, infect the mosquito
m.susceptible = 'False'
m.infected = 'True'
nInfectedVectors += 1
nSuscVectors += 1
i += 1
session.commit()

Categories