Creating a monte carlo simulation from a loop python - python

I am attempting to calculate the probablility of a for loop returning a value lower than 10% of the initial value input using a monte-carlo simulation.
for i in range(0, period):
if i < 1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock_initial * (1+r)
elif i >=1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock * (1+r)
print(stock)
This is the for-loop that I wish to run a large number of times (200000 as a rough number) and calculate the probability that:
stock < stock_initial * .9
I've found examples that define their initial loop as a function and then will use that function in the loop, so I have tried to define a function from my loop:
def stock_value(period):
for i in range(0, period):
if i < 1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock_initial * (1+r)
elif i >=1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock * (1+r)
return(stock)
This produces values for 'stock' that don't seem to fit the same range as before being defined as a function.
using this code I tried to run a monte-carlo simulation:
# code to implement monte-carlo simulation
number_of_loops = 200 # lower number to run quicker
for stock_calc in range(1,period+1):
moneyneeded = 0
for i in range(number_of_loops):
stock=stock_value(stock_calc)
if stock < stock_initial * 0.90:
moneyneeded += 1
#print(stock) this is to check the value of stock being produced.
stock_percentage = float(moneyneeded) / number_of_loops
print(stock_percentage)
but this returns no results outside the 10% range even when looped 200000 times, it seems the range/spread of results gets hugely reduced in my defined function somehow.
Can anyone see a problem in my defined function 'stock_value' or can see a way of implementing a monte-carlo simulation in a way I've not come across?
My full code for reference:
#import all modules required
import numpy as np # using different notation for easier writting
import scipy as sp
import matplotlib.pyplot as plt
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#collect variables provided by the user
stock_initial = float(12000) # can be input for variable price of stock initially.
period = int(63) # can be edited to an input() command for variable periods.
mrgn_dec = .10 # decimal value of 10%, can be manipulated to produce a 10% increase/decrease
addmoremoney = stock_initial*(1-mrgn_dec)
rtn_annual = np.repeat(np.arange(0.00,0.15,0.05), 31)
sig_annual = np.repeat(np.arange(0.01,0.31,0.01), 3) #use .31 as python doesn't include the upper range value.
#functions for variables of daily return and risk.
rtn_daily = float((1/252))*rtn_annual
sig_daily = float((1/(np.sqrt(252))))*sig_annual
D=np.random.normal(size=period) # unsure of range to use for standard distribution
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# returns the final value of stock after 63rd day(possibly?)
def stock_value(period):
for i in range(0, period):
if i < 1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock_initial * (1+r)
elif i >=1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock * (1+r)
return(stock)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# code to implement monte-carlo simulation
number_of_loops = 20000
for stock_calc in range(1,period+1):
moneyneeded = 0
for i in range(number_of_loops):
stock=stock_value(stock_calc)
if stock < stock_initial * 0.90:
moneyneeded += 1
print(stock)
stock_percentage = float(moneyneeded) / number_of_loops
print(stock_percentage)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Posting an answer as I don't have the points to comment. Some queries about your code - going through these might help you find an answer:
Why have you defined rtn_annual as an array, np.repeat(np.arange(0.00,0.15,0.05), 31)? Since it just repeats the values [0.0, 0.05, 0.1], why not define it as a function?:
def rtn_annual(i):
vals = [0.0, 0.05, 0.1]
return vals[i % 3]
Likewise for sig_annual, rtn_daily, and sig_daily - the contents of these are all straightforward functions of the index, so I'm not sure what the advantage could be of making them arrays.
What does D actually represent? As you've defined it, it's a random variable with mean of 0.0, and standard deviation 1.0. So around 95% of the values in D will be in the range (-2.0, +2.0) - is that what you expect?
Have you tested your stock_value() function even on small periods (e.g. from 0 to a few days) to ensure it's doing what you think it should? It's not clear from your question whether you've verified that it's ever doing the right thing, for any input, and your comment "...(possibly?)" doesn't sound very confident.
Spoiler alert - it almost certainly doesn't. In the function stock_value, your return statement is within the for loop. It will get executed the first time round, when i = 0, and the loop will never get any further than that. This would be the chief reason why the function is giving different results to the loop.
Also, where you say "returning a value lower than 10% of...", I assume you mean "returning a value at least 10% lower than...", since that's what your probability stock < stock_initial * .9 is calculating.
I hope this helps. You may want to step through your code with a debugger in your preferred IDE (idle, or thonny, or eclipse, whatever it may be) to see what your code is actually doing.

Related

When writing a loop to sum a bunch of things together, is it better to add them to a variable in every iteration or put in a list and sum them?

I am testing the performance of a stock trading strategy. For a basket of stocks, I want to loop through each stock, get their returns, volatility, and Sharpe ratio, then average these 3 metrics out across all the stocks.
In summing up the results from each stock, should I add them to a variable in every iteration or put them in a list and call sum()? It probably works either way, but each time there is a similar choice, I wonder which way is better.
I wrote out the non-list version:
# test the average performance on a basket of stocks
def trial_on_basket(stock_list, start, end, short_MA, long_MA):
total_returns = 0
total_volatility = 0
total_sharpe = 0
for i in stock_list:
returns, volatility, sharpe = trial(i, start, end, short_MA, long_MA) #this function tests the performance on one stock
total_returns += returns
total_volatility += volatility
total_sharpe += sharpe
average_returns = total_returns / len(stock_list)
average_volatility = total_volatility / len(stock_list)
average_sharpe = total_sharpe / len(stock_list)
return average_returns, average_volatility, average_sharpe
General suggestions welcomed on how to improve the code. Thanks!

How to break a while, when subtracting the new value from a variable from the previous one?

How to break a while, when subtracting the new value from a variable from the previous one?
The code is:
Calc = (PTO2.h - coefficients[1]) / coefficients[0]
while True:
NewP = IAPWS97(P=PH5, s=Calc)
Calc = (NewP.h - coefficients[1]) / coefficients[0]
print(NewP.h)
The results are the following:
3181.2423174700475, 3125.5329929699737, 3145.170908432667,
3138.216970209225, 3140.675480138904, 3139.805801319479,
3140.1133819014494, 3140.0045917261796, 3140.043069467109,
3140.029460245017, 3140.034273686281, 3140.032571219946,
3140.033173365131
The idea is to stop when the value does not increase anymore ie, 3140 it should be the final value.
This problem could be solved with 5 or 6 iterations.
Please check if this meets your requirements:
# Define required difference
difference = 0.1
solutions = []
# Add 1st solution or guess
# func is whatever function you are using, with the required arguments, arg.
solutions.append(func(arg))
delta = soltutions[-1]
# Check if delta is still positive and if the delta meets the requirements
while delta > 0 and delta > difference:
solutions.append(func(arg))
delta = solutions[-2] - solutions[-1]
print(f'Final result is: {solutions[-1]}')
This solution supposes that you want to finish execution if the variation starts to go negative. If the sign doesn't matter, you can use abs() in delta.

Python Code for Standard Deviation with data from SQLITE3

from math import *
import sqlite3
ages=sqlite3.connect('person.sqlite3')
def main():
ageslist=ages.execute("SELECT age from person")
#average age
for row in ageslist:
row[0]
average = (sum(row[0]))/len(row[0])
#subtracts average x from x or opposite and square, depending on n
for n in range(len(ageslist) - 1):
if numbers[n] > average:
numbers.append((ageslist[n] - average)**2)
if numbers[n] < average:
numbers.append((average - ageslist[n])**2)
#takes square rt of the sum of all these numbers and divides by n-1
Stdv = math.sqrt(sum(ageslist))/(len(ageslist)-1)
end=time()
print(Stdv)
main()
I am trying to find the standard deviation of the ages from an SQLITE3 db. However, I am getting the current error:
average = (sum(row[0]))/len(row[0])
TypeError: 'int' object is not iterable
How can I correct this?
The query sent to the database connection returns an iterator. You can only pass over that iterator once before it is flushed from memory. Here is some correction to your code to do what you are asking.
conn = sqlite3.connect('person.sqlite3')
def main():
ages_iterator = conn.execute("SELECT age from person")
# this turns the iterator into an actual list, which you need for stdev
age_list = [a[0] for a in ages_iterator]
# average age
average = (sum(age_list))/len(age_list)
# subtracts average x from x square
# because you are squaring the difference, the it does not matter if it is
# greater or less than the average
numbers = [(age-average)**2 for age in age_list]
#takes square rt of the sum of all these numbers and divides by n-1
Stdv = math.sqrt(sum(numbers))/float(len(numbers)-1)
end=time()
print(Stdv)
main()
Some quick comments in the code..
for row in ageslist:
row[0] # This statement does nothing
average = (sum(row[0]))/len(row[0]) # This statement will not have a row value to reference because your rows in ageslist will have been iterated through
When you execute ageslist=ages.execute("SELECT age from person")
your ageslist variable is now an iterable object. Once you iterate through it you can no longer reference values in it without executing the database command again.
So I believe you should have a variable that sums the age during every row iteration in the for loop and also another variable that keeps a count of the number of entries in the database. This could be done in the for loop as well. Although I'm sure there is a more "pythonic" way to accomplish this.

How to speed up Python string matching code

I have this code which computes the Longest Common Subsequence between random strings to see how accurately one can reconstruct an unknown region of the input. To get good statistics I need to iterate it many times but my current python implementation is far too slow. Even using pypy it currently takes 21 seconds to run once and I would ideally like to run it 100s of times.
#!/usr/bin/python
import random
import itertools
#test to see how many different unknowns are compatible with a set of LCS answers.
def lcs(x, y):
n = len(x)
m = len(y)
# table is the dynamic programming table
table = [list(itertools.repeat(0, n+1)) for _ in xrange(m+1)]
for i in range(n+1): # i=0,1,...,n
for j in range(m+1): # j=0,1,...,m
if i == 0 or j == 0:
table[i][j] = 0
elif x[i-1] == y[j-1]:
table[i][j] = table[i-1][j-1] + 1
else:
table[i][j] = max(table[i-1][j], table[i][j-1])
# Now, table[n, m] is the length of LCS of x and y.
return table[n][m]
def lcses(pattern, text):
return [lcs(pattern, text[i:i+2*l]) for i in xrange(0,l)]
l = 15
#Create the pattern
pattern = [random.choice('01') for i in xrange(2*l)]
#create text start and end and unknown.
start = [random.choice('01') for i in xrange(l)]
end = [random.choice('01') for i in xrange(l)]
unknown = [random.choice('01') for i in xrange(l)]
lcslist= lcses(pattern, start+unknown+end)
count = 0
for test in itertools.product('01',repeat = l):
test=list(test)
testlist = lcses(pattern, start+test+end)
if (testlist == lcslist):
count += 1
print count
I tried converting it to numpy but I must have done it badly as it actually ran more slowly. Can this code be sped up a lot somehow?
Update. Following a comment below, it would be better if lcses used a recurrence directly which gave the LCS between pattern and all sublists of text of the same length. Is it possible to modify the classic dynamic programming LCS algorithm somehow to do this?
The recurrence table table is being recomputed 15 times on every call to lcses() when it is only dependent upon m and n where m has a maximum value of 2*l and n is at most 3*l.
If your program only computed table once, it would be dynamic programming which it is not currently. A Python idiom for this would be
table = None
def use_lcs_table(m, n, l):
global table
if table is None:
table = lcs(2*l, 3*l)
return table[m][n]
Except using an class instance would be cleaner and more extensible than a global table declaration. But this gives you an idea of why its taking so much time.
Added in reply to comment:
Dynamic Programming is an optimization that requires a trade-off of extra space for less time. In your example you appear to be doing a table pre-computation in lcs() but you build the whole list on every single call and then throw it away. I don't claim to understand the algorithm you are trying to implement, but the way you have it coded, it either:
Has no recurrence relation, thus no grounds for DP optimization, or
Has a recurrence relation, the implementation of which you bungled.

Python Beginner: Selective Printing in loops

I'm a very new python user (had only a little prior experience with html/javascript as far as programming goes), and was trying to find some ways to output only intermittent numbers in my loop for a basic bicycle racing simulation (10,000 lines of biker positions would be pretty excessive :P).
I tried in this loop several 'reasonable' ways to communicate a condition where a floating point number equals its integer floor (int, floor division) to print out every 100 iterations or so:
for i in range (0,10000):
i = i + 1
t = t + t_step #t is initialized at 0 while t_step is set at .01
acceleration_rider1 = (power_rider1 / (70 * velocity_rider1)) - (force_drag1 / 70)
velocity_rider1 = velocity_rider1 + (acceleration_rider1 * t_step)
position_rider1 = position_rider1 + (velocity_rider1 * t_step)
force_drag1 = area_rider1 * (velocity_rider1 ** 2)
acceleration_rider2 = (power_rider2 / (70 * velocity_rider1)) - (force_drag2 / 70)
velocity_rider2 = velocity_rider2 + (acceleration_rider2 * t_step)
position_rider2 = position_rider2 + (velocity_rider2 * t_step)
force_drag2 = area_rider1 * (velocity_rider2 ** 2)
if t == int(t): #TRIED t == t // 1 AND OTHER VARIANTS THAT DON'T WORK HERE:(
print t, "biker 1", position_rider1, "m", "\t", "biker 2", position_rider2, "m"
The for loop auto increments for you, so you don't need to use i = i + 1.
You don't need t, just use % (modulo) operator to find multiples of a number.
# Log every 1000 lines.
LOG_EVERY_N = 1000
for i in range(1000):
... # calculations with i
if (i % LOG_EVERY_N) == 0:
print "logging: ..."
To print out every 100 iterations, I'd suggest
if i % 100 == 0: ...
If you'd rather not print the very first time, then maybe
if i and i % 100 == 0: ...
(as another answer noted, the i = i + 1 is supererogatory given that i is the control variable of the for loop anyway -- it's not particularly damaging though, just somewhat superfluous, and is not really relevant to the issue of why your if doesn't trigger).
While basing the condition on t may seem appealing, t == int(t) is unlikely to work unless the t_step is a multiple of 1.0 / 2**N for some integer N -- fractions cannot be represented exactly in a float unless this condition holds, because floats use a binary base. (You could use decimal.Decimal, but that would seriously impact the speed of your computation, since float computation are directly supported by your machine's hardware, while decimal computations are not).
The other answers suggest that you use the integer variable i instead. That also works, and is the solution I would recommend. This answer is mostly for educational value.
I think it's a roundoff error that is biting you. Floating point numbers can often not be represented exactly, so adding .01 to t for 100 times is not guaranteed to result in t == 1:
>>> sum([.01]*100)
1.0000000000000007
So when you compare to an actual integer number, you need to build in a small tolerance margin. Something like this should work:
if abs(t - int(t)) < 1e-6:
print t, "biker 1", position_rider1, "m", "\t", "biker 2", position_rider2, "m"
You can use python library called tqdm (tqdm derives from the Arabic word taqaddum (تقدّم) which can mean "progress) for showing progress and use write() method from tqdm to print intermittent log statements as answered by #Stephen
Why using tqdm is useful in your case?
Shows compact & fancy progress bar with very minimal code change.
Does not fill your console with thousands of log statement and yet shows accurate iteration progress of your for loop.
Caveats:
Can not use logging library as it writes output stdout only. Though you can redirect it to logfile very easily.
Adds little performance overhead.
Code
from tqdm import tqdm
from time import sleep
# Log every 100 lines.
LOG_EVERY_N = 100
for i in tqdm(range(1,1000)):
if i%LOG_EVERY_N == 0:
tqdm.write(f"loggig : {i}")
sleep(0.5)
How to install ?
pip install tqdm
Sample GIF that shows console output

Categories