Simulating different data with decorators in python - python

I am trying to force myself to understand how decorators work and how I might use them to run a function multiple times.
I am trying to simulate datasets with three variables, but they vary on their sample size and whether the sampling was conditional or not.
So I create the population distribution that I am sampling from:
from numpy.random import normal, negative_binomial, binomial
import pandas as pd
population_N = 100000
data = pd.DataFrame({
"Variable A": normal(0, 1, population_N),
"Variable B": negative_binomial(1, 0.5, population_N),
"Variable C": binomial(1, 0.5, population_N)
})
Rather than doing the following:
sample_20 = data.sample(20)
sample_50 = data.sample(50)
condition = data["Variable B"] != 0
sample_20_non_random = data[condition].sample(20)
sample_50_non_random = data[condition].sample(50)
I wanted to simplify things and make it more efficient. So I started with a super simple function where I can pass whether or not the sample will be random or not.
def simple_function(data_frame, type = "random"):
if (type == "random"):
sample = data_frame.sample(sample_size)
else:
condition = data_frame["Variable B"] != 0
sample = data_frame[condition].sample(sample_size)
return sample
But, I want to do this for more than one sample size. So I thought that rather than writing a for-loop that can be slow, I could maybe just use a decorator. I also have tried but have failed to understand their logic, so I thought this could be good practice to try to understand them better.
import functools
def decorator(cache = {}, **case):
def inner(function):
function_name = function.__name__
if function_name not in cache:
cache[function_name] = function
#functools.wraps(function)
def wrapped_function(**kwargs):
if cache[function_name] != function:
cache[function_name](**case)
else:
function(**case)
return wrapped_function
return inner
#decorator(sample_size = [20, 50])
def sample(data_frame, type = "random"):
if (type == "random"):
sample = data_frame.sample(sample_size)
else:
condition = data_frame["Variable B"] != 0
sample = data_frame[condition].sample(sample_size)
return sample
I guess what I am not understanding is how the inheritance of the arguments works and how that then affects the iteration over the function in the decorator.

Related

Permutation List with Variable Dependencies- UnboundLocalError

I was trying to break down the code to the simplest form before adding more variables and such. I'm stuck.
I wanted it so when I use intertools the first response is the permutations of tricks and the second response is dependent on the trick's landings() and is a permutation of the trick's corresponding landing. I want to add additional variables that further branch off from landings() and so on.
The simplest form should print a list that looks like:
Backflip Complete
Backflip Hyper
180 Round Complete
180 Round Mega
Gumbi Complete
My Code:
from re import I
import pandas as pd
import numpy as np
import itertools
from io import StringIO
backflip = "Backflip"
one80round = "180 Round"
gumbi = "Gumbi"
tricks = [backflip,one80round,gumbi]
complete = "Complete"
hyper = "Hyper"
mega = "Mega"
backflip_landing = [complete,hyper]
one80round_landing = [complete,mega]
gumbi_landing = [complete]
def landings(tricks):
if tricks == backflip:
landing = backflip_landing
elif tricks == one80round:
landing = one80round_landing
elif tricks == gumbi:
landing = gumbi_landing
return landing
for trik, land in itertools.product(tricks,landings(tricks)):
trick_and_landing = (trik, land)
result = (' '.join(trick_and_landing))
tal = StringIO(result)
tl = (pd.DataFrame((tal)))
print(tl)
I get the error:
UnboundLocalError: local variable 'landing' referenced before assignment
Add a landing = "" after def landings(tricks): to get rid of the error.
But the if checks in your function are wrong. You check if tricks, which is a list, is equal to backflip, etc. which are all strings. So thats why none of the ifs are true and landing got no value assigned.
That question was also about permutation in python. Maybe it helps.

Optimize Variable From A Function In Python

I'm used to using Excel for this kind of problem but I'm trying my hand at Python for now.
Basically I have two sets of arrays, one constant, and the other's values come from a user-defined function.
This is the function, simple enough.
import scipy.stats as sp
def calculate_probability(spread, std_dev):
return sp.norm.sf(0.5, spread, std_dev)
I have two arrays of data, one with entries that run through the calculate_probability function (these are the spreads), and the other a set of constants called expected_probabilities.
spreads = [10.5, 9.5, 10, 8.5]
expected_probabilities = [0.8091, 0.7785, 0.7708, 0.7692]
The below function is what I am seeking to optimise.
import numpy as np
def calculate_mse(std_dev):
spread_inputs = np.array(spreads)
model_probabilities = calculate_probability(spread_inputs,std_dev)
subtracted_vector = np.subtract(model_probabilities,expected_probabilities)
vector_powered = np.power(subtracted_vector,2)
mse_sum = np.sum(vector_powered)
return mse_sum/len(spreads)
I would like to find a value of std_dev such that function calculate_mse returns as close to zero as possible. This is very easy in Excel using solver but I am not sure how to do it in Python. What is the best way?
EDIT: I've changed my calculate_mse function so that it only takes a standard deviation as a parameter to be optimised. I've tried to return Andrew's answer in an API format using flask but I've run into some issues:
class Minimize(Resource):
std_dev_guess = 12.0 # might have a better guess than zeros
result = minimize(calculate_mse, std_dev_guess)
def get(self):
return {'data': result},200
api.add_resource(Minimize,'/minimize')
This is the error:
NameError: name 'result' is not defined
I guess something is wrong with the input?
I'd suggest using scipy's optimization library. From there, you have a couple options, the easiest from your current setup would be to just use the minimize method. Minimize itself has a massive amount of options, from simplex methods (default) to BFGS and COBYLA.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html
from scipy.optimize import minimize
n_params = 4 # based of your code so far
spreads_guess = np.zeros(n_params) # might have a better guess than zeros
result = minimize(calculate_mse, spreads_guess)
Give it a shot and if you have extra questions I can edit the answer and elaborate as needed.
Here's just a couple suggestions to clean up your code.
class Minimize(Resource):
def _calculate_probability(self, spread, std_dev):
return sp.norm.sf(0.5, spread, scale=std_dev)
def _calculate_mse(self, std_dev):
spread_inputs = np.array(self.spreads)
model_probabilities = self._calculate_probability(spread_inputs, std_dev)
mse = np.sum((model_probabilities - self.expected_probabilities)**2) / len(spread_inputs)
print(mse)
return mse
def __init__(self, expected_probabilities, spreads, std_dev_guess):
self.std_dev_guess = std_dev_guess
self.spreads = spreads
self.expected_probabilities = expected_probabilities
self.result = None
def solve(self):
self.result = minimize(self._calculate_mse, self.std_dev_guess, method='BFGS')
def get(self):
return {'data': self.result}, 200
# run something like
spreads = [10.5, 9.5, 10, 8.5]
expected_probabilities = [0.8091, 0.7785, 0.7708, 0.7692]
minimizer = Minimize(expected_probabilities, spreads, 10.)
print(minimizer.get()) # returns none since it hasn't been run yet, up to you how to handle this
minimizer.solve()
print(minimizer.get())

Python: How to run a function within another function in the same class suppressing the other function output

I am coding a class called BiVAR in that class I have a function called ResiPlot that plots a figure. However, in that function, I have defined a variable that is known as self.resi. In that particular class, I have another function called hzTest. In this function, I am not interested in the plot or the print of the function ResiPlot. All I need is to make sure that self.resi has been defined.
Therefore, what I am actually looking for is how to define self.resi in the function hzTest. i.e. calling the function ResiPlot supressing its figure or any printed output.
My code:
from statsmodels.tsa.api import VAR
import matplotlib.pyplot as plt, subprocess
# Calling built-in functions from R
Rsummary = robjects.r['summary']
class BiVAR:
def __init__(self, df, restrict=0): # Initialize when created
self.data = np.array(df.values, dtype=float) # self is the new object
self.isrestricted = restrict
if self.isrestricted ==0:
self.Model= VAR(self.data)
else:
p = int(input("Since, you want a restricted model please enter the lag p: "))
self.p=p
if p==0: p=1
t= Rvars.VAR(self.data, p, type='const')
self.Model= Rvars.restrict(t,method = "ser")
def BestLagAic(self):
if self.isrestricted==1:
print('Sorry this can not be excuted since you chose the model to be restricted')
else:
R=self.Model.select_order(15)
return R['aic'] # Split string on blanks
def Fit(self, *parameters, **keyword_parameters):
# This function allows you to specify the lag variable. If not specified it will use the p value you previously
# give it for the restricted VAR model otherwise it will use the best lag based on AIC
if self.isrestricted ==0:
if len(parameters)==1:
p=parameters[0]
results = self.Model.fit(p)
print(results.summary())
elif len(parameters)==0:
p=self.BestLagAic()
results = self.Model.fit(p)
print(results.summary())
else:
print('You included so many unrequired variables')
else:
p=self.p
t= Rvars.VAR(self.data, p, type='const')
t1= Rvars.restrict(t, method = "ser")
H=str(Rsummary(t1))
start = H.find('VAR Estimation Results:') + 23
end = H.find('Roots of the characteristic', start)
pvalue=H[start:end]
start1 = H.find('Estimation results for equation') + 31
pvalue1=H[start1::]
print(pvalue+pvalue1)
def ResiPlot(self, *parameters):
# This function plots the residuals when fitted with a VAR(p) model
if self.isrestricted ==0:
if len(parameters)==1:
p=parameters[0]
results = self.Model.fit(p)
resi=results.resid
self.resi=pd.DataFrame(resi, columns=['Bond-Resi','Equity-Resi'])
pd.DataFrame(resi).plot()
plt.show()
elif len(parameters)==0:
p=self.BestLagAic()
results = self.Model.fit(p)
resi=results.resid
self.resi=pd.DataFrame(resi, columns=['Bond-Resi','Equity-Resi'])
pd.DataFrame(resi).plot()
plt.show()
else:
print('You included so many unrequired variables')
else:
t= Rvars.VAR(self.data, self.p, type='const')
t1= Rvars.restrict(t, method = "ser")
t2=t1.rx2('varresult').rx2('y1').rx2('residuals')
t3=t1.rx2('varresult').rx2('y2').rx2('residuals')
resi=pd.DataFrame(np.column_stack((np.array(t2), np.array(t3))), columns=['Bond-Resi','Equity-Resi'])
self.resi=resi
pd.DataFrame(resi).plot()
plt.show()
def hzTest(self):
print('This is the Henze-Zirkler Multivariate Normality test applied on the residuals of the fitted model')
subprocess.call('self.ResiPlot')
MVNresult =MVN.hzTest(self.resi, qqplot = 0)
np.array(MVNresult.slots[tuple(MVNresult.slotnames())[1]])[0]
You need (a simple version of) multitier architecture. Don't have the functions that compute things also ask the user for their inputs and plot their results. Then you don't have to "suppress the other function's output"; you just call the computation function in different contexts, some of which produce user-visible output from it and some of which don't.

test getting skipped in pytest

I am trying to use parametrize for which I want to give testcases which I get from a different function using pytest.
I have tried this
test_input = []
rarp_input1 = ""
rarp_output1 = ""
count =1
def test_first_rarp():
global test_input
config = ConfigParser.ConfigParser()
config.read(sys.argv[2])
global rarp_input1
global rarp_output1
rarp_input1 = config.get('rarp', 'rarp_input1')
rarp_input1 =dpkt.ethernet.Ethernet(rarp_input1)
rarp_input2 = config.get('rarp','rarp_input2')
rarp_output1 = config.getint('rarp','rarp_output1')
rarp_output2 = config.get('rarp','rarp_output2')
dict_input = []
dict_input.append(rarp_input1)
dict_output = []
dict_output.append(rarp_output1)
global count
test_input.append((dict_input[0],count,dict_output[0]))
#assert test_input == [something something,someInt]
#pytest.mark.parametrize("test_input1,test_input2,expected1",test_input)
def test_mod_rarp(test_input1,test_input2,expected1):
global test_input
assert mod_rarp(test_input1,test_input2) == expected1
But the second test case is getting skipped. It says
test_mod_rarp1.py::test_mod_rarp[test_input10-test_input20-expected10]
Why is the test case getting skipped? I have checked that neither the function nor the input is wrong. Because the following code is working fine
#pytest.mark.parametrize("test_input1,test_input2,expected1,[something something,someInt,someInt])
def test_mod_rarp(test_input1,test_input2,expected1):
assert mod_rarp(test_input1,test_input2) == expected1
I have not put actual inputs here. Its correct anyway. Also I have config file from which I am taking inputs using configParser. test_mod_rarp1.py is the python file name where I am doing this. I basically want to know if we can access variables(test_input in my example) from other functions to use in parametrize if that is causing problem here. If we can't how do I change the scope of the variable?
Parametrization happens at compile time so that is the reason if you want to parametrized on data generated at run time it skips that.
The ideal way to acheive what you are trying to do is by using fixture parametrization.
Below example should clear things for you and then you could apply the same logic in your case
import pytest
input = []
def generate_input():
global input
input = [10,20,30]
#pytest.mark.parametrize("a", input)
def test_1(a):
assert a < 25
def generate_input2():
return [10, 20, 30]
#pytest.fixture(params=generate_input2())
def a(request):
return request.param
def test_2(a):
assert a < 25
OP
<SKIPPED:>pytest_suites/test_sample.py::test_1[a0]
********** test_2[10] **********
<EXECUTING:>pytest_suites/test_sample.py::test_2[10]
Collected Tests
TEST::pytest_suites/test_sample.py::test_1[a0]
TEST::pytest_suites/test_sample.py::test_2[10]
TEST::pytest_suites/test_sample.py::test_2[20]
TEST::pytest_suites/test_sample.py::test_2[30]
See test_1 was skipped because parameterization happened before execution of generate_input() but test_2 gets parameterized as required

homogenization the functions can be compiled into a calculate networks?

Inside of a network, information (package) can be passed to different node(hosts), by modify it's content it can carry different meaning. The final package depends on hosts input via it's given route of network.
Now I want to implement a calculating network model can do small jobs by give different calculate path.
Prototype:
def a(p): return p + 1
def b(p): return p + 2
def c(p): return p + 3
def d(p): return p + 4
def e(p): return p + 5
def link(p, r):
p1 = p
for x in r:
p1 = x(p1)
return p1
p = 100
route = [a,c,d]
result = link(p,result)
#========
target_result = 108
if result = target_result:
# route is OK
I think finally I need something like this:
p with [init_payload, expected_target, passed_path, actual_calculated_result]
|
\/
[CHAOS of possible of functions networks]
|
\/
px [a,a,b,c,e] # ok this path is ok and match the target
Here is my questions hope may get your help:
can p carry(determin) the route(s) by inspect the function and estmated result?
(1.1 ) for example, if on the route there's a node x()
def x(p): return x / 0 # I suppose it can pass the compile
can p know in somehow this path is not good then avoid select this path?
(1.2) Another confuse is if p is a self-defined class type, the payload inside of this class essentially is a string, when it carry with a path [a,c,d], can p know a() must with a int type then avoid to select this node?'
same as 1.2 when generating the path, can I avoid such oops
def a(p): return p + 1
def b(p): return p + 2
def x(p): return p.append(1)
def y(p): return p.append(2)
full_node_list = [a,b,x,y]
path = random(2,full_node_list) # oops x,y will be trouble for the inttype P and a,b will be trouble to list type.
pls consider if the path is lambda list of functions
PS: as the whole model is not very clear in my mind the any leading and directing will be appreciated.
THANKS!
You could test each function first with a set of sample data; any function which returns consistently unusable values might then be discarded.
def isGoodFn(f):
testData = [1,2,3,8,38,73,159] # random test input
goodEnough = 0.8 * len(testData) # need 80% pass rate
try:
good = 0
for i in testData:
if type(f(i)) is int:
good += 1
return good >= goodEnough
except:
return False
If you know nothing about what the functions do, you will have to essentially do a full breadth-first tree search with error-checking at each node to discard bad results. If you have more than a few functions this will get very large very quickly. If you can guarantee some of the functions' behavior, you might be able to greatly reduce the search space - but this would be domain-specific, requiring more exact knowledge of the problem.
If you had a heuristic measure for how far each result is from your desired result, you could do a directed search to find good answers much more quickly - but such a heuristic would depend on knowing the overall form of the functions (a distance heuristic for multiplicative functions would be very different than one for additive functions, etc).
Your functions can raise TypeError if they are not satisfied with the data types they receive. You can then catch this exception and see whether you are passing an appropriate type. You can also catch any other exception type. But trying to call the functions and catching the exceptions can be quite slow.
You could also organize your functions into different sets depending on the argument type.
functions = { list : [some functions taking a list], int : [some functions taking an int]}
...
x = choose_function(functions[type(p)])
p = x(p)
I'm somewhat confused as to what you're trying to do, but: p cannot "know about" the functions until it is run through them. By design, Python functions don't specify what type of data they operate on: e.g. a*5 is valid whether a is a string, a list, an integer or a float.
If there are some functions that might not be able to operate on p, then you could catch exceptions, for example in your link function:
def link(p, r):
try:
for x in r:
p = x(p)
except ZeroDivisionError, AttributeError: # List whatever errors you want to catch
return None
return p

Categories