How can I fix attribute error in this situation? I have a pandas dataframe where I make some data slicing and transformation and I want to plot the results of the persistence_model function like below.
Edit:
I want to customize a function with specific title of the plot, y and x axis and create a horizontal line on the same plot from the results of persitence_model function.
class ResidualErrors():
def __init__(self, data: pd.Series):
self.data = data
def _persistence_forecast_model_nrows(self, test_rows):
slicer = test_rows + 1
errors = self.data[-slicer:].diff().dropna()
return errors
def _persistence_forecast_model_percrows(self, train_perc):
n = len(self.data)
ntrain = int(n * train_perc)
errors = self.data[ntrain:].diff().dropna()
return errors
def persistence_model(self, test_rows=None, train_perc=None):
if (not test_rows) and (not train_perc):
raise TypeError(r"Please provide 'test_rows' or 'train_perc' arguments.")
if test_rows and train_perc:
raise TypeError(r"Please choose one argument either 'test_rows' or 'train_perc'.")
if test_rows:
return self._persistence_forecast_model_nrows(test_rows)
else:
return self._persistence_forecast_model_percrows(train_perc)
#classmethod
def plot_residuals(obj):
obj.plot()
plt.show()
Desired output
res = ResidualErrors(data).persistence_model(test_rows=10)
res.plot_residuals()
>> AttributeError: 'Series' object has no attribute 'plot_residuals'
You need to be more aware of what methods return. The first step creates a ResidualErrors object:
res = ResidualErrors(data)
The second step creates a DataFrame or Series:
obj = res.persistence_model(test_rows=10)
You can call plot_residuals on res but not on obj, as you are currently doing:
res.plot_residuals(obj)
Related
I have a function that does some calculations based on a filtered queryset. I am seeking to pass arguments to the function to inform the filter. The basics:
A view to test the output:
def TestPlayerData(request):
data = PlayerData(1,2)
print(data)
return HttpResponse(data)
The function called is:
def PlayerData(game, player):
"""This is a function based on EventFilterView"""
opponent = [2]
qs = Event.objects.filter(g_players=player).filter(g_name__in=game).filter(g_players__in=opponent)
count_wins = len(qs.filter(g_winner=player))
count_played = len(qs.filter(g_players=player))
if count_played == 0:
win_ratio = 'na'
else:
win_ratio = count_wins/count_played
return count_wins, count_played, win_ratio
The error received:
"TypeError: 'int' object is not iterable"
However, if I explicitly name the variables in the function rather than pass them from the view, the function works as expected -- like this:
def PlayerData():
"""This is a function based on EventFilterView"""
game = [1]
player =2
opponent = [2]
qs = Event.objects.filter(g_players=player).filter(g_name__in=game).filter(g_players__in=opponent)
count_wins = len(qs.filter(g_winner=player))
count_played = len(qs.filter(g_players=player))
if count_played == 0:
win_ratio = 'na'
else:
win_ratio = count_wins/count_played
return count_wins, count_played, win_ratio
I am obviously missing some basic python understanding here and would appreciate a point in the right direction.
In the first block of code, you pass data = PlayerData(1,2)
PlayerData() however eventually uses .filter() on the parameters which you cant do on int, which would raise a TypeError.
In your final example where you write in your own #'s after the fact, you pass in game as a list, where in the first block of code game was an int. You're passing it in correctly when you do it manually.
I am having some problems adding two class objects together.
This is the code given to me, which will run MY file, the HyperLogLog and a sample text file:
import HyperLogLog
import sys
hlls = [HyperLogLog.HyperLogLog() for _ in range(5)]
with open(sys.argv[1], "r") as file:
for line in file:
cleanLine = line.replace("\n", "")
(cmd, set, value) = cleanLine.split(" ")[:3]
# See if this was an add, count, or merge command
if cmd == "A":
hlls[int(set)].add(value)
elif cmd == "C":
estimate = hlls[int(set)].count()
print("Estimate:", estimate, "Real count:", value)
elif cmd == "M":
(cmd, m1, m2, m3) = cleanLine.split(" ")
hlls[int(m3)] = hlls[int(m1)] + hlls[int(m2)]
The bottom most line is to merge hlls(set m1) and hlls(set m2). hlls(set x) stores a single parameter M, which is my HyperLogLog vector. I need to make an add function to make the addition line above work. This I have done as follows:
class HyperLogLog:
def __init__(self):
self.M = [0 for x in range(m)]
##############
Code altering the self.M
##############
def __add__(self, other):
Sum=other.M
for i,value in enumerate(other.M):
if value<self.M[i]:
Sum[i]=self.M[i]
self.M=Sum
return self
This will return the correct value for the m3 set. But it will also alter the self.M value of set m1. How can I return something other than self, which will make hlls[int(m3)] and instance of the HyperLogLog class, with the merged self.M value?
If I just return the Sum function, hlls[int(m3)] is no longer an instance of the HyperLogLog class.
If I change self.M as I do, I alter the self.M value of hlls[int(m1)].
If I do something like:
def __add__(self, other):
Sum=other.M
for i,value in enumerate(other.M):
if value<self.M[i]:
Sum[i]=self.M[i]
self2=self
self2.M=Sum
return self2
The value of self.M of instance hlls[int(m1)] is still changed. I don't understand why.
When you do this:
self2=self
Both self and self2 point to the same object, so when one is changed the other one is changed as well. The easiest fix would be to create a new HyperLogLog object, so you would replace the line above with:
self2=HyperLogLog()
This doesn't create a new object instance. It just assigns another name to the same object.
self2=self
You should create a new HyperLogLog object in the __add__ method.
Something like this:
def __add__(self, other):
retval = HyperLogLog()
retval.M = [max(a, b) for a, b in zip(self.M, other.M)]
return retval
I am trying to implement the Scipy script from section "Simplifying the syntax" here: http://scipy-cookbook.readthedocs.io/items/FittingData.html
My code is quite long, so I'll post only the parts that seem to be the problem.
I get the following error message: TypeError: unsupported operand type(s) for *: 'int' and 'Parameter', which I understand why it happens: it's the product in this part: return self.amplitude() * np.exp(-1*self.decay_const()*x)
class Plot():
def __init__(self,slice_and_echo,first_plot,right_frame):
self.slice_and_echo = slice_and_echo
self.first_plot = first_plot
self.right_frame = right_frame
self.amplitude = Parameter(1)
self.decay_const = Parameter(1)
def function(self,x):
print(self.amplitude)
print(self.amplitude())
return self.amplitude() * np.exp(-1*self.decay_const()*x)
def create_plot(self):
plot_figure = Figure(figsize=(10,10), dpi=100)
self.the_plot = plot_figure.add_subplot(111)
self.the_plot.plot(self.echoes,self.average,'ro')
print(self.amplitude())
self.fit_parameters = self.fit(self.function,[self.amplitude,self.decay_const],self.average)
print(self.fit_parameters)
def fit(self,function, parameters, y, x=None):
def f(params):
i = 0
for p in parameters:
p.set(params[i])
i += 1
return y - function(x)
if x is None: x = np.arange(y.shape[0])
p = [param for param in parameters]
return optimize.leastsq(f, p)
and the Parameter() class is the same as in the link:
class Parameter:
def __init__(self, value):
self.value = value
def set(self, value):
self.value = value
def __call__(self):
return self.value
The issue seems to be that, when I call self.amplitude() inside of the create_plot(self): method, the value it returns is an integer (which is what I want!). But that doesn't happen when I call it inside of the function(self,x) method; when I print it inside this method I get: <__main__.Parameter object at 0x1162845c0> instead of the integer 1.
Why would it return different values when called from different methods in the same class? What am I missing here?
Thank you!
You got a typo in list comprehension. Your code states:
p = [param for param in parameters]
and the example code states:
p = [param() for param in parameters]
Note that in your case you are generating a list of objects of type Parameter instead of a list of numbers.
By the way, check out module called lmfit - it simplifies fitting routines by great deal.
I wrote the following program:
def split_and_add(invoer):
rij = invoer.split('=')
rows = []
for line in rij:
rows.append(process_row(line))
return rows
def process_row(line):
temp_coordinate_row = CoordinatRow()
rij = line.split()
for coordinate in rij:
coor = process_coordinate(coordinate)
temp_coordinate_row.add_coordinaterow(coor)
return temp_coordinate_row
def process_coordinate(coordinate):
cords = coordinate.split(',')
return Coordinate(int(cords[0]),int(cords[1]))
bestand = file_input()
rows = split_and_add(bestand)
for row in range(0,len(rows)-1):
rij = rows[row].weave(rows[row+1])
print rij
With this class:
class CoordinatRow(object):
def __init__(self):
self.coordinaterow = []
def add_coordinaterow(self, coordinate):
self.coordinaterow.append(coordinate)
def weave(self,other):
lijst = []
for i in range(len(self.coordinaterow)):
lijst.append(self.coordinaterow[i])
try:
lijst.append(other.coordinaterow[i])
except IndexError:
pass
self.coordinaterow = lijst
return self.coordinaterow
However there is an error in
for row in range(0,len(rows)-1):
rij = rows[row].weave(rows[row+1])
print rij
The outcome of the print statement is as follows:
[<Coordinates.Coordinate object at 0x021F5630>, <Coordinates.Coordinate object at 0x021F56D0>]
It seems as if the program doesn't acces the actual object and printing it. What am i doing wrong here ?
This isn't an error. This is exactly what it means for Python to "access the actual object and print it". This is what the default string representation for a class looks like.
If you want to customize the string representation of your class, you do that by defining a __repr__ method. The typical way to do it is to write a method that returns something that looks like a constructor call for your class.
Since you haven't shown us the definition of Coordinate, I'll make some assumptions here:
class Coordinate(object):
def __init__(self, x, y):
self.x, self.y = x, y
# your other existing methods
def __repr__(self):
return '{}({}, {})'.format(type(self).__name__, self.x, self.y)
If you don't define this yourself, you end up inheriting __repr__ from object, which looks something like:
return '<{} object at {:#010x}>'.format(type(self).__qualname__, id(self))
Sometimes you also want a more human-readable version of your objects. In that case, you also want to define a __str__ method:
def __str__(self):
return '<{}, {}>'.format(self.x, self.y)
Now:
>>> c = Coordinate(1, 2)
>>> c
Coordinate(1, 2)
>>> print(c)
<1, 2>
But notice that the __str__ of a list calls __repr__ on all of its members:
>>> cs = [c]
>>> print(cs)
[Coordinate(1, 2)]
I am using pandas.rolling_apply to fit data to a distribution and get a value from it, but I need it also report a rolling goodness of fit (specifically, p-value). Currently I'm doing it like this:
def func(sample):
fit = genextreme.fit(sample)
return genextreme.isf(0.9, *fit)
def p_value(sample):
fit = genextreme.fit(sample)
return kstest(sample, 'genextreme', fit)[1]
values = pd.rolling_apply(data, 30, func)
p_values = pd.rolling_apply(data, 30, p_value)
results = pd.DataFrame({'values': values, 'p_value': p_values})
The problem is that I have a lot of data, and the fit function is expensive, so I don't want to call it twice for every sample. What I'd rather do is something like this:
def func(sample):
fit = genextreme.fit(sample)
value = genextreme.isf(0.9, *fit)
p_value = kstest(sample, 'genextreme', fit)[1]
return {'value': value, 'p_value': p_value}
results = pd.rolling_apply(data, 30, func)
Where results is a DataFrame with two columns. If I try to run this, I get an exception:
TypeError: a float is required. Is it possible to achieve this, and if so, how?
I had a similar problem and solved it by using a member function of a separate helper class during apply. That member function does as required return a single value but I store the other calc results as members of the class and can use it afterwards.
Simple Example:
class CountCalls:
def __init__(self):
self.counter = 0
def your_function(self, window):
retval = f(window)
self.counter = self.counter + 1
TestCounter = CountCalls()
pandas.Series.rolling(your_seriesOrDataframeColumn, window = your_window_size).apply(TestCounter.your_function)
print TestCounter.counter
Assume your function f would return a tuple of two values v1,v2. Then you can return v1 and assign it to column_v1 to your dataframe. The second value v2 you simply accumulate in a Series series_val2 within the helper class. Afterwards you just assing that series as new column to your dataframe.
JML
I had a similar problem before. Here's my solution for it:
from collections import deque
class your_multi_output_function_class:
def __init__(self):
self.deque_2 = deque()
self.deque_3 = deque()
def f1(self, window):
self.k = somefunction(y)
self.deque_2.append(self.k[1])
self.deque_3.append(self.k[2])
return self.k[0]
def f2(self, window):
return self.deque_2.popleft()
def f3(self, window):
return self.deque_3.popleft()
func = your_multi_output_function_class()
output = your_pandas_object.rolling(window=10).agg(
{'a':func.f1,'b':func.f2,'c':func.f3}
)
I used and loved #yi-yu's answer so I made it generic:
from collections import deque
from functools import partial
def make_class(func, dim_output):
class your_multi_output_function_class:
def __init__(self, func, dim_output):
assert dim_output >= 2
self.func = func
self.deques = {i: deque() for i in range(1, dim_output)}
def f0(self, *args, **kwargs):
k = self.func(*args, **kwargs)
for queue in sorted(self.deques):
self.deques[queue].append(k[queue])
return k[0]
def accessor(self, index, *args, **kwargs):
return self.deques[index].popleft()
klass = your_multi_output_function_class(func, dim_output)
for i in range(1, dim_output):
f = partial(accessor, klass, i)
setattr(klass, 'f' + str(i), f)
return klass
and given a function f of a pandas Series (windowed but not necessarily) returning, n values, you use it this way:
rolling_func = make_class(f, n)
# dict to map the function's outputs to new columns. Eg:
agger = {'output_' + str(i): getattr(rolling_func, 'f' + str(i)) for i in range(n)}
windowed_series.agg(agger)
I also had the same issue. I solved it by generating a global data frame and feeding it from the rolling function. In the following example script, I generate a random input data. Then, I calculate with a single rolling apply function the min, the max and the mean.
import pandas as pd
import numpy as np
global outputDF
global index
def myFunction(array):
global index
global outputDF
# Some random operation
outputDF['min'][index] = np.nanmin(array)
outputDF['max'][index] = np.nanmax(array)
outputDF['mean'][index] = np.nanmean(array)
index += 1
# Returning a useless variable
return 0
if __name__ == "__main__":
global outputDF
global index
# A random window size
windowSize = 10
# Preparing some random input data
inputDF = pd.DataFrame({ 'randomValue': [np.nan] * 500 })
for i in range(len(inputDF)):
inputDF['randomValue'].values[i] = np.random.rand()
# Pre-Allocate memory
outputDF = pd.DataFrame({ 'min': [np.nan] * len(inputDF),
'max': [np.nan] * len(inputDF),
'mean': [np.nan] * len(inputDF)
})
# Precise the staring index (due to the window size)
d = (windowSize - 1) / 2
index = np.int(np.floor( d ) )
# Do the rolling apply here
inputDF['randomValue'].rolling(window=windowSize,center=True).apply(myFunction,args=())
assert index + np.int(np.ceil(d)) == len(inputDF), 'Length mismatch'
outputDF.set_index = inputDF.index
# Optional : Clean the nulls
outputDF.dropna(inplace=True)
print(outputDF)