index out of bounds// Python, dataframe, plot

index out of bounds// Python, dataframe, plot - python

I want to plot the point with max value from dataframe.
import pandas as pd
import matplotlib.pyplot as plt
dane = pd.read_table('C:\\xxx.txt', names=('rok', 'kroliki', 'lisy', 'marchewki'))
df = pd.DataFrame(dane)
data = df[1:]
data=data.astype(float)
x = int(data['kroliki'].max())
y = int(data['lisy'].max())
z = int(data['marchewki'].max())
p= data['rok'].where(data['kroliki'] == x)
q = data['rok'].where(data['lisy'] == y)
r = data['rok'].where(data['marchewki'] == z)
p1 = int(p[p.notnull()])
q1 = int(q[q.notnull()])
r1 = int(r[r.notnull()])
point = pd.DataFrame({'x':[p1],'y':[q1],'z':[r1]})
point.plot((p1,x),(q1,y),(r1,z))
I have such an error:
IndexError: index 1993 is out of bounds for axis 0 with size 4
May somebody know what is wrong with this code?
Thanks

I think that when you use Pandas to plot, it will look for indices within itself and not for values.
So, in your case, when you do:
point.plot(p1,x)
Pandas will look for the index 1993 in the x-direction, i.e, throughout all columns. In other words, you should have 1993 columns.
I tried to reproduce your problem as follows:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(10, 4)), columns=('rok', 'kroliki', 'lisy', 'marchewki'))
data = df[1:]
data=data.astype(float)
x = int(data['kroliki'].max())
y = int(data['lisy'].max())
z = int(data['marchewki'].max())
p = data['rok'].where(data['kroliki'] == x)
q = data['rok'].where(data['lisy'] == y)
r = data['rok'].where(data['marchewki'] == z)
p1 = int(p[p.notnull()])
q1 = int(q[q.notnull()])
r1 = int(r[r.notnull()])
point = pd.DataFrame({'x':[p1],'y':[q1],'z':[r1]})
point.plot((p1,x),(q1,y),(r1,z))
I get the following error:
>>> AttributeError: 'tuple' object has no attribute 'lower'
And when I run each point separately:
>>>> IndexError: index 85 is out of bounds for axis 0 with size 3
To solve it:
import matplotlib.pyplot as plt
plt.plot((point.x, point.y, point.z), (x,y,z),'ko')
And I got the following result:
Hope it helps.

Related

Calculate relative phase between two angles - python

I'm trying to calculate the relative phase between a time series of two angles. Using below, the angles are measured by the rotation derived from the xy points associated to Label A and Label B. The angles are moving in a similar direction for the first 3 time points and then deviate for the remaining 3 time points.
My understanding was that the relative phase calculation using a Hilbert transform signified that values closer to 0 ° referred to a pattern of coordination or in-phase. Conversely, values closer to 180° referred to asynchronous patterns or anti-phase. Yet when I export the results below I'm not seeing this?
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.signal import hilbert
df = pd.DataFrame({
'Time' : [1,1,2,2,3,3,4,4,5,5,6,6],
'Label' : ['A','B','A','B','A','B','A','B','A','B','A','B'],
'x' : [-2.0,-1.0,-1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0],
'y' : [-2.0,-1.0,-2.0,-1.0,-2.0,-1.0,-3.0,0.0,-4.0,1.0,-5.0,2.0],
})
x = df.groupby('Label')['x'].diff().fillna(0).astype(float)
y = df.groupby('Label')['y'].diff().fillna(0).astype(float)
df['Rotation'] = np.arctan2(y, x)
df['Angle'] = np.degrees(df['Rotation'])
df_A = df[df['Label'] == 'A'].reset_index(drop = True)
df_B = df[df['Label'] == 'B'].reset_index(drop = True)
y1 = df_A['Angle'].values
y2 = df_B['Angle'].values
ang1 = np.angle(hilbert(y1),deg=False)
ang2 = np.angle(hilbert(y2),deg=False)
f,ax = plt.subplots(3,1,figsize=(20,5),sharex=True)
ax[0].plot(y1,color='r',label='y1')
ax[0].plot(y2,color='b',label='y2')
ax[0].legend(bbox_to_anchor=(0., 1.02, 1., .102),ncol=2)
ax[1].plot(ang1,color='r')
ax[1].plot(ang2,color='b')
ax[1].set(title='Angle at each Timepoint')
phase_synchrony = 1-np.sin(np.abs(ang1-ang2)/2)
ax[2].plot(phase_synchrony)
ax[2].set(ylim=[0,1.1],title='Instantaneous Phase Synchrony',xlabel='Time',ylabel='Phase Synchrony')
plt.tight_layout()
plt.show()

By your description I would simply use
phase_synchrony = 1-np.sin(np.abs(y1-y2)/2)
The analytic representation via Hilbert Transform applies when you have only the real part of a signal you know (or assume based on reasonable principles) to be analytic, under such conditions you can find a imaginary part that makes the resulting function analytic.
But in your case you already have x and y, so you can calculate the angle directly as you done already.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.signal import hilbert
df = pd.DataFrame({
'Time' : [1,1,2,2,3,3,4,4,5,5,6,6],
'Label' : ['A','B','A','B','A','B','A','B','A','B','A','B'],
'x' : [-2.0,-1.0,-1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0],
'y' : [-2.0,-1.0,-2.0,-1.0,-2.0,-1.0,-3.0,0.0,-4.0,1.0,-5.0,2.0],
})
x = df.groupby('Label')['x'].diff().fillna(0).astype(float)
y = df.groupby('Label')['y'].diff().fillna(0).astype(float)
df['Rotation'] = np.arctan2(y, x)
df['Angle'] = np.degrees(df['Rotation'])
df_A = df[df['Label'] == 'A'].reset_index(drop = True)
df_B = df[df['Label'] == 'B'].reset_index(drop = True)
y1 = df_A['Angle'].values
y2 = df_B['Angle'].values
# no need to compute the hilbert transforms here
f,ax = plt.subplots(3,1,figsize=(20,5),sharex=True)
ax[0].plot(y1,color='r',label='y1')
ax[0].plot(y2,color='b',label='y2')
ax[0].legend(bbox_to_anchor=(0., 1.02, 1., .102),ncol=2)
ax[1].plot(ang1,color='r')
ax[1].plot(ang2,color='b')
ax[1].set(title='Angle at each Timepoint')
# all I changed
phase_synchrony = 1-np.sin(np.abs(y1-y2)/2)
ax[2].plot(phase_synchrony)
ax[2].set(ylim=[0,1.1],title='Instantaneous Phase Synchrony',xlabel='Time',ylabel='Phase Synchrony')
plt.tight_layout()
plt.show()

Plotting Results from For Iteration

I am new to python and I want to ask how to plot a figure from for loop iteration?
Here is the code!
import numpy as np #numerical python
import matplotlib.pyplot as plt #python plotting
from math import exp #exponential math directory
T_initial = 293
T_reference = range(298,340,2)
R1_initial = 57500
R2_initial = 13300
R3_initial = 18000
R4_initial = 5600
Beta = 4150
Vin = 2.8
for i in T_reference:
R1_refe = R1_initial*exp(Beta*((1/i)-(1/T_initial)))
Rs = (R2_initial/(R2_initial+ R1_refe)) - (R4_initial/(R3_initial+R4_initial))
Vo = Vin*Rs
Vo_round = round(Vo, 3)
print(i,Vo_round)

You can plot the data like this:
for i in T_reference:
R1_refe = R1_initial*exp(Beta*((1/i)-(1/T_initial)))
Rs = (R2_initial/(R2_initial+ R1_refe)) - (R4_initial/(R3_initial+R4_initial))
Vo = Vin*Rs
Vo_round = round(Vo, 3)
plt.scatter(i, Vo_round)
plt.show()
Is this what you were looking for?

Put the values of the items you want to plot into two different arrays using the 'append' method (one for the 'x' axis and one for the 'y' axis).
Then just plot the graph with the matplotlib
It should be something like the below:
is1 = list()
vos = list()
for i in T_reference:
R1_refe = R1_initial*exp(Beta*((1/i)-(1/T_initial)))
Rs = (R2_initial/(R2_initial+ R1_refe)) - (R4_initial/(R3_initial+R4_initial))
Vo = Vin*Rs
Vo_round = round(Vo, 3)
print(i,Vo_round)
is1.append(i)
vos.append(Vo_round)
plt.plot(is1,vos)
Here is a reference for plotting

Two options without a for-loop
Create a function
def v_o(T_reference):
T_initial = 293
R1_initial = 57500
R2_initial = 13300
R3_initial = 18000
R4_initial = 5600
Beta = 4150
Vin = 2.8
R1_refe = R1_initial*exp(Beta*((1/T_reference)-(1/T_initial)))
Rs = (R2_initial/(R2_initial + R1_refe)) - (R4_initial/(R3_initial+R4_initial))
Vo = Vin*Rs
Vo_round = round(Vo, 3)
return Vo_round
Option 1: Use a pandas dataframe
import pandas as pd
import matplotlib.pyplot as plt
# create the dataframe with T_reference
df = pd.DataFrame({'t_ref': [*range(298, 340,2)]})
# Call the function to calculate v_o
df['v_o'] = df.t_ref.apply(v_o)
# plot
df.plot('t_ref', 'v_o', legend=False)
plt.show()
Option 2: use map
T_reference = [*range(298, 340,2)]
v_o = list(map(v_o, T_reference))
plt.plot(T_reference, v_o)
plt.show()
Plot
The plot from both options looks like the following

How to resolve Boolean value error in linear regression model in python?

I am trying to run a fama-macbeth regression in a python. As afirst step I am running the time series for every asset in my portfolio but I am unable to run it because I am getting an error:
'ValueError: Must pass DataFrame with boolean values only'
I am relatively new to python and have heavily relied on this forum to help me out. I hope it you can help me with this issue.
Please let me know how I can resolve this. I will be very grateful to you!
I assume this line is producing the error. Cause when I run the function without the for loop, it works perfectly.
for i in range(cols):
df_beta = RegressionRoll(df=data_set, subset = 0, dependent = data_set.iloc[:,i], independent = data_set.iloc[:,30:], const = True, parameters = 'beta',
win = 12)
The dimension of my matrix is 108x35, 30 stocks and 5 factors over 108 points. Hence I want to run a regression for every stock against the 4 factors and store the result of the coeffs in a dataframe. Sample dataframe:
Date BAS GY AI FP SGL GY LNA GY AKZA NA Market Factor
1/29/2010 -5.28% -7.55% -1.23% -5.82% -7.09% -5.82%
2/26/2010 0.04% 13.04% -1.84% 4.06% -14.62% -14.62%
3/31/2010 10.75% 1.32% 7.33% 6.61% 12.21% 12.21%
The following is the entire code:
import pandas as pd
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.stats.outliers_influence import variance_inflation_factor
data_set = pd.read_excel(r'C:\XXX\Research Project\Data\Regression.xlsx', sheet_name = 'Fama Macbeth')
data_set.set_index(data_set['Date'], inplace=True)
data_set.drop('Date', axis=1, inplace=True)
X = data_set.iloc[:,30:]
y = data_set.iloc[:,:30]
def RegressionRoll(df, subset, dependent, independent, const, win, parameters):
# Data subset
if subset != 0:
df = df.tail(subset)
else:
df = df
# Loopinfo
end = df.shape[0]
win = win
rng = np.arange(start = win, stop = end, step = 1)
# Subset and store dataframes
frames = {}
n = 1
for i in rng:
df_temp = df.iloc[:i].tail(win)
newname = 'df' + str(n)
frames.update({newname: df_temp})
n += 1
# Analysis on subsets
df_results = pd.DataFrame()
for frame in frames:
#print(frames[frame])
# Rolling data frames
dfr = frames[frame]
y = dependent
x = independent
if const == True:
x = sm.add_constant(dfr[x])
model = sm.OLS(dfr[y], x).fit()
else:
model = sm.OLS(dfr[y], dfr[x]).fit()
if parameters == 'beta':
theParams = model.params[0:]
coefs = theParams.to_frame()
df_temp = pd.DataFrame(coefs.T)
indx = dfr.tail(1).index[-1]
df_temp['Date'] = indx
df_temp = df_temp.set_index(['Date'])
df_results = pd.concat([df_results, df_temp], axis = 0)
if parameters == 'R2':
theParams = model.rsquared
df_temp = pd.DataFrame([theParams])
indx = dfr.tail(1).index[-1]
df_temp['Date'] = indx
df_temp = df_temp.set_index(['Date'])
df_temp.columns = [', '.join(independent)]
df_results = pd.concat([df_results, df_temp], axis = 0)
return(df_results)
cols = len(y.columns)
for i in range(cols):
df_beta = RegressionRoll(df=data_set, subset = 0, dependent = data_set.iloc[:,i], independent = data_set.iloc[:,30:], const = True, parameters = 'beta',
win = 12)
ValueError: Must pass DataFrame with boolean values only

How do I use the output of pandas.ewm.cov?

How is one intended to use the output of the pandas.ewm.cov function. I would presume that there are functions that allow you to directly use it in the form returned for multiplication, but nothing I try seems to work.
For example, suppose I take a minimal use case, stock X and Y returns timeseries in DF1, so we estimate an ewma covariance matrix, then to get the variance estimate for a portfolio of position A and B (given in DF2) I need to compute $x^T C x$, but I can't find the command to do this without writing a for loop?
# Python 3.6, pandas 0.20
import pandas as pd
import numpy as np
np.random.seed(100)
DF1 = pd.DataFrame(dict(X = np.random.normal(size = 100), Y = np.random.normal(size = 100)))
DF2 = pd.DataFrame(dict(A = np.random.normal(size = 100), B = np.random.normal(size = 100)))
COV = DF1.ewm(10).cov()
print(DF1)
print(COV)
# All of the following are invalid
print(COV.dot(DF2))
print(DF2.dot(COV))
print(COV.multiply(DF2))

The best I can figure out is this ugly piece of code
COV.reset_index().rename(columns = dict(level_0 = "index", level_1 = "variable"), inplace = True)
DF2m = pd.melt(DF2.reset_index(), id_vars = "index").sort_values("index")
MDF = pd.merge(COV, DF2m, on=["index", "variable"])
VAR = MDF.groupby("index").apply(lambda x: np.dot(np.dot(x["value"], np.matrix([x["X"], x["Y"]])), x["value"])[0,0])
I hold out hope that there is a nice way to do this...

Convert DF into Numpy Array for calculations

I have the data in a dataframe format that I will use for linear regression calculation using user-built function. Here is the code:
from sklearn.datasets import load_boston
boston = load_boston()
bos = pd.DataFrame(boston.data) # convert to DF
bos.columns = boston.feature_names
bos['PRICE'] = boston.target
y = bos.PRICE
x = bos.drop('PRICE', axis = 1) # DROP PRICE since only want X-type variables (not Y-target)
xw = df.to_array(x)
xw = np.insert(xw,0,1, axis = 1) # to insert a column of "1" values
However, I am getting the error:
AttributeError Traceback (most recent call last)
<ipython-input-131-272f1b4d26ba> in <module>()
1 import copy
2
----> 3 xw = df.to_array(x)
AttributeError: 'int' object has no attribute 'to_array'
I am not sure where the problem. I need to pass an array of values (x in this case) to the function to execute some matrix operations
The insert function was working in a step by step code development but for some reason is failing here.
I tried:
xw = copy.deepcopy(x)
with no success
Any thoughts?

it is x.as_matrix() not df.to_array(x)
Please refer to pandas document for more detail on as_matrix()
Here is the code that work
from sklearn.datasets import load_boston
import pandas as pd
import numpy as np
boston = load_boston()
bos = pd.DataFrame(boston.data) # convert to DF
bos.columns = boston.feature_names
bos['PRICE'] = boston.target
y = bos.PRICE
x = bos.drop('PRICE', axis = 1) # DROP PRICE since only want X-type variables (not Y-target)
xw = x.as_matrix()
xw = np.insert(xw,0,1, axis = 1) # to insert a column of "1" values

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

index out of bounds// Python, dataframe, plot - python

Related

Calculate relative phase between two angles - python

Plotting Results from For Iteration

How to resolve Boolean value error in linear regression model in python?

How do I use the output of pandas.ewm.cov?

Convert DF into Numpy Array for calculations

Categories

Resources