function for counting number of oscillations - python

i'm trying to build a counter which would detect number of oscillations in a given data
i'm following a method where the slope of each point is calculated and based on negative and positive direction change
is there a preexisting function for this
i'm using the following code and i'm unable to leave out the cells with zero values after taking difference between each cell
import pandas as pd
import xlsxwriter
from asammdf import MDF
import numpy as np
dat = MDF("file_name.dat")
app = dat.get('variabe_name')
df = pd.DataFrame(app)
print(df)
data = df.loc[0, 0:]
#time step = T
T = 0.01
# Number of sample points
N = len(data)
# sample spacing
x = np.linspace(0.0, N*T, N, endpoint=False)
x1 = data.diff()
print(x1)
df1_1 = pd.DataFrame([x1])
df1_1 = df1_1.replace(0, np.nan)
df1_1 = df1_1.dropna(how='all', axis=0)
df1_1 = df1_1.dropna()
df1 = pd.DataFrame.transpose(df1_1)
df1.to_csv("output.csv")'
my data looks like this

Related

how to calculate the Sharpe ratio in different time intervals?

import pandas as pd
import numpy as np
bt_dict = {
'position_strategy1':df1_1hour,
'position_strategy2':df2_6hour
}
def backtest(bt_dict):
# ohlc_df is one hour timeframe
ohlc_df['date'] = pd.to_datetime(ohlc_df['date'])
ohlc_df.set_index('date',inplace=True)
all_df = pd.DataFrame(index=ohlc_df.index)
all_df['close'] = ohlc_df['close']
for strategy_name,strategy_df in bt_dict.items():
bt_dict[strategy_name] = strategy_df[['date','position']].rename(columns={"position":f"position_{strategy_name}"}).dropna()
bt_dict[strategy_name]['date'] = pd.to_datetime(bt_dict[strategy_name]['date'])
bt_dict[strategy_name].set_index('date', inplace=True)
all_df[f'position_{strategy_name}'] = bt_dict[strategy_name]
all_df = all_df.fillna(method='ffill')
all_df['position'] = all_df['position_strategy1']*0.6 +\
all_df['position_strategy2']*0.4
all_df = all_df.dropna()
all_df['pnl'] = all_df['position'].shift(1) * (all_df['close'] / all_df['close'].shift(1) - 1)
sharpe_ratio = all_df['pnl'].mean() / all_df['pnl'].std() * np.sqrt(365 * 24)
return sharpe_ratio
for example, I have two strategies, including 1-hour and 6-hour data frame, want to combine them and calculate the sharpe ratio
I had tried to calculate multiple timeframes, but the result was wrong..
i hope i get the right way to calculate sharpe ratio in different timeframe

Fast numpy operation on part of dataframe

I have a pandas dataframe with several columns. 2 of them are date and time and others are numerical.
I need to perform fast in-place calculation on the numerical part of the dataframe. Currently I ignore first 2 columns and convert numericals to a numpy and use it further down the code as a numpy.
However I want to keep these processed numericals in the dataframe without touching date and time.
Now:
# tanh norm
def tanh_ret():
data = df.to_numpy()
mu = np.mean(data)
std = np.std(data)
return 0.5 * (np.tanh(0.01 * ((data - mu) / std)) + 1)
del df['Date']
del df['Time']
nums = tanh_ret()
del df
What I want: normalize 3 df columns out of 5 in-place
Mind that the dataset is large so I would prefer as less data copy as possible but also reasonably fast.
Create a random pandas dataframe
I consider 5 columns of random values, you can place what you want. The Time and Date columns are set to a constant value.
import datetime as dt
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.random((100,5)))
now = dt.datetime.now()
df['Time'] = now.strftime('%H:%M:%S')
df['Date'] = now.strftime('%m/%d/%Y')
Inplace numerical processing
def tanh_ret(data):
mu = data.mean()
std = data.std()
return 0.5 * (np.tanh(0.01 * ((data - mu) / std)) + 1)
num_cols =df.columns[df.dtypes != 'object']
df[num_cols] = df[num_cols].transform(tanh_ret)
Alternatively:
tan_map = {col: tanh_ret for col in num_cols}
df[num_cols] = df.transform(tan_map)
Source

Convert create.bspline.basis((rangval,nbasis,norder=norder,breaks=breaks) in R to BSpline(t, c, degree) in Python

I am trying to convert the function create.bspline.basis(rangval,nbasis,norder=norder,breaks=breaks) from R to Python.
I have tried using the BSpline(t, c , degree) function from scipy.interpolate but cannot seem the get the same results as I got in R.
Here is my R code:
library('fda')
df <- read.csv('data.csv', header = T)
df <- df[,1] # convert data frame to vector. Vector has a length of 1941.
rangval <- c(1, length(df))
breaks = seq(1,length(df),length.out=length(df)/60)
norder = 6
nbasis = length(breaks) - 2 + norder
bbasis = create.bspline.basis(rangval,nbasis,norder=norder,breaks=breaks)
plot(bbasis)
Here is my Python code:
from scipy.interpolate import BSpline
import matplotlib.pyplot as plt
import numpy as np
import math
Load Data files as data frames:
df = pd.read_csv('RData\Data.csv')
Convert data frames to arrays:
df = df.to_numpy()
breaks = np.linspace(1, (len(df)), math.ceil(len(df)/60))
k = math.ceil(len(df)) - 2
degree = 5
order = degree + 1
n = order + k
t = np.zeros(math.ceil(len(df)/60) + (2*order) # create an array to store knots.
t[:order] = 0
t[-:order] = len(df)
t[order:-order] = (breaks)
xx = np.arange(len(df))
for i in range(0,n):
c = np.zeros(n)
c[i] = 1
spl = BSpline(t,c,degree)
plt.plot(xx, spl(xx))
plt.show()
With the python code above I get the plot:
For my python code, I will like to have all the BSplines in a single object and not just be able to plot each BSpline one at a time. My goal is to use the full set of BSplines and it to pass it into another function to perform smoothing.
Basically, I am trying to follow the steps below but using Python:

Efficient way to convert Latitude/Longitude to XY

I have a working script that converts Latitude and Longitude coordinates to Cartesian coordinates. However, I have to perform this for specific points at each point in time (row by row).
I want to do something similar on a larger df. I'm not sure if a loop that iterates over each row is the most efficient way to do this? Below is the script that converts a single XY point.
import math
import numpy as np
import pandas as pd
point1 = [-37.83028766, 144.9539561]
r = 6371000 #radians of earth meters
phi_0 = point1[1]
cos_phi_0 = math.cos(np.radians(phi_0))
def to_xy(point, r, cos_phi_0):
lam = point[0]
phi = point[1]
return (r * np.radians(lam) * cos_phi_0, r * np.radians(phi))
point1_xy = to_xy(point1, r, cos_phi_0)
This works fine if I want to convert between single points. The issue is if I have a large data frame or list (>100,000 rows) of coordinates. Would a loop that iterates through each row be inefficient. Is there a better way to perform the same function?
Below is an example of a fractionally bigger df.
d = ({
'Time' : [0,1,2,3,4,5,6,7,8],
'Lat' : [37.8300,37.8200,37.8200,37.8100,37.8000,37.8000,37.7900,37.7900,37.7800],
'Long' : [144.8500,144.8400,144.8600,144.8700,144.8800,144.8900,144.8800,144.8700,144.8500],
})
df = pd.DataFrame(data = d)
I will do this if I were you. (Btw: the tuple casting part can be optimized.
import numpy as np
import pandas as pd
point1 = [-37.83028766, 144.9539561]
def to_xy(point):
r = 6371000 #radians of earth meters
lam,phi = point
cos_phi_0 = np.cos(np.radians(phi))
return (r * np.radians(lam) * cos_phi_0,
r * np.radians(phi))
point1_xy = to_xy(point1)
print(point1_xy)
d = ({
'Lat' : [37.8300,37.8200,37.8200,37.8100,37.8000,37.8000,37.7900,37.7900,37.7800],
'Long' : [144.8500,144.8400,144.8600,144.8700,144.8800,144.8900,144.8800,144.8700,144.8500],
})
df = pd.DataFrame(d)
df['to_xy'] = df.apply(lambda x:
tuple(x.values),
axis=1).map(to_xy)
print(df)

Random walk pandas

I am trying to quickly create a simulated random walk series in pandas.
import pandas as pd
import numpy as np
dates = pd.date_range('2012-01-01', '2013-02-22')
y2 = np.random.randn(len(dates))/365
Y2 = pd.Series(y2, index=dates)
start_price = 100
would like to build another date series starting at start_price at beginning date and growing by the random growth rates.
pseudo code:
P0 = 100
P1 = 100 * exp(Y2)
P2 = P1 * exp(Y2)
very easy to do in excel, but I cant think of way of doing it without iterating over a dataframe/series with pandas and I also bump my head doing that.
have tried:
p = Y2.apply(np.exp)-1
y = p.cumsum(p)
y.plot()
this should give the cumulatively compound return since start
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
def geometric_brownian_motion(T = 1, N = 100, mu = 0.1, sigma = 0.01, S0 = 20):
dt = float(T)/N
t = np.linspace(0, T, N)
W = np.random.standard_normal(size = N)
W = np.cumsum(W)*np.sqrt(dt) ### standard brownian motion ###
X = (mu-0.5*sigma**2)*t + sigma*W
S = S0*np.exp(X) ### geometric brownian motion ###
return S
dates = pd.date_range('2012-01-01', '2013-02-22')
T = (dates.max()-dates.min()).days / 365
N = dates.size
start_price = 100
y = pd.Series(
geometric_brownian_motion(T, N, sigma=0.1, S0=start_price), index=dates)
y.plot()
plt.show()

Categories