ValueError: Length of values does not match length of index (PYTHON) - python

I'm trying to implement the stochastic indicator in TA-Lib but I'm getting the error above. The error is on the last line. Please see code below:
import pandas_datareader as pdr
import datetime
import pandas as pd
import numpy as np
import talib as ta
#Download Data
aapl = pdr.get_data_yahoo('AAPL', start=datetime.datetime(2006, 10, 1), end=datetime.datetime(2012, 1, 1))
#Saves Data as CSV on desktop
aapl.to_csv('C:\\Users\\JDOG\\Desktop\\aapl_ohlc.csv', encoding='utf-8')
#Save to dataframe
df = pd.read_csv('C:\\Users\JDOG\\Desktop\\aapl_ohlc.csv', header=0, index_col='Date', parse_dates=True)
#Initialize the `signals` DataFrame with the `signal` column
signals = pd.DataFrame(index=aapl.index)
signals['signal'] = 0.0
#Create slow stochastics //**Broken**
signals['Slow Stochastics'] = ta.STOCH(aapl.High.values,aapl.Low.values,aapl.Close.values,fastk_period=5,slowk_period=3,slowk_matype=0,slowd_period=3,slowd_matype=0)

Your error is that the STOCH function returns a tuple and you are trying to add a tuple value to your dataframe. Try this:
thirtyyear['StochSlowk'],thirtyyear['StochSlowD'] = ta.STOCH(thirtyyear['High'].values, thirtyyear['Low'].values, thirtyyear['Close'].values, fastk_period=5, slowk_period=3, slowk_matype=0, slowd_period=3, slowd_matype=0)

Related

TypeError for correlation matrix for stocks during given period - Python

I am currently trying to get a program in Python that helps me to calculate the correlation in movement between equities.
The idea is to input multiple equities and have the program complete a matrix correlation.
I copied the code off of a YouTube tutorial, the idea was to adapt to my needs afterwards since I am a beginner and no expert in coding. In the video the code works fine, on my end I keep getting the same error messages.
Here is the code:
import numpy as np
import pandas as pd
# Used to grab the stock prices, with yahoo
import pandas_datareader as web
from datetime import datetime
# To visualize the results
import matplotlib.pyplot as plt
import seaborn
start = datetime(2021, 1, 1)
symbols_list = ['A', 'AAPL']
#array to store prices
symbols=[]
#array to store prices
symbols=[]
for ticker in symbols_list:
r = web.DataReader(ticker, 'yahoo', start)
# add a symbol column
r['Symbol'] = ticker
symbols.append(r)
# concatenate into df
df = pd.concat(symbols)
df = df.reset_index()
df = df[['Date', 'Close', 'Symbol']]
df.head()
df_pivot=df.pivot('Date','Symbol','Close').reset_index()
df_pivot.head()
corr_df = df_pivot.corr(method='pearson')
#reset symbol as index (rather than 0-X)
corr_df.head().reset_index()
#del corr_df.index.name
corr_df.head(10)
#plt.figure(figsize=(13,8)
#seaborn.heatmap(corr_df, annot=True, cmap='coolwarm')
#plt.figure()
Here is the error I am getting:

Python time series: omitted values after using statsmodels.tsa.seasonal.seasonal_decompose?

Could anyone tell me what I did wrong that my first and last six observations are omitted in the final outcome?
I used the statsmode.tsa.seasonal_decompose to do seasonal adjustment.
Thanks.
import os
import statsmodels.api as sm
import pandas as pd
import numpy as np
#pd.options.display.mpl_style = 'default'
%matplotlib inline
#Load csv data#
cpi = pd.read_csv('/home/pythonwd/thai cpi.csv')
cpi = cpi.dropna()
#Create date and time series#
cpi['date'] = pd.to_datetime(cpi['date'], dayfirst=True)
cpi = cpi.set_index('date')
#Seasonal adjustment#
dec = sm.tsa.seasonal_decompose(cpi["cpi"],model='multiplicative')
dec.plot()
Data before the #Seasonal adjustment# line:
enter image description here
Data afterwards:
enter image description here

Calculate Max DrawDown

I am using pyfolio to calcuate the maxdrawdown and other risk indicator. What should be adjusted to get the correct value?
Near 27% should be the right maxdrawdown, I don't why some negative value is returned. And it seems the whole drawdown table is not corrected or as expected.
Thanks in advance
benchmark files
results files
import pandas as pd
import pyfolio as pf
import os
import matplotlib.pyplot as plt
from pandas import read_csv
from pyfolio.utils import (to_utc, to_series)
from pyfolio.tears import (create_full_tear_sheet,
create_simple_tear_sheet,
create_returns_tear_sheet,
create_position_tear_sheet,
create_txn_tear_sheet,
create_round_trip_tear_sheet,
create_interesting_times_tear_sheet,)
test_returns = read_csv("C://temp//test_return.csv", index_col=0, parse_dates=True,header=None, squeeze=True)
print(test_returns)
benchmark_returns = read_csv("C://temp//benchmark.csv", index_col=0, parse_dates=True,header=None, squeeze=True)
print(benchmark_returns)
fig = pf.create_returns_tear_sheet(test_returns,benchmark_rets=benchmark_returns,return_fig=True)
fig.savefig("risk.png")
maxdrawdown = pf.timeseries.max_drawdown(test_returns)
print(maxdrawdown)
table = pf.timeseries.gen_drawdown_table(test_returns)
print(table)

How to input dataset while using Salesforce-merlion package for timeseries forecasting

I have installed Salesforce-Merlion package in my conda-environment. Now I want to use my own dataset to run the algorithm for forecasting. Here I need only one univariate series to forecast. But I cannot figure out how to do that. As there are some variables which I cannot find how to initialize those. In the example provided in GIThub, using some already splitted dataset. Can someone can help me out here?
GIThub example for forecasting is like this:
from merlion.utils import TimeSeries from ts_datasets.forecast import M4
# Data loader returns pandas DataFrames, which we convert to Merlion TimeSeries
time_series, metadata = M4(subset="Hourly")[0]
train_data = TimeSeries.from_pd(time_series[metadata.trainval])
test_data = TimeSeries.from_pd(time_series[~metadata.trainval])
The complete code with internal dataset is available in the following link:
https://github.com/salesforce/Merlion/tree/main/examples/forecast
(Here they are using their internal dataset M4)
Now, I have to use my dataset. So my code is like this:
from merlion.utils import TimeSeries
df = pd.read_csv(r'C:\Users\Doyel_De_Sarkar\Desktop\forecasting\15786_GIK.csv')
df.dropna(inplace=True)
df['ts'] = pd.to_datetime(df['ts'])
df.sort_values('ts', inplace=True)
trainval = []
for i in range(len(df)):
if i <= (round((len(df)*0.75),0)):
trainval.append(True)
else:
trainval.append(False)
df['trainval'] = trainval
df = df.drop(columns=['wday', 'hour'])
from merlion.utils import UnivariateTimeSeries
kpi = UnivariateTimeSeries(
time_stamps=df.ts, # timestamps in units of seconds
values=df.saps_total, # time series values
name="kpi" # optional: a name for this univariate
)
kpi_label = UnivariateTimeSeries(
time_stamps=df.ts, # timestamps in units of seconds
values=df.trainval # time series values
)
from merlion.utils import TimeSeries
time_series, metadata = kpi, kpi_label
train_data = TimeSeries.from_pd(time_series[metadata.trainval])
test_data = TimeSeries.from_pd(time_series[~metadata.trainval])
test_data = TimeSeries.from_pd(time_series[~metadata.trainval])
I am getting this following error
'UnivariateTimeSeries' object has no attribute 'trainval'
at this line:
train_data = TimeSeries.from_pd(time_series[metadata.trainval])
The reason you're getting this error is because trainval is not a parameter of the TimeSeries class. In the example from GitHub that you shared, metadata is a pandas timeframe, but you're constructing a TimeSeries object out of kpi_label.
I'm not sure exactly what your dataset looks like, but try using:
kpi_labels = df.trainval
instead.
Thank you SalmonKiller for taking out time to look into the issue. The dataset used in the github has a very weird data structure, hence I had to create the column trainval and set the metadata as the column df[['trainval']]. The univariate I had created was of no use. The issue was there with indexing. After I set the time stamp column as index , the issue got solved.
Here is the code which is running fine now.
import os
import numpy as np
import pandas as pd
from merlion.models.forecast.smoother import MSESConfig, MSES
from merlion.transform.resample import TemporalResample
from merlion.utils import TimeSeries
df = pd.read_csv(r'<file.csv>')
df['ts'] = pd.to_datetime(df['ts'])
df.set_index('ts', inplace=True)
df.sort_values('ts', inplace=True)
hours = pd.date_range(start=df.index[0], end=df.index[-1], freq='H')
mean = df.saps_total.mean()
df = df.reindex(hours, fill_value=mean)
trainval = []
for i in range(len(df)):
if i <= (round((len(df)*0.75),0)):
trainval.append(True)
else:
trainval.append(False)
df['trainval'] = trainval
df = df.drop(columns=['wday', 'hour'])
from merlion.utils import TimeSeries
time_series = df[['saps_total']]
metadata = df[['trainval']]
train_data = TimeSeries.from_pd(time_series[metadata.trainval])
test_data = TimeSeries.from_pd(time_series[~metadata.trainval])
from merlion.models.forecast.arima import Arima, ArimaConfig
config1 = ArimaConfig(max_forecast_steps=len(time_series[~metadata.trainval].index), order=(0, 1, 0),
transform=TemporalResample(granularity="1h"))
model1 = Arima(config1)
model1.train(train_data=train_data)
test_pred, test_err = model1.forecast(time_stamps=test_data.time_stamps)
print(test_pred)

How do I convert data from a Scikit-learn Bunch object to a Pandas DataFrame?

I have used the following code to convert the sk learn breast cancer data set to data frame : I am not getting the output ? I am very new in python and not able to figure out what is wrong.
def answer_one():
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
data = numpy.c_[cancer.data, cancer.target]
columns = numpy.append(cancer.feature_names, ["target"])
return pandas.DataFrame(data, columns=columns)
answer_one()
Use pandas
There was a great answer here: How to convert a Scikit-learn dataset to a Pandas dataset?
The keys in bunch object give you an idea about which data you want to make columns for.
df = pd.DataFrame(cancer.data, columns=cancer.feature_names)
df['target'] = pd.Series(cancer.target)
The following code works
def answer_one():
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
data = np.c_[cancer.data, cancer.target]
columns = np.append(cancer.feature_names, ["target"])
return pd.DataFrame(data, columns=columns)
answer_one()
The reason why your code doesn't work before was you try to call numpy and pandas package again after defining it as np and pd respectively.
However, i suggest that the package loading and redefinition is done at the beginning of the script, outside a function definition.
As of scikit-learn 0.23 you can do the following to get a DataFrame and save some keystrokes:
df = load_breast_cancer(as_frame=True)
df.frame
dataframe = pd.DataFrame(data=cancer.data, columns=cancer.feature_names)
dataframe['target'] = cancer.target
return dataframe

Categories