Python time series: omitted values after using statsmodels.tsa.seasonal.seasonal_decompose? - python

Could anyone tell me what I did wrong that my first and last six observations are omitted in the final outcome?
I used the statsmode.tsa.seasonal_decompose to do seasonal adjustment.
Thanks.
import os
import statsmodels.api as sm
import pandas as pd
import numpy as np
#pd.options.display.mpl_style = 'default'
%matplotlib inline
#Load csv data#
cpi = pd.read_csv('/home/pythonwd/thai cpi.csv')
cpi = cpi.dropna()
#Create date and time series#
cpi['date'] = pd.to_datetime(cpi['date'], dayfirst=True)
cpi = cpi.set_index('date')
#Seasonal adjustment#
dec = sm.tsa.seasonal_decompose(cpi["cpi"],model='multiplicative')
dec.plot()
Data before the #Seasonal adjustment# line:
enter image description here
Data afterwards:
enter image description here

Related

How to read the note value in the score?

I am doing something like reading the MIDI info. Currently, I just read the onset and offset from PrettyMIDI. However, I found that the onset and offset is not really accurate(i hope it shows somthing like 1, 2.5, 3,4,6). It goes with something like 1,43, 2.34. Like this
So I am wondering how to read the exact note value.
For example, my expected result is [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,] in this image.
This is my current code.
import os
import sys
from matplotlib import pyplot as plt
from matplotlib import patches
from matplotlib import colors
import pretty_midi
import pandas as pd
import IPython.display as ipd
sys.path.append('..')
# import libfmp.c1
fn = os.path.join('..', 'data', 'C1', "Canon_in_D__Violin_Solo_.mid")
midi_data = pretty_midi.PrettyMIDI(fn)
midi_list = []
for instrument in midi_data.instruments:
for note in instrument.notes:
start = note.start
end = note.end
pitch = note.pitch
velocity = note.velocity
midi_list.append([start, end, pitch, velocity])
midi_list = sorted(midi_list, key=lambda x: (x[0], x[2]))
df = pd.DataFrame(midi_list, columns=['Start', 'End', 'Pitch', 'Velocity'])
html = df.to_html(index=True)
ipd.HTML(html)
Thanks

How to calculate the probability at a given condition when I know the outcome in pymc3?

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pymc3 as pm
from pymc3.math import dot, exp
import pandas as pd
trace = pm.sample(
10000,
chains=4,
tune=400,
return_inferencedata=True,
)
summary = az.summary(trace, hdi_prob=0.95)
print(summary)
with m:
pm.set_data({"condition1": [val1], "condition2": [val2], "condition3":
[val3]})
ppc = pm.sample_posterior_predictive(trace)
In the above code I have the values of three condition available and I know the output as well, I want to calculate the probability of arriving at that output.

Calculate Max DrawDown

I am using pyfolio to calcuate the maxdrawdown and other risk indicator. What should be adjusted to get the correct value?
Near 27% should be the right maxdrawdown, I don't why some negative value is returned. And it seems the whole drawdown table is not corrected or as expected.
Thanks in advance
benchmark files
results files
import pandas as pd
import pyfolio as pf
import os
import matplotlib.pyplot as plt
from pandas import read_csv
from pyfolio.utils import (to_utc, to_series)
from pyfolio.tears import (create_full_tear_sheet,
create_simple_tear_sheet,
create_returns_tear_sheet,
create_position_tear_sheet,
create_txn_tear_sheet,
create_round_trip_tear_sheet,
create_interesting_times_tear_sheet,)
test_returns = read_csv("C://temp//test_return.csv", index_col=0, parse_dates=True,header=None, squeeze=True)
print(test_returns)
benchmark_returns = read_csv("C://temp//benchmark.csv", index_col=0, parse_dates=True,header=None, squeeze=True)
print(benchmark_returns)
fig = pf.create_returns_tear_sheet(test_returns,benchmark_rets=benchmark_returns,return_fig=True)
fig.savefig("risk.png")
maxdrawdown = pf.timeseries.max_drawdown(test_returns)
print(maxdrawdown)
table = pf.timeseries.gen_drawdown_table(test_returns)
print(table)

ValueError: Length of values does not match length of index (PYTHON)

I'm trying to implement the stochastic indicator in TA-Lib but I'm getting the error above. The error is on the last line. Please see code below:
import pandas_datareader as pdr
import datetime
import pandas as pd
import numpy as np
import talib as ta
#Download Data
aapl = pdr.get_data_yahoo('AAPL', start=datetime.datetime(2006, 10, 1), end=datetime.datetime(2012, 1, 1))
#Saves Data as CSV on desktop
aapl.to_csv('C:\\Users\\JDOG\\Desktop\\aapl_ohlc.csv', encoding='utf-8')
#Save to dataframe
df = pd.read_csv('C:\\Users\JDOG\\Desktop\\aapl_ohlc.csv', header=0, index_col='Date', parse_dates=True)
#Initialize the `signals` DataFrame with the `signal` column
signals = pd.DataFrame(index=aapl.index)
signals['signal'] = 0.0
#Create slow stochastics //**Broken**
signals['Slow Stochastics'] = ta.STOCH(aapl.High.values,aapl.Low.values,aapl.Close.values,fastk_period=5,slowk_period=3,slowk_matype=0,slowd_period=3,slowd_matype=0)
Your error is that the STOCH function returns a tuple and you are trying to add a tuple value to your dataframe. Try this:
thirtyyear['StochSlowk'],thirtyyear['StochSlowD'] = ta.STOCH(thirtyyear['High'].values, thirtyyear['Low'].values, thirtyyear['Close'].values, fastk_period=5, slowk_period=3, slowk_matype=0, slowd_period=3, slowd_matype=0)

pandas.DataFrame returns Series not a Dataframe

I am working with a series of images. I read them first and store in the list then I convert them to dataframe and finally I would like to implement Isomap. When I read images (I have 84 of them) I get 84x2303 dataframe of objects. Now each object by itself also looks like a dataframe. I am wondering how to convert all of it to_numeric so I can use Isomap on it and then plot it.
Here is my code:
import pandas as pd
from scipy import misc
from mpl_toolkits.mplot3d import Axes3D
import matplotlib
import matplotlib.pyplot as plt
import glob
from sklearn import manifold
samples = []
path = 'Datasets/ALOI/32/*.png'
files = glob.glob(path)
for name in files:
img = misc.imread(name)
img = img[::2, ::2]
x = (img/255.0).reshape(-1,3)
samples.append(x)
df = pd.DataFrame.from_records(samples)
print df.dtypes
print df.shape
Thanks!

Categories