Decomposing Time Series using STL gives error

Decomposing Time Series using STL gives error - python

My code
stl_fcast = forecast(nottem_stl, steps=12, fc_func=seasonal_naive, seasonal = True)
Error Msg
ValueError Traceback (most recent call last)
<ipython-input-95-39c1ef0e911d> in <module>
1 stl_fcast = forecast(nottem_stl, steps=12, fc_func=seasonal_naive,
----> 2 seasonal = True)
3
4 stl_fcast.head()
~/opt/anaconda3/lib/python3.7/site-packages/stldecompose/stl.py in forecast(stl, fc_func, steps, seasonal, **fc_func_kwargs)
102
103 # forecast index starts one unit beyond observed series
--> 104 ix_start = stl.observed.index[-1] + pd.Timedelta(1, stl.observed.index.freqstr)
105 forecast_idx = pd.DatetimeIndex(freq=stl.observed.index.freqstr,
106 start=ix_start,
pandas/_libs/tslibs/timedeltas.pyx in > pandas._libs.tslibs.timedeltas.Timedelta.__new__()
ValueError: Units 'M' and 'Y' are no longer supported, as they do not represent unambiguous timedelta values durations.
This code used to work in older versions of Pandas - 0.25.
Appreciate any help, thanks.

Related

Proximityhash Type Error ,cannot convert the series to <class 'float'>

import proximityhash
# filtering the dataset with the required columns
df_new=df.filter(['latitude', 'longitude','cell_radius'])
# assign the column values to a variable
latitude = df_new['latitude']
longitude = df_new['longitude']
radius= df_new['cell_radius']
precision = 7
# passing the variable as the parameters to the proximityhash library
# getting the values and assigning those to a new column as proximityhash
df_new['proximityhash']=df_new.apply([proximityhash.create_geohash(latitude,longitude,radius,precision=7)])
print(df_new)
I had used this code where I imported some dataset and using that dataset I tried to filter the necessary columns and assign them to 3 variables (latitude, longitude, radius) and tried to create a new column as "proximityhash" to the new dataframe but it shows an error like below:
[enter image description here][1]
[1]: https://i.stack.imgur.com/2xW8S.png
TypeError Traceback (most recent call last)
Input In [29], in <cell line: 15>()
11 import pygeohash as gh
13 import proximityhash
---> 15 df_new['proximityhash']=df_new.apply([proximityhash.create_geohash(latitude,longitude,radius,precision=7)])
17 print(df_new)
File ~\Anaconda3\lib\site-packages\proximityhash.py:57, in create_geohash(latitude, longitude, radius, precision, georaptor_flag, minlevel, maxlevel)
54 height = (grid_height[precision - 1])/2
55 width = (grid_width[precision-1])/2
---> 57 lat_moves = int(math.ceil(radius / height)) #4
58 lon_moves = int(math.ceil(radius / width)) #2
60 for i in range(0, lat_moves):
File ~\Anaconda3\lib\site-packages\pandas\core\series.py:191, in _coerce_method.<locals>.wrapper(self)
189 if len(self) == 1:
190 return converter(self.iloc[0])
--> 191 raise TypeError(f"cannot convert the series to {converter}")
TypeError: cannot convert the series to <class 'float'>

Figured out a way to solve this, Posting the answer since it might be helpful for others.
Defined a function and pass the library to that specific column
# filtering the dataset with the required columns
df_new=df[['latitude', 'longitude','cell_radius']]
# getting a specified row (Since running the whole process might kill the kernel)
df_new = df_new.iloc[:100, ]
# predefined the precision value.
precision = 7
def PH(row):
latitude = row['latitude']
longitude = row['longitude']
cell_radius = row['cell_radius']
row['proximityhash'] = [proximityhash.create_geohash(float(latitude),float(longitude),float(cell_radius),precision=7)]
return row
df_new = df_new.apply(PH, axis=1)
df_new['proximityhash'] =pd.Series(df_new['proximityhash'], dtype="string")

'Timestamp' object is not subscriptable

This is my code, I am trying to get months for a new column
import pandas as pd
df = pd.read_excel("..\Data.xlsx")
df.head(4)
p = df["Month"][0]
p[0:3]
I don't know what's the issue is here but it was working well for other datasets with the same attributes
Dataset:
Month Passengers
0 1995-01-01 112
1 1995-02-01 118
2 1995-03-01 132
3 1995-04-01 129
4 1995-05-01 121
P.S: In the excel data set month values are in Jan-1995 Feb-1995 format, it changed to YY:MM:DAY format because of pandas.
Traceback (most recent call last):
File "C:\Users\sreen\AppData\Local\Temp/ipykernel_27276/630478717.py", line 1, in <module>
p[0:3]
TypeError: 'Timestamp' object is not subscriptable

Maybe you need to write p = df["Month"]? In you current code, p is the first value of the Month column, so p[0:3] is just a Timestamp, which can't be subscripted.

This shall work for you:
df.rename(columns = {'Month':'Date'}, inplace = True)
df['Month'] = pd.DatetimeIndex(df['Date']).month

Python For Loop giving RANDOM errors on each run, IndexError or ValueError

I'm new to python. I put together this code to pull daily Options data from yfinance for multiple stock symbols for all possible expiration dates for each stock symbol (each symbol can have different expirations dates). So I created two For Loops, first loop picks the stock, second loop picks the expiration date for that selected symbol. The code looks like this
pip install yfinance --upgrade --no-cache-dir
import yfinance as yf
import pandas as pd
# List of tickers
tickers = ["AAPL","ABBV","ABT","ACN","ADBE","ADI","ADP","AEP","AGG","ALL","AMAT","AMD","AMGN","AMT","AMZN","APD","ARKF","ARKG","ARKK","ARKQ","ARKW","ARKX","XOP"]
# Loop to call put and call values for all expirations for all tickers in the list
put_call_combined = []
for X in tickers:
ticker = X
DateArray = yf.Ticker(ticker).options
for Y in DateArray:
strikeChoice = Y
opt = yf.Ticker(ticker).option_chain(strikeChoice)
calls = opt.calls
puts = opt.puts
put_call_combined.append([calls.lastTradeDate.max().date(),ticker,strikeChoice,puts['openInterest'].sum(),puts['volume'].sum(),calls['openInterest'].sum(),calls['volume'].sum()])
ArrayStore = None
#Final Output
df = pd.DataFrame(data=put_call_combined, columns=["dataset_day","ticker", "expiry","putOI","putVolume","callOI","callVolume"])
df
My problem is; on every run I'm getting random errors, when I look at the final DF, I can see the loop was broken at different symbols. Sometimes it fails due to an IndexError
IndexError Traceback (most recent call last)
<ipython-input-26-de7016fd3a37> in <module>
6
7 ticker = X
----> 8 DateArray = yf.Ticker(ticker).options
9
10 for X in DateArray:
~\anaconda3\lib\site-packages\yfinance\ticker.py in options(self)
193 def options(self):
194 if not self._expirations:
--> 195 self._download_options()
196 return tuple(self._expirations.keys())
~\anaconda3\lib\site-packages\yfinance\ticker.py in _download_options(self, date, proxy)
59 self._expirations[_datetime.datetime.utcfromtimestamp(
60 exp).strftime('%Y-%m-%d')] = exp
---> 61 return r['optionChain']['result'][0]['options'][0]
62 return {}
63
IndexError: list index out of range
And sometime it's a ValueError;
ValueError Traceback (most recent call last)
<ipython-input-25-0a07edf80d9a> in <module>
9 for x in DateArray:
10 strikeChoice = x
---> 11 opt = yf.Ticker(ticker).option_chain(strikeChoice)
12 calls = opt.calls
13 puts = opt.puts
~\anaconda3\lib\site-packages\yfinance\ticker.py in option_chain(self, date, proxy, tz)
92 self._download_options()
93 if date not in self._expirations:
---> 94 raise ValueError(
95 "Expiration `%s` cannot be found. "
96 "Available expiration are: [%s]" % (
ValueError: Expiration `2021-10-15` cannot be found. Available expiration are: [2021-09-17, 2022-01-21, 2023-01-20]
If I reduce the number of symbols, for instance 4 or 5 symbols at a time, I can get the output for all of these symbols, but when the list becomes too many, then the errors start kicking in RANDOMLY. Anyone has any idea why this might be happening? Am I pushing the limits of Yahoo Finance API?
Thanks
MT

I'm getting a similar issue when I try to download a large number (100 or more) of historical stock prices using yf.download. I thought I might try using a for loop with yf.Ticker to get the prices one at a time, and put a try-except block inside the for loop to deal with exceptions. Hopefully that will at least allow the for loop to complete and I'll get the rest of the data.

get every season data from serveral years data

I'm using python and trying to calculate trends of SIC of different season. So i need to cut every season from all the months from 1979 to 2009
print sic.shape
(372, 180, 360)
sics=sic[90,:,:]
sicm=[]
for i in range(0,12):
sicj=sic[i::12,:,:]
sicm.append(sicj)
del sicj
sics[0::3,:,:]=sicm[11][:30,:,:]
sics[1::3,:,:]=sicm[0][1:,:,:]
sics[2::3,:,:]=sicm[1][1:,:,:]
then the result showed that
IndexErrorTraceback (most recent call last)
in ()
----> 1 sics[0::3,:,:]=sicm[11][:30,:,:]
/home/charcoalp/anaconda2/envs/pyn_test/lib/python2.7/site-packages/numpy/ma/core.pyc
in setitem(self, indx, value) 3299 _mask =
self._mask 3300 # Set the data, then the mask
-> 3301 _data[indx] = dval 3302 _mask[indx] = mval 3303 elif hasattr(indx, 'dtype') and (indx.dtype == MaskType):
IndexError: too many indices for array
My way is to cut every Jan,Feb,Mar...and make a new array to combine 3 months as the same season data. Is the problem can be solved or just my way is wrong?
Thanks a lot if you can help me

Pandas dataframe - remove outliers [duplicate]

This question already has answers here:
Detect and exclude outliers in a pandas DataFrame
(19 answers)
Closed 1 year ago.
Given a pandas dataframe, I want to exclude rows corresponding to outliers (Z-value = 3) based on one of the columns.
The dataframe looks like this:
df.dtypes
_id object
_index object
_score object
_source.address object
_source.district object
_source.price float64
_source.roomCount float64
_source.size float64
_type object
sort object
priceSquareMeter float64
dtype: object
For the line:
dff=df[(np.abs(stats.zscore(df)) < 3).all(axis='_source.price')]
The following exception is raised:
-------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-68-02fb15620e33> in <module>()
----> 1 dff=df[(np.abs(stats.zscore(df)) < 3).all(axis='_source.price')]
/opt/anaconda3/lib/python3.6/site-packages/scipy/stats/stats.py in zscore(a, axis, ddof)
2239 """
2240 a = np.asanyarray(a)
-> 2241 mns = a.mean(axis=axis)
2242 sstd = a.std(axis=axis, ddof=ddof)
2243 if axis and mns.ndim < a.ndim:
/opt/anaconda3/lib/python3.6/site-packages/numpy/core/_methods.py in _mean(a, axis, dtype, out, keepdims)
68 is_float16_result = True
69
---> 70 ret = umr_sum(arr, axis, dtype, out, keepdims)
71 if isinstance(ret, mu.ndarray):
72 ret = um.true_divide(
TypeError: unsupported operand type(s) for +: 'NoneType' and 'NoneType'
And the return value of
np.isreal(df['_source.price']).all()
is
True
Why do I get the above exception, and how can I exclude the outliers?

If one wants to use the Interquartile Range of a given dataset (i.e. IQR, as shown by a Wikipedia image below) (Ref):
def Remove_Outlier_Indices(df):
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = Q3 - Q1
trueList = ~((df < (Q1 - 1.5 * IQR)) |(df > (Q3 + 1.5 * IQR)))
return trueList
Based on the above eliminator function, the subset of outliers according to the dataset' statistical content can be obtained:
# Arbitrary Dataset for the Example
df = pd.DataFrame({'Data':np.random.normal(size=200)})
# Index List of Non-Outliers
nonOutlierList = Remove_Outlier_Indices(df)
# Non-Outlier Subset of the Given Dataset
dfSubset = df[nonOutlierList]

Use this boolean whenever you have this sort of issue:
df=pd.DataFrame({'Data':np.random.normal(size=200)}) #example
df[np.abs(df.Data-df.Data.mean())<=(3*df.Data.std())] #keep only the ones that are within +3 to -3 standard deviations in the column 'Data'.
df[~(np.abs(df.Data-df.Data.mean())>(3*df.Data.std()))] #or the other way around

I believe you could create a boolean filter with the outliers and then select the oposite of it.
outliers = stats.zscore(df['_source.price']).apply(lambda x: np.abs(x) == 3)
df_without_outliers = df[~outliers]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Decomposing Time Series using STL gives error - python

Related

Proximityhash Type Error ,cannot convert the series to <class 'float'>

'Timestamp' object is not subscriptable

Python For Loop giving RANDOM errors on each run, IndexError or ValueError

get every season data from serveral years data

Pandas dataframe - remove outliers [duplicate]

Categories

Resources