statsmodels SARIMAX with exogenous variables matrices are different sizes - python

I'm running a SARIMAX model but running into problems with specifying the exogenous variables. In the first block of code (below) I specify one exogenous variable lesdata['LESpost'] and the model runs without a problem. However, when I add in another exogenous variable I end up with an error message (see stack trace).
ar = (1,0,1) # AR(1 3)
ma = (0) # No MA terms
mod1 = sm.tsa.statespace.SARIMAX(lesdata['emadm'], exog= (lesdata['LESpost'],lesdata['QOF']), trend='c', order=(ar,0,ma), mle_regression=True)
Traceback (most recent call last):
File "<ipython-input-129-d1300aeaeffc>", line 4, in <module>
mle_regression=True)
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\tsa\statespace\sarimax.py", line 510, in __init__
endog, exog=exog, k_states=k_states, k_posdef=k_posdef, **kwargs
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\tsa\statespace\mlemodel.py", line 84, in __init__
missing='none')
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\tsa\base\tsa_model.py", line 43, in __init__
super(TimeSeriesModel, self).__init__(endog, exog, missing=missing)
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\base\model.py", line 212, in __init__
super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\base\model.py", line 63, in __init__
**kwargs)
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\base\model.py", line 88, in _handle_data
data = handle_data(endog, exog, missing, hasconst, **kwargs)
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\base\data.py", line 630, in handle_data
**kwargs)
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\base\data.py", line 80, in __init__
self._check_integrity()
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\base\data.py", line 496, in _check_integrity
super(PandasData, self)._check_integrity()
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\base\data.py", line 403, in _check_integrity
raise ValueError("endog and exog matrices are different sizes")
ValueError: endog and exog matrices are different sizes
Is there something obvious I am missing here? The variables are all of the same length and there are no missing data.
Thanks for reading and hope you can help !

Two dimensional data needs to have observations in row and variables in columns after applying numpy.asarray.
exog = (lesdata['LESpost'],lesdata['QOF'])
Applying asarray to this tuple puts the variables in rows which is the numpy default from the C origin which is not what statsmodels wants.
DataFrames are already shaped in the appropriate way, so one option is to use a DataFrame with the desired columns
exog = lesdata[['LESpost', 'QOF']]
Another option for list or tuples of array_likes is to use numpy.column_stack, e.g.
exog = np.column_stack((lesdata['LESpost'].values,lesdata['QOF'].values))

Related

How deal with the UndefinedUnitError?

I downloaded data from noaa and i wanted to calculate vertical velocity using the function vertical_velocity=metpy.calcmpcalc.vertical_velocity(omega,pressure,temperature). But something wrong when i dealing with the units of varibles.
import xarray as xr
import metpy.calc as mpcalc
omega=xr.open_dataset('D:\\data_english\\jwk\\omega.mon.mean.nc')
temperature=xr.open_dataset('D:\\data_english\\jwk\\air.mon.mean.nc')
height=xr.open_dataset('D:\\data_english\\jwk\\hgt.mon.mean.nc')
pressure=mpcalc.height_to_pressure_std(height['hgt'])
verticalwind=mpcalc.vertical_velocity(omega['omega'], pressure, temperature['air'])
Traceback (most recent call last):
File "<ipython-input-194-da22b63a1943>", line 1, in <module>
verticalwind=mpcalc.vertical_velocity(omega['omega'], pressure, temperature['air'])
File "D:\anaconda\lib\site-packages\metpy\xarray.py", line 1199, in wrapper
_mutate_arguments(bound_args, xr.DataArray, lambda arg, _: arg.metpy.unit_array)
File "D:\anaconda\lib\site-packages\metpy\xarray.py", line 1233, in _mutate_arguments
bound_args.arguments[arg_name] = mutate_arg(arg_val, arg_name)
File "D:\anaconda\lib\site-packages\metpy\xarray.py", line 1199, in <lambda>
_mutate_arguments(bound_args, xr.DataArray, lambda arg, _: arg.metpy.unit_array)
File "D:\anaconda\lib\site-packages\metpy\xarray.py", line 157, in unit_array
return units.Quantity(self._data_array.data, self.units)
File "D:\anaconda\lib\site-packages\metpy\xarray.py", line 134, in units
return units.parse_units(self._data_array.attrs.get('units', 'dimensionless'))
File "D:\anaconda\lib\site-packages\pint\registry.py", line 1084, in parse_units
units = self._parse_units(input_string, as_delta)
File "D:\anaconda\lib\site-packages\pint\registry.py", line 1298, in _parse_units
return super()._parse_units(input_string, as_delta)
File "D:\anaconda\lib\site-packages\pint\registry.py", line 1112, in _parse_units
cname = self.get_name(name)
File "D:\anaconda\lib\site-packages\pint\registry.py", line 636, in get_name
raise UndefinedUnitError(name_or_alias)
UndefinedUnitError: 'Pascal' is not defined in the unit registry
**The units of omega, height and temperature are 'Pascal/s', 'm' and 'degC', repectively. The varible pressure was calculate through the function mpcalc.height_to_pressure_std, and this function didn't give the unit of pressure. But the values of pressure range from 1000 to 0, so i think its unit is 'hpa'.
The error reported that "'Pascal' is not defined in the unit registry". Maybe 'Pascal/s' is not the default unit of omega? But how can i know which units are defined in the unit registry ? Can anyone help me? Thanks!**
This is a problem where the unit library MetPy uses (Pint) does not have the same rules about capitalization/case sensitivity as the UDUnits format used by the netCDF Climate and Forecasting Conventions for metadata. Fixing this is on MetPy's todo list, but some roadblocks have been encountered.
The work-around right now is to change your units to something that Pint understands, like:
omega['omega'].attrs['units'] = 'pascal / s'

Error while trying to use Regression on N_dimensional Arrays

My code is supposed to read audio files and predict another audio file(I dont care about accruacy for now just the error)
regr = svm.SVR()
print('Fitting...')
regr.fit(data0, data1)
clf1= regr.fit(sample_rate1,sample_rate0)
clf0 = regr.fit(data,data1)
print('Done!')
predata = clf.predict(data2)
predrate = clf1.predict(sample_rate2)
wavfile.write('result.wav',predrate,predata)# using predicted ndarrays it saves the audio file
The error which I get is:
Traceback (most recent call last):
File "D:\ Folder\Python\module properties\wav.py", line 10, in <module>
regr.fit(data0, data1)
File "C:\Users\Admin1\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\svm\_base.py", line 169, in fit
X, y = self._validate_data(X, y, dtype=np.float64,
File "C:\Users\Admin1\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\base.py", line 433, in _validate_data
X, y = check_X_y(X, y, **check_params)
File "C:\Users\Admin1\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\Admin1\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\utils\validation.py", line 826, in check_X_y
y = column_or_1d(y, warn=True)
File "C:\Users\Admin1\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\Admin1\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\utils\validation.py", line 864, in column_or_1d
raise ValueError(
ValueError: y should be a 1d array, got an array of shape (8960, 2) instead.
Check your independent and dependent variable X and y assignments.
The 'fit' function is used in the form model.fit(X,y), and the code that establishes the model fit and that gave you the error seems to be:
regr.fit(data0, data1)
Thus your predictor variables as written should be X = data0, and your target (output) variable should be y = data1.
Make sure you don't have it in reverse, and it shouldn't be:
regr.fit(data1, data0)
If the data is correctly assigned, try flattening the array.
You are also given the ValueError, "y should be a 1d array, got an array of shape (8960, 2) instead."
Flattening means converting a multidimensional array to a one dimensional array. Try reshape(-1).
data1 = data1.reshape(-1)
I hope this helps! Without any additional information about the dataset and the model's code, it's hard to figure out what to do next.

df.plot fails after pandas upgrade to v 1.0.1

I was using pandas 0.23.4 and just upgraded to 1.0.1.
I have a code which generated a dataframe and I would plot it as a stacked bar plot df.plot(kind='bar') and as an area plot df.plot.area(). It was working fine. I decided to upgrade pandas and now neither of the plot commands work. Here is an example:
df=pd.DataFrame()
df["col1"]=[0.7,0.2,0.1,0.0]
df["col2"]=[0.1,0.5,0.2,0.2]
df['col3']=[0.1,0.0,0.1,0.8]
df.plot.area()
This gives the error TypeError: float() argument must be a string or a number, not '_NoValueType'.
I don't know how to fix this. I would appreciate any help.
Thanks!
EDIT: Full error message:
Traceback (most recent call last):
File "<ipython-input-96-b436d7233c8a>", line 1, in <module>
df.plot.area()
File "C:\Users\Anaconda3\lib\site-packages\pandas\plotting\_core.py", line 1363, in area
return self(kind="area", x=x, y=y, **kwargs)
File "C:\Users\Anaconda3\lib\site-packages\pandas\plotting\_core.py", line 847, in __call__
return plot_backend.plot(data, kind=kind, **kwargs)
File "C:\Users\Anaconda3\lib\site-packages\pandas\plotting\_matplotlib\__init__.py", line 61, in plot
plot_obj.generate()
File "C:\Users\Anaconda3\lib\site-packages\pandas\plotting\_matplotlib\core.py", line 262, in generate
self._setup_subplots()
File "C:\Users\Anaconda3\lib\site-packages\pandas\plotting\_matplotlib\core.py", line 321, in _setup_subplots
axes = fig.add_subplot(111)
File "C:\Users\Anaconda3\lib\site-packages\matplotlib\figure.py", line 1257, in add_subplot
a = subplot_class_factory(projection_class)(self, *args, **kwargs)
File "C:\Users\Anaconda3\lib\site-packages\matplotlib\axes\_subplots.py", line 74, in __init__
self.update_params()
File "C:\Users\Anaconda3\lib\site-packages\matplotlib\axes\_subplots.py", line 136, in update_params
return_all=True)
File "C:\Users\Anaconda3\lib\site-packages\matplotlib\gridspec.py", line 467, in get_position
fig_bottom = fig_bottoms[rows].min()
File "C:\Users\Anaconda3\lib\site-packages\numpy\core\_methods.py", line 32, in _amin
return umr_minimum(a, axis, None, out, keepdims, initial)
TypeError: float() argument must be a string or a number, not '_NoValueType'
Okay, I rebooted my the computer and now everything works. No idea what was wrong before!

Memory error when computing eigenvalues in python

I try to find eigenvalues of adjacency matrix of a large graph (465,017 nodes, 834,797 edges). I try to find values using NetworkX adjacency_spectrum method. When I compiling I have a memory error.
Traceback (most recent call last):
File "5.py", line 19, in <module>
w=nx.adjacency_spectrum(G)
File "/home/aiym/anaconda3/lib/python3.5/site-packages/networkx/linalg/spectrum.py", line 75, in adjacency_spectrum
return eigvals(nx.adjacency_matrix(G,weight=weight).todense())
File "/home/aiym/anaconda3/lib/python3.5/site-packages/scipy/sparse/base.py", line 691, in todense
return np.asmatrix(self.toarray(order=order, out=out))
File "/home/aiym/anaconda3/lib/python3.5/site-packages/scipy/sparse/compressed.py", line 920, in toarray
return self.tocoo(copy=False).toarray(order=order, out=out)
File "/home/aiym/anaconda3/lib/python3.5/site-packages/scipy/sparse/coo.py", line 252, in toarray
B = self._process_toarray_args(order, out)
File "/home/aiym/anaconda3/lib/python3.5/site-packages/scipy/sparse/base.py", line 1009, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError
Can you help me to fix this problem? or suggest other methods to compute eigenvalues without memory error

Dask Memory Error with smaller than memory dataset

I am trying to quickly calculate the correlation matrix of a large dataset (745 rows x 18,048 columns) in Python 3. The dataset was originally read from a 50MB netCDF file and after some manipulation it came to this size. All of the data is stored as float32s. Therefore, I calculated that the final correlation matrix should be around 1.2 GB, which should easily fit into my 8 GB RAM. Using pandas' DataFrame and its methods, it can calculate the entire correlation matrix in around 100 minutes so it is possible to calculate it.
I read up on the dask module and decided to implement it. However, when I try to calculate using the same method, it almost immediately runs into a MemoryError, even though it should fit into memory. After some fiddling I realized it even fails on a relatively small 1000 x 1000 dataset. Is there something else that's going on underneath that is causing this error? I have posted my code below:
import dask.dataframe as ddf
# prepare data in dataframe called df
daskdf = ddf.from_pandas(df, chunksize=1000)
correlation = daskdf.corr()
correlation.compute()
And here's the error trace:
Traceback (most recent call last):
File "C:/Users/mughi/Documents/College Stuff/Project DIVA/Preprocessing Data/DaskCorr.py", line 36, in <module>
correlation.compute()
File "C:\Users\mughi\AppData\Local\Programs\Python\Python36\lib\site-packages\dask\base.py", line 94, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "C:\Users\mughi\AppData\Local\Programs\Python\Python36\lib\site-packages\dask\base.py", line 201, in compute
results = get(dsk, keys, **kwargs)
File "C:\Users\mughi\AppData\Local\Programs\Python\Python36\lib\site-packages\dask\threaded.py", line 76, in get
**kwargs)
File "C:\Users\mughi\AppData\Local\Programs\Python\Python36\lib\site-packages\dask\async.py", line 500, in get_async
raise(remote_exception(res, tb))
dask.async.MemoryError:
Traceback
---------
File "C:\Users\mughi\AppData\Local\Programs\Python\Python36\lib\site-packages\dask\async.py", line 266, in execute_task
result = _execute_task(task, data)
File "C:\Users\mughi\AppData\Local\Programs\Python\Python36\lib\site-packages\dask\async.py", line 247, in _execute_task
return func(*args2)
File "C:\Users\mughi\AppData\Local\Programs\Python\Python36\lib\site-packages\dask\dataframe\core.py", line 3283, in cov_corr_chunk
keep = np.bitwise_and(mask[:, None, :], mask[:, :, None])
Thank you!

Categories