Histograms plots over multiple Time Series - python

I have a pandas data set where I group the data by day. I would like to take this data and plot histrograms for each day on the same plot but offset to the day in which the data occured. I researched this item and someone stated that you need to use pcolor, which is a nice alternative
Here is a link to some example data..
http://pastebin.com/rKzj5Qzf
I attempted to use the lambda function in the below post,which create a Series. Pcolor does not like this series and says it need more that 1 value to unpack..
stackoverflow.com/questions/17050202/plot-timeseries-of-histograms-in-python
Does anyone know what I am doing wrong?
EDIT:
SO the output pasted from the series 'df' in the following
from running the following code snippet:
daily = x1.groupby(x1.date).price
f = lambda x: pd.Series(np.histogram(x, bins=bins)[0], index=bins[:-1])
df=daily.apply(f)
once, I do this i attempt to pass to matplotlib
import matplotlib as plt
plt.pcolor(df.T)
This is where I get the problem. I clearly have 3 items :date,price, count
EDIT:::Traceback
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-460b943e4ead> in <module>()
----> 1 plt.pcolor(df.T)
/usr/lib/pymodules/python2.7/matplotlib/pyplot.pyc in pcolor(*args, **kwargs)
2926 ax.hold(hold)
2927 try:
-> 2928 ret = ax.pcolor(*args, **kwargs)
2929 draw_if_interactive()
2930 finally:
/usr/lib/pymodules/python2.7/matplotlib/axes.pyc in pcolor(self, *args, **kwargs)
7543 shading = kwargs.pop('shading', 'flat')
7544
-> 7545 X, Y, C = self._pcolorargs('pcolor', *args, allmatch=False)
7546 Ny, Nx = X.shape
7547
/usr/lib/pymodules/python2.7/matplotlib/axes.pyc in _pcolorargs(funcname, *args, **kw)
7339 if len(args) == 1:
7340 C = args[0]
-> 7341 numRows, numCols = C.shape
7342 if allmatch:
7343 X, Y = np.meshgrid(np.arange(numCols), np.arange(numRows))
ValueError: need more than 1 value to unpack

Related

Interpolate_to_grid returns all nans

Practicing with MetPy Monday interpolate_to_grid for metar data and I successfully got the mslp grid to work.
Moving on to Potential temperature and the result has been all nan. When it "works". When it doesnt work, I get a set of errors that dont appear to help...
import numpy as np
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
from siphon.catalog import TDSCatalog
from metpy.io import parse_metar_file
from metpy.interpolate import interpolate_to_grid, remove_nan_observations
from metpy.plots import add_metpy_logo, current_weather, sky_cover, StationPlot
from metpy.calc import wind_components, wet_bulb_temperature, altimeter_to_station_pressure,potential_temperature,gradient
from metpy.units import units
from datetime import datetime,timedelta
import pandas as pd
mapcrs = ccrs.LambertConformal(central_longitude=-100.,central_latitude=35.,standard_parallels=(30.,60.))
datacrs = ccrs.PlateCarree()
cat = TDSCatalog('https://thredds-test.unidata.ucar.edu/thredds/catalog/noaaport/text/metar/catalog.xml')
ds = cat.datasets[-4]
dattim = ds.name[6:14]+' '+ds.name[15:19]
ds.download()
df = parse_metar_file(ds.name)
#pandas dataframe
#df.head()
df.columns.values
extent = [-120,-72,24,50]
df = df.dropna(subset=['latitude','longitude','elevation','altimeter','air_temperature','eastward_wind','northward_wind','air_pressure_at_sea_level','dew_point_temperature'])
lon = df['longitude'].values
lat = df['latitude'].values
stn_ids = df['station_id'].values
elev = df['elevation'].values
altimeter = df['altimeter'].values
t2 = df['air_temperature'].values
mslp = df['air_pressure_at_sea_level'].values
#projected coords
xp, yp, _ = mapcrs.transform_points(datacrs,lon,lat).T # x,y returned
#mslp WORKS
x_masked, y_masked, mslp = remove_nan_observations(xp,yp,mslp)
#altgridx,altgridy,alt = interpolate_to_grid(x_masked,y_masked,alt, interp_type='cressman')
altgridx,altgridy,mslp = interpolate_to_grid(x_masked,y_masked,mslp, interp_type='barnes',gamma=.5,kappa_star=10, hres=25000)
#Potential Temperature doesnt work
pres = altimeter_to_station_pressure(altimeter * units('mbar'), elev * units('m'))*33.8639
print(pres)
# theta
x_masked, y_masked, temp = remove_nan_observations(xp,yp,t2*units('degC'))
x_masked, y_masked, pres = remove_nan_observations(xp,yp,pres)
print(np.size(temp))
potemp = potential_temperature(pres, temp)
print(np.size(potemp))
print(np.unique(np.array(potemp)))
grdx = 75000.
thgridx,thgridy,theta = interpolate_to_grid(x_masked,y_masked, potemp, interp_type='barnes',kappa_star=6, gamma=0.5,hres=grdx)
print(np.shape(thgridx))
print(np.unique(theta))
Here is what is returned from the last section:
[949.361081708803 993.4468013877739 987.2845093729651 ... 1029.0930108008558 1016.002484792407 930.3708063382303] millibar
5837
5837
[236.32885315 237.21299941 239.04372591 ... 368.37047837 369.20079652
370.76269267]
---------------------------------------------------------------------------
DimensionalityError Traceback (most recent call last)
~/miniconda3/lib/python3.7/site-packages/pint/quantity.py in __float__(self)
896 return float(self._convert_magnitude_not_inplace(UnitsContainer()))
--> 897 raise DimensionalityError(self._units, "dimensionless")
898
DimensionalityError: Cannot convert from 'kelvin' to 'dimensionless'
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
/var/folders/5n/sg5k98bx6gg4flb4fskykh4m0000gn/T/ipykernel_41626/379842406.py in <module>
11
12 grdx = 75000.
---> 13 thgridx,thgridy,theta = interpolate_to_grid(x_masked,y_masked, potemp, interp_type='barnes',kappa_star=6, gamma=0.5,hres=grdx)
14 print(np.shape(thgridx))
15 print(np.unique(theta))
~/miniconda3/lib/python3.7/site-packages/metpy/pandas.py in wrapper(*args, **kwargs)
19 kwargs = {name: (v.values if isinstance(v, pd.Series) else v)
20 for name, v in kwargs.items()}
---> 21 return func(*args, **kwargs)
22 return wrapper
~/miniconda3/lib/python3.7/site-packages/metpy/interpolate/grid.py in interpolate_to_grid(x, y, z, interp_type, hres, minimum_neighbors, gamma, kappa_star, search_radius, rbf_func, rbf_smooth, boundary_coords)
301 minimum_neighbors=minimum_neighbors, gamma=gamma,
302 kappa_star=kappa_star, search_radius=search_radius,
--> 303 rbf_func=rbf_func, rbf_smooth=rbf_smooth)
304
305 return grid_x, grid_y, img.reshape(grid_x.shape)
~/miniconda3/lib/python3.7/site-packages/metpy/interpolate/points.py in interpolate_to_points(points, values, xi, interp_type, minimum_neighbors, gamma, kappa_star, search_radius, rbf_func, rbf_smooth)
365 return inverse_distance_to_points(points, values, xi, search_radius, gamma, kappa,
366 min_neighbors=minimum_neighbors,
--> 367 kind=interp_type)
368
369 # If this is radial basis function, make the interpolator and apply it
~/miniconda3/lib/python3.7/site-packages/metpy/interpolate/points.py in inverse_distance_to_points(points, values, xi, r, gamma, kappa, min_neighbors, kind)
268 img[idx] = cressman_point(dists, values_subset, r)
269 elif kind == 'barnes':
--> 270 img[idx] = barnes_point(dists, values_subset, kappa, gamma)
271
272 else:
ValueError: setting an array element with a sequence.
I struggled with Units, but I think the units are correct now. What could be causing this?
I tried cressman, I tried a larger Barnes grid, and I tried making sure search_radius was large. Still nan, when it worked.
The problem is caused by interpolate_to_grid choking on units when using Cressman or Barnes--which we definitely need to fix. For now the solution is to either use a different interpolation method (like interp_type='linear', the default), or to strip units before calling:
thgridx, thgridy, theta = interpolate_to_grid(x_masked, y_masked, potemp.magnitude,
interp_type='barnes', kappa_star=6, gamma=0.5, hres=grdx)
theta = units.Quantity(theta, 'K')
As far as your problems with NaNs is concerned, you may want to look at the search_radius parameter, which controls the maximum distance from a target point observations are considered. In some data-sparse areas, this could cause you to have some drop-outs. By default, it uses a guess of 5 times the average distance from one ob point to its nearest neighbor.

Getting a mistake with shap plotting

X = df.copy()
# Save and drop labels
y = df['class']
X = X.drop('class', axis=1)
cat_features = list(range(0, X.shape[1]))
model = CatBoostClassifier(iterations=2000, learning_rate=0.1, random_seed=12)
model.fit(X, y, verbose=False, plot=False)
explainer = shap.Explainer(model)
shap_values = explainer(X)
shap.force_plot(explainer.expected_value, shap_values[0:5,:],X.iloc[0:5,:], plot_cmap="DrDb")
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-170-ba1eca12b9ed> in <module>
----> 1 shap.force_plot(10, shap_values[0:5,:],X.iloc[0:5,:], plot_cmap="DrDb")
~\anaconda3\lib\site-packages\shap\plots\_force.py in force(base_value, shap_values, features, feature_names, out_names, link, plot_cmap, matplotlib, show, figsize, ordering_keys, ordering_keys_time_format, text_rotation, contribution_threshold)
101
102 if type(shap_values) != np.ndarray:
--> 103 return visualize(shap_values)
104
105 # convert from a DataFrame or other types
~\anaconda3\lib\site-packages\shap\plots\_force.py in visualize(e, plot_cmap, matplotlib, figsize, show, ordering_keys, ordering_keys_time_format, text_rotation, min_perc)
343 return AdditiveForceArrayVisualizer(e, plot_cmap=plot_cmap, ordering_keys=ordering_keys, ordering_keys_time_format=ordering_keys_time_format)
344 else:
--> 345 assert False, "visualize() can only display Explanation objects (or arrays of them)!"
346
347 class BaseVisualizer:
AssertionError: visualize() can only display Explanation objects (or arrays of them)!
Was trying to plot with shap and my data, but got a mistake and I actually don't understand why. Haven't found anything about this. Please explain how to avoid this error?
explainer.expected_value
-5.842052267820879
You should change the last line to this : shap.force_plot(explainer.expected_value, shap_values.values[0:5,:],X.iloc[0:5,:], plot_cmap="DrDb")
by calling shap_values.values instead of just shap_values, because shap_values holds the shapley values, the base_values and the data . I had the same problem until I inspected the variable.

A squarred variable is outside the index

A variation of this post, without the detailed traceback, had been posted in the SO about two hours ago. This version contains the whole traceback.)
I am running StatsModels to get parameter estimates from ordinary least-squares (OLS). Data-processing and model-specific commands are shown below. When I use import statsmodels.formula.api as smas the operative api, the OLS works as desired (after I drop some 15 rows programmatically), giving intuitive results. But when I switch to import statsmodels.api as sm as the binding api, without changing the code almost at all, things fall apart, and Python interpreter triggers an error saying that 'inc_2 is not in the index'. Mind you, inc_2 was computed after the dataframe was read into StatsModels in both model runs: and yet the run was successful in the first, but not in the second. (BTW, p_c_inc_18 is per-capita income, and inc_2 is the former squarred. inc_2 is the offensive element in the second run.)
import pandas as pd
import numpy as np
import statsmodels.api as sm
%matplotlib inline import
matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="whitegrid") eg=pd.read_csv(r'C:/../../../une_edu_pipc_06.csv') pd.options.display.precision = 3
plt.rc("figure", figsize=(16,8))
plt.rc("font", size=14)
sm_col = eg["lt_hsd_17"] + eg["hsd_17"]
eg["ut_hsd_17"] = sm_col
sm_col2 = eg["sm_col_17"] + eg["col_17"] eg["bnd_hsd_17"] = sm_col2
eg["d_09"]= eg["Rate_09"]-eg["Rate_06"]
eg["d_10"]= eg["Rate_10"]-eg["Rate_06"] inc_2=eg["p_c_inc_18"]*eg["p_c_inc_18"]
X = eg[["p_c_inc_18","ut_hsd_17","d_10","inc_2"]]
y = eg["Rate_18"]
X = sm.add_constant(X)
mod = sm.OLS(y, X)
res = mod.fit()
print(res.summary())
Here is the traceback in full.
KeyError Traceback (most recent call last)
<ipython-input-21-e2f4d325145e> in <module>
17 eg["d_10"]= eg["Rate_10"]-eg["Rate_06"]
18 inc_2=eg["p_c_inc_18"]*eg["p_c_inc_18"]
---> 19 X = eg[["p_c_inc_18","ut_hsd_17","d_10","inc_2"]]
20 y = eg["Rate_18"]
21 X = sm.add_constant(X)
~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2804 if is_iterator(key):
2805 key = list(key)
-> 2806 indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
2807
2808 # take() does not accept boolean indexers
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
1550 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
1551
-> 1552 self._validate_read_indexer(
1553 keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
1554 )
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
1644 if not (self.name == "loc" and not raise_missing):
1645 not_found = list(set(key) - set(ax))
-> 1646 raise KeyError(f"{not_found} not in index")
1647
1648 # we skip the warning on Categorical/Interval
KeyError: "['inc_2'] not in index"
What am I doing wrong?
The syntax you used insists that a list of strings is a legal index into eg. If you print(eg), you'll see that it has no such element. I think what you meant was to make a list of elements, each indexed by a single string.
X = [
eg["p_c_inc_18"],
eg["ut_hsd_17"],
eg["d_10"],
eg["inc_2"]
]

What to do with: ValueError: x and y can be no greater than 2-D, but have shapes (1L,) and (1L, 28L, 28L)

My goal is to plot a profile of a 2D FITS file that I already asked the user for with a try/except. I want to slice a part of this FITS file and then plot this part. I will just put my whole code here. I don't think there is anything wrong with the try/except part but the slicing and plotting have some erros.
My main problem is that the programme is giving me the following error ValueError: x and y can be no greater than 2-D, but have shapes (1L,) and (1L, 28L, 28L)
import os
files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
if f.endswith(".fits"):
print(f)
which gives me the two FITS files i have in my folder
from astropy.io import fits
valid = False
while not valid:
filename = input("Name of 2D FITS file:")
hdulist = fits.open(filename)
hdr= hdulist[0].header
try:
naxis = hdr['NAXIS']
valid = (naxis == 2)
if not valid:
print("Data structure is not two dimensional")
except Exception as e:
print(e)
valid = False
print("Valid input:", valid)
hdulist = fits.open(filename)
hdr= hdulist[0].header
Here ends my try/except and starts the messy part where i try to slice the FITS file over an angle
from matplotlib.pyplot import figure, show
from astropy.io import fits
data=hdulist[0].data
B=input("Enter the y value of the pixel defined as your starting value of the slice B=")
C=input("Enter the y value of the pixel defined as your stopping value of the slice C=")
D=input("Enter the x value of the pixel defined as your starting value of the slice D=")
E=input("Enter the x value of the pixel defined as your stopping value of the slice E=")
A = data[B:C,D:E]
astd = A.std()
print(astd)
fig = figure()
frame = fig.add_subplot(1,1,1)
mappable = frame.imshow(A, interpolation='none', origin="lower", cmap='jet', vmin=1500, vmax=8000)
cbar = fig.colorbar(mappable, ax=frame, fraction=0.046, pad=0.04)
show()
#the figure below always adjusts the axis such that the starting value is in the
#lower left corner and the stopping value in the upper right corner
import math
R= math.degrees(math.atan((C-B)/(E-D)))
L= ((E-D)**2+(C-B)**2)**(0.5)
print("The line from the starting to the stopping point has a length of:", L)
print("The line from the starting to the stopping point has an angle in degrees of:", R)
F=int(L)
print(F)
This gives me a slice, now i want to put this in a profile and plot the whole thing.
# construct interpolation function
import numpy
import scipy
from scipy import interpolate
x = numpy.arange(data.shape[1])
y = numpy.arange(data.shape[0])
f = scipy.interpolate.interp2d(x, y, data)
# extract values on line from D, B to E, C (degrees are not even necessary)
num_points = F
xvalues = numpy.linspace(D, E, num_points)
yvalues = numpy.linspace(B, C, num_points)
zvalues = f(xvalues, yvalues)
print(zvalues)
import numpy as np
from numpy import random
from matplotlib.pyplot import figure, show
from scipy import stats
from numpy import pi
profile=np.array([zvalues])
This is where i start to design the plot i used a previous design where velocity was used therefore the name "vels"
#Plot of profile
vels = np.linspace(0, 100, len(profile))
gam = profile.sum()
XY = (vels*profile).sum()
X0 = XY/gam
var = (1/gam)*(profile*(vels-X0)**2).sum()
sigma = var**0.5
skew = (1/gam)*(profile*(vels-X0)**3).sum()/sigma**3
kurt = (1/gam)*(profile*(vels-X0)**4).sum()/sigma**4
print("Mean:", X0)
print("Standard deviation:", sigma)
print("Skewness:", skew)
print("Fisher Kurtosis:", kurt-3)
a=profile.max
N=1000
def f(x,mu,sigma,a):
return a*(np.exp)((-(x-mu)**2)/(2*sigma**2))
fig = figure()
frame = fig.add_subplot(1,1,1)
frame.plot(vels, profile,)
frame.plot(vels,f(vels,X0, sigma,profile.max()), color="pink")
frame.set_xlabel('x-axis')
frame.set_ylabel('y-axis')
frame.grid(True)
show()
This is what the programme returns once I try to execute it.
('Mean:', 0.0)
('Standard deviation:', 0.0)
('Skewness:', nan)
('Fisher Kurtosis:', nan)
C:\Users\Thomas\Anaconda2\lib\site-packages\ipykernel\__main__.py:10: RuntimeWarning: invalid value encountered in double_scalars
C:\Users\Thomas\Anaconda2\lib\site-packages\ipykernel\__main__.py:11: RuntimeWarning: invalid value encountered in double_scalars
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-13-80786b6f31c3> in <module>()
24 fig = figure()
25 frame = fig.add_subplot(1,1,1)
---> 26 frame.plot(vels, profile,)
27 frame.plot(vels,f(vels,X0, sigma,profile.max()), color="pink")
28 frame.set_xlabel('x-axis')
C:\Users\Thomas\Anaconda2\lib\site-packages\matplotlib\__init__.pyc in inner(ax, *args, **kwargs)
1889 warnings.warn(msg % (label_namer, func.__name__),
1890 RuntimeWarning, stacklevel=2)
-> 1891 return func(ax, *args, **kwargs)
1892 pre_doc = inner.__doc__
1893 if pre_doc is None:
C:\Users\Thomas\Anaconda2\lib\site-packages\matplotlib\axes\_axes.pyc in plot(self, *args, **kwargs)
1404 kwargs = cbook.normalize_kwargs(kwargs, _alias_map)
1405
-> 1406 for line in self._get_lines(*args, **kwargs):
1407 self.add_line(line)
1408 lines.append(line)
C:\Users\Thomas\Anaconda2\lib\site-packages\matplotlib\axes\_base.pyc in _grab_next_args(self, *args, **kwargs)
405 return
406 if len(remaining) <= 3:
--> 407 for seg in self._plot_args(remaining, kwargs):
408 yield seg
409 return
C:\Users\Thomas\Anaconda2\lib\site-packages\matplotlib\axes\_base.pyc in _plot_args(self, tup, kwargs)
383 x, y = index_of(tup[-1])
384
--> 385 x, y = self._xy_from_xy(x, y)
386
387 if self.command == 'plot':
C:\Users\Thomas\Anaconda2\lib\site-packages\matplotlib\axes\_base.pyc in _xy_from_xy(self, x, y)
245 if x.ndim > 2 or y.ndim > 2:
246 raise ValueError("x and y can be no greater than 2-D, but have "
--> 247 "shapes {} and {}".format(x.shape, y.shape))
248
249 if x.ndim == 1:
ValueError: x and y can be no greater than 2-D, but have shapes (1L,) and (1L, 28L, 28L)

plotting rolling_mean (pandas) not working

I know there's a bunch of questions on here regarding the use of the rolling_mean function in pandas - but I can't get it to work.
I'm new to Python, and the packages numpy,pandas etc.
I have a list of DateTime objects and I have a list of simple integers. Plotting them works without a problem. But I can't plot a moving average line to the graph!
I'm trying to use the Pandas docs to understand it but I still can't get it to work. This is what I have tried:
# code before this simply reads a file and converts dates into datetime objects and values into a list
x1 = date_object #list of datetime objects
y1 = values1 #list of integer values
plot(x1,y1) # works fine
t = pd.date_range(date_object[0].strftime('%m/%d/%Y'), date_object[len(date_object)-1].strftime('%m/%d/%Y'), freq='W')
ts = pd.Series(y1, t)
ts_movavg = pd.rolling_mean(ts,10)
plot(ts_movavg)
..I get the following error:
ValueError: setting an array element with a sequence.
As you can probably quickly tell, I'm very confused. I think I'm missing the point of the Series object.
EDIT: (full traceback)
ValueError Traceback (most recent call last)
<ipython-input-228-2247062d3126> in <module>()
33 ts = pd.Series(y1, x1)
34
---> 35 ts_movavg = PD.rolling_mean(ts,10)
36
37 ts_movavg.head()
C:\Users\****\Anaconda\lib\site-packages\pandas\stats\moments.py in f(arg, window, min_periods, freq, center, time_rule, **kwargs)
507 return _rolling_moment(arg, window, call_cython, min_periods,
508 freq=freq, center=center,
--> 509 time_rule=time_rule, **kwargs)
510
511 return f
C:\Users\****\Anaconda\lib\site-packages\pandas\stats\moments.py in _rolling_moment(arg, window, func, minp, axis, freq, center, time_rule, **kwargs)
278 arg = _conv_timerule(arg, freq, time_rule)
279 calc = lambda x: func(x, window, minp=minp, **kwargs)
--> 280 return_hook, values = _process_data_structure(arg)
281 # actually calculate the moment. Faster way to do this?
282 if values.ndim > 1:
C:\Users\****\Anaconda\lib\site-packages\pandas\stats\moments.py in _process_data_structure(arg, kill_inf)
326
327 if not issubclass(values.dtype.type, float):
--> 328 values = values.astype(float)
329
330 if kill_inf:
ValueError: setting an array element with a sequence.
Could someone show me how to plot a moving average line using the rolling_mean function?

Categories