Making a sns.pairplot using scikit wine dataset

Making a sns.pairplot using scikit wine dataset - python

This seems simple enough, but I can't find a solution online.
I am trying to create an sns.pairplot in Python. I have downloaded the wine dataset, kept the features that I need, and run the plot.
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_wine
# Load the wine dataset
wine = datasets.load_wine()
wine = list(zip(wine.data, wine.target))
wine = load_wine()
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
wine = load_wine
data = load_wine()
df = pd.DataFrame(data.data, columns=data.feature_names)
#This is the code that should run the plot
b=sns.pairplot(df, vars = df.columns[1 :], hue = "target", height = 2.5)
But I get this error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2894 try:
-> 2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'target'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-108-1107acc27949> in <module>
----> 1 b=sns.pairplot(df, vars = df.columns[1 :], hue = "target", height = 2.5)
2
3 plt.show()
~\anaconda3\lib\site-packages\seaborn\_decorators.py in inner_f(*args, **kwargs)
44 )
45 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46 return f(**kwargs)
47 return inner_f
48
~\anaconda3\lib\site-packages\seaborn\axisgrid.py in pairplot(data, hue, hue_order, palette, vars, x_vars, y_vars, kind, diag_kind, markers, height, aspect, corner, dropna, plot_kws, diag_kws, grid_kws, size)
1923 # Set up the PairGrid
1924 grid_kws.setdefault("diag_sharey", diag_kind == "hist")
-> 1925 grid = PairGrid(data, vars=vars, x_vars=x_vars, y_vars=y_vars, hue=hue,
1926 hue_order=hue_order, palette=palette, corner=corner,
1927 height=height, aspect=aspect, dropna=dropna, **grid_kws)
~\anaconda3\lib\site-packages\seaborn\_decorators.py in inner_f(*args, **kwargs)
44 )
45 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46 return f(**kwargs)
47 return inner_f
48
~\anaconda3\lib\site-packages\seaborn\axisgrid.py in __init__(self, data, hue, hue_order, palette, hue_kws, vars, x_vars, y_vars, corner, diag_sharey, height, aspect, layout_pad, despine, dropna, size)
1212 index=data.index)
1213 else:
-> 1214 hue_names = categorical_order(data[hue], hue_order)
1215 if dropna:
1216 # Filter NA from the list of unique hue names
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2900 if self.columns.nlevels > 1:
2901 return self._getitem_multilevel(key)
-> 2902 indexer = self.columns.get_loc(key)
2903 if is_integer(indexer):
2904 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
-> 2897 raise KeyError(key) from err
2898
2899 if tolerance is not None:
KeyError: 'target'
The solution linked to this question: How to convert a Scikit-learn dataset to a Pandas dataset unfortunately doesn't seem to work here.
I also tried 'class' instead of target. Could it be that the 'zip' function isn't working correctly above, so the program can't identify 'target'?
Thank you in advance!

From what you typed it works like this.
from sklearn.datasets import load_iris
wine = load_wine
data = load_wine()
df = pd.DataFrame(data.data, columns=data.feature_names)
#This is the code that should run the plot
b=sns.pairplot(df, vars = df.columns[1 :], height = 2.5)
The question is how do you want to highlight features and why?
You cut alcohol from the list so the target simply won't be aligned.
Second thing is that it's feature wise pairplot not target/class.
So all in all I don't understand what you are trying to do here

Related

Linear regression (Plotting a regression line)

I am trying to plot a regression line for my assignment as shown below.
# Basic Libraries
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
sb.set()
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
#Structure of data that I am working with
hsedata = pd.read_csv('train.csv')
salePrice = pd.DataFrame(hsedata['SalePrice'])
grLivArea = pd.DataFrame(hsedata['GrLivArea'])
SPtrain, SPtest, LAtrain, LAtest = train_test_split(salePrice, grLivArea, test_size = 0.2)
SPtrain Output
LAtrain Output
#Linear Regression
linreg = LinearRegression()
linreg.fit(LAtrain, SPtrain)
print('Intercept: ', linreg.intercept_)
print('Gradient: ', linreg.coef_)
Intercept: [22485.24894138]
Gradient: [[103.92233101]]
salePricePredict = pd.DataFrame(linreg.predict(LAtrain))
print(salePricePredict.shape)
f = plt.figure(figsize=(16,8))
plt.scatter(LAtrain, SPtrain)
plt.scatter(LAtrain, salePricePredict, color = 'r')
plt.show()
Scatterplot regression line
Up to this point, I managed to print out the regression line with plt.scatter() with no issue as shown above. However, I want to print out a straight line instead with the following code below:
salePricePredict = pd.DataFrame(linreg.predict(LAtrain))
f = plt.figure(figsize=(16,8))
plt.scatter(LAtrain, SPtrain)
plt.plot(LAtrain, salePricePredict, color = 'r')
plt.show()
But this produces error type:
TypeError Traceback (most recent call last)
File ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py:3621, in Index.get_loc(self, key, method, tolerance)
3620 try:
-> 3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
File ~\anaconda3\lib\site-packages\pandas\_libs\index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()
File ~\anaconda3\lib\site-packages\pandas\_libs\index.pyx:142, in pandas._libs.index.IndexEngine.get_loc()
TypeError: '(slice(None, None, None), None)' is an invalid key
During handling of the above exception, another exception occurred:
InvalidIndexError Traceback (most recent call last)
Input In [24], in <cell line: 4>()
2 f = plt.figure(figsize=(16,8))
3 plt.scatter(LAtrain, SPtrain)
----> 4 plt.plot(LAtrain, salePricePredict, color = 'r')
5 plt.show()
File ~\anaconda3\lib\site-packages\matplotlib\pyplot.py:2757, in plot(scalex, scaley, data, *args, **kwargs)
2755 #_copy_docstring_and_deprecators(Axes.plot)
2756 def plot(*args, scalex=True, scaley=True, data=None, **kwargs):
-> 2757 return gca().plot(
2758 *args, scalex=scalex, scaley=scaley,
2759 **({"data": data} if data is not None else {}), **kwargs)
File ~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py:1632, in Axes.plot(self, scalex, scaley, data, *args, **kwargs)
1390 """
1391 Plot y versus x as lines and/or markers.
1392
(...)
1629 (``'green'``) or hex strings (``'#008000'``).
1630 """
1631 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D)
-> 1632 lines = [*self._get_lines(*args, data=data, **kwargs)]
1633 for line in lines:
1634 self.add_line(line)
File ~\anaconda3\lib\site-packages\matplotlib\axes\_base.py:312, in _process_plot_var_args.__call__(self, data, *args, **kwargs)
310 this += args[0],
311 args = args[1:]
--> 312 yield from self._plot_args(this, kwargs)
File ~\anaconda3\lib\site-packages\matplotlib\axes\_base.py:487, in _process_plot_var_args._plot_args(self, tup, kwargs, return_kwargs)
484 kw[prop_name] = val
486 if len(xy) == 2:
--> 487 x = _check_1d(xy[0])
488 y = _check_1d(xy[1])
489 else:
File ~\anaconda3\lib\site-packages\matplotlib\cbook\__init__.py:1327, in _check_1d(x)
1321 with warnings.catch_warnings(record=True) as w:
1322 warnings.filterwarnings(
1323 "always",
1324 category=Warning,
1325 message='Support for multi-dimensional indexing')
-> 1327 ndim = x[:, None].ndim
1328 # we have definitely hit a pandas index or series object
1329 # cast to a numpy array.
1330 if len(w) > 0:
File ~\anaconda3\lib\site-packages\pandas\core\frame.py:3505, in DataFrame.__getitem__(self, key)
3503 if self.columns.nlevels > 1:
3504 return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = [indexer]
File ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py:3628, in Index.get_loc(self, key, method, tolerance)
3623 raise KeyError(key) from err
3624 except TypeError:
3625 # If we have a listlike key, _check_indexing_error will raise
3626 # InvalidIndexError. Otherwise we fall through and re-raise
3627 # the TypeError.
-> 3628 self._check_indexing_error(key)
3629 raise
3631 # GH#42269
File ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py:5637, in Index._check_indexing_error(self, key)
5633 def _check_indexing_error(self, key):
5634 if not is_scalar(key):
5635 # if key is not a scalar, directly raise an error (the code below
5636 # would convert to numpy arrays and raise later any way) - GH29926
-> 5637 raise InvalidIndexError(key)
InvalidIndexError: (slice(None, None, None), None)
However if I add in .squeeze() to both parameters for .plot as shown below, it works as intended:
salePricePredict = pd.DataFrame(linreg.predict(LAtrain))
f = plt.figure(figsize=(16,8))
plt.scatter(LAtrain, SPtrain)
plt.plot(LAtrain.squeeze(), salePricePredict.squeeze(), color = 'r')
plt.show()
With .squeeze() added
I was wondering why is this the case and what does .squeeze() do to my input datatype? I've tried reading some documentation and explanation as to what it does but I got nowhere with it.
I've also tried comparing the datatypes of what I was working with by using .shape, but .squeeze() does not seem to change anything. Any explanation is greatly appreciated!

What numpy squeeze does is removing the dimensions of size 1 from your array.
The output of your prediction is a two dimensional array, and matplotlib .plot() method can't plot a vector against a two dimensional array.
Try doing the following:
X = np.array([[1,2,3]])
print(X.shape)
print(X.squeeze().shape)
Check how X has only one dimension after using .squeeze(); it has become a vector.
You can check the same by reading the code below:
a = np.array([[1,2,3]])
b = np.array([1, 2, 3])
plt.plot(a.squeeze(), b) # This works
plt.plot(a,b) # This throws a ValueError

Issue with Trackpy - InvalidIndexError when fitting EMSD to a power law from the tutorial

possibly a silly problem, but I'm stuck and I'd really appreciate someone's help. I'm trying to replicate the examples given in the walkthrough tutorial for trackpy (https://soft-matter.github.io/trackpy/dev/tutorial/walkthrough.html). I've been succesful up until I got to the point where I had to fit the ensemble mean-squared displacement to a power law (In [34]):
plt.figure()
plt.ylabel(r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]')
plt.xlabel('lag time $t$');
tp.utils.fit_powerlaw(em) # performs linear best fit in log space, plots]
This is the error message I get. Can someone suggest a solution?
Thanks!
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py:3621, in Index.get_loc(self, key, method, tolerance)
3620 try:
-> 3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx:142, in pandas._libs.index.IndexEngine.get_loc()
TypeError: '(slice(None, None, None), None)' is an invalid key
During handling of the above exception, another exception occurred:
InvalidIndexError Traceback (most recent call last)
Input In [178], in <cell line: 4>()
2 plt.ylabel(r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]')
3 plt.xlabel('lag time $t$');
----> 4 tp.utils.fit_powerlaw(em)
File ~/opt/anaconda3/lib/python3.9/site-packages/trackpy/utils.py:50, in fit_powerlaw(data, plot, **kwargs)
48 if plot:
49 from trackpy import plots
---> 50 plots.fit(data, fits, logx=True, logy=True, legend=False, **kwargs)
51 return values
File ~/opt/anaconda3/lib/python3.9/site-packages/trackpy/plots.py:50, in make_axes.<locals>.wrapper(*args, **kwargs)
47 # Delete legend keyword so remaining ones can be passed to plot().
48 legend = kwargs.pop('legend', False)
---> 50 result = func(*args, **kwargs)
52 if legend:
53 handles, labels = kwargs['ax'].get_legend_handles_labels()
File ~/opt/anaconda3/lib/python3.9/site-packages/trackpy/plots.py:652, in fit(data, fits, inverted_model, logx, logy, ax, **kwargs)
650 ax.set_yscale('log')
651 if not inverted_model:
--> 652 fitlines = ax.plot(fits.index, fits, **kwargs)
653 else:
654 fitlines = ax.plot(fits.reindex(data.dropna().index),
655 data.dropna(), **kwargs)
File ~/opt/anaconda3/lib/python3.9/site-packages/matplotlib/axes/_axes.py:1632, in Axes.plot(self, scalex, scaley, data, *args, **kwargs)
1390 """
1391 Plot y versus x as lines and/or markers.
1392
(...)
1629 (``'green'``) or hex strings (``'#008000'``).
1630 """
1631 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D)
-> 1632 lines = [*self._get_lines(*args, data=data, **kwargs)]
1633 for line in lines:
1634 self.add_line(line)
File ~/opt/anaconda3/lib/python3.9/site-packages/matplotlib/axes/_base.py:312, in _process_plot_var_args.__call__(self, data, *args, **kwargs)
310 this += args[0],
311 args = args[1:]
--> 312 yield from self._plot_args(this, kwargs)
File ~/opt/anaconda3/lib/python3.9/site-packages/matplotlib/axes/_base.py:488, in _process_plot_var_args._plot_args(self, tup, kwargs, return_kwargs)
486 if len(xy) == 2:
487 x = _check_1d(xy[0])
--> 488 y = _check_1d(xy[1])
489 else:
490 x, y = index_of(xy[-1])
File ~/opt/anaconda3/lib/python3.9/site-packages/matplotlib/cbook/__init__.py:1327, in _check_1d(x)
1321 with warnings.catch_warnings(record=True) as w:
1322 warnings.filterwarnings(
1323 "always",
1324 category=Warning,
1325 message='Support for multi-dimensional indexing')
-> 1327 ndim = x[:, None].ndim
1328 # we have definitely hit a pandas index or series object
1329 # cast to a numpy array.
1330 if len(w) > 0:
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py:3505, in DataFrame.__getitem__(self, key)
3503 if self.columns.nlevels > 1:
3504 return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = [indexer]
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py:3628, in Index.get_loc(self, key, method, tolerance)
3623 raise KeyError(key) from err
3624 except TypeError:
3625 # If we have a listlike key, _check_indexing_error will raise
3626 # InvalidIndexError. Otherwise we fall through and re-raise
3627 # the TypeError.
-> 3628 self._check_indexing_error(key)
3629 raise
3631 # GH#42269
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py:5637, in Index._check_indexing_error(self, key)
5633 def _check_indexing_error(self, key):
5634 if not is_scalar(key):
5635 # if key is not a scalar, directly raise an error (the code below
5636 # would convert to numpy arrays and raise later any way) - GH29926
-> 5637 raise InvalidIndexError(key)
InvalidIndexError: (slice(None, None, None), None)

Trying to create multiple boxplots using df.plot(kind='box) and receiving "IndexError: index 0 is out of bounds for axis 0 with size 0"

I am working through the exercises in Jason Brownlee's "Machine Learning Mastery with Python" and in Chapter 21, we use the Sonar dataset found in the UCI repository.
I've read in the sonar.all-data file:
file = 'datasets/sonar.all-data'
df = pd.read_csv(file, sep=',', header=None)
The data has shape (208, 61).
I now want to create a grid of boxplots with each column in the df as it's own plot.
df.plot(kind='box', subplots=True, layout=(8,8),
sharex=False, sharey=False, figsize=(12, 12))
When executing this code in the notebook, I get the following error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/Desktop/Python/Machine Learning Mastery with Python/mlmp_venv/lib/python3.9/site-packages/pandas/core/series.py in __setitem__(self, key, value)
1061 try:
-> 1062 self._set_with_engine(key, value)
1063 except (KeyError, ValueError):
~/Desktop/Python/Machine Learning Mastery with Python/mlmp_venv/lib/python3.9/site-packages/pandas/core/series.py in _set_with_engine(self, key, value)
1094 # fails with AttributeError for IntervalIndex
-> 1095 loc = self.index._engine.get_loc(key)
1096 # error: Argument 1 to "validate_numeric_casting" has incompatible type
~/Desktop/Python/Machine Learning Mastery with Python/mlmp_venv/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
~/Desktop/Python/Machine Learning Mastery with Python/mlmp_venv/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
During handling of the above exception, another exception occurred:
IndexError Traceback (most recent call last)
/var/folders/v1/yp29bgzj6631r45byvnx3tc00000gn/T/ipykernel_29179/2130552721.py in <module>
----> 1 df.plot(kind='box', subplots=True, layout=(8,8), sharex=False, sharey=False, figsize=(12,12));
~/Desktop/Python/Machine Learning Mastery with Python/mlmp_venv/lib/python3.9/site-packages/pandas/plotting/_core.py in __call__(self, *args, **kwargs)
970 data.columns = label_name
971
--> 972 return plot_backend.plot(data, kind=kind, **kwargs)
973
974 __call__.__doc__ = __doc__
~/Desktop/Python/Machine Learning Mastery with Python/mlmp_venv/lib/python3.9/site-packages/pandas/plotting/_matplotlib/__init__.py in plot(data, kind, **kwargs)
69 kwargs["ax"] = getattr(ax, "left_ax", ax)
70 plot_obj = PLOT_CLASSES[kind](data, **kwargs)
---> 71 plot_obj.generate()
72 plot_obj.draw()
73 return plot_obj.result
~/Desktop/Python/Machine Learning Mastery with Python/mlmp_venv/lib/python3.9/site-packages/pandas/plotting/_matplotlib/core.py in generate(self)
286 self._compute_plot_data()
287 self._setup_subplots()
--> 288 self._make_plot()
289 self._add_table()
290 self._make_legend()
~/Desktop/Python/Machine Learning Mastery with Python/mlmp_venv/lib/python3.9/site-packages/pandas/plotting/_matplotlib/boxplot.py in _make_plot(self)
144 )
145 self.maybe_color_bp(bp)
--> 146 self._return_obj[label] = ret
147
148 label = [pprint_thing(label)]
~/Desktop/Python/Machine Learning Mastery with Python/mlmp_venv/lib/python3.9/site-packages/pandas/core/series.py in __setitem__(self, key, value)
1065 if is_integer(key) and self.index.inferred_type != "integer":
1066 # positional setter
-> 1067 values[key] = value
1068 else:
1069 # GH#12862 adding a new key to the Series
IndexError: index 0 is out of bounds for axis 0 with size 0
Could someone please help me understand what this error means, what is causing the error, and how to fix it?
Thanks in advance!

This box plot doesn't work if the column headers are unnamed. After you name each of the columns, the box plot should work.

How to enable season selection as JJAS instead of JJA in xarray

I am relatively new to python and programming and have been trying to make some initial plots of precipitation data for the Indian subcontinent specifically for the indian summer monsoon through the period of June,July,August and September. I have managed to understand some of the code in a tutorial to obtain plot for JJA shown below but failing to modify it suitably to show me season as JJAS instead of JJA. Simply substituting JJAS in place of JJA ofcourse yielded the error
KeyError: 'JJAS'
I have seen one solution to this on the same forum but I am unable to adapt it to my code. I would be extremely grateful if I could receive any advice on this. Thank you !
Below is the code
import xarray as xr
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import numpy as np
import cmocean
accesscm2_pr_file = r'C:\Users\uSER\Desktop\DissTrack1\ESGF data files\pr_Amon_CAMS-CSM1-0_historical_r1i1p1f1_gn_185001-201412.nc'
dset = xr.open_dataset(accesscm2_pr_file)
clim = dset['pr'].groupby('time.season').mean('time', keep_attrs=True)
clim.data = clim.data * 86400
clim.attrs['units'] = 'mm/day'
fig = plt.figure(figsize=[12,5])
ax = fig.add_subplot(111, projection=ccrs.PlateCarree(central_longitude=180))
clim.sel(season='JJAS').plot.contourf(ax=ax,
levels=np.arange(0, 13.5, 1.5),
extend='max',
transform=ccrs.PlateCarree(),
cbar_kwargs={'label': clim.units},
cmap=cmocean.cm.haline_r)
ax.coastlines()
plt.show()
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'JJAS'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_16124/3658430410.py in <module>
15 fig = plt.figure(figsize=[12,5])
16 ax = fig.add_subplot(111, projection=ccrs.PlateCarree(central_longitude=180))
---> 17 clim.sel(season='JJAS').plot.contourf(ax=ax,
18 levels=np.arange(0, 13.5, 1.5),
19 extend='max',
~\anaconda3\lib\site-packages\xarray\core\dataarray.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
1269 Dimensions without coordinates: points
1270 """
-> 1271 ds = self._to_temp_dataset().sel(
1272 indexers=indexers,
1273 drop=drop,
~\anaconda3\lib\site-packages\xarray\core\dataset.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
2363 """
2364 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel")
-> 2365 pos_indexers, new_indexes = remap_label_indexers(
2366 self, indexers=indexers, method=method, tolerance=tolerance
2367 )
~\anaconda3\lib\site-packages\xarray\core\coordinates.py in remap_label_indexers(obj, indexers, method, tolerance, **indexers_kwargs)
419 }
420
--> 421 pos_indexers, new_indexes = indexing.remap_label_indexers(
422 obj, v_indexers, method=method, tolerance=tolerance
423 )
~\anaconda3\lib\site-packages\xarray\core\indexing.py in remap_label_indexers(data_obj, indexers, method, tolerance)
272 coords_dtype = data_obj.coords[dim].dtype
273 label = maybe_cast_to_coords_dtype(label, coords_dtype)
--> 274 idxr, new_idx = convert_label_indexer(index, label, dim, method, tolerance)
275 pos_indexers[dim] = idxr
276 if new_idx is not None:
~\anaconda3\lib\site-packages\xarray\core\indexing.py in convert_label_indexer(index, label, index_name, method, tolerance)
189 indexer = index.get_loc(label_value)
190 else:
--> 191 indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
192 elif label.dtype.kind == "b":
193 indexer = label
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 'JJAS'

Indeed, grouping by "time.season" will only split your data into "DJF", "MAM", "JJA", and "SON". For other combinations of months you will need to define your own mask(s) to apply when taking a mean. For "JJAS" I often use something like this:
jjas = dset.time.dt.month.isin(range(6, 10))
clim = dset.sel(time=jjas).mean("time")

Pandas KeyError, accessing column

I am trying to run this code:
(this will download the MNIST dataset to %HOME directory!)
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1)
mnist.keys()
X, y = mnist["data"], mnist["target"]
import matplotlib as mpl
import matplotlib.pyplot as plt
some_digit = X[0] # **ERROR LINE** <---------
some_digit_image = some_digit.reshape(28, 28)
plt.imshow(some_digit_image, cmap = mpl.cm.binary, interpolation="nearest")
plt.axis("off")
plt.show()
I have this error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-45-d5d685fca2de> in <module>
2 import matplotlib.pyplot as plt
3 import numpy as np
----> 4 some_digit = X[0]
5 some_digit_image = some_digit.reshape(28, 28)
6 plt.imshow(some_digit_image, cmap = mpl.cm.binary, interpolation="nearest")
~/.local/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
3022 if self.columns.nlevels > 1:
3023 return self._getitem_multilevel(key)
-> 3024 indexer = self.columns.get_loc(key)
3025 if is_integer(indexer):
3026 indexer = [indexer]
~/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:
KeyError: 0
Code example is from this book: Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow
I tried X.iloc[0] but its also not working.

From your dataframe pic, there is no column header named 0. If you want to access column by index, you can use .iloc which is primarily integer position based:
df.iloc[:, 0]
Or access by column header list
df[df.columns[0]]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Making a sns.pairplot using scikit wine dataset - python

Related

Linear regression (Plotting a regression line)

Issue with Trackpy - InvalidIndexError when fitting EMSD to a power law from the tutorial

Trying to create multiple boxplots using df.plot(kind='box) and receiving "IndexError: index 0 is out of bounds for axis 0 with size 0"

How to enable season selection as JJAS instead of JJA in xarray

Pandas KeyError, accessing column

Categories

Resources