Pandas KeyError, accessing column - python

I am trying to run this code:
(this will download the MNIST dataset to %HOME directory!)
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1)
mnist.keys()
X, y = mnist["data"], mnist["target"]
import matplotlib as mpl
import matplotlib.pyplot as plt
some_digit = X[0] # **ERROR LINE** <---------
some_digit_image = some_digit.reshape(28, 28)
plt.imshow(some_digit_image, cmap = mpl.cm.binary, interpolation="nearest")
plt.axis("off")
plt.show()
I have this error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-45-d5d685fca2de> in <module>
2 import matplotlib.pyplot as plt
3 import numpy as np
----> 4 some_digit = X[0]
5 some_digit_image = some_digit.reshape(28, 28)
6 plt.imshow(some_digit_image, cmap = mpl.cm.binary, interpolation="nearest")
~/.local/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
3022 if self.columns.nlevels > 1:
3023 return self._getitem_multilevel(key)
-> 3024 indexer = self.columns.get_loc(key)
3025 if is_integer(indexer):
3026 indexer = [indexer]
~/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:
KeyError: 0
Code example is from this book: Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow
I tried X.iloc[0] but its also not working.

From your dataframe pic, there is no column header named 0. If you want to access column by index, you can use .iloc which is primarily integer position based:
df.iloc[:, 0]
Or access by column header list
df[df.columns[0]]

Related

KeyError: 46220

edited to address the comments
added lines at the beginning where the data was imported from MNIST
added the full error message from jupyter notebook as text
I am trying to implement a very simple code in python (jupyter notebook, if it matters):
from sklearn.datasets import fetch_openml
x, y = fetch_openml('mnist_784', version=1, return_X_y=True, data_home='./data/')
y = y.astype(int)
fig, ax = plt.subplots(2, 4, figsize=(20, 8))
for a in ax.ravel():
j = np.random.choice(len(y))
sns.heatmap(x[j].reshape(28,28), ax=a, cbar=False, cmap='gray_r')
a.set_title(f'Label: {y[j]}')
a.set_xticks([])
a.set_yticks([])
and I get the following error shown in the screenshot. I don't think this is a code problem, as this was taken directly from the lecturer's notes. Could anyone help me troubleshoot and enlighten me, please?
See error message below:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
pandas/_libs/index.pyx in
pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in
pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 46220
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-6-02155e9f4730> in <module>
2 for a in ax.ravel():
3 j = np.random.choice(len(y))
----> 4 sns.heatmap(x[j].reshape(28,28), ax=a, cbar=False, cmap='gray_r')
5 a.set_title(f'Label: {y[j]}')
6 a.set_xticks([])
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
3022 if self.columns.nlevels > 1:
3023 return self._getitem_multilevel(key)
-> 3024 indexer = self.columns.get_loc(key)
3025 if is_integer(indexer):
3026 indexer = [indexer]
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:
KeyError: 46220
KeyError: 46220
I suppose that with the line below you were trying to access the row j of the pandas DataFrame x:
sns.heatmap(x[j].reshape(28,28), ax=a, cbar=False, cmap='gray_r')
However in order to access the values of a row by name you should use x.iloc[j].values instead. Lots of examples can be found here.
The complete code is:
from sklearn.datasets import fetch_openml
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
x, y = fetch_openml('mnist_784', version=1, return_X_y=True, data_home='./data/')
y = y.astype(int)
fig, ax = plt.subplots(2, 4, figsize=(20, 8))
for a in ax.ravel():
j = np.random.choice(len(y))
sns.heatmap(x.iloc[j].values.reshape(28,28), ax=a, cbar=False, cmap='gray_r')
a.set_title(f'Label: {y[j]}')
a.set_xticks([])
a.set_yticks([])
The result produced:

Error in DeepExplainer for MLP Network , which uses Dataset which has only float values

getting following error. when i use DeepExplainer.
Neural network works fine, here is the code:
from keras.models import Sequential
from keras.layers import Dense
from sklearn.preprocessing import scale
NNModel = Sequential()
# Add an input layer
NNModel.add(Dense(24, activation='relu', input_shape=(43,)))
# Add one hidden layer
NNModel.add(Dense(12, activation='relu'))
# Add one hidden layer
NNModel.add(Dense(6, activation='relu'))
# Add one hidden layer
NNModel.add(Dense(3, activation='relu'))
# Add an output layer
NNModel.add(Dense(1, activation='sigmoid'))
NNModel.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
NNModel.fit(X_train_res,y_train_res,epochs=50, batch_size=1, verbose=1)
import tensorflow.keras.backend
import numpy as np
X_train = X_train_res.sample(frac=100, replace=True)
X_test_11 = X_df1_test.sample(frac=10, replace=True)
X_train_array = X_train.to_numpy()
deepExplainer = shap.DeepExplainer(NNModel,X_train_array)
X = X_test_11
shap_values = deepExplainer.shap_values(X)
How ever, when i use deepExplainer it throws following error:
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2894 try:
-> 2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-620-fe7c03d72907> in <module>
1 X = X_Train_1
----> 2 shap_values = deepExplainer.shap_values(X)
~\anaconda3\lib\site-packages\shap\explainers\_deep\__init__.py in shap_values(self, X, ranked_outputs, output_rank_order, check_additivity)
122 were chosen as "top".
123
--> 124 return self.explainer.shap_values(X, ranked_outputs, output_rank_order, check_additivity=check_additivity)
~\anaconda3\lib\site-packages\shap\explainers\_deep\deep_tf.py in shap_values(self, X, ranked_outputs, output_rank_order, check_additivity)
310 # assign the attributions to the right part of the output arrays
311 for l in range(len(X)):
--> 312 phis[l][j] = (sample_phis[l][bg_data[l].shape[0]:] * (X[l][j] - bg_data[l])).mean(0)
313
314 output_phis.append(phis[0] if not self.multi_input else phis)
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2900 if self.columns.nlevels > 1:
2901 return self._getitem_multilevel(key)
-> 2902 indexer = self.columns.get_loc(key)
2903 if is_integer(indexer):
2904 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
-> 2897 raise KeyError(key) from err
2898
2899 if tolerance is not None:
KeyError: 0

Cannot find data from index in python

I am trying to show image of specific index using matplotlib, but it is showing me error which i did not get why ? I am trying to get index 0 of mnist data and resize it to 28 by 28 pixel and then show that index value by plot.show() function.
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
some_digit = X[0]
some_digit_image = some_digit.reshape(28, 28)
plt.imshow(
some_digit_image,
cmap = matplotlib.cm.binary,
interpolation="nearest")
plt.axis("off")
plt.show()
KeyError Traceback (most recent call last)
~/Machinelearning/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-35-246778f0802e> in <module>
3 import matplotlib.pyplot as plt
4
----> 5 some_digit = X[0]
6 some_digit_image = some_digit.reshape(28, 28)
7
~/Machinelearning/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
3022 if self.columns.nlevels > 1:
3023 return self._getitem_multilevel(key)
-> 3024 indexer = self.columns.get_loc(key)
3025 if is_integer(indexer):
3026 indexer = [indexer]
~/Machinelearning/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:
KeyError: 0
Change
import numpy as np
x = [36000]
to
np.array(x.iloc[36000])

I can't subset rows in pandas this way: df[0] (or with any integer)

I loaded a csv and then tried to get the first row with the row index number
import pandas as pd
pkm = pd.read_csv('datasets/pokemon_data.csv')
pkm[0]
But for some reason I get this error, as far as I know, you can subset the way I did.
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/Desktop/ml/ml_env/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-6-19c40ecbd036> in <module>
----> 1 X[0]
~/Desktop/ml/ml_env/lib/python3.9/site-packages/pandas/core/frame.py in __getitem__(self, key)
3022 if self.columns.nlevels > 1:
3023 return self._getitem_multilevel(key)
-> 3024 indexer = self.columns.get_loc(key)
3025 if is_integer(indexer):
3026 indexer = [indexer]
~/Desktop/ml/ml_env/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:
KeyError: 0
When I use .iloc or .loc I don't face any issues
I used pandas 1.1.5 and 1.2.0 and I got the same error
This is how the data looks:
pokemon_data
pkm[0] calls for the column named 0 in pkm. That's why it's not working.
Try pkm['HP'] or using a column name and it will be clear.

how to impute more than one specific columns in DataSet: Python(sklearn)

with no time waste, heading towards the Problem.
I am actually imputing my DataSet with sklearn.SimpleImputer in Python.
But my DataSet contains some columns with integers and some columns with other alphabets points. So, I am using Median to fill empty spaces and I just want to do it for only my specific columns with integers, not with the whole DataSet.
I tried this:
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy="median")
imputer.fit(students['age'], ['sex'], ['failures'])
I want to do imputation with only these columns which only have intigers values not all dataset because all dataset contains columns with alphbets datapoints too whose Median can not be taken.
From the above code, I got this error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2894 try:
-> 2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: ('age', 'sex', 'failures')
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-26-8961e0ce249f> in <module>
2 from sklearn.impute import SimpleImputer
3 imputer = SimpleImputer(strategy="median")
----> 4 imputer.fit(students['age', 'sex', 'failures'])
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2900 if self.columns.nlevels > 1:
2901 return self._getitem_multilevel(key)
-> 2902 indexer = self.columns.get_loc(key)
2903 if is_integer(indexer):
2904 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
-> 2897 raise KeyError(key) from err
2898
2899 if tolerance is not None:
KeyError: ('age', 'sex', 'failures')
The link to Data is https://archive.ics.uci.edu/ml/machine-learning-databases/00320/
THANKS! HOPE YOU UNDERSTOOD THE PROBLEM, I TRIED MY BEST TO EXPLAIN IT.
try:
imputer.fit_transform([students['age'], students['sex'], students['failures']])

Categories