Discretize all the columns in a dataframe pyton

Discretize all the columns in a dataframe pyton - python

I have a dataframe where all the columns are continous variables, and I want to discretize them in binnings based on frequency (so the binnings have the same size).
In order to do this I just apply the pd.cut function and iterate through the columns, however I'm getting the following errors:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df_q[column] = pd.qcut(df_q[column], 3)
<ipython-input-46-87e2efb9d039>:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df_q[column] = pd.qcut(df_q[column], 3)
<ipython-input-46-87e2efb9d039>:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
You can find a RepEx here:
import pandas as pd
from sklearn import datasets
import matplotlib.pyplot as plt
# Load data
data = datasets.load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
# Remove categorical variable and bin
df_q = df.loc[:, df.columns != "target"]
for column in df_q:
df_q[column] = pd.qcut(df_q[column], 3)

I do not get any error or warning message when running your code. Do try to make a copy of df before creating df_q.
df2 = df.copy()
df_q = df2.loc[:, df2.columns != "target"]

Related

fillna and copy of a slice problem even after .loc

I am trying to fillna a column of dataframe like the following,
df['temp'] = df['temp'].fillna(method='ffill')
and I am getting,
var/folders/qp/lp_5yt3s65q_pj__6v_kdvnh0000gn/T/ipykernel_10842/2929940072.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['temp'] = df['temp'].fillna(method='ffill')
I revised the code to the following but I am still getting the same error. Do you have any suggestions?
df.loc[:,'temp'] = df['temp'].fillna(method='ffill')

Replace entire pandas dataframe after scaling without warning

I have tried this according to this awnser
x = df[feature_collums]
y = df[[label_column]][label_column]
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
x[:] = scaler.fit_transform(x, y)
print('Scaled the data to the 0-1 interval')
But this gives me warning:
/tmp/ipykernel_560/2431060981.py:14: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
I have a hard time converting this code to using the .loc atribute. Could someone please show me how to convert this code to using .loc and getting rid of the warning?
Thank you!

Ok I found here that you can use this:
data_scaled = pd.DataFrame(scaled_features, index=df.index, columns=df.columns)
Basically just creating a new dataframe with the same index and column and this gets rid of the warning.

Getting rid of Error from Fillna in pandas

I tried filling the NA values of a column in a dataframe with:
df1 = data.copy()
df1.columns = data.columns.str.lower()
df2 = df1[['passangerid', 'trip_cost','class']]
df2['class'] = df2['class'].fillna(0)
df2
Although getting this error:
:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation:
df2['class'] = df2['class'].fillna(0, axis = 0)
Can someone please help?

First of all I'd advise you to follow the warning message and read up on the caveats in the provided link.
You're getting this warning (not an error) because your df2 is a slice of your df1, not a separate DataFrame.
To avoid getting this warning you can use .copy() method as:
df2 = df1[['passangerid', 'trip_cost','class']].copy()

Add new column to dataframe - SettingWithCopyWarning

I have a pandas dataframe (pandas version '0.24.2') and a list which have the same length.
I want to add this list as a column to the dataframe.
I do this:
df.loc[:, 'new_column'] = pd.Series(my_List, index=df.index)
but I receive this warning:
.../anaconda/lib/python3.7/site-packages/pandas/core/indexing.py:362: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-
docs/stable/indexing.html#indexing-view-versus-copy
self.obj[key] = _infer_fill_value(value)
.../anaconda/lib/python3.7/site-packages/pandas/core/indexing.py:543: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
Am I doing something wrong?

Replacing multiple numeric columns with the log value of those columns Python

I am working with a pandas DataFrame in Python that has 10 variables (4 numeric, 6 categorical). I want to replace the values of the 4 numeric variables with the natural log of the current values.
Example of my data below:
df = DataFrame
logcolumns = the names of the columns that I want to convert to the natural log
Import numpy as np
Import pandas as pd
df = pd.read_csv("myfile.csv")
logcolumns = ['Volume', 'Sales', 'Weight', 'Price']
df[logcolumns] = np.log(df[logcolumns])
After running this, I receive a SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
This process works with an individual column, and with an entire dataframe, but not when I try to run it on a list of selected columns.

You could follow up the suggestion inside the warning and use labelled based access:
df.loc[:, logcolumns] = np.log(df[logcolumns])
The official doc is here: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Discretize all the columns in a dataframe pyton - python

I do not get any error or warning message when running your code. Do try to make a copy of df before creating df_q. df2 = df.copy() df_q = df2.loc[:, df2.columns != "target"]

Related

fillna and copy of a slice problem even after .loc

Replace entire pandas dataframe after scaling without warning

Getting rid of Error from Fillna in pandas

Add new column to dataframe - SettingWithCopyWarning

Replacing multiple numeric columns with the log value of those columns Python

Categories

Resources