I am working with a pandas DataFrame in Python that has 10 variables (4 numeric, 6 categorical). I want to replace the values of the 4 numeric variables with the natural log of the current values.
Example of my data below:
df = DataFrame
logcolumns = the names of the columns that I want to convert to the natural log
Import numpy as np
Import pandas as pd
df = pd.read_csv("myfile.csv")
logcolumns = ['Volume', 'Sales', 'Weight', 'Price']
df[logcolumns] = np.log(df[logcolumns])
After running this, I receive a SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
This process works with an individual column, and with an entire dataframe, but not when I try to run it on a list of selected columns.
You could follow up the suggestion inside the warning and use labelled based access:
df.loc[:, logcolumns] = np.log(df[logcolumns])
The official doc is here: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
Related
I have a dataframe where all the columns are continous variables, and I want to discretize them in binnings based on frequency (so the binnings have the same size).
In order to do this I just apply the pd.cut function and iterate through the columns, however I'm getting the following errors:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df_q[column] = pd.qcut(df_q[column], 3)
<ipython-input-46-87e2efb9d039>:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df_q[column] = pd.qcut(df_q[column], 3)
<ipython-input-46-87e2efb9d039>:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
You can find a RepEx here:
import pandas as pd
from sklearn import datasets
import matplotlib.pyplot as plt
# Load data
data = datasets.load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
# Remove categorical variable and bin
df_q = df.loc[:, df.columns != "target"]
for column in df_q:
df_q[column] = pd.qcut(df_q[column], 3)
I do not get any error or warning message when running your code. Do try to make a copy of df before creating df_q.
df2 = df.copy()
df_q = df2.loc[:, df2.columns != "target"]
I have a Pandas dataframe where its just 2 columns: the first being a name, and the second being a dictionary of information relevant to the name. Adding new rows works fine, but if I try to updates the dictionary column by assigning a new dictionary in place, I get
ValueError: Incompatible indexer with Series
So, to be exact, this is what I was doing to produce the error:
import pandas as pd
df = pd.DataFrame(data=[['a', {'b':1}]], columns=['name', 'attributes'])
pos = df[df.loc[:,'name']=='a'].index[0]
df.loc[pos, 'attributes'] = {'c':2}
I was able to find another solution that seems to work:
import pandas as pd
df = pd.DataFrame(data=[['a', {'b':1}]], columns=['name', 'attributes'])
pos = df[df.loc[:,'name']=='a'].index[0]
df.loc[:,'attributes'].at[pos] = {'c':2}
but I was hoping to get an answer as to why the first method doesn't work, or if there was something wrong with how I had it initially.
Since you are trying to access a dataframe with an index 'pos', you have to use iloc to access the row. So changing your last row as following would work as intended:
df.iloc[pos]['attributes'] = {'c':2}
For me working DataFrame.at:
df.at[pos, 'attributes'] = {'c':2}
print (df)
name attributes
0 a {'c': 2}
I want to split the rows while maintaing the values.
How can I split the rows like that?
The data frame below is an example.
the output that i want to see
You can use the pd.melt( ). Read the documentation for more information: https://pandas.pydata.org/docs/reference/api/pandas.melt.html
I tried working on your problem.
import pandas as pd
melted_df = data.melt(id_vars=['value'], var_name="ToBeDropped", value_name="ID1")
This would show a warning because of the unambiguity in the string passed for "value_name" argument. This would also create a new column which I have assigned the name already. The new column will be called 'ToBeDropped'. Below code will remove the column for you.
df = melted_df.drop(columns = ['ToBeDropped'])
'df' will be your desired output.
via wide_to_long:
df = pd.wide_to_long(df, stubnames='ID', i='value',
j='ID_number').reset_index(0)
via set_index and stack:
df = df.set_index('value').stack().reset_index(name='IDs').drop('level_1', 1)
via melt:
df = df.melt(id_vars='value', value_name="ID1").drop('variable', 1)
I am doing 2 things.
1) filter a dataframe in pandas
2) clean unicode text in a specific column in the filtered dataframe.
import pandas as pd
import probablepeople
from unidecode import unidecode
import re
#read data
df1 = pd.read_csv("H:\\data.csv")
#filter
df1=df1[(df1.gender=="female")]
#reset index because otherwise indexes will be as per original dataframe
df1=df1.reset_index()
Now i am trying to clean unicode text in the address column
#clean unicode text
for i in range(10):
df1.loc[i][16] = re.sub(r"[^a-zA-Z.,' ]",r' ',df1.address[i])
However, i am unable to do so and below is the error i am getting.
c:\python27\lib\site-packages\ipykernel\__main__.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
I think you can use str.replace:
df1=df1[df1.gender=="female"]
#reset index with parameter drop if need new monotonic index (0,1,2,...)
df1=df1.reset_index(drop=True)
df1.address = df1.address.str.replace(r"[^a-zA-Z.,' ]",r' ')
I'm using Pandas with Python 3. I have a dataframe with a bunch of columns, but I only want to change the data type of all the values in one of the columns and leave the others alone. The only way I could find to accomplish this is to edit the column, remove the original column and then merge the edited one back. I would like to edit the column without having to remove and merge, leaving the the rest of the dataframe unaffected. Is this possible?
Here is my solution now:
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
def make_float(var):
var = float(var)
return var
#create a new dataframe with the value types I want
df2 = df1['column'].apply(make_float)
#remove the original column
df3 = df1.drop('column',1)
#merge the dataframes
df1 = pd.concat([df3,df2],axis=1)
It also doesn't work to apply the function to the dataframe directly. For example:
df1['column'].apply(make_float)
print(type(df1.iloc[1]['column']))
yields:
<class 'str'>
df1['column'] = df1['column'].astype(float)
It will raise an error if conversion fails for some row.
Apply does not work inplace, but rather returns a series that you discard in this line:
df1['column'].apply(make_float)
Apart from Yakym's solution, you can also do this -
df['column'] += 0.0