Unable to drop columns from a pandas dataframe

Unable to drop columns from a pandas dataframe - python

I am trying to drop a column from a pandas dataframe as follows:
df = pd.read_csv('Caravan_Dataset.csv')
X = df.drop('Purchase',axis=1)
y = df['Purchase']
but it does not work.
I also tried the following one:
df = pd.read_csv('Caravan_Dataset.csv')
X = df.drop('Purchase',axis=1,inplace=True)
y = df['Purchase']
but it does not work neither. ıt keeps giving an error like Purchase is still on the columns. Any idea about how can I do it?

When inplace = True , the data is modified in place, which means it will return nothing and the dataframe is now updated. When inplace=False, you will need to assign it to something new.
Change your code from:
X = df.drop('Purchase',axis=1,inplace=True)
To this:
df.drop('Purchase',axis=1,inplace=True)
Or, alternatively use inplace=False (which is the default) and returns a copy of the object, and use:
X = df.drop('Purchase',axis=1)

You have to assign it to df like df = df.drop('column_name',asix=1)

Related

Change Column Names in Dataframe via list of Column Names

I have a dataframe with a ton of columns. I would like to change a list of a sub set of the column names to all uppercase.
The code below doesn't change the column names and the other code I've tried produces errors:
df[cols_to_cap].columns = df[cols_to_cap].columns.str.upper()
What am I missing?

Try the below code, this uses the rename function.
rename_dict = {}
for each_column in list_of_cols_in_lower_case:
rename_dict[each_column] = each_column.upper()
df.rename(columns = rename_dict , inplace = True ) #inplace to True if you want the change to be applied to the dataframe

Pandas- Function is overwriting original DF even though I am maniuplating copy?

I am creating a function to categorize data in bins in a df. I have made the function, and am first extracting numbers from a string, and replacing the column of text with a column of numbers.
The function is somehow overwriting the original dataframe, despite me only manipulating a copy of it.
def categorizeColumns(df):
newdf = df
if 'Runtime' in newdf.columns:
for row in range(len(newdf['Runtime'])):
strRuntime = newdf['Runtime'][row]
numsRuntime = [int(i) for i in strRuntime.split() if i.isdigit()]
newdf.loc[row,'Runtime'] = numsRuntime[0]
return newdf
df = pd.read_csv('moviesSeenRated.csv')
newdf = categorizeColumns(df)
The original df has a column of runtimes like this [34 mins, 32 mins, 44 mins] etc, and the newdf should have [33,32,44], which it does. However, the original df also changes outside the function.
Whats going on here? Any fixes? Thanks in advance.
EDIT: Seems like I wasn't making a copy, I needed to do
df.copy()
Thank you all!

The problem is that you aren't actually making a copy of the dataframe in the line newdf = df. To make a copy, you could do newdf = df.copy().

I think you are not making a copy of dataframe. What you did on newdf = df is called reference.
You have to .copy() your dataframe.
def categorizeColumns(df):
newdf = df.copy()
if 'Runtime' in newdf.columns:
for row in range(len(newdf['Runtime'])):
strRuntime = newdf['Runtime'][row]
numsRuntime = [int(i) for i in strRuntime.split() if i.isdigit()]
newdf.loc[row,'Runtime'] = numsRuntime[0]
return newdf
df = pd.read_csv('moviesSeenRated.csv')
newdf = categorizeColumns(df)

Pandas get_dummies in for loop

I would like to convert categorical variables into dummies using pandas.get_dummies in a for loop.
However, following code does not convert the dataframes.
data_cleaner = [data_train, data_val]
for df in data_cleaner:
df = pd.get_dummies(df, columns = categorical_fields)
data_train.head() # Not changed
I know that an iterator in a for loop is just a temporary variable. But the modified code also didn't work.
for i in range(len(data_cleaner)):
data_cleaner[i] = pd.get_dummies(data_cleaner[i], columns = categorical_fields)
data_train.head() # Still not changed
Anyone can help? Do I have to manually run get_dummies for each dataframe? FYI, Pandas get_dummies doesn't provide an inplace option.

You can run it as a list comprehension
data_cleaner = [pd.get_dummies(df, columns=categorical_fields) for df in data_cleaner]
or
data_train_dum, data_val_dum = [pd.get_dummies(df, columns=categorical_fields) for df in [data_train, data_val]]

Try following
data_cleaner = [data_train, data_val]
for i,df in enumerate(data_cleaner):
data_cleaner[i] = pd.get_dummies(df, columns = categorical_fields)
data_train,data_val=data_cleaner

Writing string to empty dataframe not working, but works in other dataframes, how to fix?

I have created a dataframe on my local machine like so:
df1 = pd.DataFrame()
Next, I have added to columns to dataframe plus I want to assign a string to only one column. I am attempting to do this like so:
df1['DF_Name'] = 'test'
df1['DF_Date'] = ''
The columns get added successfully, but the string 'test' does not. When I apply this to other dataframe it works perfectly fine. What am I doing wrong?
I am also trying to append dates into the same dataframe except using logic, and that is also not working. Logic is as so:
if max_created > max_updated:
df1['DF_Date'] = max_created
else:
df1['DF_Date'] = max_updated
Not sure what I am doing wrong.
Thank you in advance.

You need to add brackets, like:
df1 = pd.DataFrame()
df1['DF_Name'] = ['test']
df1['DF_Date'] = ['']
df1
you can't assign a single value to a pd.Series. With the brackets, it's a list instead.

Set value to an entire column of a pandas dataframe

I'm trying to set the entire column of a dataframe to a specific value.
In [1]: df
Out [1]:
issueid industry
0 001 xxx
1 002 xxx
2 003 xxx
3 004 xxx
4 005 xxx
From what I've seen, loc is the best practice when replacing values in a dataframe (or isn't it?):
In [2]: df.loc[:,'industry'] = 'yyy'
However, I still received this much talked-about warning message:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
If I do
In [3]: df['industry'] = 'yyy'
I got the same warning message.
Any ideas? Working with Python 3.5.2 and pandas 0.18.1.
EDIT Jan 2023:
Given the volume of visits on this question, it's worth stating that my original question was really more about dataframe copy-versus-slice than "setting value to an entire column".
On copy-versus-slice: My current understanding is that, in general, if you want to modify a subset of a dataframe after slicing, you should create the subset by .copy(). If you only want a view of the slice, no copy() needed.
On setting value to an entire column: simply do df[col_name] = col_value

You can use the assign function:
df = df.assign(industry='yyy')

Python can do unexpected things when new objects are defined from existing ones. You stated in a comment above that your dataframe is defined along the lines of df = df_all.loc[df_all['issueid']==specific_id,:]. In this case, df is really just a stand-in for the rows stored in the df_all object: a new object is NOT created in memory.
To avoid these issues altogether, I often have to remind myself to use the copy module, which explicitly forces objects to be copied in memory so that methods called on the new objects are not applied to the source object. I had the same problem as you, and avoided it using the deepcopy function.
In your case, this should get rid of the warning message:
from copy import deepcopy
df = deepcopy(df_all.loc[df_all['issueid']==specific_id,:])
df['industry'] = 'yyy'
EDIT: Also see David M.'s excellent comment below!
df = df_all.loc[df_all['issueid']==specific_id,:].copy()
df['industry'] = 'yyy'

df.loc[:,'industry'] = 'yyy'
This does the magic. You are to add '.loc' with ':' for all rows. Hope it helps

You can do :
df['industry'] = 'yyy'

Assuming your Data frame is like 'Data' you have to consider if your data is a string or an integer. Both are treated differently. So in this case you need be specific about that.
import pandas as pd
data = [('001','xxx'), ('002','xxx'), ('003','xxx'), ('004','xxx'), ('005','xxx')]
df = pd.DataFrame(data,columns=['issueid', 'industry'])
print("Old DataFrame")
print(df)
df.loc[:,'industry'] = str('yyy')
print("New DataFrame")
print(df)
Now if want to put numbers instead of letters you must create and array
list_of_ones = [1,1,1,1,1]
df.loc[:,'industry'] = list_of_ones
print(df)
Or if you are using Numpy
import numpy as np
n = len(df)
df.loc[:,'industry'] = np.ones(n)
print(df)

This provides you with the possibility of adding conditions on the rows and then change all the cells of a specific column corresponding to those rows:
df.loc[(df['issueid'] == '001'), 'industry'] = str('yyy')

Seems to me that:
df1 = df[df['col1']==some_value] will not create a new DataFrame, basically, changes in df1 will be reflected in the parent df. This leads to the warning.
Whereas, df1 = df[df['col1]]==some_value].copy() will create a new DataFrame, and changes in df1 will not be reflected in df. The copy method is recommended if you don't want to make changes to your original df.

I had a similar issue before even with this approach df.loc[:,'industry'] = 'yyy', but once I refreshed the notebook, it ran well.
You may want to try refreshing the cells after you have df.loc[:,'industry'] = 'yyy'.

Only use them instead:
df.iloc[:]['industry'] = 'yyy'
remember: this only works with exist columns in dataframe
this for people who didn't work .loc

For anyone else coming for this answer and doesn't want to use copy -
df['industry'] = df['industry'].apply(lambda x: '')

if you just create new but empty data frame, you cannot directly sign a value to a whole column. This will show as NaN because the system wouldn't know how many rows the data frame will have!You need to either define the size or have some existing columns.
df = pd.DataFrame()
df["A"] = 1
df["B"] = 2
df["C"] = 3

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unable to drop columns from a pandas dataframe - python

You have to assign it to df like df = df.drop('column_name',asix=1)

Related

Change Column Names in Dataframe via list of Column Names

Pandas- Function is overwriting original DF even though I am maniuplating copy?

Pandas get_dummies in for loop

Writing string to empty dataframe not working, but works in other dataframes, how to fix?

Set value to an entire column of a pandas dataframe

Categories

Resources