Dropping column in pandas dataframe not possible [duplicate] - python

This question already has answers here:
Python Pandas: drop a column from a multi-level column index?
(3 answers)
Closed 2 years ago.
I'd like to delete columns in a dataframe.
This is how I import the csv:
dffm = pd.read_csv('..xxxxxx.csv', sep=';', engine='python')
Why is it not possible to delete the column "High'?:
Time Open High Low Close
Date
12.06.20 07:00:00 3046.50 3046.75 3046.00 3046.50
12.06.20 07:00:06 3046.75 3046.75 3046.00 3046.00
12.06.20 07:00:12 3046.00 3046.00 3045.75 3045.75
12.06.20 07:00:18 3046.00 3046.25 3046.00 3046.0
with this line:
dffm = dffm.drop(['High'], axis=1, inplace=True)
error:
"['High'] not found in axis"

hmm first of all the line you are using
dffm = dffm.drop(['High'], axis=1, inplace=True)
would have returned none if succeeded ,because inplace flag means it will do the operation on the current dataframe .
see:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html
try :
dffm.drop(columns=['High'], axis=1, inplace=True)
if that doesn't work you need to view your dataframe and see the column type, maybe it's not a string, that's a long shot but sometimes csvs string get change into byte string type. (you'll see a b"stringvalue")
see :
What is the difference between a string and a byte string?

Possible cause of error is that column does not exists indeed, so check:
('High' in dffm.columns)
If result is False then seek for example for spaces in column names that make column name different.

Kindly try the following
# you can use columns parameter
data = dffm.drop(columns="High")
# when using inplace=True, you don't need to re-assign the dataframe ,
# as it directly modifies the datafame
dffm.drop("High", axis=1, inplace=True).

You might be getting this error since you are using inplace=True and at the same time trying to save the returned DataFrame in dffm.
However, doing it this way will is incorrect since when you turn on the inplace flag, the changes are done inplace and it returns None.
You can read about it in the documentation of the drop operation of pandas.DataFrame https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html
You can do it using the general way of overwriting the dataframe with the one returned from the operation.
dffm = dffm.drop('High', axis=1)
Or you can use the inplace flag correctly and do it like,
dffm.drop('High', axis=1, inplace=True)

Related

Error when try adding a column to a dataframe

I am trying first to slice a some columns from original dataframe and then add the additional column 'INDEX' to the last column.
df = df.iloc[:, np.r_[10:17]] #col 0~6
df['INDEX'] = df.index #col 7
I have the error message of second line saying 'A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead'
Why am I seeing this and how should I solve it?
I would do
df.loc[:,'INDEX'] = df.index
by default Python does shallow copy of dataframe. So whatever operations are performed on dataframe, it will actually performed on originall data frame. and the message is exactly indicates that.
Either of below will make the Python interpreter happy 😃 :
df = df.iloc[:, np.r_[10:17]].copy()
or
df.loc[:, ['INDEX']] = df.index

DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version warning

I appending a new row to an existing pandas dataframe as follows:
df= df.append(pd.Series(), ignore_index=True)
This is resulting in the subject DeprecationWarning.
The existing df has a mix of string, float and dateime.date datatypes (8 columns totals).
Is there a way to explicitly specify the columns types in the df.append?
I have looked here and here but I still have no solution. Please advise if there is a better way to append a row to the end of an existing dataframe without triggering this warning.
You can try this
Type_new = pd.Series([],dtype=pd.StringDtype())
This will create a blank data frame for us.
You can add dtype to your code.
pd.Series(dtype='float64')
df = df.append(pd.Series(dtype = 'object'), ignore_index=True)
If the accepted solution still results in :
'ValueError: No objects to concatenate'
Try this solution from FutureWarning: on `df['col'].apply(p.Series)` :
(lambda x: pd.Series(x, dtype="float"))

Dataframe sum(axis=1) is returning Nan Values

I'm trying to make a sum of the second column ('ALL_PPA'), grouping by Numéro_département
Here's my code :
df.fillna(0,inplace=True)
df = df.loc[:, ('Numéro_département','ALL_PPA')]
df = df.groupby('Numéro_département').sum(axis=1)
print(df)
My DF is full of numbers, I don't have any NaN values, but when I apply the function df.sum(axis=1),some rows appear to have a NaN Value
Here's how my tab looks like before sum():
Here's after sum()
My question is : How am I supposed to do this? I've try to use numpy library but, it doesn't work as I want it to work
Drop the first row of that dataframe, as it just as the column names in it, and convert it to an int. Right now, it is an object because of the mixed data types:
df2 = df.iloc[1:].astype(int).copy()
Then, apply groupby.sum() and specify the column as well:
df3 = df2.groupby('Numero_department')['ALL_PPA'].sum()
I think using .dropna() before summing the DF will help remove any rows or columns (depending on the axis= you choose) with nan values. According to the screenshot provided, please drop the first line of the DF as it is a string.

Unable rename column series

I am unable to rename the column of a series:
tabla_paso4
Date decay
2015-06-29    0.003559
2015-09-18    0.025024
2015-08-24    0.037058
2014-11-20    0.037088
2014-10-02    0.037098
Name: decay, dtype: float64
I have tried:
tabla_paso4.rename('decay_acumul')
tabla_paso4.rename(columns={'decay':'decay_acumul'}
I already had a look at the possible duplicate, however don't know why although applying :
tabla_paso4.rename(columns={'decay':'decay_acumul'},inplace=True)
returns the series like this:
Date
2015-06-29    0.003559
2015-09-18    0.025024
2015-08-24    0.037058
2014-11-20    0.037088
2014-10-02    0.037098
dtype: float64
It looks like your tabla_paso4 - is a Series, not a DataFrame.
You can make a DataFrame with named column out of it:
new_df = tabla_paso4.to_frame(name='decay_acumul')
Try
tabla_paso4.columns = ['Date', 'decay_acumul']
or
tabla_paso4.rename(columns={'decay':'decay_acumul'}, inplace=True)
What you were doing wrong earlier, is you missed the inplace=True part and therefore the renamed df was returned but not assigned.
I hope this helps!

Removing index column in pandas when reading a csv

I have the following code which imports a CSV file. There are 3 columns and I want to set the first two of them to variables. When I set the second column to the variable "efficiency" the index column is also tacked on. How can I get rid of the index column?
df = pd.DataFrame.from_csv('Efficiency_Data.csv', header=0, parse_dates=False)
energy = df.index
efficiency = df.Efficiency
print efficiency
I tried using
del df['index']
after I set
energy = df.index
which I found in another post but that results in "KeyError: 'index' "
When writing to and reading from a CSV file include the argument index=False and index_col=False, respectively. Follows an example:
To write:
df.to_csv(filename, index=False)
and to read from the csv
df.read_csv(filename, index_col=False)
This should prevent the issue so you don't need to fix it later.
df.reset_index(drop=True, inplace=True)
DataFrames and Series always have an index. Although it displays alongside the column(s), it is not a column, which is why del df['index'] did not work.
If you want to replace the index with simple sequential numbers, use df.reset_index().
To get a sense for why the index is there and how it is used, see e.g. 10 minutes to Pandas.
You can set one of the columns as an index in case it is an "id" for example.
In this case the index column will be replaced by one of the columns you have chosen.
df.set_index('id', inplace=True)
If your problem is same as mine where you just want to reset the column headers from 0 to column size. Do
df = pd.DataFrame(df.values);
EDIT:
Not a good idea if you have heterogenous data types. Better just use
df.columns = range(len(df.columns))
you can specify which column is an index in your csv file by using index_col parameter of from_csv function
if this doesn't solve you problem please provide example of your data
One thing that i do is df=df.reset_index()
then df=df.drop(['index'],axis=1)
To remove or not to create the default index column, you can set the index_col to False and keep the header as Zero. Here is an example of how you can do it.
recording = pd.read_excel("file.xls",
sheet_name= "sheet1",
header= 0,
index_col= False)
The header = 0 will make your attributes to headers and you can use it later for calling the column.
It works for me this way:
Df = data.set_index("name of the column header to start as index column" )

Categories