Unable rename column series - python

I am unable to rename the column of a series:
tabla_paso4
Date decay
2015-06-29    0.003559
2015-09-18    0.025024
2015-08-24    0.037058
2014-11-20    0.037088
2014-10-02    0.037098
Name: decay, dtype: float64
I have tried:
tabla_paso4.rename('decay_acumul')
tabla_paso4.rename(columns={'decay':'decay_acumul'}
I already had a look at the possible duplicate, however don't know why although applying :
tabla_paso4.rename(columns={'decay':'decay_acumul'},inplace=True)
returns the series like this:
Date
2015-06-29    0.003559
2015-09-18    0.025024
2015-08-24    0.037058
2014-11-20    0.037088
2014-10-02    0.037098
dtype: float64

It looks like your tabla_paso4 - is a Series, not a DataFrame.
You can make a DataFrame with named column out of it:
new_df = tabla_paso4.to_frame(name='decay_acumul')

Try
tabla_paso4.columns = ['Date', 'decay_acumul']
or
tabla_paso4.rename(columns={'decay':'decay_acumul'}, inplace=True)
What you were doing wrong earlier, is you missed the inplace=True part and therefore the renamed df was returned but not assigned.
I hope this helps!

Related

The Drop() method appears to do nothing to my dataframe, but no error is given

I am using drop() to try to remove a row from a dataframe, based on an index. Nothing seems to happen. I get lots of errors if I play with the syntax, but the below example yields the same dataframe I started with.
data= {"col1":[1, 3, 3,3],"col2":[4,5,6,4],"col3":[7,6,6,8]}
testdf2 = pd.DataFrame(data)
testdf2.drop([1])
testdf2
I assume I'm missing something obvious?
When using drop, you must reassign to your df (Or create a new one).
import pandas as pd
data= {"col1":[1, 3, 3,3],"col2":[4,5,6,4],"col3":[7,6,6,8]}
testdf2 = pd.DataFrame(data)
testdf2 = testdf2.drop([1])
testdf2
Alternatively, supply inplace.
testdf2.drop([1], inplace=True)
However, this could lead to complications regarding view / copy and I usually reassign.
Hope that helps!

Unmerge cells when using groupby (PANDAS)

I grouped some data using groupby:
df1['N° of options'] = df.groupby(['Country','Restaurant']).Food.size()
And the result is a dataframe with grouped arguments merged, instead of it I'd like to repeat these values along the cells.
Any clue about how can I display data like this?
For now, I got something like this:
Thank you!!
Assuming that grouped_df is your grouped dataframe, you can use pandas.DataFrame.reset_index to fill down the rows of your two indexes.
>>> print(grouped_df)
>>> print(grouped_df.reset_index())
Another way to do it is to add as_index=False argument to your groupyby clause :
grouped_df = df.groupby(['SG_UF', 'SG_PARTIDO'], as_index=False).sum()
If I understand correctly, you are trying to sort instead of groupby as you have mentioned you want to see the values.
sort works like df_name.sort_values(by column_name, axis=0, ascending=True, inplace=False, kind=’quicksort’, na_position=’last’, ignore_index=False, key=None)
In your code, it could look like:
df.sort_values(by = ['Country','Restaurant']). Use other arguments as required, like, order of sort, etc.

How to index a pandas dataframe by datetime?

I am using the django_pandas package to obtain a Pandas dataframe from stored Django models.
df = qs.to_dataframe(['time', 'price_open', 'price_high', 'price_low', 'price_close'], index='time')
Now I want to access the dataframe by a datetime, however, this does not work. Printing df looks like this, where the time column has a weird position:
If I do print(df.keys()) I get the following results: Index(['price_open', 'price_high', 'price_low', 'price_close', ], dtype='object') but I more expected time. Also, df.columns does not contain time but only the other columns.
How can I access df by a given datetime? Why is time not the key for df? Thanks!
As pointed out by #ThePorcius, reset_index should give you back the time column.
df = df.reset_index()
According to the docs, you can use on argument in resample to use a column instead of index.
You'll need to make sure that time column is a datetime.
dailyFrame=(
df.resample('D', on='time')
.agg({'price_open': 'first', 'price_high': 'max', 'price_low': 'min', 'price_close': 'last'})
)

Dropping column in pandas dataframe not possible [duplicate]

This question already has answers here:
Python Pandas: drop a column from a multi-level column index?
(3 answers)
Closed 2 years ago.
I'd like to delete columns in a dataframe.
This is how I import the csv:
dffm = pd.read_csv('..xxxxxx.csv', sep=';', engine='python')
Why is it not possible to delete the column "High'?:
Time Open High Low Close
Date
12.06.20 07:00:00 3046.50 3046.75 3046.00 3046.50
12.06.20 07:00:06 3046.75 3046.75 3046.00 3046.00
12.06.20 07:00:12 3046.00 3046.00 3045.75 3045.75
12.06.20 07:00:18 3046.00 3046.25 3046.00 3046.0
with this line:
dffm = dffm.drop(['High'], axis=1, inplace=True)
error:
"['High'] not found in axis"
hmm first of all the line you are using
dffm = dffm.drop(['High'], axis=1, inplace=True)
would have returned none if succeeded ,because inplace flag means it will do the operation on the current dataframe .
see:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html
try :
dffm.drop(columns=['High'], axis=1, inplace=True)
if that doesn't work you need to view your dataframe and see the column type, maybe it's not a string, that's a long shot but sometimes csvs string get change into byte string type. (you'll see a b"stringvalue")
see :
What is the difference between a string and a byte string?
Possible cause of error is that column does not exists indeed, so check:
('High' in dffm.columns)
If result is False then seek for example for spaces in column names that make column name different.
Kindly try the following
# you can use columns parameter
data = dffm.drop(columns="High")
# when using inplace=True, you don't need to re-assign the dataframe ,
# as it directly modifies the datafame
dffm.drop("High", axis=1, inplace=True).
You might be getting this error since you are using inplace=True and at the same time trying to save the returned DataFrame in dffm.
However, doing it this way will is incorrect since when you turn on the inplace flag, the changes are done inplace and it returns None.
You can read about it in the documentation of the drop operation of pandas.DataFrame https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html
You can do it using the general way of overwriting the dataframe with the one returned from the operation.
dffm = dffm.drop('High', axis=1)
Or you can use the inplace flag correctly and do it like,
dffm.drop('High', axis=1, inplace=True)

How to get particular Column of DataFrame in pandas?

I have a data frame name it df I want to have like its first and second colums(series) in variable x and y.
I would have done that by name of the column like df['A'] or df['B'] or something like that.
But problem here is that data is itself header and it has no name.Header is like 2.17 ,3.145 like that.
So my Question is:
a) How to name column and start the data(which starts now from head) right after the name ?
b) How to get particular column's data if we don't know the name or it doesn't have the name ?
Thank you.
You might want to read the
documentation on indexing.
For what you specified in the question, you can use
x, y = df.iloc[:, [0]], df.iloc[:, [1]]
Set the names kwarg when reading the DataFrame (see the read_csv docs.
So instead of pd.read_csv('kndkma') use pd.read_csv('kndkma', names=['a', 'b', ...]).
It is usually easier to name the columns when you read or create the DataFrame, but you can also name (or rename) the columns afterwards with something like:
df.columns = ['A','B', ...]

Categories