Rename column with a name from a list - python

for example i have a list of name:
name_list = ['a', 'b', 'c']
and 3 dataframes:
>> df1
>> k l m
0 12 13 14
1 13 14 15
>> df2
>> o p q
0 10 11 12
1 15 16 17
>> df3
>> r s t
0 1 3 4
1 3 4 5
What i want to do is to replace the first column from each dataframe with a each name from name_list. So, a will replace k, b will replace o and c will replace r.
the output will be:
>> df1
>> a l m
0 12 13 14
1 13 14 15
>> df2
>> b p q
0 10 11 12
1 15 16 17
>> df3
>> c s t
0 1 3 4
1 3 4 5
i can do it manually but would be better if there is best method to do it. Thanks

I totally agree with #ALollz but nevertheless you can try something like
df1 = pd.DataFrame([[1,2,3]], columns=['k', 'l', 'm'])
df2 = pd.DataFrame([[1,2,3]], columns=['o', 'p', 'q'])
df3 = pd.DataFrame([[1,2,3]], columns=['r', 's', 't'])
name_list = ['a', 'b', 'c']
for index, name in enumerate(name_list, 1):
df = pd.eval('df{index}'.format(index=index))
df.rename(
columns = {
df.columns[0]: name,
}, inplace=True)

If you have the dataframes in a list like dfs = [df1, df2, df3] then you can do:
dfs = [dfs[i].rename(columns={dfs[i].columns[0]: name_list[i]}) for i in range(0,len(dfs)]

You can do it in place:
[df.rename(columns={df.columns[0]: c}, inplace=True)
for df,c in zip([df1,df2,df3], ['a', 'b', 'c'])]
Alternatively:
for df,c in zip([df1,df2,df3], ['a', 'b', 'c']):
df.rename(columns={df.columns[0]: c}, inplace=True)

Related

Combine two pandas index slices

How can two pandas.IndexSlice s be combined into one?
Set up of the problem:
import pandas as pd
import numpy as np
idx = pd.IndexSlice
cols = pd.MultiIndex.from_product([['A', 'B', 'C'], ['x', 'y'], ['a', 'b']])
df = pd.DataFrame(np.arange(len(cols)*2).reshape((2, len(cols))), columns=cols)
df:
A B C
x y x y x y
a b a b a b a b a b a b
0 0 1 2 3 4 5 6 7 8 9 10 11
1 12 13 14 15 16 17 18 19 20 21 22 23
How can the two slices idx['A', 'y', :] and idx[['B', 'C'], 'x', :], be combined to show in one dataframe?
Separately they are:
df.loc[:, idx['A', 'y',:]]
A
y
a b
0 2 3
1 14 15
df.loc[:, idx[['B', 'C'], 'x', :]]
B C
x x
a b a b
0 4 5 8 9
1 16 17 20 21
Simply combining them as a list does not play nicely:
df.loc[:, [idx['A', 'y',:], idx[['B', 'C'], 'x',:]]]
....
TypeError: unhashable type: 'slice'
My current solution is incredibly clunky, but gives the sub df that I'm looking for:
df.loc[:, df.loc[:, idx['A', 'y', :]].columns.to_list() + df.loc[:,
idx[['B', 'C'], 'x', :]].columns.to_list()]
A B C
y x x
a b a b a b
0 2 3 4 5 8 9
1 14 15 16 17 20 21
However this doesn't work when one of the slices is just a series (as expected), which is less fun:
df.loc[:, df.loc[:, idx['A', 'y', 'a']].columns.to_list() + df.loc[:,
idx[['B', 'C'], 'x', :]].columns.to_list()]
...
AttributeError: 'Series' object has no attribute 'columns'
Are there any better alternatives to what I'm currently doing that would ideally work with dataframe slices and series slices?
General solution is join together both slice:
a = df.loc[:, idx['A', 'y', 'a']]
b = df.loc[:, idx[['B', 'C'], 'x', :]]
df = pd.concat([a, b], axis=1)
print (df)
A B C
y x x
a a b a b
0 2 4 5 8 9
1 14 16 17 20 21

how to reorder of rows of a dataframe based on values in a column

I have a dataframe like this:
A B C D
b 3 3 4
a 1 2 1
a 1 2 1
d 4 4 1
d 1 2 1
c 4 5 6
Now I hope to reorder the rows based on values in column A.
I don't want to sort the values but reorder them with a specific order like ['b', 'd', 'c', 'a']
what I expect is:
A B C D
b 3 3 4
d 4 4 1
d 1 2 1
c 4 5 6
a 1 2 1
a 1 2 1
This is a good use case for pd.Categorical, since you have ordered categories. Just make that column a categorical and mark ordered=True. Then, sort_values should do the rest.
df['A'] = pd.Categorical(df.A, categories=['b', 'd', 'c', 'a'], ordered=True)
df.sort_values('A')
If you want to keep your column as is, you can just use loc and the indexes.
df.loc[pd.Series(pd.Categorical(df.A,
categories=['b', 'd', 'c', 'a'],
ordered=True))\
.sort_values()\
.index\
]
Use dictionary like mapping for order of strings then sort the values and reindex:
order = ['b', 'd', 'c', 'a']
df = df.reindex(df['A'].map(dict(zip(order, range(len(order))))).sort_values().index)
print(df)
A B C D
0 b 3 3 4
3 d 4 4 1
4 d 1 2 1
5 c 4 5 6
1 a 1 2 1
2 a 1 2 1
Without changing datatype of A, you can set 'A' as index and select elements in the desired order defined by sk.
sk = ['b', 'd', 'c', 'a']
df.set_index('A').loc[sk].reset_index()
Or use a temp column for sorting:
sk = ['b', 'd', 'c', 'a']
(
df.assign(S=df.A.map({v:k for k,v in enumerate(sk)}))
.sort_values(by='S')
.drop('S', axis=1)
)
I'm taking the solution provided by rafaelc a step further. If you want to do it in a chained process, here is how you'd do it:
df = (
df
.assign(A = lambda x: pd.Categorical(x['A'], categories = ['b', 'd', 'c', 'a'], ordered = True))
.sort_values('A')
)

Series or list index in pandas

I have a list of group IDs:
letters = ['A', 'A/D', 'B', 'B/D', 'C', 'C/D', 'D']
and a dataframe of groups:
groups = pd.DataFrame({'group': ['B', 'A/D', 'D', 'D', 'A']})
I'd like to create a column in the dataframe that gives the position of the group ids in the list, like so:
group group_idx
0 B 2
1 A/D 1
2 D 6
3 D 6
4 A 0
My current solution is this:
group_to_num = {hsg: i for i, hsg in enumerate(letters)}
groups['group_idx'] = groups.applymap(lambda x: group_to_num.get(x)).max(axis=1).fillna(-1).astype(np.int32)
but it seems inelegant. Is there a simpler way of doing this?
You can try merge after a dataframe constructor:
groups.merge(pd.DataFrame(letters).reset_index(),left_on='group',right_on=0).\
rename(columns={'index':'group_idx'}).drop(0,1)
group group_idx
0 B 2
1 A/D 1
2 D 6
3 D 6
4 A 0
Use map:
import pandas as pd
letters = ['A', 'A/D', 'B', 'B/D', 'C', 'C/D', 'D']
group_to_num = {hsg: i for i, hsg in enumerate(letters)}
groups = pd.DataFrame({'group': ['B', 'A/D', 'D', 'D', 'A']})
groups['group_idx'] = groups.group.map(group_to_num)
print(groups)
Output
group group_idx
0 B 2
1 A/D 1
2 D 6
3 D 6
4 A 0

Python - Pandas - Edit duplicate items keeping last

Lets say my df is:
import pandas as pd
df = pd.DataFrame({'col1':['a', 'a', 'a', 'b', 'b', 'c', 'd', 'd', 'd'],
'col2':[10,20, 30, 10, 20, 10, 10, 20, 30]})
How can I make all numbers zero keeping the last one only? In this case the result should be:
col1 col2
a 0
a 0
a 30
b 0
b 20
c 10
d 0
d 0
d 30
Thanks!
Use loc and duplicated with the argument keep='last':
df.loc[df.duplicated(subset='col1',keep='last'), 'col2'] = 0
>>> df
col1 col2
0 a 0
1 a 0
2 a 30
3 b 0
4 b 20
5 c 10
6 d 0
7 d 0
8 d 30

get first and last values in a groupby

I have a dataframe df
df = pd.DataFrame(np.arange(20).reshape(10, -1),
[['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd'],
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']],
['X', 'Y'])
How do I get the first and last rows, grouped by the first level of the index?
I tried
df.groupby(level=0).agg(['first', 'last']).stack()
and got
X Y
a first 0 1
last 6 7
b first 8 9
last 12 13
c first 14 15
last 16 17
d first 18 19
last 18 19
This is so close to what I want. How can I preserve the level 1 index and get this instead:
X Y
a a 0 1
d 6 7
b e 8 9
g 12 13
c h 14 15
i 16 17
d j 18 19
j 18 19
Option 1
def first_last(df):
return df.ix[[0, -1]]
df.groupby(level=0, group_keys=False).apply(first_last)
Option 2 - only works if index is unique
idx = df.index.to_series().groupby(level=0).agg(['first', 'last']).stack()
df.loc[idx]
Option 3 - per notes below, this only makes sense when there are no NAs
I also abused the agg function. The code below works, but is far uglier.
df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \
.set_index('level_1', append=True).reset_index(1, drop=True) \
.rename_axis([None, None])
Note
per #unutbu: agg(['first', 'last']) take the firs non-na values.
I interpreted this as, it must then be necessary to run this column by column. Further, forcing index level=1 to align may not even make sense.
Let's include another test
df = pd.DataFrame(np.arange(20).reshape(10, -1),
[list('aaaabbbccd'),
list('abcdefghij')],
list('XY'))
df.loc[tuple('aa'), 'X'] = np.nan
def first_last(df):
return df.ix[[0, -1]]
df.groupby(level=0, group_keys=False).apply(first_last)
df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \
.set_index('level_1', append=True).reset_index(1, drop=True) \
.rename_axis([None, None])
Sure enough! This second solution is taking the first valid value in column X. It is now nonsensical to have forced that value to align with the index a.
This could be on of the easy solution.
df.groupby(level = 0, as_index= False).nth([0,-1])
X Y
a a 0 1
d 6 7
b e 8 9
g 12 13
c h 14 15
i 16 17
d j 18 19
Hope this helps. (Y)
Please try this:
For last value: df.groupby('Column_name').nth(-1),
For first value: df.groupby('Column_name').nth(0)

Categories