Pandas join two dataframes based on relationship described in dictionary - python

I have two dataframes that I want to join based on a relationship described in a dictionary of lists, where the keys in the dictionary refer to ids from dfA idA column, and the items in the list are ids from dfB idB column. The dataframes and dictionary look something like this:
dfA
colA colB idA
0 a abc 3
1 b def 4
2 b ghi 5
dfB
colX idB colZ
0 bob 7 a
1 bob 7 b
2 bob 7 c
3 jim 8 d
4 jake 9 a
5 jake 9 e
myDict = { '3': [ '7', '8' ], '4': [], '5': ['7', '9'] }
How can I use myDict to join the two dataframes to produce a dataframe like the following?
dfC
colA colB idA colX idB colZ
0 a abc 3 bob 7 a
1 b
2 c
3 jim 8 d
4 b def 4 None None None
5 b ghi 5 bob 7 a
6 b
7 c
8 jake 9 a
9 e

You can create a linking table (DataFrame) from your dictionary. Below full working example. It might need some row and column sorting at the end to produce exactly your output.
import pandas as pd
import numpy as np
dfA = pd.DataFrame({'colA': ('a', 'b', 'b'),
'colB': ('abc', 'def', 'ghi'),
'idA': ('3', '4', '5')})
dfB = pd.DataFrame({'colX': ('bob', 'bob', 'bob', 'jim', 'jake', 'jake'),
'idB': ('7', '7', '7', '8', '9', '9'),
'colZ': ('a', 'b', 'c', 'd', 'a', 'e')})
myDict = {'3': ['7', '8'], '4': [], '5': ['7', '9']}
dfC = pd.DataFrame(columns=['idA', 'idB'])
i = 0
for key, value in myDict.items():
# the if statement is for empty list to create one record with NaNs
if not value:
dfC.loc[i, 'idA'] = key
dfC.loc[i, 'idB'] = np.nan
i += 1
for val in value:
dfC.loc[i, 'idA'] = key
dfC.loc[i, 'idB'] = val
i += 1
temp = dfA.merge(dfC, how='right')
result = temp.merge(dfB, how='outer')
print(result)
The output is:
colA colB idA idB colX colZ
0 a abc 3 7 bob a
1 a abc 3 7 bob b
2 a abc 3 7 bob c
3 b ghi 5 7 bob a
4 b ghi 5 7 bob b
5 b ghi 5 7 bob c
6 a abc 3 8 jim d
7 b def 4 NaN NaN NaN
8 b ghi 5 9 jake a
9 b ghi 5 9 jake e

This is not greatest solution, but it is fairly simple and gets the job done
temp = pd.DataFrame(dfA.idAaux.tolist(), index = dfA.idA).stack()
temp = temp.reset_index()[['idA', 0]]
temp.columns = ['idA', 'idB']
temp2 = dfA.merge(temp, left_on='idA', right_on='idA', how='left').drop('idAaux', axis=1)
temp2['idB'] = pd.to_numeric(temp2['idB'])
res= temp2.merge(dfB, left_on='idB', right_on='idB', how='left')
Output:
colA colB idA idB colX colZ
0 a abc 3 7.0 bob a
1 a abc 3 7.0 bob b
2 a abc 3 7.0 bob c
3 a abc 3 8.0 jim d
4 b def 4 NaN NaN NaN
5 b ghi 5 7.0 bob a
6 b ghi 5 7.0 bob b
7 b ghi 5 7.0 bob c
8 b ghi 5 9.0 jake a
9 b ghi 5 9.0 jake e

Related

How to add data to NaN rows from another data

How can i add datas from another data, but without removing NaN values?
I have three data similar to this
df_main = df_main = pd.DataFrame({'ID': ['10', '11', '12', '13', '14', '15', '16'], 'Name': [ np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]})
ID Name
0 10 NaN
1 11 NaN
2 12 NaN
3 13 NaN
4 14 NaN
5 15 NaN
6 16 NaN
df2 = pd.DataFrame({'ID': ['10', '11', '12'], 'Name': [ 'Peter', 'Bruce', 'Tony']})
ID Name
0 10 Peter
1 11 Bruce
2 12 Tony
df3 = pd.DataFrame({'ID': ['15', '16'], 'Name': ['Wanda', 'Natasha']})
ID Name
0 15 Wanda
1 16 Natasha
What I want to have is data like this:
ID Name
0 10 Peter
1 11 Bruce
2 12 Tony
3 13 NaN
4 14 NaN
5 15 Wanda
6 16 Natasha
I tried this code but it did not work
for id in df2['ID'].unique():
if id in df_main['ID'].unique():
df_main.loc[df_main['ID'] == id, 'Name'] = df2.loc[df2['ID'] == id, 'Name']
for id in df3['ID'].unique():
if id in df_main['ID'].unique():
df_main.loc[df_main['ID'] == id, 'Name'] = df3.loc[df3['ID'] == id, 'Name']
IIUC, you can use concat with GroupBy.first :
out = pd.concat([df2, df_main, df3]).groupby("ID", as_index=False).first()
Output :
print(out)
ID Name
0 10 Peter
1 11 Bruce
2 12 Tony
3 13 None
4 14 None
5 15 Wanda
6 16 Natasha
concat df2/df3 and map the values:
df_main['Name'] = df_main['ID'].map(pd.concat([df2, df3]).set_index('ID')['Name'])
Output:
ID Name
0 10 Peter
1 11 Bruce
2 12 Tony
3 13 NaN
4 14 NaN
5 15 Wanda
6 16 Natasha
df_main.set_index("ID").combine_first(df2.set_index("ID"))\
.combine_first(df3.set_index("ID")).reset_index()
out
ID Name
0 10 Peter
1 11 Bruce
2 12 Tony
3 13 NaN
4 14 NaN
5 15 Wanda
6 16 Natasha

Pandas DataFrame Wide to Long based on first column values

I would like to transform a Pandas DataFrame of the following wide format
df = pd.DataFrame([['A', '1', '2', '3'], ['B', '4', '5', '6'], ['C', '7', '8', '9']], columns=['ABC', 'def', 'ghi', 'jkl'])
df =
ABC def ghi jkl
0 A 1 2 3
1 B 4 5 6
2 C 7 8 9
into a long format, where the values from the first column still correspond to the values in the lower-case columns. The column names cannot be used as stub names. The names of the new columns are irrelevant and could be renamed later.
The output should look something like this:
df =
0 1
0 A 1
1 A 2
2 A 3
3 B 4
4 B 5
5 B 6
6 C 7
7 C 8
8 C 9
I am not sure how to best and efficiently do this. Can this be done with wide_to_long()? Then I would not know how to deal with stub names. The best would be an efficient one-liner that can be used on a large table.
Many thanks!!
Use DataFrame.melt with DataFrame.sort_index and remove variable column:
df1 = (df.melt("ABC", value_name='new', ignore_index=False)
.sort_index(ignore_index=True)
.drop('variable', axis=1)
)
print (df1)
ABC new
0 A 1
1 A 2
2 A 3
3 B 4
4 B 5
5 B 6
6 C 7
7 C 8
8 C 9
If need more dynamic solution with generate first value of columns names:
first = df.columns[0]
df1 = (df.melt(first, value_name='new', ignore_index=False)
.sort_index(ignore_index=True)
.drop('variable', axis=1))
You can use df.stack:
>>> df.set_index('ABC') \
.stack() \
.reset_index(level='ABC') \
.reset_index(drop=True)
ABC 0
0 A 1
1 A 2
2 A 3
3 B 4
4 B 5
5 B 6
6 C 7
7 C 8
8 C 9
or use df.melt as suggested by #MustafaAydın:
>>> df.melt('ABC') \
.sort_values('ABC') \
.drop(columns='variable') \
.reset_index(drop=True)
ABC value
0 A 1
1 A 2
2 A 3
3 B 4
4 B 5
5 B 6
6 C 7
7 C 8
8 C 9

Pandas groupby on one column witout losing others columns?

I have a problem with the groupby and pandas, at the beginning I have this chart :
import pandas as pd
data = {'Code_Name':[1,2,3,4,1,2,3,4] ,'Name':['Tom', 'Nicko', 'Krish','Jack kr','Tom', 'Nick', 'Krishx', 'Jacks'],'Cat':['A', 'B','C','D','A', 'B','C','D'], 'T':[9, 7, 14, 12,4, 3, 12, 11]}
# Create DataFrame
df = pd.DataFrame(data)
df
i have this :
Code_Name Name Cat T
0 1 Tom A 9
1 2 Nick B 7
2 3 Krish C 14
3 4 Jack kr D 12
4 1 Tom A 4
5 2 Nick B 3
6 3 Krishx C 12
7 4 Jacks D 11
Now i with groupby :
df.groupby(['Code_Name','Name','Cat'],as_index=False)['T'].sum()
i got this:
Code_Name Name Cat T
0 1 Tom A 13
1 2 Nick B 10
2 3 Krish C 14
3 3 Krishx C 12
4 4 Jack kr D 12
5 4 Jacks D 11
But for me , i need this result :
Code_Name Name Cat T
0 1 Tom A 13
1 2 Nick B 10
2 3 Krish C 26
3 4 Jack D 23
i don't care about Name the Code_name is only thing important for me with sum of T
Thank's
There is 2 ways - for each column with avoid losts add aggreation function - first, last or ', '.join obviuosly for strings columns and aggregation dunctions like sum, mean for numeric columns:
df = df.groupby('Code_Name',as_index=False).agg({'Name':'first', 'Cat':'first', 'T':'sum'})
print (df)
Code_Name Name Cat T
0 1 Tom A 13
1 2 Nicko B 10
2 3 Krish C 26
3 4 Jack kr D 23
Or if some values are duplicated per groups like here Cat values add this columns to groupby - only order should be changed in output:
df = df.groupby(['Code_Name','Cat'],as_index=False).agg({'Name':'first', 'T':'sum'})
print (df)
Code_Name Cat Name T
0 1 A Tom 13
1 2 B Nicko 10
2 3 C Krish 26
3 4 D Jack kr 23
If you don't care about the other variable then just group by the column of interest:
gb = df.groupby(['Code_Name'],as_index=False)['T'].sum()
print(gb)
Code_Name T
0 1 13
1 2 10
2 3 26
3 4 23
Now to get your output, you can take the last value of Name for each group:
gb = df.groupby(['Code_Name'],as_index=False).agg({'Name': 'last', 'Cat': 'first', 'T': 'sum'})
print(gb)
0 1 Tom A 13
1 2 Nick B 10
2 3 Krishx C 26
3 4 Jacks D 23
Perhaps you can try:
(df.groupby("Code_Name", as_index=False)
.agg({"Name":"first", "Cat":"first", "T":"sum"}))
see link: https://datascience.stackexchange.com/questions/53405/pandas-dataframe-groupby-and-then-sum-multi-columns-sperately for the original answer

How do I name the column and index in a Pandas dataframe? [duplicate]

How do I get the index column name in python pandas? Here's an example dataframe:
Column 1
Index Title
Apples 1
Oranges 2
Puppies 3
Ducks 4
What I'm trying to do is get/set the dataframe index title. Here is what i tried:
import pandas as pd
data = {'Column 1' : [1., 2., 3., 4.],
'Index Title' : ["Apples", "Oranges", "Puppies", "Ducks"]}
df = pd.DataFrame(data)
df.index = df["Index Title"]
del df["Index Title"]
print df
Anyone know how to do this?
You can just get/set the index via its name property
In [7]: df.index.name
Out[7]: 'Index Title'
In [8]: df.index.name = 'foo'
In [9]: df.index.name
Out[9]: 'foo'
In [10]: df
Out[10]:
Column 1
foo
Apples 1
Oranges 2
Puppies 3
Ducks 4
You can use rename_axis, for removing set to None:
d = {'Index Title': ['Apples', 'Oranges', 'Puppies', 'Ducks'],'Column 1': [1.0, 2.0, 3.0, 4.0]}
df = pd.DataFrame(d).set_index('Index Title')
print (df)
Column 1
Index Title
Apples 1.0
Oranges 2.0
Puppies 3.0
Ducks 4.0
print (df.index.name)
Index Title
print (df.columns.name)
None
The new functionality works well in method chains.
df = df.rename_axis('foo')
print (df)
Column 1
foo
Apples 1.0
Oranges 2.0
Puppies 3.0
Ducks 4.0
You can also rename column names with parameter axis:
d = {'Index Title': ['Apples', 'Oranges', 'Puppies', 'Ducks'],'Column 1': [1.0, 2.0, 3.0, 4.0]}
df = pd.DataFrame(d).set_index('Index Title').rename_axis('Col Name', axis=1)
print (df)
Col Name Column 1
Index Title
Apples 1.0
Oranges 2.0
Puppies 3.0
Ducks 4.0
print (df.index.name)
Index Title
print (df.columns.name)
Col Name
print df.rename_axis('foo').rename_axis("bar", axis="columns")
bar Column 1
foo
Apples 1.0
Oranges 2.0
Puppies 3.0
Ducks 4.0
print df.rename_axis('foo').rename_axis("bar", axis=1)
bar Column 1
foo
Apples 1.0
Oranges 2.0
Puppies 3.0
Ducks 4.0
From version pandas 0.24.0+ is possible use parameter index and columns:
df = df.rename_axis(index='foo', columns="bar")
print (df)
bar Column 1
foo
Apples 1.0
Oranges 2.0
Puppies 3.0
Ducks 4.0
Removing index and columns names means set it to None:
df = df.rename_axis(index=None, columns=None)
print (df)
Column 1
Apples 1.0
Oranges 2.0
Puppies 3.0
Ducks 4.0
If MultiIndex in index only:
mux = pd.MultiIndex.from_arrays([['Apples', 'Oranges', 'Puppies', 'Ducks'],
list('abcd')],
names=['index name 1','index name 1'])
df = pd.DataFrame(np.random.randint(10, size=(4,6)),
index=mux,
columns=list('ABCDEF')).rename_axis('col name', axis=1)
print (df)
col name A B C D E F
index name 1 index name 1
Apples a 5 4 0 5 2 2
Oranges b 5 8 2 5 9 9
Puppies c 7 6 0 7 8 3
Ducks d 6 5 0 1 6 0
print (df.index.name)
None
print (df.columns.name)
col name
print (df.index.names)
['index name 1', 'index name 1']
print (df.columns.names)
['col name']
df1 = df.rename_axis(('foo','bar'))
print (df1)
col name A B C D E F
foo bar
Apples a 5 4 0 5 2 2
Oranges b 5 8 2 5 9 9
Puppies c 7 6 0 7 8 3
Ducks d 6 5 0 1 6 0
df2 = df.rename_axis('baz', axis=1)
print (df2)
baz A B C D E F
index name 1 index name 1
Apples a 5 4 0 5 2 2
Oranges b 5 8 2 5 9 9
Puppies c 7 6 0 7 8 3
Ducks d 6 5 0 1 6 0
df2 = df.rename_axis(index=('foo','bar'), columns='baz')
print (df2)
baz A B C D E F
foo bar
Apples a 5 4 0 5 2 2
Oranges b 5 8 2 5 9 9
Puppies c 7 6 0 7 8 3
Ducks d 6 5 0 1 6 0
Removing index and columns names means set it to None:
df2 = df.rename_axis(index=(None,None), columns=None)
print (df2)
A B C D E F
Apples a 6 9 9 5 4 6
Oranges b 2 6 7 4 3 5
Puppies c 6 3 6 3 5 1
Ducks d 4 9 1 3 0 5
For MultiIndex in index and columns is necessary working with .names instead .name and set by list or tuples:
mux1 = pd.MultiIndex.from_arrays([['Apples', 'Oranges', 'Puppies', 'Ducks'],
list('abcd')],
names=['index name 1','index name 1'])
mux2 = pd.MultiIndex.from_product([list('ABC'),
list('XY')],
names=['col name 1','col name 2'])
df = pd.DataFrame(np.random.randint(10, size=(4,6)), index=mux1, columns=mux2)
print (df)
col name 1 A B C
col name 2 X Y X Y X Y
index name 1 index name 1
Apples a 2 9 4 7 0 3
Oranges b 9 0 6 0 9 4
Puppies c 2 4 6 1 4 4
Ducks d 6 6 7 1 2 8
Plural is necessary for check/set values:
print (df.index.name)
None
print (df.columns.name)
None
print (df.index.names)
['index name 1', 'index name 1']
print (df.columns.names)
['col name 1', 'col name 2']
df1 = df.rename_axis(('foo','bar'))
print (df1)
col name 1 A B C
col name 2 X Y X Y X Y
foo bar
Apples a 2 9 4 7 0 3
Oranges b 9 0 6 0 9 4
Puppies c 2 4 6 1 4 4
Ducks d 6 6 7 1 2 8
df2 = df.rename_axis(('baz','bak'), axis=1)
print (df2)
baz A B C
bak X Y X Y X Y
index name 1 index name 1
Apples a 2 9 4 7 0 3
Oranges b 9 0 6 0 9 4
Puppies c 2 4 6 1 4 4
Ducks d 6 6 7 1 2 8
df2 = df.rename_axis(index=('foo','bar'), columns=('baz','bak'))
print (df2)
baz A B C
bak X Y X Y X Y
foo bar
Apples a 2 9 4 7 0 3
Oranges b 9 0 6 0 9 4
Puppies c 2 4 6 1 4 4
Ducks d 6 6 7 1 2 8
Removing index and columns names means set it to None:
df2 = df.rename_axis(index=(None,None), columns=(None,None))
print (df2)
A B C
X Y X Y X Y
Apples a 2 0 2 5 2 0
Oranges b 1 7 5 5 4 8
Puppies c 2 4 6 3 6 5
Ducks d 9 6 3 9 7 0
And #Jeff solution:
df.index.names = ['foo','bar']
df.columns.names = ['baz','bak']
print (df)
baz A B C
bak X Y X Y X Y
foo bar
Apples a 3 4 7 3 3 3
Oranges b 1 2 5 8 1 0
Puppies c 9 6 3 9 6 3
Ducks d 3 2 1 0 1 0
df.index.name should do the trick.
Python has a dir function that let's you query object attributes. dir(df.index) was helpful here.
Use df.index.rename('foo', inplace=True) to set the index name.
Seems this api is available since pandas 0.13.
If you do not want to create a new row but simply put it in the empty cell then use:
df.columns.name = 'foo'
Otherwise use:
df.index.name = 'foo'
Setting the index name can also be accomplished at creation:
pd.DataFrame(data={'age': [10,20,30], 'height': [100, 170, 175]}, index=pd.Series(['a', 'b', 'c'], name='Tag'))
df.columns.values also give us the column names
The solution for multi-indexes is inside jezrael's cyclopedic answer, but it took me a while to find it so I am posting a new answer:
df.index.names gives the names of a multi-index (as a Frozenlist).
To just get the index column names df.index.names will work for both a single Index or MultiIndex as of the most recent version of pandas.
As someone who found this while trying to find the best way to get a list of index names + column names, I would have found this answer useful:
names = list(filter(None, df.index.names + df.columns.values.tolist()))
This works for no index, single column Index, or MultiIndex. It avoids calling reset_index() which has an unnecessary performance hit for such a simple operation. I'm surprised there isn't a built in method for this (that I've come across). I guess I run into needing this more often because I'm shuttling data from databases where the dataframe index maps to a primary/unique key, but is really just another column to me.

Pandas index column title or name

How do I get the index column name in python pandas? Here's an example dataframe:
Column 1
Index Title
Apples 1
Oranges 2
Puppies 3
Ducks 4
What I'm trying to do is get/set the dataframe index title. Here is what i tried:
import pandas as pd
data = {'Column 1' : [1., 2., 3., 4.],
'Index Title' : ["Apples", "Oranges", "Puppies", "Ducks"]}
df = pd.DataFrame(data)
df.index = df["Index Title"]
del df["Index Title"]
print df
Anyone know how to do this?
You can just get/set the index via its name property
In [7]: df.index.name
Out[7]: 'Index Title'
In [8]: df.index.name = 'foo'
In [9]: df.index.name
Out[9]: 'foo'
In [10]: df
Out[10]:
Column 1
foo
Apples 1
Oranges 2
Puppies 3
Ducks 4
You can use rename_axis, for removing set to None:
d = {'Index Title': ['Apples', 'Oranges', 'Puppies', 'Ducks'],'Column 1': [1.0, 2.0, 3.0, 4.0]}
df = pd.DataFrame(d).set_index('Index Title')
print (df)
Column 1
Index Title
Apples 1.0
Oranges 2.0
Puppies 3.0
Ducks 4.0
print (df.index.name)
Index Title
print (df.columns.name)
None
The new functionality works well in method chains.
df = df.rename_axis('foo')
print (df)
Column 1
foo
Apples 1.0
Oranges 2.0
Puppies 3.0
Ducks 4.0
You can also rename column names with parameter axis:
d = {'Index Title': ['Apples', 'Oranges', 'Puppies', 'Ducks'],'Column 1': [1.0, 2.0, 3.0, 4.0]}
df = pd.DataFrame(d).set_index('Index Title').rename_axis('Col Name', axis=1)
print (df)
Col Name Column 1
Index Title
Apples 1.0
Oranges 2.0
Puppies 3.0
Ducks 4.0
print (df.index.name)
Index Title
print (df.columns.name)
Col Name
print df.rename_axis('foo').rename_axis("bar", axis="columns")
bar Column 1
foo
Apples 1.0
Oranges 2.0
Puppies 3.0
Ducks 4.0
print df.rename_axis('foo').rename_axis("bar", axis=1)
bar Column 1
foo
Apples 1.0
Oranges 2.0
Puppies 3.0
Ducks 4.0
From version pandas 0.24.0+ is possible use parameter index and columns:
df = df.rename_axis(index='foo', columns="bar")
print (df)
bar Column 1
foo
Apples 1.0
Oranges 2.0
Puppies 3.0
Ducks 4.0
Removing index and columns names means set it to None:
df = df.rename_axis(index=None, columns=None)
print (df)
Column 1
Apples 1.0
Oranges 2.0
Puppies 3.0
Ducks 4.0
If MultiIndex in index only:
mux = pd.MultiIndex.from_arrays([['Apples', 'Oranges', 'Puppies', 'Ducks'],
list('abcd')],
names=['index name 1','index name 1'])
df = pd.DataFrame(np.random.randint(10, size=(4,6)),
index=mux,
columns=list('ABCDEF')).rename_axis('col name', axis=1)
print (df)
col name A B C D E F
index name 1 index name 1
Apples a 5 4 0 5 2 2
Oranges b 5 8 2 5 9 9
Puppies c 7 6 0 7 8 3
Ducks d 6 5 0 1 6 0
print (df.index.name)
None
print (df.columns.name)
col name
print (df.index.names)
['index name 1', 'index name 1']
print (df.columns.names)
['col name']
df1 = df.rename_axis(('foo','bar'))
print (df1)
col name A B C D E F
foo bar
Apples a 5 4 0 5 2 2
Oranges b 5 8 2 5 9 9
Puppies c 7 6 0 7 8 3
Ducks d 6 5 0 1 6 0
df2 = df.rename_axis('baz', axis=1)
print (df2)
baz A B C D E F
index name 1 index name 1
Apples a 5 4 0 5 2 2
Oranges b 5 8 2 5 9 9
Puppies c 7 6 0 7 8 3
Ducks d 6 5 0 1 6 0
df2 = df.rename_axis(index=('foo','bar'), columns='baz')
print (df2)
baz A B C D E F
foo bar
Apples a 5 4 0 5 2 2
Oranges b 5 8 2 5 9 9
Puppies c 7 6 0 7 8 3
Ducks d 6 5 0 1 6 0
Removing index and columns names means set it to None:
df2 = df.rename_axis(index=(None,None), columns=None)
print (df2)
A B C D E F
Apples a 6 9 9 5 4 6
Oranges b 2 6 7 4 3 5
Puppies c 6 3 6 3 5 1
Ducks d 4 9 1 3 0 5
For MultiIndex in index and columns is necessary working with .names instead .name and set by list or tuples:
mux1 = pd.MultiIndex.from_arrays([['Apples', 'Oranges', 'Puppies', 'Ducks'],
list('abcd')],
names=['index name 1','index name 1'])
mux2 = pd.MultiIndex.from_product([list('ABC'),
list('XY')],
names=['col name 1','col name 2'])
df = pd.DataFrame(np.random.randint(10, size=(4,6)), index=mux1, columns=mux2)
print (df)
col name 1 A B C
col name 2 X Y X Y X Y
index name 1 index name 1
Apples a 2 9 4 7 0 3
Oranges b 9 0 6 0 9 4
Puppies c 2 4 6 1 4 4
Ducks d 6 6 7 1 2 8
Plural is necessary for check/set values:
print (df.index.name)
None
print (df.columns.name)
None
print (df.index.names)
['index name 1', 'index name 1']
print (df.columns.names)
['col name 1', 'col name 2']
df1 = df.rename_axis(('foo','bar'))
print (df1)
col name 1 A B C
col name 2 X Y X Y X Y
foo bar
Apples a 2 9 4 7 0 3
Oranges b 9 0 6 0 9 4
Puppies c 2 4 6 1 4 4
Ducks d 6 6 7 1 2 8
df2 = df.rename_axis(('baz','bak'), axis=1)
print (df2)
baz A B C
bak X Y X Y X Y
index name 1 index name 1
Apples a 2 9 4 7 0 3
Oranges b 9 0 6 0 9 4
Puppies c 2 4 6 1 4 4
Ducks d 6 6 7 1 2 8
df2 = df.rename_axis(index=('foo','bar'), columns=('baz','bak'))
print (df2)
baz A B C
bak X Y X Y X Y
foo bar
Apples a 2 9 4 7 0 3
Oranges b 9 0 6 0 9 4
Puppies c 2 4 6 1 4 4
Ducks d 6 6 7 1 2 8
Removing index and columns names means set it to None:
df2 = df.rename_axis(index=(None,None), columns=(None,None))
print (df2)
A B C
X Y X Y X Y
Apples a 2 0 2 5 2 0
Oranges b 1 7 5 5 4 8
Puppies c 2 4 6 3 6 5
Ducks d 9 6 3 9 7 0
And #Jeff solution:
df.index.names = ['foo','bar']
df.columns.names = ['baz','bak']
print (df)
baz A B C
bak X Y X Y X Y
foo bar
Apples a 3 4 7 3 3 3
Oranges b 1 2 5 8 1 0
Puppies c 9 6 3 9 6 3
Ducks d 3 2 1 0 1 0
df.index.name should do the trick.
Python has a dir function that let's you query object attributes. dir(df.index) was helpful here.
Use df.index.rename('foo', inplace=True) to set the index name.
Seems this api is available since pandas 0.13.
If you do not want to create a new row but simply put it in the empty cell then use:
df.columns.name = 'foo'
Otherwise use:
df.index.name = 'foo'
Setting the index name can also be accomplished at creation:
pd.DataFrame(data={'age': [10,20,30], 'height': [100, 170, 175]}, index=pd.Series(['a', 'b', 'c'], name='Tag'))
df.columns.values also give us the column names
The solution for multi-indexes is inside jezrael's cyclopedic answer, but it took me a while to find it so I am posting a new answer:
df.index.names gives the names of a multi-index (as a Frozenlist).
To just get the index column names df.index.names will work for both a single Index or MultiIndex as of the most recent version of pandas.
As someone who found this while trying to find the best way to get a list of index names + column names, I would have found this answer useful:
names = list(filter(None, df.index.names + df.columns.values.tolist()))
This works for no index, single column Index, or MultiIndex. It avoids calling reset_index() which has an unnecessary performance hit for such a simple operation. I'm surprised there isn't a built in method for this (that I've come across). I guess I run into needing this more often because I'm shuttling data from databases where the dataframe index maps to a primary/unique key, but is really just another column to me.

Categories