Set a string as index of pandas DataFrame - python

Given the dataframe df
df = pd.DataFrame([1,2,3,4])
print(df)
0
0 1
1 2
2 3
3 4
I would like to modify it as
print(df)
0
A 1
A 2
A 3
A 4

In this specific case you can use:
df.index = ['A'] * len(df)

Use set_index
In [797]: df.set_index([['A']*len(df)], inplace=True)
In [798]: df
Out[798]:
0
A 1
A 2
A 3
A 4

When you create the df, you can add it.
df = pd.DataFrame([1,2,3,4],index=['A']*4)
df
Out[325]:
0
A 1
A 2
A 3
A 4

Related

df.rename does not alter df column names, but df.columns and df.set_axis do (Pandas)

I have a pandas dataframe that I want to rename the columns on
When I run:
df.rename(columns={0:"C", 1:"D"}, inplace=True)
No change happens, it's still the original column names.
But if I do:
df.columns = ["C", "D"]
or
df.set_axis(["C", "D"],axis=1, inplace=True)
It works.
Why does not df.rename work?
NOTE: I specifically would like to rename the first and second column regardless of what their name is, it may change (in my case) so I can't specify it.
Example:
df = pd.DataFrame({"A": pd.Series(range(0,2)),"B": pd.Series(range(2,4))})
df
A B
1 0 2
2 1 3
df = pd.DataFrame({"A": pd.Series(range(0,2)),"B": pd.Series(range(2,4))})
df.rename(columns={0:"C", 1:"D"}, inplace=True)
df
A B
1 0 2
2 1 3
df = pd.DataFrame({"A": pd.Series(range(0,2)),"B": pd.Series(range(2,4))})
df.columns = ["C", "D"]
df
C D
0 0 2
1 1 3
df = pd.DataFrame({"A": pd.Series(range(0,2)),"B": pd.Series(range(2,4))})
df.set_axis(["C", "D"],axis=1, inplace=True)
df
C D
0 0 2
1 1 3
EDIT:
My original dataframe had the column names 0 and 1 which is why df.rename(columns={0:"C", 1:"D"}, inplace=True) worked.
Example:
df = pd.DataFrame([range(2,4), range(4,6)])
df
0 1
0 2 3
1 4 5
df.rename(columns={0:"C", 1:"D"}, inplace=True)
df
C D
0 2 3
1 4 5
If you don't want to rename by using the old name, you could zip the current columns and pass in the number of items you want.
If you're using Python 3.7+ then order should be preserved
Also don't use inplace=True
print(df)
A B
0 0 2
1 1 3
df.rename(columns=dict(zip(df.columns, ['C','E'])))
C E
0 0 2
1 1 3
df.rename(columns=dict(zip(df.columns, ['E'])))
E B
0 0 2
1 1 3

join two overlapping dataframes vertically

I am trying to update df1 with df2:
add new rows from df2 to df1
update existing rows (if row index exist)
df1 = pd.DataFrame([[1,3],[2,4]], index=[1,2], columns=['a','b'])
df2 = pd.DataFrame([[0,1],[3,2]], index=[3,2], columns=['a','b'])
The expected result should be
a b
1 1 3
2 2 3
3 1 0
but
df1.append(df2).drop_duplicates(keep='last') # drop_duplicates has no effect
gives a simple vertical stack
a b
1 1 3
2 2 4
3 1 0
2 2 3
df1.merge(df2, how='outer')
gives the same values and destroys the row index
a b
0 1 3
1 2 4
2 1 0
3 2 3
df1.join(df2)
df1.loc[df2.index] = df1.values
raise error
Try this:
new_df = df1.append(df2)
new_df = new_df[~new_df.index.duplicated(keep='last')]

map DataFrame index and forward fill nan values

I have a DataFrame with integer indexes that are missing some values (i.e. not equally spaced), I want to create a new DataFrame with equally spaced index values and forward fill column values. Below is a simple example:
have
import pandas as pd
df = pd.DataFrame(['A', 'B', 'C'], index=[0, 2, 4])
0
0 A
2 B
4 C
want to use above and create:
0
0 A
1 A
2 B
3 B
4 C
Use reindex with method='ffill':
df = df.reindex(np.arange(0, df.index.max()+1), method='ffill')
Or:
df = df.reindex(np.arange(df.index.min(), df.index.max() + 1), method='ffill')
print (df)
0
0 A
1 A
2 B
3 B
4 C
Using reindex and ffill:
df = df.reindex(range(df.index[0],df.index[-1]+1)).ffill()
print(df)
0
0 A
1 A
2 B
3 B
4 C
You can do this:
In [319]: df.reindex(list(range(df.index.min(),df.index.max()+1))).ffill()
Out[319]:
0
0 A
1 A
2 B
3 B
4 C

Pandas Python : how to create multiple columns from a list

I have a list with columns to create :
new_cols = ['new_1', 'new_2', 'new_3']
I want to create these columns in a dataframe and fill them with zero :
df[new_cols] = 0
Get error :
"['new_1', 'new_2', 'new_3'] not in index"
which is true but unfortunate as I want to create them...
EDIT : This is a duplicate of this question : Add multiple empty columns to pandas DataFrame however I keep this one too because the accepted answer here was the simple solution I was looking for, and it was not he accepted answer out there
EDIT 2 : While the accepted answer is the most simple, interesting one-liner solutions were posted below
You need to add the columns one by one.
for col in new_cols:
df[col] = 0
Also see the answers in here for other methods.
Use assign by dictionary:
df = pd.DataFrame({
'A': ['a','a','a','a','b','b','b','c','d'],
'B': list(range(9))
})
print (df)
0 a 0
1 a 1
2 a 2
3 a 3
4 b 4
5 b 5
6 b 6
7 c 7
8 d 8
new_cols = ['new_1', 'new_2', 'new_3']
df = df.assign(**dict.fromkeys(new_cols, 0))
print (df)
A B new_1 new_2 new_3
0 a 0 0 0 0
1 a 1 0 0 0
2 a 2 0 0 0
3 a 3 0 0 0
4 b 4 0 0 0
5 b 5 0 0 0
6 b 6 0 0 0
7 c 7 0 0 0
8 d 8 0 0 0
import pandas as pd
new_cols = ['new_1', 'new_2', 'new_3']
df = pd.DataFrame.from_records([(0, 0, 0)], columns=new_cols)
Is this what you're looking for ?
You can use assign:
new_cols = ['new_1', 'new_2', 'new_3']
values = [0, 0, 0] # could be anything, also pd.Series
df = df.assign(**dict(zip(new_cols, values)
Try looping through the column names before creating the column:
for col in new_cols:
df[col] = 0
We can use the Apply function to loop through the columns in the dataframe and assigning each of the element to a new field
for instance for a list in a dataframe with a list named keys
[10,20,30]
In your case since its all 0 we can directly assign them as 0 instead of looping through. But if we have values we can populate them as below
...
df['new_01']=df['keys'].apply(lambda x: x[0])
df['new_02']=df['keys'].apply(lambda x: x[1])
df['new_03']=df['keys'].apply(lambda x: x[2])

concat two dataframe using python

We have one dataframe like
-0.140447131 0.124802527 0.140780106
0.062166349 -0.121484447 -0.140675515
-0.002989106 0.13984927 0.004382326
and the other as
1
1
2
We need to concat both the dataframe like
-0.140447131 0.124802527 0.140780106 1
0.062166349 -0.121484447 -0.140675515 1
-0.002989106 0.13984927 0.004382326 2
Let's say your first dataframe is like
In [281]: df1
Out[281]:
a b c
0 -0.140447 0.124803 0.140780
1 0.062166 -0.121484 -0.140676
2 -0.002989 0.139849 0.004382
And, the second like,
In [283]: df2
Out[283]:
d
0 1
1 1
2 2
Then you could create new column for df1 using df2
In [284]: df1['d_new'] = df2['d']
In [285]: df1
Out[285]:
a b c d_new
0 -0.140447 0.124803 0.140780 1
1 0.062166 -0.121484 -0.140676 1
2 -0.002989 0.139849 0.004382 2
The assumption however being both dataframes have common index
Use pd.concat and specify the axis equal to 1 (rows):
df_new = pd.concat([df1, df2], axis=1)
>>> df_new
0 1 2 0
0 -0.140447 0.124803 0.140780 1
1 0.062166 -0.121484 -0.140676 2
2 -0.002989 0.139849 0.004382 3

Categories