Pandas: renaming columns that have the same name - python

I have a dataframe that has duplicated column names a, b and b. I would like to rename the second b into c.
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "b1": [7, 8, 9]})
df.rename(index=str, columns={'b1' : 'b'})
Trying this with no success..
df.rename(index=str, columns={2 : "c"})

try:
>>> df.columns = ['a', 'b', 'c']
>>> df
a b c
0 1 4 7
1 2 5 8
2 3 6 9

You can always just manually rename all the columns.
df.columns = ['a', 'b', 'c']

You can simply do:
df.columns = ['a','b','c']

If your columns are ordered and you want lettered columns, don't type names out manually. This is prone to error.
You can use string.ascii_lowercase, assuming you have a maximum of 26 columns:
from string import ascii_lowercase
df = pd.DataFrame(columns=['a', 'b', 'b1'])
df.columns = list(ascii_lowercase[:len(df.columns)])
print(df.columns)
Index(['a', 'b', 'c'], dtype='object')

These solutions don't take into account the problem with having many cols.
Here is a solution where, independent on the amount of columns, you can rename the columns with the same name to a unique name
df.columns = ['name'+str(col[0]) if col[1] == 'name' else col[1] for col in enumerate(df.columns)]

Related

How to calculate number of rows between 2 indexes of pandas dataframe

I have the following Pandas dataframe in Python:
import pandas as pd
d = {'col1': [1, 2, 3, 4, 5], 'col2': [6, 7, 8, 9, 10]}
df = pd.DataFrame(data=d)
df.index=['A', 'B', 'C', 'D', 'E']
df
which gives the following output:
col1 col2
A 1 6
B 2 7
C 3 8
D 4 9
E 5 10
I need to write a function (say the name will be getNrRows(fromIndex) ) that will take an index value as input and will return the number of rows between that given index and the last index of the dataframe.
For instance:
nrRows = getNrRows("C")
print(nrRows)
> 2
Because it takes 2 steps (rows) from the index C to the index E.
How can I write such a function in the most elegant way?
The simplest way might be
len(df[row_index:]) - 1
For your information we have built-in function get_indexer_for
len(df)-df.index.get_indexer_for(['C'])-1
Out[179]: array([2], dtype=int64)

How to update a dataframe with values from another dataframe when indexes and columns don't not match

I want to update the dataframe df with the values coming from another dataframe df_new if some condition hold true.
The indexes and the columns names of the dataframes does not match. How could it be done?
names = ['a', 'b', 'c']
df = pd.DataFrame({
'val': [10, 10, 10],
}, index=names)
new_names = ['a', 'c', 'd']
df_new = pd.DataFrame({
'profile': [5, 15, 22],
}, index=new_names)
above_max = df_new['profile'] >= 7
# This works only if indexes of df and df_new match
#df.loc[above_max, 'val'] = df_new['profile']
# expected df:
# val
# a 10
# b 10
# c 15
One idea with Series.reindex for match index values of mask with another DataFrame:
s = df_new['profile'].reindex(df.index)
above_max = s >= 7
df.loc[above_max, 'val'] = s

How to replace specific character in pandas column with null?

I Have a column within a dataset, regarding categorical company sizes, which currently looks like this, where the '-' hyphens are currently representing missing data:
I want to change the '-' in missing values with nulls so i can analyse missing data. However when I use the pd replace tool (see following code) with a None value it seems to also make any of the genuine entries as they also contain hyphens (e.g 51-200).
df['Company Size'].replace({'-': None},inplace =True, regex= True)
How can I replace only lone standing hyphens and leave the other entries untouched?
You need not to use regex=True.
df['Company Size'].replace({'-': None},inplace =True)
You could also just do:
df['column_name'] = df['column_name'].replace('-','None')
import numpy as np
df.replace('-', np.NaN, inplace=True)
This code worked for me.
you can do it like this
import numpy as np
import pandas as pd
df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
'B': [5, 6, 7, 8, 9],
'C': ['a', '-', 'c--', 'd', 'e']})
df['C'] = df['C'].replace('-', np.nan)
df = df.where((pd.notnull(df)), None)
# can also use this -> df['C'] = df['C'].where((pd.notnull(df)), None)
print(df)
output:
A B C
0 0 5 a
1 1 6 None
2 2 7 c--
3 3 8 d
4 4 9 e
another example:
df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
'B': ['5-5', '-', 7, 8, 9],
'C': ['a', 'b', 'c--', 'd', 'e']})
df['B'] = df['B'].replace('-', np.nan)
df = df.where((pd.notnull(df)), None)
print(df)
output:
A B C
0 0 5-5 a
1 1 None b
2 2 7 c--
3 3 8 d
4 4 9 e

How to add a value to specific columns of a pandas dataframe?

I have to perform the same arithmetic operation on specific columns of a pandas DataFrame. I do it as
c.loc[:,'col3'] += cons
c.loc[:,'col5'] += cons
c.loc[:,'col6'] += cons
There should be a simpler approach to do all of these in one operation. I mean updating col3,col5,col6 in one command.
pd.DataFrame.loc label indexing accepts lists:
df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['A', 'B', 'C'])
df.loc[:, ['B', 'C']] += 10
print(df)
A B C
0 1 12 13
1 4 15 16

Select columns of pandas dataframe using a dictionary list value

I have column names in a dictionary and would like to select those columns from a dataframe.
In the example below, how do I select dictionary values 'b', 'c' and save it in to df1?
import pandas as pd
ds = {'cols': ['b', 'c']}
d = {'a': [2, 3], 'b': [3, 4], 'c': [4, 5]}
df_in = pd.DataFrame(data=d)
print(ds)
print(df_in)
df_out = df_in[[ds['cols']]]
print(df_out)
TypeError: unhashable type: 'list'
Remove nested list - []:
df_out = df_in[ds['cols']]
print(df_out)
b c
0 3 4
1 4 5
According to ref, just need to drop one set of brackets.
df_out = df_in[ds['cols']]

Categories