Hi I would like to change the column names of a part of the columns in my dataframe.
When I print just the part I want to change it to: palColAdj.iloc[:, 73:].columns.str[:-2] I see the outcome I would like to see, but when I try to change it in my original dataframe I don't see the change.
So if I write either
palColAdj.iloc[:, 73:].columns=palColAdj.iloc[:, 73:].columns.str[:-2]
or
prodColAdj.iloc[:, 39:].columns=prodColAdj.iloc[:, 39:].columns.str[:-2].to_list()
and afterwards I print
prodColAdj.head()
I still see the original column names. How can this be?
Here's a way to do it.
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['aaaa', 'bbbb', 'ccccc'])
# aaaa bbbb ccccc
# 0 1 2 3
# 1 4 5 6
# 2 7 8 9
cols = df2.columns.values
dict = {}
for col in cols:
dict[col] = col[:-2]
df.rename(dict, axis=1, inplace=True)
# aa bb ccc
# 0 1 2 3
# 1 4 5 6
# 2 7 8 9
To pick specific cols, edit this:
cols = df2.columns.values[0:2]
# array(['aaaa', 'bbbb'], dtype=object)
Related
I'd like to broadcast or expand a dataframe columns-wise from a smaller set index to a larger set index based on a mapping specification. I have the following example, please accept small mistakes as this is untested
import pandas as pd
# my broadcasting mapper spec
mapper = pd.Series(data=['a', 'b', 'c'], index=[1, 2, 2])
# my data
df = pd.DataFrame(data={1: [3, 4], 2: [5, 6]})
print(df)
# 1 2
# --------
# 0 3 5
# 1 4 6
# and I would like to get
df2 = ...
print(df2)
# a b c
# -----------
# 0 3 5 5
# 1 4 6 6
Simply mapping the columns will not work as there are duplicates, I would like to instead expand to the new values as defined in mapper:
# this will of course not work => raises InvalidIndexError
df.columns = df.columns.as_series().map(mapper)
A naive approach would just iterate the spec ...
df2 = pd.DataFrame(index=df.index)
for i, v in df.iteritems():
df2[v] = df[i]
Use reindex and set_axis:
out = df.reindex(columns=mapper.index).set_axis(mapper, axis=1)
Output:
a b c
0 3 5 5
1 4 6 6
You can use pd.concat + df.get:
pd.concat({v:df.get(k) for k,v in mapper.items()},axis=1)
a b c
0 3 5 5
1 4 6 6
Let's consider data frame following:
import pandas as pd
df = pd.DataFrame([[1, -2, 3, -5, 4 ,2 ,7 ,-8 ,2], [2, -4, 6, 7, -8, 9, 5, 3, 2], [2, 4, 6, 7, 8, 9, 5, 3, 2], [1, 2, 3, 4, 5, 6, 7, 8, 9]]).transpose()
df.columns = ["A", "B", "C", "D"]
A B C D
0 1 2 2 1
1 -2 -4 4 2
2 3 6 6 3
3 -5 7 7 4
4 4 -8 8 5
5 2 9 9 6
6 7 5 5 7
7 -8 3 3 8
8 2 2 2 9
I want to add at the end of the column name "pos" if column contain only positive values. What I would do with it is:
pos_idx = df.loc[:, (df>0).all()].columns
df[pos_idx].columns = df[pos_idx].columns + "pos"
However it seems not to work - it returns no error, however it does not change column names. Moreover, what is very interesting, is that code:
df.columns = df.columns + "anything"
actually add to column names word "anything". Could you please explain to me why it happens (works in general case, but it does not work on index case), and how to do this correctly?
You are saving the new column names onto a copy of the dataframe. The below statement is not overwriting column names of df, but only of the slice df[pos_idx]
df[pos_idx].columns = df[pos_idx].columns + "pos"
Your second code example directly acccesses df, that's why that one works
How to make it work? --> Define the "full columns list" (separately). Afterwards write it into df directly.
How to define the "full list"? Add "pos" as a suffix to all cols which don't have any occurrence of values that are <=0.
my_col_list = [col+(count==0)*"_pos" for col, count in (df <= 0).sum().to_dict().items()]
df.columns = my_col_list
First of all, use .rename() function to change the name of a column.
To add 'pos' to columns with non negative values you can use this:
renamed_columns = {i:i+' pos' for i in df.columns if df[i].min()>=0}
df.rename(columns=renamed_columns,inplace=True)
I created a list as a mean of 2 other columns, the length of the list is same as the number of rows in the dataframe. But when I try to add that list as a column to the dataframe, the entire list gets assigned to each row instead of only corresponding values of the list.
glucose_mean = []
for i in range(len(df)):
mean = (df['h1_glucose_max']+df['h1_glucose_min'])/2
glucose_mean.append(mean)
df['glucose'] = glucose_mean
data after adding list
I think you overcomplicated it. You don't need for-loop but only one line
df['glucose'] = (df['h1_glucose_max'] + df['h1_glucose_min']) / 2
EDIT:
If you want to work with every row separatelly then you can use .apply()
def func(row):
return (row['h1_glucose_max'] + row['h1_glucose_min']) / 2
df['glucose'] = df.apply(func, axis=1)
And if you really need to use for-loop then you can use .iterrows() (or similar functions)
glucose_mean = []
for index, row in df.iterrows():
mean = (row['h1_glucose_max'] + row['h1_glucose_min']) / 2
glucose_mean.append(mean)
df['glucose'] = glucose_mean
Minimal working example:
import pandas as pd
data = {
'h1_glucose_min': [1,2,3],
'h1_glucose_max': [4,5,6],
}
df = pd.DataFrame(data)
# - version 1 -
df['glucose_1'] = (df['h1_glucose_max'] + df['h1_glucose_min']) / 2
# - version 2 -
def func(row):
return (row['h1_glucose_max'] + row['h1_glucose_min']) / 2
df['glucose_2'] = df.apply(func, axis=1)
# - version 3 -
glucose_mean = []
for index, row in df.iterrows():
mean = (row['h1_glucose_max'] + row['h1_glucose_min']) / 2
glucose_mean.append(mean)
df['glucose_3'] = glucose_mean
print(df)
You do not need to iterate over your frame. Use this instead (example for a pseudo data frame):
df = pd.DataFrame({'col1': [1, 2, 3, 4, 5, 6, 7, 8], 'col2': [10, 9, 8, 7, 6, 5, 4, 100]})
df['mean_col1_col2'] = df[['col1', 'col2']].mean(axis=1)
df
-----------------------------------
col1 col2 mean_col1_col2
0 1 10 5.5
1 2 9 5.5
2 3 8 5.5
3 4 7 5.5
4 5 6 5.5
5 6 5 5.5
6 7 4 5.5
7 8 100 54.0
-----------------------------------
As you can see in the following example, your code is appending an entire column each time the for loop executes, so when you assign glucose_mean list as a column, each element is a list instead of a single element:
import pandas as pd
df = pd.DataFrame({'col1':[1, 2, 3, 4], 'col2':[2, 3, 4, 5]})
glucose_mean = []
for i in range(len(df)):
glucose_mean.append(df['col1'])
print((glucose_mean[0]))
df['col2'] = [5, 6, 7, 8]
print(df)
Output:
0 1
1 2
2 3
3 4
Name: col1, dtype: int64
col1 col2
0 1 5
1 2 6
2 3 7
3 4 8
I have two DataFrames that looks like this (Note: I am still a beginner and trying to learn joins better)
xx = pd.DataFrame(np.array([[13, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
yy = pd.DataFrame(np.array([[1, 2, 3,5], [4, 5, 6,5], [7, 8, 9,5]]),
columns=['aa', 'bb', 'cc','dd'])
I want to preform a left join so that I have a final table that looks like this
aa bb cc dd
4 5 6 6
7 8 9 5
I have come up with this so far
zz = pd.merge(yy,xx, how = 'left', left_on= ['aa','bb'], right_on=['a','b'])
But this gives me the incorrect output which is
Can you please help me with what correction I need to make in order to get the desired output?
Any help will be very much appreciated
Based on the expected output, you have to do an inner join not a left join. Also to join pandas DataFrames the columns must have common columns. So I've set the columns of xx to that in yy
>>>xx.columns=['aa','bb','cc']
>>>pd.merge(yy,xx,how='inner',on=['aa','bb','cc'])
aa bb cc dd
0 4 5 6 5
1 7 8 9 5
And this would be the output of left join of yy with xx:
>>>pd.merge(yy,xx,how='left',on=['aa','bb','cc'])
aa bb cc dd
0 1 2 3 5
1 4 5 6 5
2 7 8 9 5
You need the dataframes with equal column headers so another dataframe can be created by changing the column header before merging:
zz = pd.merge(xx.rename(columns={"a": "aa", "b": "bb","c":"cc"}),yy)
zz
I have a Dataframe file in which I want to switch the order of columns in only the third row while keeping other rows the same.
Under some condition, I have to switch orders for my project, but here is an example that probably has no real meaning.
Suppose the dataset is
df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
'B': [5, 6, 7, 8, 9],
'C': ['a', 'b', 'c', 'd', 'e']})
df
out[1]:
A B C
0 0 5 a
1 1 6 b
2 2 7 c
3 3 8 d
4 4 9 e
I want to have the output:
A B C
0 0 5 a
1 1 6 b
2 **7 2** c
3 3 8 d
4 4 9 e
How do I do it?
I have tried:
new_order = [1, 0, 2] # specify new order of the third row
i = 2 # specify row number
df.iloc[i] = df[df.columns[new_order]].loc[i] # reorder the third row only and assign new values to df
I observed from the output of the right-hand side that the columns are reordering as I wanted:
df[df.columns[new_order]].loc[i]
Out[2]:
B 7
A 2
C c
Name: 2, dtype: object
But when assigned to df again, it did nothing. I guess it's because of the name matching.
Can someone help me? Thanks in advance!