Given a dataframe df like this:
Col1 Col2
Key
A 4 10
B 7 10
C 3 9
My desired data frame is
A B C
Col1 4 7 3
Col2 10 10 9
Where Col1 and Col2 are the indices.
How would I specify this? I've tried:
In [419]: mydf.T.reset_index(drop=True)
Out[419]:
Key A B C
0 4 7 3
1 10 10 9
But for some reason, the Key remains. I'm not sure what it is, and I'm not sure how to get rid of it. I've also tried mydf.T.reset_index().set_index('index') but it is very unsightly.
we can use DataFrame.rename_axis() here:
In [24]: df.T.rename_axis(None, axis=1)
Out[24]:
A B C
Col1 4 7 3
Col2 10 10 9
Related
I am creating a tool to automate some tasks. These tasks generate two DataFrames, but when concatenating them the columns are messed up as follows:
col2 col4 col3 col1
0 A 2 0 a
1 A 1 1 B
2 B 9 9 c
3 NaN 8 4 D
4 D 7 2 e
5 C 4 3 F
But I need to rearrange them so that they look like this:
col1 col2 col3 col4
0 a A 0 2
1 B A 1 1
2 c B 9 9
3 D NaN 4 8
4 e D 2 7
5 F C 3 4
Can someone help me?
I tried with sort_values, but it didn't work, and I can't find anywhere another way to try to solve the problem.
use following code:
df.sort_index(axis=1)
You can do:
df = df[sorted(df.columns.tolist())].copy()
df = df[['col1', 'col2', 'col3', 'col4']]
i want to replace all rows that have "A" in name column
with single row from another df
i got this
data={"col1":[2,3,4,5,7],
"col2":[4,2,4,6,4],
"col3":[7,6,9,11,2],
"col4":[14,11,22,8,5],
"name":["A","A","V","A","B"],
"n_roll":[8,2,1,3,9]}
df=pd.DataFrame.from_dict(data)
df
that is my single row (the another df)
data2={"col1":[0]
,"col2":[1]
,"col3":[5]
,"col4":[6]
}
df2=pd.DataFrame.from_dict(data2)
df2
that how i want it to look like
data={"col1":[0,0,4,0,7],
"col2":[1,1,4,1,4],
"col3":[5,5,9,5,2],
"col4":[6,6,22,6,5],
"name":["A","A","V","A","B"],
"n_roll":[8,2,1,3,9]}
df=pd.DataFrame.from_dict(data)
df
i try do this df.loc[df["name"]=="A"][df2.columns]=df2
but it did not work
We can try mask + combine_first
df = df.mask(df['name'].eq('A'), df2.loc[0], axis=1).combine_first(df)
df
col1 col2 col3 col4 name n_roll
0 0 1 5 6 A 8.0
1 0 1 5 6 A 2.0
2 4 4 9 22 V 1.0
3 0 1 5 6 A 3.0
4 7 4 2 5 B 9.0
df.loc[df["name"]=="A"][df2.columns]=df2 is index-chaining and is not expected to work. For details, see the doc.
You can also use boolean indexing like this:
df.loc[df['name']=='A', df2.columns] = df2.values
Output:
col1 col2 col3 col4 name n_roll
0 0 1 5 6 A 8
1 0 1 5 6 A 2
2 4 4 9 22 V 1
3 0 1 5 6 A 3
4 7 4 2 5 B 9
This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 3 years ago.
I'm trying to set the values in a column of a df to either 0 or 1 based on the comparison of two other columns.
My initial df would be this:
col1 col2 col3
1 3 NaN
5 1 NaN
7 4 NaN
5 10 NaN
I'm using this code to try to set values in col3 to 1 where col1 is greater than col2 all other rows would be 0:
df[df['col1'] >= df['col2']]['col3'] = 1
df[df['col1'] < df['col2']]['col3'] = 0
This is for sure the wrong way to do it, but I'm not sure how else to approach it.
The desired result is a df with the following values:
col1 col2 col3
1 3 0
5 1 1
7 4 1
5 10 0
Thanks in advance.
You could use:
df['col3'] = df.col1.ge(df.col2).view('i1')
print(df)
col1 col2 col3
0 1 3 0
1 5 1 1
2 7 4 1
3 5 10 0
Let's say I have a dataframe (I'll just use a simple example) that looks like this:
import pandas as pd
df = {'Col1':[3,4,2,6,5,7,3,4,9,7,1,3],
'Col2':['B','B','B','B','A','A','A','A','C','C','C','C',],
'Col3':[1,1,2,2,1,1,2,2,1,1,2,2]}
df = pd.DataFrame(df)
Which gives a dataframe like so:
Col1 Col2 Col3
0 3 B 1
1 4 B 1
2 2 B 2
3 6 B 2
4 5 A 1
5 7 A 1
6 3 A 2
7 4 A 2
8 9 C 1
9 7 C 1
10 1 C 2
11 3 C 2
What I want to do is several steps:
1) For each unique value in Col2, and for each unique value in Col3, average Col1. So a desired output would be:
Avg Col2 Col3
1 3.5 B 1
2 4 B 2
3 6 A 1
4 3.5 A 2
5 8 C 1
6 2 C 2
2) Now, for each unique value in Col3, I want the highest average and the corresponding value in Col2. So
Best Avg Col2 Col3
1 8 C 1
2 4 B 2
My attempt has been using df.groupby(['Col3','Col2'], as_index = False).agg({'Col1':'mean'}).groupby(['Col3']).agg({'Col1':'max'})
This gives me the highest average for each Col3 value, but not the corresponding Col2 label. Thank you for any help you can give!
After you first groupby do sort_values + drop_duplicates
g1=df.groupby(['Col3','Col2'], as_index = False).agg({'Col1':'mean'})
g1.sort_values('Col1').drop_duplicates('Col3',keep='last')
Out[569]:
Col3 Col2 Col1
4 2 B 4.0
2 1 C 8.0
Or in case you have duplicate max value of mean
g1[g1.Col1==g1.groupby('Col3').Col1.transform('max')]
Do the following (I modified your code slightly,
to make it a bit shorter):
df2 = df.groupby(['Col3','Col2'], as_index = False).mean()
When you print the result, for your input, you will get:
Col3 Col2 Col1
0 1 A 6.0
1 1 B 3.5
2 1 C 8.0
3 2 A 3.5
4 2 B 4.0
5 2 C 2.0
Then run:
res = df2.iloc[df2.groupby('Col3').Col1.idxmax()]
When you print the result, you will get:
Col3 Col2 Col1
2 1 C 8.0
4 2 B 4.0
As you can see:
idxmax gives the index of the row with "maximal" element (for each
group),
this result you can use as the argument of iloc.
I need help with formatting my tables. This is a simpler version and I will explain it with an example. If I have a table as follows:
Col1 Col2
A 8
B 2
C 3
A 4
B 5
C 6
A 7
B 1
C 9
I want it to be arranged where highest value of col2 comes first. In this case it is 9 from account C. Therefore all account C values follow, arranged in Col2 order. Next, highest value is shown by account A, so all account A values follow, again arranged in Col2 values order.
The final table should look something like this:
Col1 Col2
C 9
C 6
C 3
A 8
A 7
A 4
B 5
B 2
B 1
What would be the best way to do this. any ideas?
You may need create a help key for sort_values by groupby transform
df['helperkey']=df.groupby('Col1').Col2.transform('max')
df.sort_values(['helperkey','Col2'],ascending=[False,False]).drop('helperkey',1)
Out[102]:
Col1 Col2
8 C 9
5 C 6
2 C 3
0 A 8
6 A 7
3 A 4
4 B 5
1 B 2
7 B 1
There may be a better way, but you could figure out the order, set column Col1 to be an ordered categorical, and sort by Col1 and Col2, in ascending and descending order respectively:
order = df.groupby('Col1').max().sort_values('Col2', ascending=False).index
df['Col1'] = pd.Categorical(df['Col1'], categories=order, ordered=True)
df.sort_values(['Col1', 'Col2'], ascending=[True,False])
Col1 Col2
8 C 9
5 C 6
2 C 3
0 A 8
6 A 7
3 A 4
4 B 5
1 B 2
7 B 1