This question already has an answer here:
Pandas find last non NAN value
(1 answer)
Closed 3 years ago.
If I had an asymmetric dataframe list like:
1 2 3
0 a b c
1 d e NaN
2 f NaN NaN
3 g h NaN
or a series like:
0 [a, b, c]
1 [d, e]
2 [f]
3 [g, h]
and I required the last value from each row to create another series like:
0 c
1 e
2 f
3 h
What would be the best way about this? Thank you!
You can use fillna to propagate the last good value in each row and tale the last column, in the example provided
df.fillna(method='ffill', axis=1).iloc[:,2]
0 c
1 e
2 f
3 h
Use DataFrame.stack, groupby and last:
df.stack().groupby(level=0).last()
0 c
1 e
2 f
3 h
dtype: object
Related
I have a pandas dataframe in which I want to add a column (col_new), which values depend on a comparison of values in a existing column (col_exist).
Existing column (type=objects) contains As and Bs.
New column should count, starting with 1.
If an A follows an A, the count should rise by one.
If an A follows a B, the count should rise by one.
If a B follows an A, the count should not rise.
If a B follows a B, the count should not rise.
col_exist col_new
A 1
A 2
A 3
B 3
A 4
B 4
B 4
A 5
B 5
I am completely new to programming, so thank you in advance for your adequade answer.
Use eq and cumsum:
df['col_new'] = df['col_exist'].eq('A').cumsum()
output:
col_exist col_new
0 A 1
1 A 2
2 A 3
3 B 3
4 A 4
5 B 4
6 B 4
7 A 5
8 B 5
I am fairly new to pandas and I've been trying multiple solutions for this problem using dataframe.merge and lambda logic but I haven't been able to find an solution that consistently results with what I'm looking for.
After filtering some data using
df = df.groupby(['0', '1']).size()
df = df.to_frame(name='2').reset_index()
I obtain the following table, the first two columns represent starting and ending points respectively and the third represent the number of times it repeated before the groupby:
0 1 2
a d 8
b h 7
c f 3
c e 3
d a 2
b b 2
e c 1
f c 1
g i 1
h b 1
i g 1
I need to consider both start -> end and end -> start points as the same, meaning that the following dataframe:
0 1 2
a d 8
d a 2
should end looking like this:
0 1 2
a d 10
And back to the original table, that one should end looking like this:
0 1 2
a d 10
b h 8
c f 4
c e 4
b b 2
g i 2
I'm fairly sure this should be an easy solution but for the life of me I just can't pinpoint the answer.
You can do it like this:
df1 = df[['0', '1']].apply(sorted, 1, result_type = "expand").rename(columns = {0:'col1', 1:'col2'})
result = df.groupby([df1.col1, df1.col2]).sum().reset_index()
One option is to use apply to sort the values in the columns, then do another groupby(Note that your column names may differ, my df was made using pd.read_clipboard())
df.reset_index(inplace=True)
df[['0','1']]=df[['0','1']].apply(lambda x:sorted(x),axis=1).tolist()
df
0 1 2
0 a d 8
1 b h 7
2 c f 3
3 c e 3
4 a d 2
5 b b 2
6 c e 1
7 c f 1
8 g i 1
9 b h 1
10 g i 1
df.groupby(['0','1'], as_index=False).sum()
0 1 2
0 a d 10
1 b b 2
2 b h 8
3 c e 4
4 c f 4
5 g i 2
I have the following dataframe
df
A B C D
1 2 NA 3
2 3 NA 1
3 NA 1 2
A, B, C, and D are answers to a question. Basically, respondents ranked answers from 1 to 3 which means that one line cannot have 2 values the same. I am trying to make a new column which is a summary of the top 3 something such as.
1st 2nd 3rd
A B D
D A B
C D A
This format will make it easier for me to come up with conclusions such as, here are the 3rd top answers.
I didn't find any way to do this. Could you help me, please?
Thank you very much!
One way is using argsort and indexing the columns:
pd.DataFrame(df.columns[df.values.argsort()[:,:-1]],
columns=['1st', '2nd', '2rd'])
1st 2nd 2rd
0 A B D
1 D A B
2 C D A
Another way is to use stack()/pivot():
(df.stack().astype(int)
.reset_index(name='val')
.pivot('level_0', 'val', 'level_1')
)
Output:
val 1 2 3
level_0
0 A B D
1 D A B
2 C D A
This is my table:
A B C E
0 1 1 5 4
1 1 1 1 1
2 3 3 8 2
Now, I want to group all rows by Column A and B. Column C should be summed and for column E, I want to use the value where value C is max.
I did the first part of grouping A and B and summing C. I did this with:
df = df.groupby(['A', 'B'])['C'].sum()
But at this point, I am not sure how to tell that column E should take the value where C is max.
The end result should look like this:
A B C E
0 1 1 6 4
1 3 3 8 2
Can somebody help me with this past piece?
Thanks!
Using groupby with agg after sorting by C.
In general, if you are applying different functions to different columns, DataFrameGroupBy.agg allows you to pass a dictionary specifying which operation is applied to each column:
df.sort_values('C').groupby(['A', 'B'], sort=False).agg({'C': 'sum', 'E': 'last'})
C E
A B
1 1 6 4
3 3 8 2
By sorting by column C first, and not sorting as part of groupby, we can select the last value of E per group, which will align with the maximum value of C for each group.
This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 5 years ago.
now I have this dataframe:
A B C
0 m 1 b
1 n 4 a
2 p 3 c
3 o 4 d
4 k 6 e
so,How I can get n,p,k in column。as follow:
A B C
0 n 4 a
1 p 3 c
2 k 6 e
thanks
Use .loc
df = df.loc[df.A.isin(['n','p','k']),:]