I have a dataframe df which looks like this:
And output columns is to be calculated
ID input OUTPUT
1 A,B 1
1 B,C,D 2
1 C 1
2 E,f 1
2 A,B,C 3
3 E 0
Can anyone please help me how to calculate the changes in the output column based on the input value and whenever the ID changes the output will be zero even if the previous element of the list is different
In row 1 output is 1 because there is only one change from A to B
in 2 row there are two changes from B to C and C to D.
in row 2 it will be one because previous element of the last list was D.
When ID changes we will not compare with the previous row, so E to F only 1 change
Here's one approach:
# get the last element from the previous row
prev_row = df.groupby('ID').input.shift().str.split(',').str[-1]
# concatenate with the next element, construct a set and
# count the amount of elements
df['OUTPUT'] = (prev_row.str.cat(df.input, sep=',')
.fillna(df.input)
.str.split(',')
.map(set)
.str.len()
.sub(1))
print(df)
ID input OUTPUT
0 1 A,B 1
1 1 B,C,D 2
2 1 C 1
3 2 E,f 1
4 2 A,B,C 3
5 3 E 0
I append a new row to the test data for testing, see the output:
df["idchng"]= df.ID.diff().ge(1)
df["lastch"]= df.input.str.rpartition(",")[2].shift()
print(df,"\n")
df["inp2"]= np.where(df.idchng, df.input, df.lastch.str.cat(df.input,sep=","))
df.inp2.iat[0]= df.input.iat[0]
def diffstr(s):
ser= pd.Series(s.split(","))
return ser.ne(ser.shift()).sum()-1
df["RSLT"]= df.inp2.map(diffstr)
df= df.drop(columns=["inp2","lastch","idchng"])
print(df,"\n")
Outputs:
# test data:
ID input OUTPUT
0 1 A,B 1
1 1 B,C,D 2
2 1 C 1
3 2 E,f 1
4 2 A,B,C 3
5 3 E 0
6 4 A,A,B,A,C,D,A,E 6
ID input OUTPUT idchng lastch
0 1 A,B 1 False NaN
1 1 B,C,D 2 False B
2 1 C 1 False D
3 2 E,f 1 True C
4 2 A,B,C 3 False f
5 3 E 0 True C
6 4 A,A,B,A,C,D,A,E 6 True E
ID input OUTPUT RSLT
0 1 A,B 1 1
1 1 B,C,D 2 2
2 1 C 1 1
3 2 E,f 1 1
4 2 A,B,C 3 3
5 3 E 0 0
6 4 A,A,B,A,C,D,A,E 6 6
Related
I have two different dataframe in pandas.
First
A
B
C
D
VALUE
1
2
3
5
0
1
5
3
2
0
2
5
3
2
0
Second
A
B
C
D
Value
5
3
3
2
1
1
5
4
3
1
I want column values A and B in the first dataframe to be searched in the second dataframe. If A and B values match then update the Value column.Search only 2 columns in other dataframe and update only 1 column. Actually the process we know in sql.
Result
A
B
C
D
VALUE
1
2
3
5
0
1
5
3
2
1
2
5
3
2
0
If you focus on the bold text, you can understand it more easily.Despite my attempts, I could not succeed. I only want 1 column to change but it also changes A and B. I only want the Value column of matches to change.
You can use a merge:
cols = ['A', 'B']
df1['VALUE'] = (df2.merge(df1[cols], on=cols, how='right')
['Value'].fillna(df1['VALUE'], downcast='infer')
)
output:
A B C D VALUE
0 1 2 3 5 0
1 1 5 3 2 1
2 2 5 3 2 0
I am trying to create a "two-entry table" from many columns in my df. I tried with pivot_table / crosstrab / groupby but results appeareance using this functions is not acomplish since will not be a "two entry table"
for example if i have a dataframe like this :
df
A B C D E
1 0 0 1 1
0 1 0 1 0
1 1 1 1 1
I will like to transform my df to a df which could be seen like a "two-entry table"
A B C D E
A 2 1 1 2 2
B 1 2 1 2 1
C 1 1 1 1 1
D 2 2 1 3 1
E 2 1 1 1 2
so if i should explain first row, would be as A has two 1 in his column, then A-A = 2, A-B = 1 because they shared one's in the third row level in df, A-C = 1 because the third row in df they shared one's in the same row level and finaly A-E = 2 because they shared one's in the first row and the third row of df
Use pd.DataFrame.dot with T:
df.T.dot(df) # or df.T#df
Output:
A B C D E
A 2 1 1 2 2
B 1 2 1 2 1
C 1 1 1 1 1
D 2 2 1 3 2
E 2 1 1 2 2
I have a dataset with three columns A,B and C.
*
A B C
1 2 3
1 3 4
1 4 5
1 2 6
2 1 9
2 9 8
2 8 2
2 1 2
I need to get value of A,B,C columns corresponds to min B value grouped by A column.
As you can see I have duplicated values for A{1,2}B and A{2,1}B. If I do this command:
dataset['A,'B','C'].loc[dataset.groupby('A').B.idxmin()]
I get only first value of A,B,C for min B. But how can I get all rows? \
Output:
A B C
1 2 3
2 1 9
Output expected:
A B C
1 2 3
1 2 6
2 1 9
2 1 2
Use GroupBy.transform and compare by column B in boolean indexing:
df = dataset[dataset.groupby('A').B.transform('min').eq(dataset['B'])]
print (df)
A B C
0 1 2 3
3 1 2 6
4 2 1 9
7 2 1 2
for below dataframe,how to return header name of column which is greater than 1 to new column named "Remark"
A B C
0 1 1 2
1 2 2 2
2 1 3 1
3 4 5 2
Desired Out put as below
A B C Remark
0 1 1 2 C
1 2 2 2 A,B,C
2 1 3 1 B
3 4 5 2 A,B,C
Thanks in Advance
Use DataFrame.dot of boolean DataFrame by columns names with separator and last remove ,:
df['Remark'] = df.gt(1).dot(df.columns + ',').str[:-1]
print (df)
A B C Remark
0 1 1 2 C
1 2 2 2 A,B,C
2 1 3 1 B
3 4 5 2 A,B,C
Details:
print (df.gt(1))
A B C
0 False False True
1 True True True
2 False True False
3 True True True
print (df.gt(1).dot(df.columns + ','))
0 C,
1 A,B,C,
2 B,
3 A,B,C,
dtype: object
I want to create a new column with column name for the max value by index.
Tie would include both columns.
A B C D
TRDNumber
ALB2008081610 3 1 1 1
ALB200808167 1 3 4 1
ALB200808168 3 1 3 1
ALB200808171 2 2 5 1
ALB2008081710 1 2 2 5
Desired output
A B C D Best
TRDNumber
ALB2008081610 3 1 1 1 A
ALB200808167 1 3 4 1 C
ALB200808168 3 1 3 1 A,C
ALB200808171 2 2 5 1 C
ALB2008081710 1 2 2 5 D
I have tried the following code
df.groupby(['TRDNumber'])[cols].max()
you can do:
>>> f = lambda r: ','.join(df.columns[r])
>>> df.eq(df.max(axis=1), axis=0).apply(f, axis=1)
TRDNumber
ALB2008081610 A
ALB200808167 C
ALB200808168 A,C
ALB200808171 C
ALB2008081710 D
dtype: object
>>> df['best'] = _
>>> df
A B C D best
TRDNumber
ALB2008081610 3 1 1 1 A
ALB200808167 1 3 4 1 C
ALB200808168 3 1 3 1 A,C
ALB200808171 2 2 5 1 C
ALB2008081710 1 2 2 5 D