Get columns with the max value accounting for ties in pandas - python

let's say that we have this pandas DF.
I want to know which column has the maximum value per row
The output for row 1,2 and 3 would be all the 5 columns
For row 4 would be visits_total
And for row 5 would be ['content_gene_strength', 'sport_gene_strength', 'visits_total']
Thanks

Compare all columns by DataFrame.eq by maximal value, then use DataFrame.dot for matrix multiplication with columns names with separator, last remove separator from right side by Series.str.rstrip:
df['new'] = df.eq(df.max(axis=1), axis=0).dot(df.columns + ',').str.rstrip(',')

Related

how to merge duplicate rows using two columns in pandas

I am trying to merge two rows from the dataframe below but at the same time I want to replace the None and Nan fields with values from the rows that have the values.
I started with
new_df = df.groupby(['source','code'], axis =0)
but the result wasn't what I am looking for. In the dataframe below I would row 2 and row 5 to merge into a single row and filled with non empty values

Delete Pandas dataframe row, where the sum of all columns equals to 0

Is there a way I can drop all rows in a pandas dataframe, where the sum of all columns is equal to 0?
You can change logic - select all column with sum not equal 0 with boolean indexing:
df = df[df.sum(axis=1).ne(0)]

Drop columns according to some criteria

For example, I have a dataframe called dat, then I want to apply a function on each column of the dataframe, if the return value is Ture, then keep this column and turn to next column, if the return value is False, then drop this column and turn to next column.
I know I can write a for loop to do this, but is there a efficient way to do this?
You could do it like this using boolean index on df.columns:
I want to drop all columns where the 'sum' for simplicity is greater than 50
df = pd.DataFrame({'A':[2,4,6,8],'B':[101,102,102,102]})
r = df.apply(np.sum) # applies the sum function to all columns
c = r <= 50 #create boolean test for columns
df[c[c].index] #Use boolea indexing to get columns and column filter for dataframe
Output:
A
0 2
1 4
2 6
3 8
Updating an old answer:
df.loc[:, df.sum() <= 50]

How to multiply one column to few other multiple column in Python DataFrame

I have a Dataframe of 100 Columns and I want to multiply one column ('Count') value with the columns position ranging from 6 to 74. Please tell me how to do that.
I have been trying
df = df.ix[0, 6:74].multiply(df["Count"], axis="index")
df = df[df.columns[6:74]]*df["Count"]
None of them is working
The result Dataframe should be of 100 columns with all original columns where columns number 6 to 74 have the multiplied values in all the rows.
You can multiply the columns in place.
columns = df.columns[6:75]
df[columns] *= df['Count']

Reindexing dataframes and joining columns

Given two DataFrames A and B which are the same length (number of rows) but have different integer indices. How do I add the columns of A to the columns of B but ignore the indices? (i.e. row 1 of A goes with row 1 of B regardless of the index value.)
If the index of A is non-consecutive integer index, how do I reindex A to be 1...n using consecutive integers? The index of be is a 1...n consecutive integer index.
Is it best practice to reindex A and then add columns from B to it?
You can combine the columns of two DataFrames using concat:
pd.concat([A, B], axis=1)
To make the index consecutive integers you can use reset_index:
A.reset_index(inplace=True)
Or, alternatively you can match the index of B to that of A using:
B.index = A.index
What the "best" choice is here I think depends on the context/the meaning of the index.

Categories