Sum multiple dataframe columns based on a condition - python

I have a python dataframe with 30 columns,
I would like to add new column and set it to be the sum only the columns that equal to 1 from the last 10 columns (20:30)
How can I do that ?
Thanks

Related

Pandas dataframe on python

I feel like this may be a really easy question but I can't figure it out I have a data frame that looks like this
one two three
1 2 3
2 3 3
3 4 4
The third column has duplicates if I want to keep the first row but drop the second row because there is a duplicate on row two how would I do this.
Pandas DataFrame objects have a method for this; assuming df is your dataframe, df.drop_duplicates(subset='name_of_third_column') returns the dataframe with any rows containing duplicate values in the third column removed.

how to merge duplicate rows using two columns in pandas

I am trying to merge two rows from the dataframe below but at the same time I want to replace the None and Nan fields with values from the rows that have the values.
I started with
new_df = df.groupby(['source','code'], axis =0)
but the result wasn't what I am looking for. In the dataframe below I would row 2 and row 5 to merge into a single row and filled with non empty values

Get columns with the max value accounting for ties in pandas

let's say that we have this pandas DF.
I want to know which column has the maximum value per row
The output for row 1,2 and 3 would be all the 5 columns
For row 4 would be visits_total
And for row 5 would be ['content_gene_strength', 'sport_gene_strength', 'visits_total']
Thanks
Compare all columns by DataFrame.eq by maximal value, then use DataFrame.dot for matrix multiplication with columns names with separator, last remove separator from right side by Series.str.rstrip:
df['new'] = df.eq(df.max(axis=1), axis=0).dot(df.columns + ',').str.rstrip(',')

How to add a pandas Series to a DataFrame ignoring indices?

I have a DataFrame with random, unsorted row indices, which is a result of removing some 'noise' from the original DataFrame.
row_index col1 col2
2 1 2
19 3 4
432 4 1
I would like to add some pd.Series to this Dataframe. The Series has its indices sorted from 0 to n=number of rows. The number of rows equals the number of rows in the DataFrame
Having tried multiple ways of adding the Series to my DataFrame I realized that the data from the Series gets mixed up, because (I believe) Python is matching records by their indices.
Is there a way I can add the Series to the Dataframe, ignoring the indices, so that my data doesn't get mixed up?
convert the series into a data frame.
code
df=pd.DataFrame(df)
result=pd.concat([df1,df],axis=1,ignore_index=True)
df1 is the data frame you want to add .
df is the data frame i.e series you converted to data frame
df['new_col'] = other_df['column'].values

How to multiply one column to few other multiple column in Python DataFrame

I have a Dataframe of 100 Columns and I want to multiply one column ('Count') value with the columns position ranging from 6 to 74. Please tell me how to do that.
I have been trying
df = df.ix[0, 6:74].multiply(df["Count"], axis="index")
df = df[df.columns[6:74]]*df["Count"]
None of them is working
The result Dataframe should be of 100 columns with all original columns where columns number 6 to 74 have the multiplied values in all the rows.
You can multiply the columns in place.
columns = df.columns[6:75]
df[columns] *= df['Count']

Categories