Summing a dataframe and keeping row labels - python

I am just wondering if it's possible to sum a dataframe showing a total value at the end of each column while keeping the label string description in the zero column (like you would in Excel)?
I am using Python 2.7

Summing a column is as easy as Dataframe_Name['COLUMN_NAME'].sum() you can review it In the Documentation Here
You can also do Dataframe_Name.sum() and it will return the sums for each column

Related

DataFrame Pandas condition over a column

Dear fellows I´ve difficulties by performing a condition over a column in my DataFrame, i want to iterate over the column and extract only the values that starts with the number 6, the values from that column are floats.
The columns is called "Vendor".
This is my Dataframe, and I want to sum the values from the column "Amount in loc.curr.2" only for the values from column "Vendor" starts with 6.
This is what I´ve been traying
Also this
idx = df_spend['Vendor'].apply(lambda x: str(x).startswith('6'))
This should create a Boolean pandas.Series that you can use as an index.
summed_col=df_spend.loc[idx,"Amount in loc.curr.2"].apply(sum)
summed_col contains the sum of the column
Definitely take a look at the pandas documentation for the apply function: http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
Hope this works! :)

How to reshape dataframe with pandas?

I have a data frame that contains product sales for each day starting from 2018 to 2021 year. Dataframe contains four columns (Date, Place, Product Category and Sales). From the first two columns (Date, Place) I want to use the available data to fill in the gaps. Once the data is added, I would like to delete rows that do not have data in ProductCategory. I would like to do in python pandas.
The sample of my data set looked like this:
I would like the dataframe to look like this:
Use fillna with method 'ffill' that propagates last valid observation forward to next valid backfill. Then drop the rows that contain NAs.
df['Date'].fillna(method='ffill',inplace=True)
df['Place'].fillna(method='ffill',inplace=True)
df.dropna(inplace=True)
You are going to use the forward-filling method to replace null values with the value of the nearest one above it df['Date', 'Place'] = df['Date', 'Place'].fillna(method='ffill'). Next, to drop rows with missing values df.dropna(subset='ProductCategory', inplace=True). Congrats, now you have your desired df 😄
Documentation: Pandas fillna function, Pandas dropna function
compute the frequency of catagories in the column by plotting,
from plot you can see bars reperesenting the most repeated values
df['column'].value_counts().plot.bar()
and get the most frequent value using index, index[0] gives most repeated and
index[1] gives 2nd most repeated and you can choose as per your requirement.
most_frequent_attribute = df['column'].value_counts().index[0]
then fill missing values by above method
df['column'].fillna(df['column'].most_freqent_attribute,inplace=True)
to fill multiple columns with same method just define this as funtion, like this
def impute_nan(df,column):
most_frequent_category=df[column].mode()[0]
df[column].fillna(most_frequent_category,inplace=True)
for feature in ['column1','column2']:
impute_nan(df,feature)

Pandas conditional formula based on comparison of two cells

When calculating a new column called "duration_minutes", some of the results are negative because the values were put in the original columns backwards.
time.started_at=pd.to_datetime(time.started_at)
time.ended_at=pd.to_datetime(time.ended_at)
time["duration_minutes"]=(time.ended_at-time.started_at).dt.total_seconds()/60
time.head()
A quick check for negatives time[time.duration_minutes<0] in the "duration_minutes" column shows many rows with negative values because the start and stop times are in the wrong columns.
Is there a way to create and calculate the "duration_minutes" column to deal with this situation?

Pandas: Find string in a column and replace them with numbers with incrementing values

I am working on a dataframe with where I have multiple columns and in one of the columns where there are many rows approx more than 1000 rows which contains the string values. Kindly check the below table for more details:
In the above image I want to change the string values in the column Group_Number to number by picking the values from the first column (MasterGroup) and increment by one (01) and want values to be like below:
Also need to verify that if the String is duplicating then instead of giving a new number it replaces with already changed number. For example in the above image ANAYSIM is duplicating and instead of giving a new sequence number I want already given number to repeating string.
Have checked different links but they are focusing on giving values from user:
Pandas DataFrame: replace all values in a column, based on condition
Change one value based on another value in pandas
Conditional Replace Pandas
Any help with achieving the desired outcome is highly appreciated.
We could do cumcount with groupby
s=(df.groupby('MasterGroup').cumcount()+1).mul(10).astype(str)
t=pd.to_datetime(df.Group_number, errors='coerce')
Then we assign
df.loc[t.isnull(), 'Group_number']=df.MasterGroup.astype(str)+s

Python Pivot - Adding calculation row

I am exploring if it possible to create a calculation or total row which uses the column value based on matching a specified index value. I am quite new to Python so I am not sure if it is possible using pivots. See pivot I want to replicate below.
As you can see in the image above, I want the Ordered Row to be the calculation row. This will minus the Not Ordered Row value of each column from the Grand Total.
Is it possible in Python to Search the index, specifying criteria (E.g "Not Ordered") and loop through the columns to calculate the "Ordered Row"?

Categories