I have this pivot table created
pvt_tbl = pd.pivot_table(df, index = 'ID', columns='Month', values = ['money', 'days active'], aggfunc='sum')
I want to obtain a calulated field, avg_places_by_day
I wish I don't have the need to use .to_records
I was hoping somenthing like this
pvt_tbl['avg_places_by_day'] = pvt_tbl['places visited'] / pvt_tbl[days active']
to get something like this
Related
A more efficient way to do this?
I have a sales records imported from a spreadsheet. I start by importing that list to a dataframe. I then need to get the average orders per customer by month and year.
The spreadsheet does not contain counts, just order and customer ID.
So I have to count each ID then get drop duplicates and then reset index.
Final dataframe is exported back into a spreadsheet and SQL database.
The code below works, and I get the desired output, but it seems it should be more efficient? I am new to pandas and Python so I'm sure I could do this better.
df_customers = df.filter(
['Month', 'Year', 'Order_Date', 'Customer_ID', 'Customer_Name', 'Patient_ID', 'Order_ID'], axis=1)
df_order_count = df.filter(
['Month', 'Year'], axis=1)
df_order_count['Order_cnt'] = df_customers.groupby(['Month', 'Year'])['Order_ID'].transform('nunique')
df_order_count['Customer_cnt'] = df_customers.groupby(['Month', 'Year'])['Customer_ID'].transform('nunique')
df_order_count['Avg'] = (df_order_count['Order_cnt'] / df_order_count['Customer_cnt']).astype(float).round(decimals=2)
df_order_count = df_order_count.drop_duplicates().reset_index(drop=True)
Try this
g = df.groupby(['Month', 'Year'])
df_order_count['Avg'] = g['Order_ID'].transform('nunique')/g['Customer_ID'].transform('nunique')
How can we apply multiple filter on piviot_table?
I have a dataframe as df
I'm able to apply one filter on it by
pivot = df.query('Brand == ["HTC", "APPLE"]'
).pivot_table(index = ['Outlet'],
columns = ['Material'],
values = ['Sales'],
aggfunc = np.sum, fill_value=0, margins=True)
The above code is working correct with one filter Brand, but how do apply one or more filter to this? In the below code if you see I have tried to add but I was getting an error which didn't work. Maybe I'm not giving the second filter correctly.
pivot = df.query('Brand == ["HTC", "APPLE"]', 'City == ["Delhi", "Mumbai"]'
).pivot_table(index = ['Outlet'],
columns = ['Material'],
values = ['Sales'],
aggfunc = np.sum, fill_value=0, margins=True)
Does anyone know how do I do this?
You may need a logical operator in the condition, e.g:
pivot = df.query('Brand == ["HTC", "APPLE"]' and 'City == ["Delhi", "Mumbai"]'
).pivot_table(index = ['Outlet'],
columns = ['Material'],
values = ['Sales'],
aggfunc = np.sum, fill_value=0, margins=True)
I created a pandas DataFrame that holds various summary statistics for several variables in my dataset. I want to name the columns of the dataframe, but every time I try it deletes all my data. Here is what it looks like without column names:
MIN = df.min(axis=0, numeric_only=True)
MAX = df.max(axis=0, numeric_only=True)
RANGE = MAX-MIN
MEAN = df.mean(axis=0, numeric_only=True)
MED = df.median(axis=0, numeric_only=True)
sum_stats = pd.concat([MIN, MAX, RANGE, MEAN, MED], axis=1)
sum_stats = pd.DataFrame(data=sum_stats)
sum_stats
My output looks like this:
But for some reason when I add column names:
sum_stats = pd.concat([MIN, MAX, RANGE, MEAN, MED], axis=1)
columns = ['MIN', 'MAX', 'RANGE', 'MEAN', 'MED']
sum_stats = pd.DataFrame(data=sum_stats, columns=columns)
sum_stats
My output becomes this:
Any idea why this is happening?
From the documentation for the columns parameter of the pd.DataFrame constructor:
[...] If data contains column labels, will perform column selection instead.
That means that, if the data passed is already a dataframe, for example, the columns parameter will act as a list of columns to select from the data.
If you change columns to equal a list of some columns that already exist in the dataframe that you're using, e.g. columns=[1, 4], you'll see that the resulting dataframe only contains those two columns, copied from the original dataframe.
Instead, you can assign the columns after you create the dataframe:
sum_stats.columns = ['MIN', 'MAX', 'RANGE', 'MEAN', 'MED']
I have a data frame with many columns. I want to turn the values in the category type column ('Series Name)' into columns without losing the other columns.
Below you can see what I did:
I have this
and I use this code:
education_level.pivot(index=education_level.index, columns='Series Name')['Value']
And the result is this:
So I lose the columns 'Country Name', 'Country Code', and 'Year'. And I don't want that. I hope somebody can help me with this issue.
I want to get the following final result:
Country name - Country Code - Year - Category 1 - Category 2 - ...
Meaning, I want to get the data for a country for a single year on one row.
If all colmbinations of values in cols list are unique, use set_index with unstack:
cols = ['Country name','Country Code','Year','Series Name']
df = education_level.set_index(cols)['Value'].unstack()
If not, use pivot_table with aggregate function - e.g. mean:
df1 = df.pivot_table(index=['Country name','Country Code','Year'],
columns='Series Name',
values='Value',
aggfunc='mean')
First, a pivot_table has to be made:
education_level= education_level.pivot_table(index=['Country Name','Country Code','Year'],
columns='Series Name',
values='Value',
aggfunc='mean')
Then, since I don't want multi indexing, I create categories and then reset the indexes:
education_level.columns = education_level.columns.add_categories(['Country Name','Country Code','Year'])
education_level.columns = pd.Index(list(education_level.columns))
education_level.reset_index(level=education_level.index.names, inplace = True)
this is my data format, I want to reset the index and wanna make it in one table format, so I can take the count of all the id's which is 2nd row and can plot them with the histogram by date and the count,
any simple idea?
if reset_index() is not working, you can convert the table manually also.
Assume df1 is your existing data frame, we'll create df2 (new one) that you want.
df2 = pd.DataFrame()
df2['DateTime'] = df1.index.get_level_values(0).tolist()
df2['ID1'] = df1.index.get_level_values(1).tolist()
df2['ID2'] = df1['ID2'].values.tolist()
df2['Count'] = df1['Count'].values.tolist()