Unmerge cells when using groupby (PANDAS) - python

I grouped some data using groupby:
df1['N° of options'] = df.groupby(['Country','Restaurant']).Food.size()
And the result is a dataframe with grouped arguments merged, instead of it I'd like to repeat these values along the cells.
Any clue about how can I display data like this?
For now, I got something like this:
Thank you!!

Assuming that grouped_df is your grouped dataframe, you can use pandas.DataFrame.reset_index to fill down the rows of your two indexes.
>>> print(grouped_df)
>>> print(grouped_df.reset_index())
Another way to do it is to add as_index=False argument to your groupyby clause :
grouped_df = df.groupby(['SG_UF', 'SG_PARTIDO'], as_index=False).sum()

If I understand correctly, you are trying to sort instead of groupby as you have mentioned you want to see the values.
sort works like df_name.sort_values(by column_name, axis=0, ascending=True, inplace=False, kind=’quicksort’, na_position=’last’, ignore_index=False, key=None)
In your code, it could look like:
df.sort_values(by = ['Country','Restaurant']). Use other arguments as required, like, order of sort, etc.

Related

Pandas conditionally copy values from one column to another row

I have this Dataframe:
I would like to copy the value of the Date column to the New_Date column, but not only to the same exact row, I want to every row that has the same User_ID value.
So, it will be:
I tried groupby and then copy, but groupby made all values become lists and other columns with same user_id can have different values in different rows and then it messes up many things.
I tried also:
df['New_Date'] = df.apply(lambda x: x['Date'] if x['User_ID'] == x['User_ID'] else x['New_Date'], axis=1)
But it only copied values to the same row and left the other two empty.
And this:
if (df['User_ID'] == df['User_ID']):
df['New_Date'] = np.where(df['New_Date'] == '', df['Date'], df['New_Date'])
None accomplished my intention.
Help is appreciated, Thanks!
try this:
df['New_Date'] = df.groupby('User_Id')['Date'].transform('first')
If I'm understanding you correctly, just copy the Date column and then use .fillna() with ffill=True. If you post your data as text I can provide example code.

unable to concat the output for multiple rows

I have a dataframe which is like below
If i write a code like below
df.iloc[0]
And if i write code like below
df.iloc[3]
I want to concat all the df.iloc[0], df.iloc1, df.iloc2 untill whatever the max rows are are present. But with the help of for loop i'm unable to. Can anyone help me with this?
Use concat with comprehension:
df1 = pd.concat((df.loc[i] for i in df.index))
Or:
df1 = pd.concat((df.iloc[i] for i in range(len(df.index))))

Vlookup/Map value from one dataframe to another dataframe in Python

I want to do something similar to the vlookup in the python.
Here is a dataframe I want to lookup for value 'Flow_Rate_Lupa'
And here is the dataframe I want to fill the data by looking at the same month+day to fill the missing value. Is there any one to help me to solve how to do this QAQ
I usually merge the two data frames and define an indicator, then filter out the values where the indicator says both meaning data is in both data frames.
import pandas as pd
mergedData= pd.merge(df1,df2, how='left' ,left_on='Key1', right_on='Key2', indicator ='Exists')
filteredData = mergedData[mergedData[Exists]='both']
Use DataFrame.merge with left join and the nreplace missing values by Series.fillna with DataFrame.pop for use and remove column:
df = df2.merge(df1, on=['month','day'], how='left', suffixes=('','_'))
df['Flow_Rate_Lupa'] = df['Flow_Rate_Lupa'].fillna(df.pop('Flow_Rate_Lupa_'))

Filling a dataframe with multiple dataframe values

I have some 100 dataframes that need to be filled in another big dataframe. Presenting the question with two dataframes
import pandas as pd
df1 = pd.DataFrame([1,1,1,1,1], columns=["A"])
df2 = pd.DataFrame([2,2,2,2,2], columns=["A"])
Please note that both the dataframes have same column names.
I have a master dataframe that has repetitive index values as follows:-
master_df=pd.DataFrame(index=df1.index)
master_df= pd.concat([master_df]*2)
Expected Output:-
master_df['A']=[1,1,1,1,1,2,2,2,2,2]
I am using for loop to replace every n rows of master_df with df1,df2... df100.
Please suggest a better way of doing it.
In fact df1,df2...df100 are output of a function where the input is column A values (1,2). I was wondering if there is something like
another_df=master_df['A'].apply(lambda x: function(x))
Thanks in advance.
If you want to concatenate the dataframes you could just use pandas concat with a list as the code below shows.
First you can add df1 and df2 to a list:
df_list = [df1, df2]
Then you can concat the dfs:
master_df = pd.concat(df_list)
I used the default value of 0 for 'axis' in the concat function (which is what I think you are looking for), but if you want to concatenate the different dfs side by side you can just set axis=1.

Unable rename column series

I am unable to rename the column of a series:
tabla_paso4
Date decay
2015-06-29    0.003559
2015-09-18    0.025024
2015-08-24    0.037058
2014-11-20    0.037088
2014-10-02    0.037098
Name: decay, dtype: float64
I have tried:
tabla_paso4.rename('decay_acumul')
tabla_paso4.rename(columns={'decay':'decay_acumul'}
I already had a look at the possible duplicate, however don't know why although applying :
tabla_paso4.rename(columns={'decay':'decay_acumul'},inplace=True)
returns the series like this:
Date
2015-06-29    0.003559
2015-09-18    0.025024
2015-08-24    0.037058
2014-11-20    0.037088
2014-10-02    0.037098
dtype: float64
It looks like your tabla_paso4 - is a Series, not a DataFrame.
You can make a DataFrame with named column out of it:
new_df = tabla_paso4.to_frame(name='decay_acumul')
Try
tabla_paso4.columns = ['Date', 'decay_acumul']
or
tabla_paso4.rename(columns={'decay':'decay_acumul'}, inplace=True)
What you were doing wrong earlier, is you missed the inplace=True part and therefore the renamed df was returned but not assigned.
I hope this helps!

Categories