Averaging data of dataframe columns based on redundancy of another column - python

I want to average the data of one column in a pandas dataframe is they share the same 'id' which is stored in another column in the same dataframe. To make it simple i have:
and i want:
Were is clear that 'nx' and 'ny' columns' elements have been averaged if for them the value of 'nodes' was the same. The column 'maille' on the other hand has to remain untouched.
I'm trying with groupby but couldn't manage till now to keep the column 'maille' as it is.
Any idea?

Use GroupBy.transform with specify columns names in list for aggregates and assign back:
cols = ['nx','ny']
df[cols] = df.groupby('nodes')[cols].transform('mean')
print (df)
Another idea with DataFrame.update:
df.update(df.groupby('nodes')[cols].transform('mean'))
print (df)

Related

how to get column names in pandas of getdummies

After i created a data frame and make the function get_dummies on my dataframe:
df_final=pd.get_dummies(df,columns=['type'])
I got the new columns that I want and everything is working.
My question is, how can I get the new columns names of the get dummies? my dataframe is dynamic so I can't call is staticly, I wish to save all the new columns names on List.
An option would be:
df_dummy = pd.get_dummies(df, columns=target_cols)
df_dummy.columns.difference(df.columns).tolist()
where df is your original dataframe, df_dummy the output from pd.get_dummies, and target_cols your list of columns to get the dummies.

How to get rows from one dataframe based on another dataframe

I just edited the question as maybe I didn't make myself clear.
I have two dataframes (MR and DT)
The column 'A' in dataframe DT is a subset of the column 'A' in dataframe MR, they both are just similar (not equal) in this ID column, the rest of the columns are different as well as the number of rows.
How can I get the rows from dataframe MR['ID'] that are equal to the dataframe DT['ID']? Knowing that values in 'ID' can appear several times in the same column.
The DT is 1538 rows and MR is 2060 rows).
I tried some lines proposed here >https://stackoverflow.com/questions/28901683/pandas-get-rows-which-are-not-in-other-dataframe but I got bizarre results as I don't fully understand the methods they proposed (and the goal is little different)
Thanks!
Take a look at pandas.Series.isin() method. In your case you'd want to use something like:
matching_id = MR.ID.isin(DT.ID) # This returns a boolean Series of whether values match or not
# Now filter your dataframe to keep only matching rows
new_df = MR.loc[matching_id, :]
Or if you want to just get a new dataframe of combined records for the same ID you need to use merge():
new_df = pd.merge(MR, DT, on='ID')
This will create a new dataframe with columns from both original dfs but only where ID is the same.

How to create a dataframe with the column included in groupby clause?

I have a data frame. It has 3 columns A, Amount. I have done a group by using 'A'. Now I want to insert these values into a new data frame how can I achieve this?
top_plt=pd.DataFrame(top_plt.groupby('A')['Amount'].sum())
The resulting dataframe contains only the Amount column but the groupby 'A' column is missing.
Example:
Result:
DataFrame constructor is not necessary, better is add as_index=False to groupby:
top_plt= top_plt.groupby('A', as_index=False)['Amount'].sum()
Or add DataFrame.reset_index:
top_plt= top_plt.groupby('A')['Amount'].sum().reset_index()

fill blank values of one dataframe with the values of another dataframe based on conditions-pandas

I have above dataframe df,and I have following dataframe as df2
I want to fill missing values in df with values in df2 corresponding to the id.
Also for Player1,Player2,Player3.If the value is missing.I want to replace Player1,Player2,Player3 of df with the corresponding values of df2.
Thus the resultant dataframe would look like this
Notice.Rick,Scott,Andrew are still forward as they are in df.I just replaced players in df with the corresponding players in df2.
So far,I have attempted to fill the blank values in df with the values in df2.
df=pd.read_csv('file1.csv')
for s in list(df.columns[1:]):
df[s]=df[s].str.strip()
df.fillna('',inplace=True)
df.replace(r'',np.nan,regex=True,inplace=True)
df2=pd.read_csv('file2.csv')
for s in list(df2.columns[1:]):
df2[s]=df2[s].str.strip()
df.set_index('Team ID',inplace=True)
df2.set_index('Team ID',inplace=True)
df.fillna(df2,inplace=True)
df.reset_index(inplace=True)
I am getting above result.How can I get result in Image Number 3?
Using combine_first
df1=df1.set_index('Team ID')
df2=df2.set_index('Team ID')
df2=df2.combine_first(df1.replace('',np.nan)).reset_index()

Get unique values of multiple columns as a new dataframe in pandas

Having pandas data frame df with at least columns C1,C2,C3 how would you get all the unique C1,C2,C3 values as a new DataFrame?
in other words, similiar to :
SELECT C1,C2,C3
FROM T
GROUP BY C1,C2,C3
Tried that
print df.groupby(by=['C1','C2','C3'])
but im getting
<pandas.core.groupby.DataFrameGroupBy object at 0x000000000769A9E8>
I believe you need drop_duplicates if want all unique triples:
df = df.drop_duplicates(subset=['C1','C2','C3'])
If want use groupby add first:
df = df.groupby(by=['C1','C2','C3'], as_index=False).first()

Categories