i have following data sample i am trying to flatten it out using pandas, i wanna flatten this data over Candidate_Name.
This is my implementation,
df= df.merge(df,on=('Candidate_Name'))
but i am not getting desired result. My desired output is as follows. So basically have all the rows that match Candidate_Name in a single row, where duplicate column names may suffix with _x
I think you need GroupBy.cumcount with DataFrame.unstack and then flatten MultiIndex with same values for first groups and added numbers for another levels for avoid duplicated columns names:
df = df.set_index(['Candidate_Name', df.groupby('Candidate_Name').cumcount()]).unstack()
df.columns = [a if b == 0 else f'{a}_{b}' for a, b in df.columns]
Related
I have a dataframe which I have created out of a groupby operation (see code and output below):
Index=right_asset_df.groupby(['Year','Asset_Type']).agg({'Titre':'count','psqm':'median','Superficie':'median', 'Montant':'median'}).reset_index()
The output is:
Instead of having Asset_Type as rows, I would like to have it as columns which means that the new output Index should have one column for each Asset_Type (Appart and Villa).
Here is an example of the output:
As you can see, the groupby attributes Asset_Type as one specific column each.
How can I do that in python please? Thanks
Quick way is to use Pivot:
# Pivot table with (index, columns, values)
df = Index.pivot(['Year'], 'Asset_Type',['Titre','psqm','Superficie','Montant']).reset_index()
# Instack multi level in columns
df.columns = ['_'.join(col) for col in df.columns]
I have the following pandas dataframe
I would like it to be converted to a pandas dataframe with one row. Is there a simple way to do it. I tried pivot but was getting weird results.
You can pivot, swap the level of columns names, shift values up to fill NaN values and flatten column names:
out = df.pivot(columns='Study Identification').swaplevel(0,1,axis=1).apply(lambda x: pd.Series(x.dropna().values)).fillna('')
out.columns = s.columns.map(''.join)
So in your case reshape the df with unstack
s = df.set_index('A',append=True).unstack(level=1).swaplevel(0,1,axis=1)
s.columns = s.columns.map(''.join)
What i have is a list of Dataframes.
What is important to note is that the shape of the dataframes differ between 2-7 columns, also the columns are named between 0 & len of the column (e.g. df1 has 5 columns named 0,1,2,3,4 etc. df2 has 4 columns named 0,1,2,3)
I would like is to check if a row in a column contains a certain string, then delete that column.
list_dfs1=[df1,df2,df3...df100]
What i have done so far is the below & i get an error that column 5 is not in axis (it is there for some DF)
for i, df in enumerate(list_dfs1):
for index,row in df.iterrows():
if np.where(row.str.contains("DEC")):
df.drop(index, axis=1)
Any suggestions.
You could try:
for df in list_dfs:
for col in df.columns:
# If you are unsure about column types, cast column as string:
df[col] = df[col].astype(str)
# Check if the column contains the string of interest
if df[col].str.contains("DEC").any():
df.drop(columns=[col], inplace=True)
If you know that all columns are of type string, you don't have to actually do df[col] = df[col].astype(str).
You can write a custom function that checks whether the dataframe has the pattern or not. You can use pd.Series.str.contains with pd.Series.any
def func(s):
return s.str.contains('DEC').any()
list_df = [df.loc[:, ~df.apply(func)] for df in list_dfs1]
I would take another approach. I would concatenate the list into a data frame and then eliminate the column where finding the string
import pandas as pd
df = pd.concat(list_dfs1)
Let us say your condition was to eliminate any column with "DEC"
df.mask(df == "DEC").dropna(axis=1, how="any")
My DF has the following columns:
df.columns = ['not-changing1', 'not-changing2', 'changing1', 'changing2', 'changing3', 'changing4']
I want to swap the last 4 columns WITHOUT USING COLUMNS NAMES, but using their index instead.
So, the final column order would be:
result.columns = ['not-changing1', 'not-changing2', 'changing1', 'changing3', 'changing2', 'changing4']
How do I do that?
I have a dataframe contains 4 columns, the first 3 columns are numerical variables which indicate the feature of the variable at the last column, and the last column are strings.
I want to merge the last string column by the previous 3 columns through the groupby function. Then it works(I mean the string which shares the same feature logged by the first three columns had been merged successfully)
Previously the length of the dataframe was 1200, and the length of the merged dataframe is 1100. I found the later df is multindexed. Which only contain 2 columns.(hierarchical index ) Thus I tried the reindex method by a generated ascending numerical list. Sadly I failed.
df1.columns
*[Out]Index(['time', 'column','author', 'text'], dtype='object')
series = df1.groupby(['time', 'column','author'])
['body_text'].sum()#merge the last column by the first 3 columns
dfx = series.to_frame()# get the new df
dfx.columns
*[Out]Index(['author', 'text'], dtype='object')
len(dfx)
*[Out]1100
indexs = list(range(1100))
dfx.reindex(index = indexs)
*[Out]Exception: cannot handle a non-unique multi-index!
Reindex here is not necessary, better is use DataFrame.reset_index or add parameter as_index=False to DataFrame.groupby
dfx = df1.groupby(['time', 'column','author'])['body_text'].sum().reset_index()
Or:
dfx = df1.groupby(['time', 'column','author'], as_index=False)['body_text'].sum()