How to reformat dataframe using pandas? - python

I have the following dataframe:
data = {'Names':['Abbey','English','Maths','Billy','English','Maths','Charlie','English','Maths'],'Subject Grade':['Student Name',85,91,'Student Name',82,74,'Student Name',83,96]}
df = pd.DataFrame(data, columns = ['Names','Subject Grade'])
I would like to reformat the dataframe in order for the names, subject and grades to all be in their respective columns as follows:
data2 = {'Names':['Abbey','Abbey','Billy','Billy','Charlie','Charlie'],'Subject':['English','Maths','English','Maths','English','Maths'],'Grade':[85,91,82,74,83,96]}
df2 = pd.DataFrame(data2, columns = ['Names','Subject','Grade'])

Hi you can use those instructions :
df['name'] = df['Names'].mask(df['Subject Grade'] != "Student Name")
df['name'] = df['name'].fillna(method='ffill')
df = df.query('`Subject Grade`!="Student Name"')
df = df.rename(columns={'Names':'Subject', 'Subject Grade':'Grade', 'name':'Names'})

Related

How Can I Convert my DataFrame to other column headers?

I have the dataframe of following type - notice the column header groups:
But I Want to convert my headers as below:
how can I do this?
i hope these codes work for you :)
unstack = your_df.unstack().reset_index()
unstack.columns = ["Stock Info", "Bank Ticker", "Date", "Prices"]
categories = unstack["Stock Info"].unique()
group = unstack.groupby("Bank Ticker")
con = pd.DataFrame()
for g,d in group:
d["Stock Info"] = pd.Categorical(d["Stock Info"], categories=categories, ordered=True)
d = d.pivot(index=["Date","Stock Info"], columns="Bank Ticker", values="Prices").unstack()
con = pd.concat([con, d], axis=1, sort=False)
print(con)

How to aggregate a dataframe then transpose it with Pandas

I'm trying to achieve this kind of transformation with Pandas.
I made this code but unfortunately it doesn't give the result I'm searching for.
CODE :
import pandas as pd
df = pd.read_csv('file.csv', delimiter=';')
df = df.count().reset_index().T.reset_index()
df.columns = df.iloc[0]
df = df[1:]
df
RESULT :
Do you have any proposition ? Any help will be appreciated.
First create columns for test nonOK and then use named aggregatoin for count, sum column Values and for count Trues values use sum again, last sum both columns:
df = (df.assign(NumberOfTest1 = df['Test one'].eq('nonOK'),
NumberOfTest2 = df['Test two'].eq('nonOK'))
.groupby('Category', as_index=False)
.agg(NumberOfID = ('ID','size'),
Values = ('Values','sum'),
NumberOfTest1 = ('NumberOfTest1','sum'),
NumberOfTest2 = ('NumberOfTest2','sum'))
.assign(TotalTest = lambda x: x['NumberOfTest1'] + x['NumberOfTest2']))

Pandas reorder raw content

I do have the following Excel-File
Which I've converted it to DataFrame and dropped 2 columns using below code:
df = pd.read_excel(self.file)
df.drop(['Name', 'Scopus ID'], axis=1, inplace=True)
Now, My target is to switch all names orders within the df.
For example,
the first name is Adedokun, Babatunde Olubayo
which i would like to convert it to Babatunde Olubayo Adedokun
how to do that for the entire df whatever name is it?
Split the name and reconcat them.
import pandas as pd
data = {'Name': ['Adedokun, Babatunde Olubayo', "Uwizeye, Dieudonné"]}
df = pd.DataFrame(data)
def swap_name(name):
name = name.split(', ')
return name[1] + ' ' + name[0]
df['Name'] = df['Name'].apply(swap_name)
df
Output:
> Name
> 0 Babatunde Olubayo Adedokun
> 1 Dieudonné Uwizeye
Let's assume you want to do the operation on "Other Names 1":
df.loc[:, "Other Names1"] = df["Other Names1"].str.split(",").apply(lambda row: " ".join(row))
You can use str accessor:
df['Name'] = df['Name'].str.split(', ').str[::-1].str.join(' ')
print(df)
# Output
Name
0 Babatunde Olubayo Adedokun
1 Dieudonné Uwizeye

How to merge multiple columns with same names in a dataframe

I have the following dataframe as below:
df = pd.DataFrame({'Field':'FAPERF',
'Form':'LIVERID',
'Folder':'ALL',
'Logline':'9',
'Data':'Yes',
'Data':'Blank',
'Data':'No',
'Logline':'10'}) '''
I need dataframe:
df = pd.DataFrame({'Field':['FAPERF','FAPERF'],
'Form':['LIVERID','LIVERID'],
'Folder':['ALL','ALL'],
'Logline':['9','10'],
'Data':['Yes','Blank','No']}) '''
I had tried using the below code but not able to achieve desired output.
res3.set_index(res3.groupby(level=0).cumcount(), append=True['Data'].unstack(0)
Can anyone please help me.
I believe your best option is to create multiple data frames with the same column name ( example 3 df with column name : "Data" ) then simply perform a concat function over Data frames :
df1 = pd.DataFrame({'Field':'FAPERF',
'Form':'LIVERID',
'Folder':'ALL',
'Logline':'9',
'Data':'Yes'}
df2 = pd.DataFrame({
'Data':'No',
'Logline':'10'})
df3 = pd.DataFrame({'Data':'Blank'})
frames = [df1, df2, df3]
result = pd.concat(frames)
You just need to add to list in which you specify the logline and data_type for each row.
import pandas as pd
import numpy as np
list_df = []
data_type_list = ["yes","no","Blank"]
logline_type = ["9","10",'10']
for x in range (len(data_type_list)):
new_dict = { 'Field':['FAPERF'], 'Form':['LIVERID'],'Folder':['ALL'],"Data" : [data_type_list[x]], "Logline" : [logline_type[x]]}
df = pd.DataFrame(new_dict)
list_df.append(df)
new_df = pd.concat(list_df)
print(new_df)

Sort a Pandas DataFrame using both Date and Time

I'm Trying to sort my dataframe using "sort_value" Im not getting the desired output
df1 = pd.read_csv('raw data/120_FT DDMG.csv')
df2 = pd.read_csv('raw data/120_FT MG.csv')
df3 = pd.read_csv('raw data/120_FT DD.csv')
dconcat = pd.concat([df1,df2,df3])
dconcat['date'] = pd.to_datetime(dconcat['ActivityDates(Individual)']+' '+dconcat['ScheduledStartTime'])
dconcat.sort_values(by='date')
dconcat = dconcat.set_index('date')
print(dconcat)
sort_values returns a data frame which is sorted if inplace=False.
so dconcat=dconcat.sort_values(by='date')
or you can do dconcat.sort_values(by='date', inplace=True)
you can try this;
dconcat = pd.concat([df1,df2,df3])
dconcat['date'] = pd.to_datetime(dconcat['ActivityDates(Individual)']+' '+dconcat['ScheduledStartTime'])
dconcat.set_index('date', inplace=True)
dconcat.sort_index(inplace=True)
print(dconcat)

Categories