This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 4 years ago.
I have a dataframe which is similar to:
grades=pd.DataFrame(columns=["person","course_code","grade"],data=[[1,101,2.0],[2,102,1.0],[3,103,3.0],[2,104,4.0],[1,102,5.0],[3,104,2.5],[2,101,1.0]])
On each row is the grade of a certain student in certain subject.
And want to convert it to another that looks like this:
students=pd.DataFrame(columns=[101,102,103,104],data [[2.0,5.0,"NaN","NaN"],[1.0,1.0,"Nan",4.0],["Nan","Nan",3.0,2.5]])
On each row is a student (codex of the row) with the different grades obtained in every subject (every column is a different subject).
I have tried doing this:
for subj in grades["COURSE_CODE"].unique():
grades_subj=grades[grades["COURSE_CODE"]==subj]
grades_subj = grades_subj.set_index("EXPEDIENT_CODE", drop = True)
for st in grades["EXPEDIENT_CODE"].unique():
grade_num=grades_subj.loc[st]["GRADE"]
student.loc[st][subj]=grade_num
But I get:
KeyError: 'the label [304208] is not in the [index]'
I have tried other ways too and get always errors...
Can someone help me, please?
try:
grades.pivot_table(index='person', columns='course_code', values='grade')
The value argument let you to choose the aggregation column.
In order to answer your comment below, you can always add different levels when indexing. This is simply done by passing a list rather than a single string to index. Note you can do the same in columns. SO, based in the example you provide.
grades.pivot_table(index=['person','school'], columns='course_code', values ='grade')
After this I usually recommend to reset_index() unless you are fluent slicing and indexing with MultiIndex.
Also, if the correspondence is 1 to 1, you could merge both dataframes using the appropiate join.
Here you have all the information about Reshaping and Pivot Tables in Pandas.
Related
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 3 months ago.
I have two df's: one has a date in the first column: all dates of the last three years and the second column are names of participants, other columns are information.
In the second df, I have some dates on which we did tests in the first column, then second column the names again and more columns information.
I would like to combine the two dateframes that in the first dataframe the information from the second will be added but for example if we did one test on 2-9-2020 and the same test for the same person on 16-9-2022 then from 2-9-202 until 16-9-2022 i want that variable and after that the other.
I hope it's clear what i mean.
i tried
data.merge(data_2, on='Date' & 'About')
but that is not possible to give two columns for on.
Please, I would be nice if you can provide and example. I would try this.
import pandas as pd
new_df = pd.merge(data,names_participants, on = ['Date'], how = 'left')
I would validate if everything is right regarding the date format as well.
With Python and Pandas, you can join on 2 variables by using something like:
df=pd.merge(df,df2,how="left",on=['Date','About']) # can be how="left" or "inner","right","outer"
I think you had the right idea, but not quite the right syntax. Does the following work for your situation?
import pandas as pd
new_df = pd.merge(data, data2, on = ["Date", "About"], how = "left")
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 12 months ago.
I have 2 almost identical pandas dataframes with 5 common columns.
I want to add the second dataframe to the first which has a new column.
Dataframe 1
Dataframe 2
But I want it to update the same row given that columns 'Lot name', 'wafer' and 'site' match (green). If the columns do not match, I want to have the value of NaN as shown below.
Desired output
I have to do this with over 160 discrete columns but with possible matching Lot name, WAFER and SITE values.
I have tried the various merging(left right outer) and concat options, just cant seem to get it right. Any help\comments is appreciated.
Edit, follow up question:
I am trying to use this in a loop, where each iteration generates a new dataframe assigned to TEMP that needs to be merged with the previous dataframe. I cannot merge with an empty dataframe as it gives a merge error. How can I achieve this?
alldata = pd.DataFrame()
for i in range(len(operation)):
temp = data[data['OPE_NO'].isin([operation[i]])]
temp = temp[temp['PARAM_NAME'].isin([parameter[i]])]
temp = temp.reset_index(drop=True)
temp = temp[["LOT",'Lot name','WAFER',"SITE","PRODUCT",'PARAM_VALUE_NUMBER']]
temp = temp.rename(columns={'PARAM_VALUE_NUMBER':'PMRM28LEMCKLYTFR.1~'+operation[i]+'~'+parameter[i]})
alldata.merge(temp,how='outer')
example can be done with the following code
df1.merge(df2, how="outer")
If I'm misunderstanding problem, please tell me problem.
my english is not good but i have good heart to help you
This question already has answers here:
How to pivot a dataframe in Pandas? [duplicate]
(2 answers)
Closed 1 year ago.
Hi there I have a data set look like df1 below and I want to make it look like df2 using pandas. I have tried to use pivot and transpose but can't wrap my head around how to do it. Appreciate any help!
This should do the job
df.pivot_table(index=["AssetID"], columns='MeterName', values='MeterValue')
index: Identifier
columns: row values that will become columns
values: values to put in those columns
I often have the same trouble:
https://towardsdatascience.com/reshape-pandas-dataframe-with-pivot-table-in-python-tutorial-and-visualization-2248c2012a31
This could help next time.
This question already has answers here:
Split pandas dataframe based on groupby
(4 answers)
Closed 11 months ago.
I have a Dataframe that is being output to a spreadsheet called 'All Data'. Let's say this data contains a business addresses (column for street, city, zip, state). However, I also want to create a worksheet for each unique state containing the exact same columns.
My basic idea was to iterate over every row using df.iterrows() and divide the dataframe like that by appending it to a new dataframe but that seems extremely inefficient. Is there a better way to do this?
I found this answer but that is just a boolean index.
The groupby answers on the other question will work for you too. In your case, something like:
df_list = [d for _, d in df.groupby(['state'])]
This uses a list comprehension to return a list of dataframes, with one dataframe for each state.
A simple way to do it would be to get the unique states and then filtering them out and saving them as individual CSVs or do any other operation after
Here's an example:
# df[column].unique() returns a list of unique values in that particular column
for state in df['state'].unique():
# Filter the dataframe using that column and value from the list
df[df['state']==state].to_csv()
This question already has answers here:
How do I expand the output display to see more columns of a Pandas DataFrame?
(22 answers)
Closed 3 years ago.
I have two dataframes df and df2 with contents as follows
dataframe df
dataframe df2
I'd like to add to df1 the two columns from df2 "NUMSESSIONS_ANDROID" and "AVGSESSDUR_ANDROID"
I do this as follows:
df['NUMSESSIONS_ANDROID'] = df2['NUMSESSIONS_ANDROID']
df['AVGSESSDUR_ANDROID'] = df2['AVGSESSDUR_ANDROID']
However when I print the resulting df I see ... in place of AVGSESSDUR_IOS (i.e. it appears to have swallowed that column)
Appreciate any help resolving this ....
As ALollz stated, the fact you are seeing ... in the output means there's "hidden" data that is part of the dataframe, but not showing in your console or IDE. However you can perform an easy print to check all the columns that your dataframe contains with:
print(list(df))
And this will show you all the names of the columns in your df that way you can check whether the ones you want are there or not.
Furthermore you can print an specific column as a series (first line) or dataframe (second):
print(df['column_name'])
print(df[['column_name']])
If successful you will see the series/dataframe, if the column actually doesn't exist in your original dataframe, then you will get a KeyError.
Leveraging #ALollz's hint above ...
"The ... indicates that only part of the DataFrame is being shown in your terminal/output, so 'AVGSESSDUR_IOS' is almost certainly still there it's just not shown. You can look at print(df.iloc[:, 0:3]) to see the first 3 columns for instance."
I added the following two lines to increase the number of columns and width of console display and it worked:
pd.set_option('display.max_columns',20)
pd.set_option('display.width', 1000)
print(df.iloc[:,0:5])