Pandas dataframe combine rows representing the same entity [duplicate] - python

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 8 months ago.
I have data that contains several rows for each employee. Each row contains one attribute and its value. For example:
Worker ID
Last Name
First Name
Metric Name
Metric Value
1
Hanson
Scott
Attendance
98
1
Hanson
Scott
On time
35
2
Avery
Kara
Attendance
95
2
Avery
Kara
On time
57
I would like to combine rows based on worker id, taking metrics to their own columns like so:
Worker ID
Last Name
First Name
Attendance
On time
1
Hanson
Scott
98
35
2
Avery
Kara
95
57
I can do worker_data.pivot_table(values='Metric Value', index='Worker ID', columns=['Metric Name']), but that does not give me the first and last names as columns. What is the best Pandas way to merge these rows?

In your solution change index parameter by list and for avoid MultiIndex remove [] from column parameter:
df = (worker_data.pivot_table(index=['Worker ID','Last Name','First Name'],
columns='Metric Name',
values='Metric Value')
.reset_index()
.rename_axis(None, axis=1))

Related

How to split a string column into multiple columns? [duplicate]

This question already has answers here:
Split / Explode a column of dictionaries into separate columns with pandas
(13 answers)
Closed 14 days ago.
I have a data frame with one string column and I'd like to split it into multiple columns by seperate with
','. I want to name the column as same as the string in the column before ':'.
The column looks like this:
0 {"ID":"AP001","Name":"Anderson","Age":"23"}
1 {"ID":"AP002","Name":"Jasmine","Age":"36"}
2 {"ID":"AP003","Name":"Zack","Age":"28"}
3 {"ID":"AP004","Name":"Chole","Age":"39"}
And I want to split to this:
ID
Name
Age
AP001
Anderson
23
AP002
Jasmine
36
AP003
Zack
28
AP004
Chole
39
I have tried to split it by ',', but im not sure how to remove the string before ':' and put it as the column name.
data1 = data['demographic'].str.split(',',expand=True)
This is what I get after splitting it:
0
1
2
"ID":"AP001"
"Name":"Anderson"
"Age":"23"
"ID":"AP002"
"Name":"Jasmine"
"Age":"36"
"ID":"AP003"
"Name":"Zack"
"Age":"28"
"ID":"AP004"
"Name":"Chole"
"Age":"39"
Anyone knows how to do it?
You can use ast.literal_eval:
import ast
data1 = pd.json_normalize(data['demographic'].apply(ast.literal_eval))
print(data1)
# Output
ID Name Age
0 AP001 Anderson 23
1 AP002 Jasmine 36
2 AP003 Zack 28
3 AP004 Chole 39

Merging/concatenating two datasets on a specific column (different lengths) [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 1 year ago.
I have two different datasets
df1
Name Surname Age Address
Julian Ross 34 Main Street
Mary Jane 52 Cook Road
len(1200)
df2
Name Country Telephone
Julian US NA
len(800)
df1 contains the full list of unique names; df2 contains less rows as many Name were not added.
I would like to get a final dataset with the full list of names in df1 (and all the fields that are there) plus the fields in df2. I would then expect a final dataset of length 1200 with some empty fields corresponding to the missing name in df2.
I have tried as follows:
pd.concat([df1.set_index('Name'),df2.set_index('Name')], axis=1, join='inner')
but it returns the length of the smallest dataset (i.e. 800).
I have also tried
df1.merge(df2, how = 'inner', on = ['Name'])
... same result.
I am not totally familiar with joining/merging/concatenating functions, even after reading the document https://pandas.pydata.org/docs/user_guide/merging.html .
I know that probably this question will be a duplicate of some others and I will be happy to delete it if necessary, but I would be really grateful if you could provide same help and explaining how to get the expected result:
df
Name Surname Age Address Country Telephone
Julian Ross 34 Main Street US NA
Mary Jane 52 Cook Road
IIUC, Use pd.merge like below:
>>> df1.merge(df2, how='left', on='Name')
Name Surname Age Address Country Telephone
0 Julian Ross 34 Main Street US NaN
1 Mary Jane 52 Cook Road NaN NaN
If you want to keep the number of rows of df1, you have to use how='left' in the case where there is no duplicate names in df2.
Read Pandas Merging 101

Pandas how to make a transpose of data-frame to get values for the remaining two columns [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 1 year ago.
Have a df with values
name marks subject
mark 50 math
mark 75 french
tom 25 english
tom 30 Art
luca 100 math
luca 100 art
How to make a transpose of a dataframe so it looks like this
name math art french english
mark 50 75
tom 30 25
luca 100 100
tried:
df.T and df[['marks','subject']].T
but
This is a pivot. First we need to normalize the subject column, then we pivot.
df['subject'] = df['subject'].str.lower()
df.pivot(index='name', columns='subject', values='marks')
See here for more info: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot

Pandas Dataframe : Using same category codes on different existing dataframes with same category

I have two pandas dataframes with some columns in common. These columns are of type category but unfortunately the category codes don't match for the two dataframes. For example I have:
>>> df1
artist song
0 The Killers Mr Brightside
1 David Guetta Memories
2 Estelle Come Over
3 The Killers Human
>>> df2
artist date
0 The Killers 2010
1 David Guetta 2012
2 Estelle 2005
3 The Killers 2006
But:
>>> df1['artist'].cat.codes
0 55
1 78
2 93
3 55
Whereas:
>>> df2['artist'].cat.codes
0 99
1 12
2 23
3 99
What I would like is for my second dataframe df2 to take the same category codes as the first one df1 without changing the category values. Is there any way to do this?
(Edit)
Here is a screenshot of my two dataframes. Essentially I want the song_tags to have the same cat codes for artist_name and track_name as the songs dataframe. Also song_tags is created from a merge between songs and another tag dataframe (which contains song data and their tags, without the user information) and then saved and loaded through pickle. Also it might be relevant to add that I had to cast artist_name and track_name in song_tags to type category from type object.
I think essentially my question is: how to modify category codes of an existing dataframe column?

Create multiple dataframes from one

I have a dataframe like this:
name time session1 session2 session3
Alex 135 10 3 5
Lee 136 2 6 4
I want to make multiple dataframes based on each session. for example, i want to make dataframe one that has name, time, and session1. and dataframe 2 has name, time, and session2. and dataframe 3 has name, time, and session3. I want to use for loop or any other way is better but don't know how to choose column 1,2,3 at one time but column 1,2, 4 and etc. Any one has idea about that. The data is saved in pandas dataframe. I just don't know how to type it in Stackoverflow here. Thank you
I don't think you need to create a new dictionary for that.
Just directly slice your data frame whenever needed.
df[['name', 'time', 'session 1']]
If you think the following design can help you, you can set the name and time to be indexes (df.set_index(['name', 'time'])) and just simply
df['session 1']
Organize it into a dictionary of dataframes:
dict_of_dfs = {f'df {i}':df[['name','time', i]] for i in df.columns[2:]}
Then you can access each dataframe as you would any other dictionary values:
>>> dict_of_dfs['df session1']
name time session1
0 Alex 135 10
1 Lee 136 2
>>> dict_of_dfs['df session2']
name time session2
0 Alex 135 3
1 Lee 136 6

Categories