Pandas: Converting Columns to Rows based on ID

Pandas: Converting Columns to Rows based on ID - python

I am new to pandas,
I have the following dataframe:
df = pd.DataFrame([[1, 'name', 'peter'], [1, 'age', 23], [1, 'height', '185cm']], columns=['id', 'column','value'])
id column value
0 1 name peter
1 1 age 23
2 1 height 185cm
I need to create a single row for each ID. Like so:
id name age height
0 1 peter 23 185cm
Any help is greatly appreciated, thank you.

You can use pivot_table with aggregate join:
df = pd.DataFrame([[1, 'name', 'peter'],
[1, 'age', 23],
[1, 'height', '185cm'],
[1, 'age', 25]], columns=['id', 'column','value'])
print (df)
id column value
0 1 name peter
1 1 age 23
2 1 height 185cm
3 1 age 25
df1 = df.astype(str).pivot_table(index="id",columns="column",values="value",aggfunc=','.join)
print (df1)
column age height name
id
1 23,25 185cm peter
Another solution with groupby + apply join and unstack:
df1 = df.astype(str).groupby(["id","column"])["value"].apply(','.join).unstack(fill_value=0)
print (df1)
column age height name
id
1 23,25 185cm peter

Assuming your dataframe as "df", below line would help:
df.pivot(index="subject",columns="predicate",values="object")

Related

Get the value of a data frame column with respect to another data frame column value

I have two data frames
df1:
ID Date Value
0 9560 07/3/2021 25
1 9560 03/03/2021 20
2 9712 12/15/2021 15
3 9712 08/30/2021 10
4 9920 4/11/2021 5
df2:
ID Value
0 9560
1 9712
2 9920
In df2, I want to get the latest value from "Value" column of df1 with respect to ID.
This is my expected output:
ID Value
0 9560 25
1 9712 15
2 9920 5
How could I achieve it?

Based on Daniel Afriyie's approach, I came up with this solution:
import pandas as pd
# Setup for demo
df1 = pd.DataFrame(
columns=['ID', 'Date', 'Value'],
data=[
[9560, '07/3/2021', 25],
[9560, '03/03/2021', 20],
[9712, '12/15/2021', 15],
[9712, '08/30/2021', 10],
[9920, '4/11/2021', 5]
]
)
df2 = pd.DataFrame(
columns=['ID', 'Value'],
data=[[9560, None], [9712, None], [9920, None]]
)
## Actual solution
# Casting 'Date' column to actual dates
df1['Date'] = pd.to_datetime(df1['Date'])
# Sorting by dates
df1 = df1.sort_values(by='Date', ascending=False)
# Dropping duplicates of 'ID' (since it's ordered by date, only the newest of each ID will be kept)
df1 = df1.drop_duplicates(subset=['ID'])
# Merging the values from df1 into the the df2
pf2 = pd.merge(df2[['ID']], df1[['ID', 'Value']]))
output:
ID Value
0 9560 25
1 9712 15
2 9920 5

Adding column in a dataframe with 0,1 values based on another column values

In the example dataframe created below:
Name Age
0 tom 10
1 nick 15
2 juli 14
I want to add another column 'Checks' and get the values in it as 0 or 1 if the list check contain s the value as check=['nick']
I have tried the below code:
import numpy as np
import pandas as pd
# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]
check = ['nick']
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
df['Checks'] = np.where(df['Name']== check[], 1, 0)
#print dataframe.
print(df)
print(check)

str.containts
phrase = ['tom', 'nick']
df['check'] = df['Name'].str.contains('|'.join(phrase))

You can use pandas.Series.isin:
check = ['nick']
df['check'] = df['Name'].isin(check).astype(int)
output:
Name Age check
0 tom 10 0
1 nick 15 1
2 juli 14 0

Merge between columns from the same dataframe

I've the following dataframe:
id;name;parent_of
1;John;3
2;Rachel;3
3;Peter;
Where the column "parent_of" is the id of the parent id. What I want to get the is the name instead of the id on the column "parent_of".
Basically I want to get this:
id;name;parent_of
1;John;Peter
2;Rachel;Peter
3;Peter;
I already wrote a solution but is not the more effective way:
import pandas as pd
d = {'id': [1, 2, 3], 'name': ['John', 'Rachel', 'Peter'], 'parent_of': [3,3,'']}
df = pd.DataFrame(data=d)
df_tmp = df[['id', 'name']]
df = pd.merge(df, df_tmp, left_on='parent_of', right_on='id', how='left').drop('parent_of', axis=1).drop('id_y', axis=1)
df=df.rename(columns={"name_x": "name", "name_y": "parent_of"})
print(df)
Do you have any better solution to achieve this?
Thanks!

Check with map
df['parent_of']=df.parent_of.map(df.set_index('id')['name'])
df
Out[514]:
id name parent_of
0 1 John Peter
1 2 Rachel Peter
2 3 Peter NaN

frequency of values in column in multiple panda data frame

I have multiple panda data frames ( more than 70), each having same columns. Let say there are only 10 rows in each data frame. I want to find the column A' value occurence in each of data frame and list it. Example:
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
data = [['sam', 12], ['nick', 15], ['juli', 14]]
df2 = pd.DataFrame(data, columns = ['Name', 'Age'])
I am expecting the output as
Name Age
tom 1
sam 1
nick 2
juli 2

You can do the following:
from collections import Counter
d={'df1':df1, 'df2':df2, ..., 'df70':df70}
l=[list(d[i]['Name']) for i in d]
m=sum(l, [])
result=Counter(m)
print(result)

Do you want value counts of Name column across all dataframes?
main = pd.concat([df,df2])
main["Name"].value_counts()
juli 2
nick 2
sam 1
tom 1
Name: Name, dtype: int64

This can work if your data frames are not costly to concat:
pd.concat([x['Name'] for x in [df,df2]]).value_counts()
nick 2
juli 2
tom 1
sam 1

You can try this:
df = pd.concat([df, df2]).groupby('Name', as_index=False).count()
df.rename(columns={'Age': 'Count'}, inplace=True)
print(df)
Name Count
0 juli 2
1 nick 2
2 sam 1
3 tom 1

You can try this:
df = pd.concat([df1,df2])
df = df.groupby(['Name'])['Age'].count().to_frame().reset_index()
df = df.rename(columns={"Age": "Count"})
print(df)

Pandas duplicate row based on condition and unstack

I have a pandas data frame of the following form:
Name Age BMoney BTime BEffort
John 22 1 0 0
Pete 54 0 1 0
Lisa 26 0 1 1
And I would like to convert it to
Name Age B
John 22 Money
Pete 54 Time
Lisa 26 Effort
Lisa 26 Time
That is, based on the values in the "Breason" column I would like to create a new column "B" containing "reason". If for a person multiple reasons exists (i.e: a row contains multiple 1's) I would like to create seperate rows for that person in my new dataframe showcasing their different reasons.

With Multi Index and stack():
# Create the dataframe
df = [["John", 22, 1, 0, 0],
["Pete", 54, 0, 1, 0],
["Lisa", 26, 1, 1, 0]]
df = pd.DataFrame(df, columns=["Name", "Age", "BMoney", "BTime", "BEffort"])
# Set Multi Indexing
df.set_index(["Name", "Age"], inplace=True)
# Use the fact that columns and Series can carry names and use stack to do the transformation
df.columns.name = "B"
df = df.stack()
df.name = "value"
df = df.reset_index()
# Select only the "valid" rows, remove the last columns and remove first letter in B columns
df = df[df.value == 1]
df.drop("value", axis=1, inplace=True)
df["B"] = df.B.apply(lambda x: x[1:])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas: Converting Columns to Rows based on ID - python

Assuming your dataframe as "df", below line would help: df.pivot(index="subject",columns="predicate",values="object")

Related

Get the value of a data frame column with respect to another data frame column value

Adding column in a dataframe with 0,1 values based on another column values

Merge between columns from the same dataframe

frequency of values in column in multiple panda data frame

Pandas duplicate row based on condition and unstack

Categories

Resources