I am new to pandas,
I have the following dataframe:
df = pd.DataFrame([[1, 'name', 'peter'], [1, 'age', 23], [1, 'height', '185cm']], columns=['id', 'column','value'])
id column value
0 1 name peter
1 1 age 23
2 1 height 185cm
I need to create a single row for each ID. Like so:
id name age height
0 1 peter 23 185cm
Any help is greatly appreciated, thank you.
You can use pivot_table with aggregate join:
df = pd.DataFrame([[1, 'name', 'peter'],
[1, 'age', 23],
[1, 'height', '185cm'],
[1, 'age', 25]], columns=['id', 'column','value'])
print (df)
id column value
0 1 name peter
1 1 age 23
2 1 height 185cm
3 1 age 25
df1 = df.astype(str).pivot_table(index="id",columns="column",values="value",aggfunc=','.join)
print (df1)
column age height name
id
1 23,25 185cm peter
Another solution with groupby + apply join and unstack:
df1 = df.astype(str).groupby(["id","column"])["value"].apply(','.join).unstack(fill_value=0)
print (df1)
column age height name
id
1 23,25 185cm peter
Assuming your dataframe as "df", below line would help:
df.pivot(index="subject",columns="predicate",values="object")
Related
I have two data frames
df1:
ID Date Value
0 9560 07/3/2021 25
1 9560 03/03/2021 20
2 9712 12/15/2021 15
3 9712 08/30/2021 10
4 9920 4/11/2021 5
df2:
ID Value
0 9560
1 9712
2 9920
In df2, I want to get the latest value from "Value" column of df1 with respect to ID.
This is my expected output:
ID Value
0 9560 25
1 9712 15
2 9920 5
How could I achieve it?
Based on Daniel Afriyie's approach, I came up with this solution:
import pandas as pd
# Setup for demo
df1 = pd.DataFrame(
columns=['ID', 'Date', 'Value'],
data=[
[9560, '07/3/2021', 25],
[9560, '03/03/2021', 20],
[9712, '12/15/2021', 15],
[9712, '08/30/2021', 10],
[9920, '4/11/2021', 5]
]
)
df2 = pd.DataFrame(
columns=['ID', 'Value'],
data=[[9560, None], [9712, None], [9920, None]]
)
## Actual solution
# Casting 'Date' column to actual dates
df1['Date'] = pd.to_datetime(df1['Date'])
# Sorting by dates
df1 = df1.sort_values(by='Date', ascending=False)
# Dropping duplicates of 'ID' (since it's ordered by date, only the newest of each ID will be kept)
df1 = df1.drop_duplicates(subset=['ID'])
# Merging the values from df1 into the the df2
pf2 = pd.merge(df2[['ID']], df1[['ID', 'Value']]))
output:
ID Value
0 9560 25
1 9712 15
2 9920 5
In the example dataframe created below:
Name Age
0 tom 10
1 nick 15
2 juli 14
I want to add another column 'Checks' and get the values in it as 0 or 1 if the list check contain s the value as check=['nick']
I have tried the below code:
import numpy as np
import pandas as pd
# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]
check = ['nick']
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
df['Checks'] = np.where(df['Name']== check[], 1, 0)
#print dataframe.
print(df)
print(check)
str.containts
phrase = ['tom', 'nick']
df['check'] = df['Name'].str.contains('|'.join(phrase))
You can use pandas.Series.isin:
check = ['nick']
df['check'] = df['Name'].isin(check).astype(int)
output:
Name Age check
0 tom 10 0
1 nick 15 1
2 juli 14 0
I've the following dataframe:
id;name;parent_of
1;John;3
2;Rachel;3
3;Peter;
Where the column "parent_of" is the id of the parent id. What I want to get the is the name instead of the id on the column "parent_of".
Basically I want to get this:
id;name;parent_of
1;John;Peter
2;Rachel;Peter
3;Peter;
I already wrote a solution but is not the more effective way:
import pandas as pd
d = {'id': [1, 2, 3], 'name': ['John', 'Rachel', 'Peter'], 'parent_of': [3,3,'']}
df = pd.DataFrame(data=d)
df_tmp = df[['id', 'name']]
df = pd.merge(df, df_tmp, left_on='parent_of', right_on='id', how='left').drop('parent_of', axis=1).drop('id_y', axis=1)
df=df.rename(columns={"name_x": "name", "name_y": "parent_of"})
print(df)
Do you have any better solution to achieve this?
Thanks!
Check with map
df['parent_of']=df.parent_of.map(df.set_index('id')['name'])
df
Out[514]:
id name parent_of
0 1 John Peter
1 2 Rachel Peter
2 3 Peter NaN
I have multiple panda data frames ( more than 70), each having same columns. Let say there are only 10 rows in each data frame. I want to find the column A' value occurence in each of data frame and list it. Example:
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
data = [['sam', 12], ['nick', 15], ['juli', 14]]
df2 = pd.DataFrame(data, columns = ['Name', 'Age'])
I am expecting the output as
Name Age
tom 1
sam 1
nick 2
juli 2
You can do the following:
from collections import Counter
d={'df1':df1, 'df2':df2, ..., 'df70':df70}
l=[list(d[i]['Name']) for i in d]
m=sum(l, [])
result=Counter(m)
print(result)
Do you want value counts of Name column across all dataframes?
main = pd.concat([df,df2])
main["Name"].value_counts()
juli 2
nick 2
sam 1
tom 1
Name: Name, dtype: int64
This can work if your data frames are not costly to concat:
pd.concat([x['Name'] for x in [df,df2]]).value_counts()
nick 2
juli 2
tom 1
sam 1
You can try this:
df = pd.concat([df, df2]).groupby('Name', as_index=False).count()
df.rename(columns={'Age': 'Count'}, inplace=True)
print(df)
Name Count
0 juli 2
1 nick 2
2 sam 1
3 tom 1
You can try this:
df = pd.concat([df1,df2])
df = df.groupby(['Name'])['Age'].count().to_frame().reset_index()
df = df.rename(columns={"Age": "Count"})
print(df)
I have a pandas data frame of the following form:
Name Age BMoney BTime BEffort
John 22 1 0 0
Pete 54 0 1 0
Lisa 26 0 1 1
And I would like to convert it to
Name Age B
John 22 Money
Pete 54 Time
Lisa 26 Effort
Lisa 26 Time
That is, based on the values in the "Breason" column I would like to create a new column "B" containing "reason". If for a person multiple reasons exists (i.e: a row contains multiple 1's) I would like to create seperate rows for that person in my new dataframe showcasing their different reasons.
With Multi Index and stack():
# Create the dataframe
df = [["John", 22, 1, 0, 0],
["Pete", 54, 0, 1, 0],
["Lisa", 26, 1, 1, 0]]
df = pd.DataFrame(df, columns=["Name", "Age", "BMoney", "BTime", "BEffort"])
# Set Multi Indexing
df.set_index(["Name", "Age"], inplace=True)
# Use the fact that columns and Series can carry names and use stack to do the transformation
df.columns.name = "B"
df = df.stack()
df.name = "value"
df = df.reset_index()
# Select only the "valid" rows, remove the last columns and remove first letter in B columns
df = df[df.value == 1]
df.drop("value", axis=1, inplace=True)
df["B"] = df.B.apply(lambda x: x[1:])