How to delete Pandas rows that have been seen before - python

If I have a table as follows in Pandas Dataframe:
Date Name
15/12/01 John Doe
15/12/01 John Doe
15/12/01 John Doe
15/12/02 Mary Jean
15/12/02 Mary Jean
15/12/02 Mary Jean
I would like to delete all instances of John Doe/Mary Jean (or whatever name may be there) with the same date and only keep the latest one. After the operation it would look like this:
Date Name
15/12/01 John Doe
15/12/02 Mary Jean
Where the third instance of both John Doe and Mary Jean have been kept and the rest have been deleted. How could I do this in an efficient and fast way in Pandas?
Thanks!

Related

Python remove text if same of another column

I want to drop in my dataframe the text in a column if it starts with the same text that is in another column.
Example of dataframe:
name var1
John Smith John Smith Hello world
Mary Jane Mary Jane Python is cool
James Bond My name is James Bond
Peter Pan Nothing happens here
Dataframe that I want:
name var1
John Smith Hello world
Mary Jane Python is cool
James Bond My name is James Bond
Peter Pan Nothing happens here
Something simple as:
df[~df.var1.str.contains(df.var1)]
does not work. How I should write my python code?
Try using apply lambda;
df["var1"] = df.apply(lambda x: x["var1"][len(x["name"]):].strip() if x["name"] == x["var1"][:len(x["name"])] else x["var1"],axis=1)
How about this?
df['var1'] = [df.loc[i, 'var1'].replace(df.loc[i, 'name'], "") for i in df.index]

Change Values in Pandas Dataframe

I have two dataframes. I want to change some of the values.
I know how to change it on a one by one basis, using isin and where statement but I don't know how to change a large list of changes.
df1
Name Type
David Staff
Jones Pilot
Jack Pilot
Susan Steward
John Staff
Leroy Staff
Steve Staff
df2
Name Type
David Captain
Leroy Pilot
Steve Pilot
How do I change the "type" column on df1 by using df2?
df_desired
Name Type
David Captain
Jones Pilot
Jack Pilot
Susan Steward
John Staff
Leroy Pilot
Steve Pilot
You can try map Type column of df2 to df1 then update
df1['Type'].update(df1['Name'].map(df2.set_index('Name')['Type']))
print(df1)
Name Type
0 David Captain
1 Jones Pilot
2 Jack Pilot
3 Susan Steward
4 John Staff
5 Leroy Pilot
6 Steve Pilot

How to replace strings in a dataframe where there is a likely typo

I been working on this for a few hours but no progress on how to automate. I have a dataframe with over 50,000 rows.
Occasionally there is a misspelling like
Rosalind vs Rosalinda
Wong vs Wang
Of course there can be cases where there is lets say indeed two different people but lets assume that they work in different factories
John Wong from Factory1
John Wang from Factory1 -> Should be changed to John Wong
John Wang from Factory2
Without manually finding all the typos, how do I clean this dataset or atleast identify likely typos?
So the dataframe would go from
DF1
Lname Fname Location
Wong John Factory1
Wang John Factory1
Wong Joh Facotry1
Wang John Factory2
to something like
Lname Fname Location
Wong John Factory1
Wong John Factory1
Wong John Factory1
Wang John Factory2
Is something like this possible? Thanks
Edit: fixed typo in the location

Melting Transformation

I work with financial files that are organized with dates as the columns.
Example Table.
However, I need to transform a table like this to have column names like this: Name, Date, Apples, Oranges. How would you do this using Python, Power Query, or Excel?
Type
Name
Jan-21
Feb-21
Mar-21
Apples
John
$1.20
$1.05
$1.65
Oranges
John
$1.42
$1.15
$1.77
Apples
Jim
$1.60
$1.15
$1.85
Oranges
Jim
$1.62
$1.45
$1.37
I'm wanting the table to look like this:
Name
Dates
Apples
Oranges
John
Jan-21
$1.20
$1.42
John
Feb-21
$1.05
$1.15
Jim
Jan-21
$1.60
$1.62
Jim
Feb-21
$1.15
$1.45
Solved my own problem using Power Query (within Power BI). I just unpivoted all columns other than Type and Name. Then pivot the Type column using the new "Value" column. Also, this is apparently called melting which I wasn't familiar with.

match name and surname from two data frames, extract middle name from one data frame and append it to other

I have two almost identical data frames A and B. In reality its a two data frames with 1000+ names each.
I want to match name and surname from both data frames and then extract the middle name from data frame B to data frame A.
data frame A
name surname
John Doe
Tom Sawyer
Huckleberry Finn
data frame B
name middle_name surname
John `O Doe
Tom Philip Sawyer
Lilly Tomas Finn
The result i seek:
name middle name surname
John `O Doe
Tom Philip Sawyer
You can use df.merge with parameter how='inner' and on=['name','surname']. To get the correct order use df.reindex over axis 1.
df = df.merge(df1,how='inner',on=['name','surname'])
df.reindex(['name', 'middle_name', 'surname'])
name middle_name surname
0 John `O Doe
1 Tom Philip Sawyer

Categories