Pandas - remove duplicate items completely from dataframe [duplicate] - python

This question already has answers here:
Drop all duplicate rows across multiple columns in Python Pandas
(8 answers)
Closed 2 years ago.
I want to remove duplicate items completely from a pandas dataframe. For example, I have the dataframe:
location area
0 mountain view 1044ft2
1 palo alto None
2 mountain view 890ft2
3 san carlos 1000ft2
4 belmont None
What I want to do is find unique values in column location and remove any items that had duplicates altogether, completely, etc.. So the final product will look like this (notice mountain view is gone):
location area
1 palo alto None
3 san carlos 1000ft2
4 belmont None
Thanks.

Use
df.drop_duplicates(subset='location', keep=False)

Related

Get name and count of unique elements based on other column unique element [duplicate]

This question already has answers here:
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
(9 answers)
Closed 4 months ago.
Given the following sample dataframe:
df =
Car Country
BMW Germany
Tesla USA
BMW Germany
Mercedes France
Tesla USA
Based on unique values of column Country I want to get count and naming of column Car. Desired output:
Germany:
BMW - 2
USA:
Telsa - 2
France:
Mercedes - 1
I have tried to play with pivot table but it was mess
A classic use for groupby:
df.groupby(["Country", "Car"]).size()

How to replace row value with only the integers within the value using Pandas? [duplicate]

This question already has answers here:
How can I remove all non-numeric characters from all the values in a particular column in pandas dataframe?
(6 answers)
Closed 2 years ago.
I have a bunch of columns like this
District
________
State District 1
State District 2
State District 3
4th State House District
5th State House District
State District 6
...
State District 17
I want to transform each of these so it only contains the integer value:
District
________
1
2
3
4
5
6
...
17
How can this be done with Pandas? Is this even possible with Pandas or would I have to perform this transformation with SQL or some database language?
A simple solution using str.replace would be to just strip off all non numeric characters:
df['District'] = df['District'].str.replace('\D', '')

Reorder Dataframe through transpose and setting values from other columns (Pandas) [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 2 years ago.
Given a dataframe:
Location | Rate | Skill
San Francisco $56-$64 architect
Albany $43-$50 architect
San Francisco $23-$48 tester
I'm trying to turn that into this expected result:
Location | architect | tester
San Francisco $56-$64 $23-$48
Albany $43-$50
I thought about transposing on column 'Skill' and then setting it's value to the value of 'Rate' , but I'm not entirely sure how this can be done.
If you expect only one row for each Location-Skill combination
df.groupby(['Location', 'Skill']).first().unstack()
Or you can use pivot
df.pivot(index='Location', columns='Skill', values='Rate')
Notice groupby will return only the first row of each combination and pivot will fail if there is more than one row for any combination.

python pandas read dataframe and do not include index [duplicate]

This question already has answers here:
Pandas dataframe hide index functionality?
(8 answers)
Closed 3 years ago.
very simple question
I am reading an excel sheet with python and I want to print the results without the automatic index pandas adds
import pandas as pd
x=pd.read_excel(r'2_56_01.276295.xlsx',index_col=None)
print x[:3]
this prints the 1st 3 rows
blahblah Street Borough
0 55 W 192 ST Bronx
1 2514 EAST TREMONT AV Bronx
2 877 INTERVALE AV Bronx
but I do not want the index
print x.to_string(index=False)
should do the trick

Data Manipulation in Python [duplicate]

This question already has answers here:
Pandas groupby with delimiter join
(2 answers)
pandas groupby concatenate strings in multiple columns
(1 answer)
Closed 4 years ago.
I am dealing with a data set which has the following fields:
ID Person_Name Person_Country
110 Marc CA
110 Sean CN
111 Matt IN
111 Rob AU
112 Mike US
I intend grouping the data in the following way:
ID Person_Name Person_Country
110 Marc; Sean CA; CN
111 Matt; Rob IN; AU
112 Mike US
I tried using the built-in functions like .pivot_table() and .unstack(), but they weren't helpful since I am dealing with non-numeric data.

Categories