Data Manipulation in Python [duplicate] - python

This question already has answers here:
Pandas groupby with delimiter join
(2 answers)
pandas groupby concatenate strings in multiple columns
(1 answer)
Closed 4 years ago.
I am dealing with a data set which has the following fields:
ID Person_Name Person_Country
110 Marc CA
110 Sean CN
111 Matt IN
111 Rob AU
112 Mike US
I intend grouping the data in the following way:
ID Person_Name Person_Country
110 Marc; Sean CA; CN
111 Matt; Rob IN; AU
112 Mike US
I tried using the built-in functions like .pivot_table() and .unstack(), but they weren't helpful since I am dealing with non-numeric data.

Related

Pandas how to make a transpose of data-frame to get values for the remaining two columns [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 1 year ago.
Have a df with values
name marks subject
mark 50 math
mark 75 french
tom 25 english
tom 30 Art
luca 100 math
luca 100 art
How to make a transpose of a dataframe so it looks like this
name math art french english
mark 50 75
tom 30 25
luca 100 100
tried:
df.T and df[['marks','subject']].T
but
This is a pivot. First we need to normalize the subject column, then we pivot.
df['subject'] = df['subject'].str.lower()
df.pivot(index='name', columns='subject', values='marks')
See here for more info: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot

Pandas - remove duplicate items completely from dataframe [duplicate]

This question already has answers here:
Drop all duplicate rows across multiple columns in Python Pandas
(8 answers)
Closed 2 years ago.
I want to remove duplicate items completely from a pandas dataframe. For example, I have the dataframe:
location area
0 mountain view 1044ft2
1 palo alto None
2 mountain view 890ft2
3 san carlos 1000ft2
4 belmont None
What I want to do is find unique values in column location and remove any items that had duplicates altogether, completely, etc.. So the final product will look like this (notice mountain view is gone):
location area
1 palo alto None
3 san carlos 1000ft2
4 belmont None
Thanks.
Use
df.drop_duplicates(subset='location', keep=False)

how can we use sort_value? [duplicate]

This question already has answers here:
how to sort pandas dataframe from one column
(13 answers)
Closed 3 years ago.
rating
size mean
title
'Til There Was You (1997) 9 2.333333
1-900 (1994) 5 2.600000
101 Dalmatians (1996) 109 2.908257
12 Angry Men (1957) 125 4.344000
187 (1997) 41 3.024390
How can i sort based on mean column?
As it is a MultiIndex DataFrame, you could do:
>>> df.sort_values([('rating', 'mean')])

python pandas read dataframe and do not include index [duplicate]

This question already has answers here:
Pandas dataframe hide index functionality?
(8 answers)
Closed 3 years ago.
very simple question
I am reading an excel sheet with python and I want to print the results without the automatic index pandas adds
import pandas as pd
x=pd.read_excel(r'2_56_01.276295.xlsx',index_col=None)
print x[:3]
this prints the 1st 3 rows
blahblah Street Borough
0 55 W 192 ST Bronx
1 2514 EAST TREMONT AV Bronx
2 877 INTERVALE AV Bronx
but I do not want the index
print x.to_string(index=False)
should do the trick

Reshaping groupby data into a dataframe [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 3 years ago.
I want to create a dataframe that takes the 1st index and makes it into a column.
My grouping code:
candy_df.groupby(['BAG', 'LOLLIPOP']).agg('count')['STICKID']
Right now my grouping returns this:
BAG LOLLIPOP
011111 CHOCO 69
VANILL 33
011112 CHOCO 133
VANILL 129
I'd like to take the 1st index, LOLLIPOP, and make the different flavors be the columns:
BAG CHOCO VANILL
011111 69 33
011112 133 129
candy_df.groupby(['BAG', 'LOLLIPOP'])['STICKID'].count().unstack()
This question was also answered under Question 4 under How to pivot a dataframe

Categories