Formatting pandas output data - python

I have a dataframe and want the output to be formatted to save paper for printing.
GameA GameB
Country
London 5 20
London 5 10
London 3 5
London 3 6
London 8
London 40
France 2 20
France 2 22
France 3
France 3
France 3
USA 10
Is there a way to format the dataframe to look like this:
GameA GameB
Country
London 5 London 20
London 5 London 10
London 3 London 5
London 3 London 6
London London 8
London London 40
GameA GameB
France 2 France 20
France 2 France 22
France 3
France 3
France 3
GameA
USA 10

The formatting is off a bit because of how it copy and pasted the text results above (due to the missing values), but this should work with your actual data.
countries = df.index.unique()
for country in countries:
print(df.loc[df.index == country])
print(' ')
GameA GameB
Country
London 5 20
London 5 10
London 3 5
London 3 6
London 8 NaN
London 40 NaN
GameA GameB
Country
France 2 20
France 2 22
France 3 NaN
France 3 NaN
France 3 NaN
GameA GameB
Country
USA 10 NaN

Related

Missing value replacemnet using mode in pandas in subgroup of a group

Having a data set as below.Here I need to group the subset in column and fill the missing values using mode method.Here specifically needs to fill the missing value of Tom from UK. So I need to group the TOM from Uk, and in that group the most repeating value needs to be added to the missing value.
Below fig shows how i need to do the group by.From the below matrix i need to replace all the Nan values using mode.
the desired output:
attaching the dataset
Name location Value
Tom USA 20
Tom UK Nan
Tom USA Nan
Tom UK 20
Jack India Nan
Nihal Africa 30
Tom UK Nan
Tom UK 20
Tom UK 30
Tom UK 20
Tom UK 30
Sam UK 30
Sam UK 30
try:
df = df\
.set_index(['Name', 'location'])\
.fillna(
df[df.Name.eq('Tom') & df.location.eq('UK')]\
.groupby(['Name', 'location'])\
.agg(pd.Series.mode)\
.to_dict()
)\
.reset_index()
Output:
Name location Value
0 Tom USA 20
1 Tom UK 20
2 Tom USA NaN
3 Tom UK 20
4 Jack India NaN
5 Nihal Africa 30
6 Tom UK 20
7 Tom UK 20
8 Tom UK 30
9 Tom UK 20
10 Tom UK 30
11 Sam UK 30
12 Sam UK 30

Modify duplicated rows in dataframe (Python)

I am working with a dataframe in Pandas and I need a solution to automatically modify one of the columns that has duplicate values. It is a column type 'object' and I would need to modify the name of the duplicate values. The dataframe is the following:
City Year Restaurants
0 New York 2001 20
1 Paris 2000 40
2 New York 1999 41
3 Los Angeles 2004 35
4 Madrid 2001 22
5 New York 1998 33
6 Barcelona 2001 15
As you can see, New York is repeated 3 times. I would like to create a new dataframe in which this value would be automatically modified and the result would be the following:
City Year Restaurants
0 New York 2001 2001 20
1 Paris 2000 40
2 New York 1999 1999 41
3 Los Angeles 2004 35
4 Madrid 2001 22
5 New York 1998 1998 33
6 Barcelona 2001 15
I would also be happy with "New York 1", "New York 2" and "New York 3". Any option would be good.
Use np.where, to modify column City if duplicated
df['City']=np.where(df['City'].duplicated(keep=False), df['City']+' '+df['Year'].astype(str),df['City'])
A different approach without the use of numpy would be with groupby.cumcount() which will give you your alternative New York 1, New York 2 but for all values.
df['City'] = df['City'] + ' ' + df.groupby('City').cumcount().add(1).astype(str)
City Year Restaurants
0 New York 1 2001 20
1 Paris 1 2000 40
2 New York 2 1999 41
3 Los Angeles 1 2004 35
4 Madrid 1 2001 22
5 New York 3 1998 33
6 Barcelona 1 2001 15
To have an increment only in the duplicate cases you can use loc:
df.loc[df[df.City.duplicated(keep=False)].index, 'City'] = df['City'] + ' ' + df.groupby('City').cumcount().add(1).astype(str)
City Year Restaurants
0 New York 1 2001 20
1 Paris 2000 40
2 New York 2 1999 41
3 Los Angeles 2004 35
4 Madrid 2001 22
5 New York 3 1998 33
6 Barcelona 2001 15

Merge dataframes on same row

I have a python code that gets links from a dataframe (df1) , collect data from a website and return output in a new dataframe
df1:
id Name link Country Continent
1 Company1 www.link1.com France Europe
2 Company2 www.link2.com France Europe
3 Company3 www.Link3.com France Europe
The ouput from the code is df2:
link numberOfPPL City
www.link1.com 8 Paris
www.link1.com 9 Paris
www.link2.com 15 Paris
www.link2.com 1 Paris
I want to join these 2 dataframes in one (dfinal). My code:
dfinal = df1.append(df2, ignore_index=True)
I got dfinal:
link numberOfPPL City id Name Country Continent
www.link1.com 8 Paris
www.link1.com 9 Paris
www.link2.com 15 Paris
www.link2.com 1 Paris
www.link1.com 1 Company1 France Continent
..
..
I Want my final dataframe to be like this:
link numberOfPPL City id Name Country Continent
www.link1.com 8 Paris 1 Company1 France Europe
www.link1.com 9 Paris 1 Company1 France Europe
www.link2.com 15 Paris 1 Company1 France Europe
www.link2.com 1 Paris 2 Company2 France Europe
Can anyone help please ??
You can merge the two dataframes on 'link':
outputDF = df2.merge(df1, how='left', on=['link'])

How can i find unique record in python by row count?

df:
Country state item
0 Germany Augsburg Car
1 Spain Madrid Bike
2 Italy Milan Steel
3 Paris Lyon Bike
4 Italy Milan Steel
5 Germany Augsburg Car
In the above dataframe, if we take unique record Appearance.
Country state item Appeared
0 Germany Augsburg Car 1
1 Spain Madrid Bike 1
2 Italy Milan Steel 1
3 Paris Lyon Bike 1
4 Italy Milan Steel 2
5 Germany Augsburg Car 2
Since row no. 4 and 5 appeared for the second time, i want to change their item name to differentiate both record.If a record is appeared more than once in the data, item name should be rename as Item_A for 1st appearance and Item_B for the second appearance...
Output:
Country state item Appeared
0 Germany Augsburg Car_A 1
1 Spain Madrid Bike 1
2 Italy Milan Steel_A 1
3 Paris Lyon Bike 1
4 Italy Milan Stee_B 2
5 Germany Augsburg Car_B 2
You can first get the Appreared column by groupby().cumcount, then add the suffixes:
# unique values
duplicates = df.duplicated(keep=False)
# Appearance count
df['Appeared'] = df.groupby([*df]).cumcount().add(1)
# add the suffixes
suffixes = np.array(list('ABC'))
df.loc[duplicates, 'item'] = df['item'] + '_' + suffixes[df.Appeared-1]
Output:
Country state item Appeared
0 Germany Augsburg Car_A 1
1 Spain Madrid Bike 1
2 Italy Milan Steel_A 1
3 Paris Lyon Bike 1
4 Italy Milan Steel_B 2
5 Germany Augsburg Car_B 2

How to use python to group by two columns, sum them and use one of the columns to sort and get the n highest per group in pandas

I have a dataframe and I'm trying to group by the Name and Destination columns and calculate the sum of the sales for that Destination for the particular Name and then get the top 2 for each name.
data=
Name Origin Destination Sales
John Italy China 2
Dan UK China 3
Dan UK India 2
Sam UK India 5
Sam Italy Malaysia 1
John Italy Malaysia 1
Dan France India 4
Dan Italy China 2
Sam Italy Malaysia 2
John France Malaysia 1
Sam Italy China 2
Dan UK Malaysia 4
Dan France India 2
John France Malaysia 4
John Italy China 4
John UK Malaysia 1
Sam UK China 4
Sam France China 5
I have tried to do this but I keep getting it sorted by the Destination and not the Sales. Below is the code I tried.
data.groupby(['Name', 'Destination'])['Sales'].sum().groupby(level=0).head(2).reset_index(name='Total_Sales')
This code gives me this dataframe:
Name Destination Total_Sales
Dan China 5
Dan India 8
John China 6
John Malaysia 7
Sam China 11
Sam India 5
But it is sorted on the wrong column (Destination) but I would like to sort by the sum of the sales (Total_Sales).
The expected result I want I want to achieve is:
Name Destination Total_Sales
Dan India 8
Dan China 5
John Malaysia 7
John China 6
Sam China 11
Sam India 5
Your code:
grouped_df = data.groupby(['Name', 'Destination'])['Sales'].sum().groupby(level=0).head(2).reset_index(name='Total_Sales')
To sort the result:
sorted_df = grouped_df.sort_values(by=['Name','Total_Sales'], ascending=(True,False))
print(sorted_df)
Output:
Name Destination Total_Sales
1 Dan India 8
0 Dan China 5
3 John Malaysia 7
2 John China 6
4 Sam China 11
5 Sam India 5

Categories