Can I have a two line caption in pandas dataframe? - python

Can I have a two line caption in pandas dataframe?
Create dataframe with:
df = pd.DataFrame({'Name' : ['John','Harry','Gary','Richard','Anna','Richard','Gary','Richard'], 'Age' : [25,32,37,43,44,56,37,22], 'Zone' : ['East','West','North','South','East','West','North', 'South']})
df=df.drop_duplicates('Name',keep='first')
df.style.set_caption("Team Members Per Zone")
which outputs:
Team Members Per Zone
Name Age Zone
0 John 25 East
1 Harry 32 West
4 Anna 44 East
6 Gary 37 North
7 Richard 22 South
However I'd like it to look like:
Team Members
Per Zone
Name Age Zone
0 John 25 East
1 Harry 32 West
4 Anna 44 East
6 Gary 37 North
7 Richard 22 South

Using a break works for me in JupyterLab:
df.style.set_caption('This is line one <br> This is line two')

Have you tried with \n ? (Sorry too low reputation to just comment.

Related

Join and overwrite an attribute in the same dataframe

If I have this dataframe:
# data
data = [['london_1', 10,'london'], ['london_2', 15,'london'], ['london_3', 14,'london'],['london',49,'']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['station', 'info','parent_station'])
So:
station info parent_station
0 london_1 10 london
1 london_2 15 london
2 london_3 14 london
3 london 49
I would like to overwrite the info value of the child station according to the info value of the parent station:
station info parent_station
0 london_1 49 london
1 london_2 49 london
2 london_3 49 london
3 london 49
Is there a simple way to do that ?
Additional information:
There could be more than one parent station, but only one parent station per station.
You can map then condition assign
df.loc[df.parent_station.ne(''),'info'] = df.parent_station.map(df.set_index('station')['info'])
df
Out[329]:
station info parent_station
0 london_1 49.0 london
1 london_2 49.0 london
2 london_3 49.0 london
3 london 49.0

Combine text using delimiter for duplicate column values

What im trying to achieve is to combine Name into one value using comma delimiter whenever Country column is duplicated, and sum the values in Salary column.
Current input :
pd.DataFrame({'Name': {0: 'John',1: 'Steven',2: 'Ibrahim',3: 'George',4: 'Nancy',5: 'Mo',6: 'Khalil'},
'Country': {0: 'USA',1: 'UK',2: 'UK',3: 'France',4: 'Ireland',5: 'Ireland',6: 'Ireland'},
'Salary': {0: 100, 1: 200, 2: 200, 3: 100, 4: 50, 5: 100, 6: 10}})
Name Country Salary
0 John USA 100
1 Steven UK 200
2 Ibrahim UK 200
3 George France 100
4 Nancy Ireland 50
5 Mo Ireland 100
6 Khalil Ireland 10
Expected output :
Row 1 & 2 (in inputs) got grupped into one since Country column is duplicated & Salary column is summed up.
Tha same goes for Row 4,5 & 6.
Name Country Salary
0 John USA 100
1 Steven, Ibrahim UK 400
2 George France 100
3 Nancy, Mo, Khalil Ireland 160
What i have tried, but im not sure how to combine text in Name column :
df.groupby(['Country'],as_index=False)['Salary'].sum()
[Out:]
Country Salary
0 France 100
1 Ireland 160
2 UK 400
3 USA 100
use groupby() and agg():
out=df.groupby('Country',as_index=False).agg({'Name':', '.join,'Salary':'sum'})
If needed unique values of 'Name' column then use :
out=(df.groupby('Country',as_index=False)
.agg({'Name':lambda x:', '.join(set(x)),'Salary':'sum'}))
Note: use pd.unique() in place of set() if order of unique values is important
output of out:
Country Name Salary
0 France George 100
1 Ireland Nancy, Mo, Khalil 160
2 UK Steven, Ibrahim 400
3 USA John 100
Use agg:
df.groupby(['Country'], as_index=False).agg({'Name': ', '.join, 'Salary':'sum'})
And to get the columns in order you can add [df.columns] to the pipe:
df.groupby(['Country'], as_index=False).agg({'Name': ', '.join, 'Salary':'sum'})[df.columns]
Name Country Salary
0 John USA 100
1 Steven, Ibrahim UK 400
2 George France 100
3 Nancy, Mo, Khalil Ireland 160

How to separate a combined column, but with incongruent data

I'm preparing for a new job where I'll be receiving data submissions in varying quality, often times dates/chars/etc are combined together nonsensically and must be separated before analysis. Thinking ahead of how might this be solved.
Using a fictitious example below, I combined region, rep, and product together.
file['combine'] = file['Region'] + file['Sales Rep'] + file['Product']
Shift Region Sales Rep Product Cost per Units Sold combine
0 3 East Shirlene Pencil 5 71 EastShirlenePencil
1 3 South Anderson Folder 17 69 SouthAndersonFolder
2 3 West Shelli Folder 17 185 WestShelliFolder
3 3 South Damion Binder 30 159 SouthDamionBinder
4 3 West Shirlene Stapler 25 41 WestShirleneStapler
Assuming no other data, the question is, how can the 'combine' column be split up?
Many thanks in advance!
If you want space between the strings, you can do:
df["combine"] = df[["Region", "Sales Rep", "Product"]].apply(" ".join, axis=1)
print(df)
Prints:
Shift Region Sales Rep Product Cost per Units Sold combine
0 3 East Shirlene Pencil 5 71 East Shirlene Pencil
1 3 South Anderson Folder 17 69 South Anderson Folder
2 3 West Shelli Folder 17 185 West Shelli Folder
3 3 South Damion Binder 30 159 South Damion Binder
4 3 West Shirlene Stapler 25 41 West Shirlene Stapler
Or: if you want to split the already combined string:
import re
df["separated"] = df["combine"].apply(lambda x: re.findall(r"[A-Z][^A-Z]*", x))
print(df)
Prints:
Shift Region Sales Rep Product Cost per Units Sold combine separated
0 3 East Shirlene Pencil 5 71 EastShirlenePencil [East, Shirlene, Pencil]
1 3 South Anderson Folder 17 69 SouthAndersonFolder [South, Anderson, Folder]
2 3 West Shelli Folder 17 185 WestShelliFolder [West, Shelli, Folder]
3 3 South Damion Binder 30 159 SouthDamionBinder [South, Damion, Binder]
4 3 West Shirlene Stapler 25 41 WestShirleneStapler [West, Shirlene, Stapler]

Add a column from an existing dataframe into another between every other column

I'll try my best to explain this as I had trouble phrasing the title. I have two dataframes. What I would like to do is add a column from df1 into df2 between every other column.
For example, df1 looks like this :
Age City
0 34 Sydney
1 30 Toronto
2 31 Mumbai
3 32 Richmond
And after adding in df2 it looks like this:
Name Age Clicks City Country
0 Ali 34 10 Sydney Australia
1 Lori 30 20 Toronto Canada
2 Asher 31 45 Mumbai United States
3 Lylah 32 33 Richmond United States
In terms of code, I wasn't quite sure where to even start.
'''Concatenating the dataframes'''
for i in range len(df2):
pos = i+1
df3 = df2.insert
#df2 = pd.concat([df1, df2], axis=1).sort_index(axis=1)
#df2.columns = np.arange(len(df2.columns))
#print (df2)
I was originally going to run it through a loop, but I wasn't quite sure how to do it. Any help would be appreciated!
You can use itertools.zip_longest. For example:
from itertools import zip_longest
new_columns = [
v
for v in (c for a in zip_longest(df2.columns, df1.columns) for c in a)
if not v is None
]
df_out = pd.concat([df1, df2], axis=1)[new_columns]
print(df_out)
Prints:
Name Age Clicks City Country
0 Ali 34 10 Sydney Australia
1 Lori 30 20 Toronto Canada
2 Asher 31 45 Mumbai United States
3 Lylah 32 33 Richmond United States

pandas - how to extract top three rows from the dataframe provided

My pandas Data frame df could produce result as below:
grouped = df[(df['X'] == 'venture') & (df['company_code'].isin(['TDS','XYZ','UVW']))].groupby(['company_code','sector'])['X_sector'].count()
The output of this is as follows:
company_code sector
TDS Meta 404
Electrical 333
Mechanical 533
Agri 453
XYZ Sports 331
Electrical 354
Movies 375
Manufacturing 355
UVW Sports 505
Robotics 345
Movies 56
Health 3263
Manufacturing 456
Others 524
Name: X_sector, dtype: int64
What I want to get is the top three sectors within the company codes.
What is the way to do it?
You will have to chain a groupby here. Consider this example:
import pandas as pd
import numpy as np
np.random.seed(111)
names = [
'Robert Baratheon',
'Jon Snow',
'Daenerys Targaryen',
'Theon Greyjoy',
'Tyrion Lannister'
]
df = pd.DataFrame({
'season': np.random.randint(1, 7, size=100),
'actor': np.random.choice(names, size=100),
'appearance': 1
})
s = df.groupby(['season','actor'])['appearance'].count()
print(s.sort_values(ascending=False).groupby('season').head(1)) # <-- head(3) for 3 values
Returns:
season actor
4 Daenerys Targaryen 7
6 Robert Baratheon 6
3 Robert Baratheon 6
5 Jon Snow 5
2 Theon Greyjoy 5
1 Jon Snow 4
Where s is (clipped at 4)
season actor
1 Daenerys Targaryen 2
Jon Snow 4
Robert Baratheon 2
Theon Greyjoy 3
Tyrion Lannister 4
2 Daenerys Targaryen 4
Jon Snow 3
Robert Baratheon 1
Theon Greyjoy 5
Tyrion Lannister 3
3 Daenerys Targaryen 2
Jon Snow 1
Robert Baratheon 6
Theon Greyjoy 3
Tyrion Lannister 3
4 ...
Why would you want things to be complicated, when there are simple codes possible:
Z = df.groupby('country_code')['sector'].value_counts().groupby(level=0).head(3).sort_values(ascending=False).to_frame('counts').reset_index()
Z

Categories