Formatting nlargest output pandas - python

I'm new to pandas and so am a bit unfamiliar with how it works. I have processed some data and obtained the results I want, however, I am having trouble figuring out how to format the output with print. For instance, I only want to display certain rows of data, as well as putting certain values in ().
From doing this:
df = pd.read_csv('data_file.csv')
tallmen = df[df['gender'] == 'M'].nlargest(2, 'height')
This is the output I get by doing print(tallmen):
id name gender state height
6 5 Smith, Bob M New York 73.5
2 7 Wright, Frank M Kentucky 75.2
And this is the output I want:
Smith, Bob (M) 6' 1.5"
Wright, Frank (M) 6' 3.2"
When I tried to use tallmen as a dictionary, and that gave me an error. So I'm not quite sure what to do. Additionally, it there a way for me to be able to manipulate the height values so that I can reformat them (aka display them in the ft in format as shown below)?

you can create a new column this way:
In [207]: df
Out[207]:
id name gender state height
6 5 Smith, Bob M New York 73.5
2 7 Wright, Frank M Kentucky 75.2
In [208]: df['new'] = (
...: df.name + ' (' + df.gender + ') ' +
...: (df.height // 12).astype(int).astype(str) +
...: "' " + (df.height % 12).astype(str) + '"')
...:
In [209]: df
Out[209]:
id name gender state height new
6 5 Smith, Bob M New York 73.5 Smith, Bob (M) 6' 1.5"
2 7 Wright, Frank M Kentucky 75.2 Wright, Frank (M) 6' 3.2"

My professor helped me figure this out. Really what I needed was to know how to iterate through values in the DataFrame. My solution looks like this:
df = pd.read_csv('data_file.csv')
tallmen = df[df['gender'] == 'M'].nlargest(2, 'height')
for i, val in tallmen.iterrows():
feet = val['height']//12
inches = val['height']%12
print("%s (%s) %i'%i"" % (val['name'], val['gender'],
feet, inches))

Related

How can loop this excel datas in python?

I have an excel file which includes 5 sheet. I should create 5 graphs and plot them as x and y. but I should loop it. How can i do
You can load all the sheets:
f = pd.ExcelFile('users.xlsx')
Then extract sheet names:
>>> f.sheet_names
['User_info', 'purchase', 'compound', 'header_row5']
Now, you can loop over the sheet names above. For example one sheet:
>>> f.parse(sheet_name = 'User_info')
User Name Country City Gender Age
0 Forrest Gump USA New York M 50
1 Mary Jane CANADA Tornoto F 30
2 Harry Porter UK London M 20
3 Jean Grey CHINA Shanghai F 30
The loop looks like this:
for name in f.sheet_names:
df = f.parse(sheet_name = name)
# do something here
No need to use variables, create the output lists and use this simple loop:
data = pd.ExcelFile("DCB_200_new.xlsx")
l = ['DCB_200_9', 'DCB_200_15', 'DCB_200_23', 'DCB_200_26', 'DCB_200_28']
x = []
y = []
for e in l:
x.append(pd.read_excel(data, e, usecols=[2], skiprows=[0,1]))
y.append(pd.read_excel(data, e, usecols=[1], skiprows=[0,1]))
But, ideally you should be able to load the data only once and loop over the sheets/columns. Please update your question with more info.

Access datas from pandas dataframe

I have a pandas dataframe
A B
Joe 20
Raul 22
James 30
How can i create sentence in the following format
Mark of Joe is 20
Mark of Raul is 22
Mark of James is 30
For Series join values by + with casting to strings numeric column(s):
s = 'Mark of ' + df.A + ' is ' + df.B.astype(str)
print (s)
0 Mark of Joe is 20
1 Mark of Raul is 22
2 Mark of James is 30
dtype: object
If need loop:
for x in df[['A','B']].to_numpy(): #df.to_numpy() if only A,B columns
print (f'Mark of {x[0]} is {x[1]}')
Mark of Joe is 20
Mark of Raul is 22
Mark of James is 30
First we replicate your dataframe
df = pd.DataFrame({'A': ['Joe', 'Raul', 'James'], 'B': [20, 22, 30]})
Then we can make a new column to the datframe containing the sentences you want
df['sentence'] = 'Mark of ' + df.A + ' is ' + df.B.astype(str)
Finally we check that the "sentence" column contains the sentence in the format you wanted
df
A B sentence
Joe 20 Mark of Joe is 20
Raul 22 Mark of Raul is 22
James 30 Mark of James is 30

Extract last word in DataFrame column

This has to be so simple - but I can't figure it out. I have a "name" column within a DataFrame and I'm trying to reverse the order of ['First Name', 'Middle Name', 'Last Name'] to ['Last Name', 'First Name', 'Middle Name'].
Here is my code:
for i in range(2114):
bb = a['Approved by User'][i].split(" ",2)[2]
aa = a['Approved by User'][i].split(" ",2)[0]
a['Full Name]'] = bb+','+aa
Unfortunately I keep getting IndexError: list index out of range with the current code.
This is what I want:
Old column Name| Jessica Mary Simpson
New column Name| Simpson Jessica Mary
One way to do it is to split the string and joinit later on in a function.
like so:
import pandas as pd
d = {"name": ["Jessica Mary Simpson"]}
df = pd.DataFrame(d)
a = df.name.str.split()
a = a.apply(lambda x: " ".join(x[::-1])).reset_index()
print(a)
output:
index name
0 0 Simpson Mary Jessica
With your shown samples, you could try following.
Let's say following is the df:
fullname
0 Jessica Mary Simpson
1 Ravinder avtar singh
2 John jonny janardan
Here is the code:
df['fullname'].replace(r'^([^ ]*) ([^ ]*) (.*)$', r'\3 \1 \2',regex=True)
OR
df['fullname'].replace(r'^(\S*) (\S*) (.*)$', r'\3 \1 \2',regex=True)
output will be as follows:
0 Simpson Jessica Mary
1 singh Ravinder avtar
2 janardan John jonny
I think problem is in your data, here is your solution in pandas text functions Series.str.split, indexing and Series.str.join:
df['Full Name'] = df['Approved by User'].str.split(n=2).str[::-1].str.join(' ')
print (df)
Approved by User Full Name
0 Jessica Mary Simpson Simpson Mary Jessica
1 John Doe Doe John
2 Mary Mary

How do I fill a string column using a set in Pandas dataframe?

I have a huge dataset with two specific columns for Sales Person and Manager. I want to make a new column which assigns sales person name on different basis.
So lets say that Under Manager John, I have 4 executives - A, B, C, D
I want to replace the existing sales person under John with the executives A, B, C and D in a sequence.
Here is what I want to do -
Input-
ID
SalesPerson
Sales Manager
AM12
Oliver
Bren
AM21
Athreyu
John
AM31
Margarita
Fer
AM41
Jenny
Fer
AM66
Omar
John
AM81
Michael
Nati
AM77
Orlan
John
AM87
Erika
Nateran
AM27
Jesus
John
AM69
Randy
John
Output -
ID
SalesPerson
Sales Manager
SalesPerson_new
AM12
Oliver
Bren
oliver
AM21
Athreyu
John
A
AM31
Margarita
Fer
Margarita
AM41
Jenny
Fer
Jenny
AM66
Omar
John
B
AM81
Michael
Nati
Michael
AM77
Orlan
John
C
AM87
Erika
Nateran
Nateran
AM27
Jesus
John
D
AM69
Randy
John
A
We can do this with cumcount and .map
first we need to build up a dictionary that repeats ABCD in multiple of fours.
i.e {0 : 'A', 1 : 'B', 2 : 'C', 3 : 'D', 4 : 'A'}
we can do this with a helper function and some handy modules from the itertools library.
from itertools import cycle, zip_longest, islice
from string import ascii_uppercase
import pandas as pd
import numpy as np
def repeatlist(it, count):
return islice(cycle(it), count)
mapper = dict(zip_longest(range(50), repeatlist(ascii_uppercase[:4],50)))
df['SalesPersonNew'] = np.where(
df['Sales Manager'].eq('John'),
df.groupby('Sales Manager')['SalesPerson'].cumcount().map(mapper),
df['SalesPerson'])
print(df)
ID SalesPerson Sales Manager SalesPersonNew
0 AM12 Oliver Bren Oliver
1 AM21 Athreyu John A
2 AM31 Margarita Fer Margarita
3 AM41 Jenny Fer Jenny
4 AM66 Omar John B
5 AM81 Michael Nati Michael
6 AM77 Orlan John C
7 AM87 Erika Nateran Erika
8 AM27 Jesus John D
9 AM69 Randy John A
Let's say that your dataframe is the variable df.
First you need to create the new column on your dataframe, which you can initiate with the values already present in the SalesPerson column.
df["SalesPerson_new"] = df["SalesPerson"]
Then you can make a view of your dataframe to select the rows where the value of Sales Manager is John, and use that to update the SalesPerson_new column.
number_of_rows = len(df.loc[df["Sales Manager"] == "John", :])
df.loc[df["Sales Manager"] == "John", :] = ["A", "B", "C", "D"][:number_of_rows]
It is important to note that this will work only if the list ["A", "B", "C", "D"] has a length equal or larger than the number of rows in the filtered_df

Get string instead of list in Pandas DataFrame

I have a column Name of string data type. I want to get all the values except the last one and put it in a new column FName, which I could achieve
df = pd.DataFrame({'Name': ['John A Sether', 'Robert D Junior', 'Jonny S Rab'],
'Age':[32, 34, 36]})
df['FName'] = df['Name'].str.split(' ').str[0:-1]
Name Age FName
0 John A Sether 32 [John, A]
1 Robert D Junior 34 [Robert, D]
2 Jonny S Rab 36 [Jonny, S]
But the new column FName looks like a list, which I don't want. I want it to be like: John A.
I tried convert the list to string, but it does not seems to be right.
Any suggestion ?
You can use .str.rsplit:
df['FName'] = df['Name'].str.rsplit(n=1).str[0]
Or you can use .str.extract:
df['FName'] = df['Name'].str.extract(r'(\S+\s?\S*)', expand=False)
Or, you can chain .str.join after .str.split:
df['FName'] = df['Name'].str.split().str[:-1].str.join(' ')
Name Age FName
0 John A Sether 32 John A
1 Robert D Junior 34 Robert D
2 Jonny S Rab 36 Jonny S

Categories