Access datas from pandas dataframe - python

I have a pandas dataframe
A B
Joe 20
Raul 22
James 30
How can i create sentence in the following format
Mark of Joe is 20
Mark of Raul is 22
Mark of James is 30

For Series join values by + with casting to strings numeric column(s):
s = 'Mark of ' + df.A + ' is ' + df.B.astype(str)
print (s)
0 Mark of Joe is 20
1 Mark of Raul is 22
2 Mark of James is 30
dtype: object
If need loop:
for x in df[['A','B']].to_numpy(): #df.to_numpy() if only A,B columns
print (f'Mark of {x[0]} is {x[1]}')
Mark of Joe is 20
Mark of Raul is 22
Mark of James is 30

First we replicate your dataframe
df = pd.DataFrame({'A': ['Joe', 'Raul', 'James'], 'B': [20, 22, 30]})
Then we can make a new column to the datframe containing the sentences you want
df['sentence'] = 'Mark of ' + df.A + ' is ' + df.B.astype(str)
Finally we check that the "sentence" column contains the sentence in the format you wanted
df
A B sentence
Joe 20 Mark of Joe is 20
Raul 22 Mark of Raul is 22
James 30 Mark of James is 30

Related

python - Joining two columns pandas - returning NA if any value is NA, however need to return real join

I have dataframe:
df = pd.DataFrame({'student_id': [71, 63, 23],
'student_name': [nan, 'Peter Andrews', 'Amy Powers'],
})
I am creating new column column which joins id + name using
df['student_id_name'] = df['student_id'].astype(str) + ' ' + df['student_name']
Needed output:
{student_id_name : [71, 63 Peter Andrews, 23 Amy Powers]}
What I get:
{student_id_name : [nan, 63 Peter Andrews, 23 Amy Powers]}
May you help to get to expected outcome?
Use Series.str.cat with na_rep parameter, last remove possible trailing spaces by Series.str.strip:
df['student_id_name'] = (df['student_id'].astype(str).str.cat(df['student_name'],
sep=' ', na_rep='').str.strip())
print (df)
student_id student_name student_id_name
0 71 NaN 71
1 63 Peter Andrews 63 Peter Andrews
2 23 Amy Powers 23 Amy Powers
You can use fillna() to cleanup missing/blank values in dataframe. Then your original expression will work. Note that this will actually replace nan with replace value used:
import math
df = pd.DataFrame({'student_id': [71, 63, 23],
'student_name': [math.nan, 'Peter Andrews', 'Amy Powers'],
})
#
df = df.fillna('')
df['student_id_name'] = df['student_id'].astype(str) + ' ' + df['student_name']
[Out]:
student_id student_name student_id_name
0 71 71
1 63 Peter Andrews 63 Peter Andrews
2 23 Amy Powers 23 Amy Powers

How to replace a row value in a pandas dataframe after a desired number is achieved?

Here's a simple piece of code, something similar to what I am doing. I'm trying to replace the value after 1 with a -1. But in my case, how would I do it if I don't know where the 1's are in a dataframe of over 1000's of rows?
import pandas as pd
df = pd.DataFrame({'Name':['Craig', 'Davis', 'Anthony', 'Tony'], 'Age':[22, 27, 24, 33], 'Employed':[0, 1, 0, 0]})
df
I have this...
Name
Age
Employed
Craig
22
0
Davis
27
1
Anthony
24
0
Tony
33
0
I want something similar to this but iterable through 1000's of rows
Name
Age
Employed
Craig
22
0
Davis
27
1
Anthony
24
-1
Tony
33
0
Use shift to get the next row after a 1:
df = df.loc[df['Employed'].shift() == 1, 'Employed'] = -1
print(df)
# Output
Name Age Employed
0 Craig 22 0
1 Davis 27 1
2 Anthony 24 -1
3 Tony 33 0

Get string instead of list in Pandas DataFrame

I have a column Name of string data type. I want to get all the values except the last one and put it in a new column FName, which I could achieve
df = pd.DataFrame({'Name': ['John A Sether', 'Robert D Junior', 'Jonny S Rab'],
'Age':[32, 34, 36]})
df['FName'] = df['Name'].str.split(' ').str[0:-1]
Name Age FName
0 John A Sether 32 [John, A]
1 Robert D Junior 34 [Robert, D]
2 Jonny S Rab 36 [Jonny, S]
But the new column FName looks like a list, which I don't want. I want it to be like: John A.
I tried convert the list to string, but it does not seems to be right.
Any suggestion ?
You can use .str.rsplit:
df['FName'] = df['Name'].str.rsplit(n=1).str[0]
Or you can use .str.extract:
df['FName'] = df['Name'].str.extract(r'(\S+\s?\S*)', expand=False)
Or, you can chain .str.join after .str.split:
df['FName'] = df['Name'].str.split().str[:-1].str.join(' ')
Name Age FName
0 John A Sether 32 John A
1 Robert D Junior 34 Robert D
2 Jonny S Rab 36 Jonny S

How to split single column of pandas dataframe into multiple columns with group?

I am new to python pandas. I have one dataframe like below:
df = pd.DataFrame({'Name': ['football', 'ramesh','suresh','pankaj','cricket','rakesh','mohit','mahesh'],
'age': ['25', '22','21','32','37','26','24','30']})
print df
Name age
0 football 25
1 ramesh 22
2 suresh 21
3 pankaj 32
4 cricket 37
5 rakesh 26
6 mohit 24
7 mahesh 30
"Name" column contains "sports name" and "sport person name" also. I want to split it into two different columns like below:
Expected Output:
sports_name sport_person_name age
football ramesh 25
suresh 22
pankaj 32
cricket rakesh 26
mohit 24
mahesh 30
If I make groupby on "Name" column I'm not getting expected output and it is obviously straight-forward output because no duplicates in "Name" column. What I need to use so that I can get expected output?
Edit : If don't want to hardcode the sports names
df = pd.DataFrame({'Name': ['football', 'ramesh','suresh','pankaj','cricket','rakesh','mohit','mahesh'],
'age': ['', '22','21','32','','26','24','30']})
df = df.replace('', np.nan, regex=True)
nan_rows = df[df.isnull().T.any().T]
sports = nan_rows['Name'].tolist()
df['sports_name'] = df['Name'].where(df['Name'].isin(sports)).ffill()
d = {'Name':'sport_person_name'}
df = df[df['sports_name'] != df['Name']].reset_index(drop=True).rename(columns=d)
df = df[['sports_name','sport_person_name','age']]
print (df)
I Just Checked for except "Name" column which rows contains NAN values in all rest of the columns and It will be definitely sports names. I created list of that sports names and make use of below solutions to create sports_name and sports_person_name columns.
You can use:
#define list of sports
sports = ['football','cricket']
#create NaNs if no sport in Name, forward filling NaNs
df['sports_name'] = df['Name'].where(df['Name'].isin(sports)).ffill()
#remove same values in columns sports_name and Name, rename column
d = {'Name':'sport_person_name'}
df = df[df['sports_name'] != df['Name']].reset_index(drop=True).rename(columns=d)
#change order of columns
df = df[['sports_name','sport_person_name','age']]
print (df)
sports_name sport_person_name age
0 football ramesh 22
1 football suresh 21
2 football pankaj 32
3 cricket rakesh 26
4 cricket mohit 24
5 cricket mahesh 30
Similar solution with DataFrame.insert - then reorder is not necessary:
#define list of sports
sports = ['football','cricket']
#rename column by dict
d = {'Name':'sport_person_name'}
df = df.rename(columns=d)
#create NaNs if no sport in Name, forward filling NaNs
df.insert(0, 'sports_name', df['sport_person_name'].where(df['sport_person_name'].isin(sports)).ffill())
#remove same values in columns sports_name and Name
df = df[df['sports_name'] != df['sport_person_name']].reset_index(drop=True)
print (df)
sports_name sport_person_name age
0 football ramesh 22
1 football suresh 21
2 football pankaj 32
3 cricket rakesh 26
4 cricket mohit 24
5 cricket mahesh 30
If want only one value of sport add limit=1 to ffill and replace NaNs to empty string:
sports = ['football','cricket']
df['sports_name'] = df['Name'].where(df['Name'].isin(sports)).ffill(limit=1).fillna('')
d = {'Name':'sport_person_name'}
df = df[df['sports_name'] != df['Name']].reset_index(drop=True).rename(columns=d)
df = df[['sports_name','sport_person_name','age']]
print (df)
sports_name sport_person_name age
0 football ramesh 22
1 suresh 21
2 pankaj 32
3 cricket rakesh 26
4 mohit 24
5 mahesh 30
The output you want is a dictionary and not a dataframe.
The dictionary will look:
{'Sport' : {'Player' : age,'Player2' : age}}
If you really want a dataframe:
If the name always comes before the players:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['football','ramesh','suresh','pankaj','cricket'
,'rakesh','mohit','mahesh'],
'age': ['25', '22','21','32','37','26','24','30']})
sports=['football', 'cricket']
wanted_dict={}
current_sport=''
for val in df['sport_person_name']:
if val in sports:
current_sport=val
else:
wanted_dict[val]=current_sport
#Now you got - {name:sport_name,...}
df['sports_name']=999
for val in df['sport_person_name']
df['sports_name']=np.where((val not in sports)&
(df['sport_person_name']==val),
wanted_dict[val],'sport)
df = df[df['sports_name']!='sport']
What it should look like:
sports_name sport_person_name age
football ramesh 25
football suresh 22
football pankaj 32
cricket rakesh 26
cricket mohit 24
cricket mahesh 30

Formatting nlargest output pandas

I'm new to pandas and so am a bit unfamiliar with how it works. I have processed some data and obtained the results I want, however, I am having trouble figuring out how to format the output with print. For instance, I only want to display certain rows of data, as well as putting certain values in ().
From doing this:
df = pd.read_csv('data_file.csv')
tallmen = df[df['gender'] == 'M'].nlargest(2, 'height')
This is the output I get by doing print(tallmen):
id name gender state height
6 5 Smith, Bob M New York 73.5
2 7 Wright, Frank M Kentucky 75.2
And this is the output I want:
Smith, Bob (M) 6' 1.5"
Wright, Frank (M) 6' 3.2"
When I tried to use tallmen as a dictionary, and that gave me an error. So I'm not quite sure what to do. Additionally, it there a way for me to be able to manipulate the height values so that I can reformat them (aka display them in the ft in format as shown below)?
you can create a new column this way:
In [207]: df
Out[207]:
id name gender state height
6 5 Smith, Bob M New York 73.5
2 7 Wright, Frank M Kentucky 75.2
In [208]: df['new'] = (
...: df.name + ' (' + df.gender + ') ' +
...: (df.height // 12).astype(int).astype(str) +
...: "' " + (df.height % 12).astype(str) + '"')
...:
In [209]: df
Out[209]:
id name gender state height new
6 5 Smith, Bob M New York 73.5 Smith, Bob (M) 6' 1.5"
2 7 Wright, Frank M Kentucky 75.2 Wright, Frank (M) 6' 3.2"
My professor helped me figure this out. Really what I needed was to know how to iterate through values in the DataFrame. My solution looks like this:
df = pd.read_csv('data_file.csv')
tallmen = df[df['gender'] == 'M'].nlargest(2, 'height')
for i, val in tallmen.iterrows():
feet = val['height']//12
inches = val['height']%12
print("%s (%s) %i'%i"" % (val['name'], val['gender'],
feet, inches))

Categories