I have a pandas dataframe
A B
Joe 20
Raul 22
James 30
How can i create sentence in the following format
Mark of Joe is 20
Mark of Raul is 22
Mark of James is 30
For Series join values by + with casting to strings numeric column(s):
s = 'Mark of ' + df.A + ' is ' + df.B.astype(str)
print (s)
0 Mark of Joe is 20
1 Mark of Raul is 22
2 Mark of James is 30
dtype: object
If need loop:
for x in df[['A','B']].to_numpy(): #df.to_numpy() if only A,B columns
print (f'Mark of {x[0]} is {x[1]}')
Mark of Joe is 20
Mark of Raul is 22
Mark of James is 30
First we replicate your dataframe
df = pd.DataFrame({'A': ['Joe', 'Raul', 'James'], 'B': [20, 22, 30]})
Then we can make a new column to the datframe containing the sentences you want
df['sentence'] = 'Mark of ' + df.A + ' is ' + df.B.astype(str)
Finally we check that the "sentence" column contains the sentence in the format you wanted
df
A B sentence
Joe 20 Mark of Joe is 20
Raul 22 Mark of Raul is 22
James 30 Mark of James is 30
Related
I have dataframe:
df = pd.DataFrame({'student_id': [71, 63, 23],
'student_name': [nan, 'Peter Andrews', 'Amy Powers'],
})
I am creating new column column which joins id + name using
df['student_id_name'] = df['student_id'].astype(str) + ' ' + df['student_name']
Needed output:
{student_id_name : [71, 63 Peter Andrews, 23 Amy Powers]}
What I get:
{student_id_name : [nan, 63 Peter Andrews, 23 Amy Powers]}
May you help to get to expected outcome?
Use Series.str.cat with na_rep parameter, last remove possible trailing spaces by Series.str.strip:
df['student_id_name'] = (df['student_id'].astype(str).str.cat(df['student_name'],
sep=' ', na_rep='').str.strip())
print (df)
student_id student_name student_id_name
0 71 NaN 71
1 63 Peter Andrews 63 Peter Andrews
2 23 Amy Powers 23 Amy Powers
You can use fillna() to cleanup missing/blank values in dataframe. Then your original expression will work. Note that this will actually replace nan with replace value used:
import math
df = pd.DataFrame({'student_id': [71, 63, 23],
'student_name': [math.nan, 'Peter Andrews', 'Amy Powers'],
})
#
df = df.fillna('')
df['student_id_name'] = df['student_id'].astype(str) + ' ' + df['student_name']
[Out]:
student_id student_name student_id_name
0 71 71
1 63 Peter Andrews 63 Peter Andrews
2 23 Amy Powers 23 Amy Powers
Here's a simple piece of code, something similar to what I am doing. I'm trying to replace the value after 1 with a -1. But in my case, how would I do it if I don't know where the 1's are in a dataframe of over 1000's of rows?
import pandas as pd
df = pd.DataFrame({'Name':['Craig', 'Davis', 'Anthony', 'Tony'], 'Age':[22, 27, 24, 33], 'Employed':[0, 1, 0, 0]})
df
I have this...
Name
Age
Employed
Craig
22
0
Davis
27
1
Anthony
24
0
Tony
33
0
I want something similar to this but iterable through 1000's of rows
Name
Age
Employed
Craig
22
0
Davis
27
1
Anthony
24
-1
Tony
33
0
Use shift to get the next row after a 1:
df = df.loc[df['Employed'].shift() == 1, 'Employed'] = -1
print(df)
# Output
Name Age Employed
0 Craig 22 0
1 Davis 27 1
2 Anthony 24 -1
3 Tony 33 0
I have a column Name of string data type. I want to get all the values except the last one and put it in a new column FName, which I could achieve
df = pd.DataFrame({'Name': ['John A Sether', 'Robert D Junior', 'Jonny S Rab'],
'Age':[32, 34, 36]})
df['FName'] = df['Name'].str.split(' ').str[0:-1]
Name Age FName
0 John A Sether 32 [John, A]
1 Robert D Junior 34 [Robert, D]
2 Jonny S Rab 36 [Jonny, S]
But the new column FName looks like a list, which I don't want. I want it to be like: John A.
I tried convert the list to string, but it does not seems to be right.
Any suggestion ?
You can use .str.rsplit:
df['FName'] = df['Name'].str.rsplit(n=1).str[0]
Or you can use .str.extract:
df['FName'] = df['Name'].str.extract(r'(\S+\s?\S*)', expand=False)
Or, you can chain .str.join after .str.split:
df['FName'] = df['Name'].str.split().str[:-1].str.join(' ')
Name Age FName
0 John A Sether 32 John A
1 Robert D Junior 34 Robert D
2 Jonny S Rab 36 Jonny S
I am new to python pandas. I have one dataframe like below:
df = pd.DataFrame({'Name': ['football', 'ramesh','suresh','pankaj','cricket','rakesh','mohit','mahesh'],
'age': ['25', '22','21','32','37','26','24','30']})
print df
Name age
0 football 25
1 ramesh 22
2 suresh 21
3 pankaj 32
4 cricket 37
5 rakesh 26
6 mohit 24
7 mahesh 30
"Name" column contains "sports name" and "sport person name" also. I want to split it into two different columns like below:
Expected Output:
sports_name sport_person_name age
football ramesh 25
suresh 22
pankaj 32
cricket rakesh 26
mohit 24
mahesh 30
If I make groupby on "Name" column I'm not getting expected output and it is obviously straight-forward output because no duplicates in "Name" column. What I need to use so that I can get expected output?
Edit : If don't want to hardcode the sports names
df = pd.DataFrame({'Name': ['football', 'ramesh','suresh','pankaj','cricket','rakesh','mohit','mahesh'],
'age': ['', '22','21','32','','26','24','30']})
df = df.replace('', np.nan, regex=True)
nan_rows = df[df.isnull().T.any().T]
sports = nan_rows['Name'].tolist()
df['sports_name'] = df['Name'].where(df['Name'].isin(sports)).ffill()
d = {'Name':'sport_person_name'}
df = df[df['sports_name'] != df['Name']].reset_index(drop=True).rename(columns=d)
df = df[['sports_name','sport_person_name','age']]
print (df)
I Just Checked for except "Name" column which rows contains NAN values in all rest of the columns and It will be definitely sports names. I created list of that sports names and make use of below solutions to create sports_name and sports_person_name columns.
You can use:
#define list of sports
sports = ['football','cricket']
#create NaNs if no sport in Name, forward filling NaNs
df['sports_name'] = df['Name'].where(df['Name'].isin(sports)).ffill()
#remove same values in columns sports_name and Name, rename column
d = {'Name':'sport_person_name'}
df = df[df['sports_name'] != df['Name']].reset_index(drop=True).rename(columns=d)
#change order of columns
df = df[['sports_name','sport_person_name','age']]
print (df)
sports_name sport_person_name age
0 football ramesh 22
1 football suresh 21
2 football pankaj 32
3 cricket rakesh 26
4 cricket mohit 24
5 cricket mahesh 30
Similar solution with DataFrame.insert - then reorder is not necessary:
#define list of sports
sports = ['football','cricket']
#rename column by dict
d = {'Name':'sport_person_name'}
df = df.rename(columns=d)
#create NaNs if no sport in Name, forward filling NaNs
df.insert(0, 'sports_name', df['sport_person_name'].where(df['sport_person_name'].isin(sports)).ffill())
#remove same values in columns sports_name and Name
df = df[df['sports_name'] != df['sport_person_name']].reset_index(drop=True)
print (df)
sports_name sport_person_name age
0 football ramesh 22
1 football suresh 21
2 football pankaj 32
3 cricket rakesh 26
4 cricket mohit 24
5 cricket mahesh 30
If want only one value of sport add limit=1 to ffill and replace NaNs to empty string:
sports = ['football','cricket']
df['sports_name'] = df['Name'].where(df['Name'].isin(sports)).ffill(limit=1).fillna('')
d = {'Name':'sport_person_name'}
df = df[df['sports_name'] != df['Name']].reset_index(drop=True).rename(columns=d)
df = df[['sports_name','sport_person_name','age']]
print (df)
sports_name sport_person_name age
0 football ramesh 22
1 suresh 21
2 pankaj 32
3 cricket rakesh 26
4 mohit 24
5 mahesh 30
The output you want is a dictionary and not a dataframe.
The dictionary will look:
{'Sport' : {'Player' : age,'Player2' : age}}
If you really want a dataframe:
If the name always comes before the players:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['football','ramesh','suresh','pankaj','cricket'
,'rakesh','mohit','mahesh'],
'age': ['25', '22','21','32','37','26','24','30']})
sports=['football', 'cricket']
wanted_dict={}
current_sport=''
for val in df['sport_person_name']:
if val in sports:
current_sport=val
else:
wanted_dict[val]=current_sport
#Now you got - {name:sport_name,...}
df['sports_name']=999
for val in df['sport_person_name']
df['sports_name']=np.where((val not in sports)&
(df['sport_person_name']==val),
wanted_dict[val],'sport)
df = df[df['sports_name']!='sport']
What it should look like:
sports_name sport_person_name age
football ramesh 25
football suresh 22
football pankaj 32
cricket rakesh 26
cricket mohit 24
cricket mahesh 30
I'm new to pandas and so am a bit unfamiliar with how it works. I have processed some data and obtained the results I want, however, I am having trouble figuring out how to format the output with print. For instance, I only want to display certain rows of data, as well as putting certain values in ().
From doing this:
df = pd.read_csv('data_file.csv')
tallmen = df[df['gender'] == 'M'].nlargest(2, 'height')
This is the output I get by doing print(tallmen):
id name gender state height
6 5 Smith, Bob M New York 73.5
2 7 Wright, Frank M Kentucky 75.2
And this is the output I want:
Smith, Bob (M) 6' 1.5"
Wright, Frank (M) 6' 3.2"
When I tried to use tallmen as a dictionary, and that gave me an error. So I'm not quite sure what to do. Additionally, it there a way for me to be able to manipulate the height values so that I can reformat them (aka display them in the ft in format as shown below)?
you can create a new column this way:
In [207]: df
Out[207]:
id name gender state height
6 5 Smith, Bob M New York 73.5
2 7 Wright, Frank M Kentucky 75.2
In [208]: df['new'] = (
...: df.name + ' (' + df.gender + ') ' +
...: (df.height // 12).astype(int).astype(str) +
...: "' " + (df.height % 12).astype(str) + '"')
...:
In [209]: df
Out[209]:
id name gender state height new
6 5 Smith, Bob M New York 73.5 Smith, Bob (M) 6' 1.5"
2 7 Wright, Frank M Kentucky 75.2 Wright, Frank (M) 6' 3.2"
My professor helped me figure this out. Really what I needed was to know how to iterate through values in the DataFrame. My solution looks like this:
df = pd.read_csv('data_file.csv')
tallmen = df[df['gender'] == 'M'].nlargest(2, 'height')
for i, val in tallmen.iterrows():
feet = val['height']//12
inches = val['height']%12
print("%s (%s) %i'%i"" % (val['name'], val['gender'],
feet, inches))