I have a column Name of string data type. I want to get all the values except the last one and put it in a new column FName, which I could achieve
df = pd.DataFrame({'Name': ['John A Sether', 'Robert D Junior', 'Jonny S Rab'],
'Age':[32, 34, 36]})
df['FName'] = df['Name'].str.split(' ').str[0:-1]
Name Age FName
0 John A Sether 32 [John, A]
1 Robert D Junior 34 [Robert, D]
2 Jonny S Rab 36 [Jonny, S]
But the new column FName looks like a list, which I don't want. I want it to be like: John A.
I tried convert the list to string, but it does not seems to be right.
Any suggestion ?
You can use .str.rsplit:
df['FName'] = df['Name'].str.rsplit(n=1).str[0]
Or you can use .str.extract:
df['FName'] = df['Name'].str.extract(r'(\S+\s?\S*)', expand=False)
Or, you can chain .str.join after .str.split:
df['FName'] = df['Name'].str.split().str[:-1].str.join(' ')
Name Age FName
0 John A Sether 32 John A
1 Robert D Junior 34 Robert D
2 Jonny S Rab 36 Jonny S
Related
I have a pandas dataframe
A B
Joe 20
Raul 22
James 30
How can i create sentence in the following format
Mark of Joe is 20
Mark of Raul is 22
Mark of James is 30
For Series join values by + with casting to strings numeric column(s):
s = 'Mark of ' + df.A + ' is ' + df.B.astype(str)
print (s)
0 Mark of Joe is 20
1 Mark of Raul is 22
2 Mark of James is 30
dtype: object
If need loop:
for x in df[['A','B']].to_numpy(): #df.to_numpy() if only A,B columns
print (f'Mark of {x[0]} is {x[1]}')
Mark of Joe is 20
Mark of Raul is 22
Mark of James is 30
First we replicate your dataframe
df = pd.DataFrame({'A': ['Joe', 'Raul', 'James'], 'B': [20, 22, 30]})
Then we can make a new column to the datframe containing the sentences you want
df['sentence'] = 'Mark of ' + df.A + ' is ' + df.B.astype(str)
Finally we check that the "sentence" column contains the sentence in the format you wanted
df
A B sentence
Joe 20 Mark of Joe is 20
Raul 22 Mark of Raul is 22
James 30 Mark of James is 30
This has to be so simple - but I can't figure it out. I have a "name" column within a DataFrame and I'm trying to reverse the order of ['First Name', 'Middle Name', 'Last Name'] to ['Last Name', 'First Name', 'Middle Name'].
Here is my code:
for i in range(2114):
bb = a['Approved by User'][i].split(" ",2)[2]
aa = a['Approved by User'][i].split(" ",2)[0]
a['Full Name]'] = bb+','+aa
Unfortunately I keep getting IndexError: list index out of range with the current code.
This is what I want:
Old column Name| Jessica Mary Simpson
New column Name| Simpson Jessica Mary
One way to do it is to split the string and joinit later on in a function.
like so:
import pandas as pd
d = {"name": ["Jessica Mary Simpson"]}
df = pd.DataFrame(d)
a = df.name.str.split()
a = a.apply(lambda x: " ".join(x[::-1])).reset_index()
print(a)
output:
index name
0 0 Simpson Mary Jessica
With your shown samples, you could try following.
Let's say following is the df:
fullname
0 Jessica Mary Simpson
1 Ravinder avtar singh
2 John jonny janardan
Here is the code:
df['fullname'].replace(r'^([^ ]*) ([^ ]*) (.*)$', r'\3 \1 \2',regex=True)
OR
df['fullname'].replace(r'^(\S*) (\S*) (.*)$', r'\3 \1 \2',regex=True)
output will be as follows:
0 Simpson Jessica Mary
1 singh Ravinder avtar
2 janardan John jonny
I think problem is in your data, here is your solution in pandas text functions Series.str.split, indexing and Series.str.join:
df['Full Name'] = df['Approved by User'].str.split(n=2).str[::-1].str.join(' ')
print (df)
Approved by User Full Name
0 Jessica Mary Simpson Simpson Mary Jessica
1 John Doe Doe John
2 Mary Mary
I am trying to extract the name from the data frame.
df.['target_name'].head()
3 Minnie
4 Albert [unclear]Gles[/unclear]
5 Eliza [unclear]Gles[/unclear]
6 John Slaltery
7 [unclear]P.[/unclear] Slaltery
23 ? Stewart
34 John Maddison
35 Herbert Olney
36 William Iverach
37 [unclear][/unclear]
38 Peter Blacksmith
39 William Oliver
40 Emily
Name: target_name, dtype: object
This is the output. We just want to get rid of the unnecessary characters and fetch the name.
This is what I have done:
import re
df['target_name'] = df['target_name'].astype(str) #converting it into a string.
I tried using these two methods, but the both gave me the same output i.e. Nan
df['target_name'] = df['target_name'].str.extract('([a-zA-Z ]+)', expand=False).str.strip()
df['target_name3'] = df['target_name'].str.replace(r'\([^)]*\)', '').str.strip()
This seems to work for me.
import pandas as pd
import re
target_name = ["Minnie", "Albert [unclear]Gles[/unclear]",
"Eliza [unclear]Gles[/unclear]",
"[unclear]P.[/unclear] Slaltery", "? Stewart"]
df = pd.DataFrame(target_name, columns = ['target_name'])
df['target_name'] = df['target_name'].astype('str').str.replace(r'\/|\?','').str.replace('\[[a-z]+\]','').str.strip()
I'm new to pandas and so am a bit unfamiliar with how it works. I have processed some data and obtained the results I want, however, I am having trouble figuring out how to format the output with print. For instance, I only want to display certain rows of data, as well as putting certain values in ().
From doing this:
df = pd.read_csv('data_file.csv')
tallmen = df[df['gender'] == 'M'].nlargest(2, 'height')
This is the output I get by doing print(tallmen):
id name gender state height
6 5 Smith, Bob M New York 73.5
2 7 Wright, Frank M Kentucky 75.2
And this is the output I want:
Smith, Bob (M) 6' 1.5"
Wright, Frank (M) 6' 3.2"
When I tried to use tallmen as a dictionary, and that gave me an error. So I'm not quite sure what to do. Additionally, it there a way for me to be able to manipulate the height values so that I can reformat them (aka display them in the ft in format as shown below)?
you can create a new column this way:
In [207]: df
Out[207]:
id name gender state height
6 5 Smith, Bob M New York 73.5
2 7 Wright, Frank M Kentucky 75.2
In [208]: df['new'] = (
...: df.name + ' (' + df.gender + ') ' +
...: (df.height // 12).astype(int).astype(str) +
...: "' " + (df.height % 12).astype(str) + '"')
...:
In [209]: df
Out[209]:
id name gender state height new
6 5 Smith, Bob M New York 73.5 Smith, Bob (M) 6' 1.5"
2 7 Wright, Frank M Kentucky 75.2 Wright, Frank (M) 6' 3.2"
My professor helped me figure this out. Really what I needed was to know how to iterate through values in the DataFrame. My solution looks like this:
df = pd.read_csv('data_file.csv')
tallmen = df[df['gender'] == 'M'].nlargest(2, 'height')
for i, val in tallmen.iterrows():
feet = val['height']//12
inches = val['height']%12
print("%s (%s) %i'%i"" % (val['name'], val['gender'],
feet, inches))
I am trying to import the following csv text:
name, favorites, age, other_hobbies
joe, "[madonna, elvis, u2]", 28, "[football, cooking]"
mary, "[lady gaga, adele]", 36, "[]"
With the following pandas command
file_name = "new_data.csv"
df = pd.read_csv(file_name, sep =",")
print(df)
And I get this result:
name favorites age other_hobbies
joe "[madonna elvis u2]" 28 "[football cooking]"
mary "[lady gaga adele]" 36 "[]" NaN NaN
Why is this happening, and how can I get pandas to read this properly?
Pass skipinitialspace along with the sep:
df = pd.read_csv("in.csv",sep="," , skipinitialspace=1)
print(df)
Output:
name favorites age other_hobbies
0 joe [madonna, elvis, u2] 28 [football, cooking]
1 mary [lady gaga, adele] 36 []