Get string instead of list in Pandas DataFrame - python

I have a column Name of string data type. I want to get all the values except the last one and put it in a new column FName, which I could achieve
df = pd.DataFrame({'Name': ['John A Sether', 'Robert D Junior', 'Jonny S Rab'],
'Age':[32, 34, 36]})
df['FName'] = df['Name'].str.split(' ').str[0:-1]
Name Age FName
0 John A Sether 32 [John, A]
1 Robert D Junior 34 [Robert, D]
2 Jonny S Rab 36 [Jonny, S]
But the new column FName looks like a list, which I don't want. I want it to be like: John A.
I tried convert the list to string, but it does not seems to be right.
Any suggestion ?

You can use .str.rsplit:
df['FName'] = df['Name'].str.rsplit(n=1).str[0]
Or you can use .str.extract:
df['FName'] = df['Name'].str.extract(r'(\S+\s?\S*)', expand=False)
Or, you can chain .str.join after .str.split:
df['FName'] = df['Name'].str.split().str[:-1].str.join(' ')
Name Age FName
0 John A Sether 32 John A
1 Robert D Junior 34 Robert D
2 Jonny S Rab 36 Jonny S

Related

Access datas from pandas dataframe

I have a pandas dataframe
A B
Joe 20
Raul 22
James 30
How can i create sentence in the following format
Mark of Joe is 20
Mark of Raul is 22
Mark of James is 30
For Series join values by + with casting to strings numeric column(s):
s = 'Mark of ' + df.A + ' is ' + df.B.astype(str)
print (s)
0 Mark of Joe is 20
1 Mark of Raul is 22
2 Mark of James is 30
dtype: object
If need loop:
for x in df[['A','B']].to_numpy(): #df.to_numpy() if only A,B columns
print (f'Mark of {x[0]} is {x[1]}')
Mark of Joe is 20
Mark of Raul is 22
Mark of James is 30
First we replicate your dataframe
df = pd.DataFrame({'A': ['Joe', 'Raul', 'James'], 'B': [20, 22, 30]})
Then we can make a new column to the datframe containing the sentences you want
df['sentence'] = 'Mark of ' + df.A + ' is ' + df.B.astype(str)
Finally we check that the "sentence" column contains the sentence in the format you wanted
df
A B sentence
Joe 20 Mark of Joe is 20
Raul 22 Mark of Raul is 22
James 30 Mark of James is 30

Extract last word in DataFrame column

This has to be so simple - but I can't figure it out. I have a "name" column within a DataFrame and I'm trying to reverse the order of ['First Name', 'Middle Name', 'Last Name'] to ['Last Name', 'First Name', 'Middle Name'].
Here is my code:
for i in range(2114):
bb = a['Approved by User'][i].split(" ",2)[2]
aa = a['Approved by User'][i].split(" ",2)[0]
a['Full Name]'] = bb+','+aa
Unfortunately I keep getting IndexError: list index out of range with the current code.
This is what I want:
Old column Name| Jessica Mary Simpson
New column Name| Simpson Jessica Mary
One way to do it is to split the string and joinit later on in a function.
like so:
import pandas as pd
d = {"name": ["Jessica Mary Simpson"]}
df = pd.DataFrame(d)
a = df.name.str.split()
a = a.apply(lambda x: " ".join(x[::-1])).reset_index()
print(a)
output:
index name
0 0 Simpson Mary Jessica
With your shown samples, you could try following.
Let's say following is the df:
fullname
0 Jessica Mary Simpson
1 Ravinder avtar singh
2 John jonny janardan
Here is the code:
df['fullname'].replace(r'^([^ ]*) ([^ ]*) (.*)$', r'\3 \1 \2',regex=True)
OR
df['fullname'].replace(r'^(\S*) (\S*) (.*)$', r'\3 \1 \2',regex=True)
output will be as follows:
0 Simpson Jessica Mary
1 singh Ravinder avtar
2 janardan John jonny
I think problem is in your data, here is your solution in pandas text functions Series.str.split, indexing and Series.str.join:
df['Full Name'] = df['Approved by User'].str.split(n=2).str[::-1].str.join(' ')
print (df)
Approved by User Full Name
0 Jessica Mary Simpson Simpson Mary Jessica
1 John Doe Doe John
2 Mary Mary

Regular Expressions in a dataframe Python

I am trying to extract the name from the data frame.
df.['target_name'].head()
3 Minnie
4 Albert [unclear]Gles[/unclear]
5 Eliza [unclear]Gles[/unclear]
6 John Slaltery
7 [unclear]P.[/unclear] Slaltery
23 ? Stewart
34 John Maddison
35 Herbert Olney
36 William Iverach
37 [unclear][/unclear]
38 Peter Blacksmith
39 William Oliver
40 Emily
Name: target_name, dtype: object
This is the output. We just want to get rid of the unnecessary characters and fetch the name.
This is what I have done:
import re
df['target_name'] = df['target_name'].astype(str) #converting it into a string.
I tried using these two methods, but the both gave me the same output i.e. Nan
df['target_name'] = df['target_name'].str.extract('([a-zA-Z ]+)', expand=False).str.strip()
df['target_name3'] = df['target_name'].str.replace(r'\([^)]*\)', '').str.strip()
This seems to work for me.
import pandas as pd
import re
target_name = ["Minnie", "Albert [unclear]Gles[/unclear]",
"Eliza [unclear]Gles[/unclear]",
"[unclear]P.[/unclear] Slaltery", "? Stewart"]
df = pd.DataFrame(target_name, columns = ['target_name'])
df['target_name'] = df['target_name'].astype('str').str.replace(r'\/|\?','').str.replace('\[[a-z]+\]','').str.strip()

Formatting nlargest output pandas

I'm new to pandas and so am a bit unfamiliar with how it works. I have processed some data and obtained the results I want, however, I am having trouble figuring out how to format the output with print. For instance, I only want to display certain rows of data, as well as putting certain values in ().
From doing this:
df = pd.read_csv('data_file.csv')
tallmen = df[df['gender'] == 'M'].nlargest(2, 'height')
This is the output I get by doing print(tallmen):
id name gender state height
6 5 Smith, Bob M New York 73.5
2 7 Wright, Frank M Kentucky 75.2
And this is the output I want:
Smith, Bob (M) 6' 1.5"
Wright, Frank (M) 6' 3.2"
When I tried to use tallmen as a dictionary, and that gave me an error. So I'm not quite sure what to do. Additionally, it there a way for me to be able to manipulate the height values so that I can reformat them (aka display them in the ft in format as shown below)?
you can create a new column this way:
In [207]: df
Out[207]:
id name gender state height
6 5 Smith, Bob M New York 73.5
2 7 Wright, Frank M Kentucky 75.2
In [208]: df['new'] = (
...: df.name + ' (' + df.gender + ') ' +
...: (df.height // 12).astype(int).astype(str) +
...: "' " + (df.height % 12).astype(str) + '"')
...:
In [209]: df
Out[209]:
id name gender state height new
6 5 Smith, Bob M New York 73.5 Smith, Bob (M) 6' 1.5"
2 7 Wright, Frank M Kentucky 75.2 Wright, Frank (M) 6' 3.2"
My professor helped me figure this out. Really what I needed was to know how to iterate through values in the DataFrame. My solution looks like this:
df = pd.read_csv('data_file.csv')
tallmen = df[df['gender'] == 'M'].nlargest(2, 'height')
for i, val in tallmen.iterrows():
feet = val['height']//12
inches = val['height']%12
print("%s (%s) %i'%i"" % (val['name'], val['gender'],
feet, inches))

Why does this Pandas csv import fail?

I am trying to import the following csv text:
name, favorites, age, other_hobbies
joe, "[madonna, elvis, u2]", 28, "[football, cooking]"
mary, "[lady gaga, adele]", 36, "[]"
With the following pandas command
file_name = "new_data.csv"
df = pd.read_csv(file_name, sep =",")
print(df)
And I get this result:
name favorites age other_hobbies
joe "[madonna elvis u2]" 28 "[football cooking]"
mary "[lady gaga adele]" 36 "[]" NaN NaN
Why is this happening, and how can I get pandas to read this properly?
Pass skipinitialspace along with the sep:
df = pd.read_csv("in.csv",sep="," , skipinitialspace=1)
print(df)
Output:
name favorites age other_hobbies
0 joe [madonna, elvis, u2] 28 [football, cooking]
1 mary [lady gaga, adele] 36 []

Categories