This question already has answers here:
How to reverse the order of first and last name in a Pandas Series
(4 answers)
Closed 4 years ago.
I need to swap the list of names which is in the format of FirstName and LastName which in a dataframe one column, using python.
Below is the sample format:
~Adam Smith
The above need to change into
~Smith Adam
Is there any single line function available in python?
Could anyone help on this!!
Using apply
import pandas as pd
df = pd.DataFrame({"names": ["Adam Smith", "Greg Rogers"]})
df["names"] = df["names"].apply(lambda x: " ".join(reversed(x.split())))
print(df)
Output:
names
0 Smith Adam
1 Rogers Greg
Related
This question already has answers here:
How do I create variable variables?
(17 answers)
Closed 4 months ago.
I have a df in python with different cities.
I am trying to create a df for each city.
So wrote this code in python and it works. It does what I need. But i was wondering if there is any over way to create the name of each df in a different way rather than using
globals()\["df\_"+str(ciudad)\] = new_grouped_by
If I try this:
"df\_"+str(ciudad) = new_grouped_by
Give me this error: SyntaxError: can't assign to operator
Any tips/suggestions would be more than welcome!
def get_city():
for ciudad in df["Ciudad"].unique():
#print (ciudad)
grouped_by = df.groupby('Ciudad')
new_grouped_by=[grouped_by.get_group(ciudad) for i in grouped_by.groups]
globals()["df_"+str(ciudad)] = new_grouped_by
get_city()
A simple way would be to store the dataframes in a dictionary with the city names as keys:
import pandas as pd
data = zip(['Amsterdam', 'Amsterdam', 'Barcelona'],[1,22,333])
df = pd.DataFrame(data, columns=['Ciudad', 'data'])
new_dfs = dict(list(df.groupby('Ciudad')))
Calling new_dfs['Amsterdam'] will then give you the dataframe:
Ciudad
data
0
Amsterdam
1
1
Amsterdam
22
This question already has answers here:
Split / Explode a column of dictionaries into separate columns with pandas
(13 answers)
Closed 9 months ago.
So here's my simple example (the json field in my actual dataset is very nested so I'm unpacking things one level at a time). I need to keep certain columns on the dataset post json_normalize().
https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html
Start:
Expected (Excel mockup):
Actual:
import json
d = {'report_id': [100, 101, 102], 'start_date': ["2021-03-12", "2021-04-22", "2021-05-02"],
'report_json': ['{"name":"John", "age":30, "disease":"A-Pox"}', '{"name":"Mary", "age":22, "disease":"B-Pox"}', '{"name":"Karen", "age":42, "disease":"C-Pox"}']}
df = pd.DataFrame(data=d)
display(df)
df = pd.json_normalize(df['report_json'].apply(json.loads), max_level=0, meta=['report_id', 'start_date'])
display(df)
Looking at the documentation on json_normalize(), I think the meta parameter is what I need to keep the report_id and start_date but it doesn't seem to be working as the expected fields to keep are not appearing on the final dataset.
Does anyone have advice? Thank you.
as you're dealing with a pretty simple json along a structured index you can just normalize your frame then make use of .join to join along your axis.
from ast import literal_eval
df.join(
pd.json_normalize(df['report_json'].map(literal_eval))
).drop('report_json',axis=1)
report_id start_date name age disease
0 100 2021-03-12 John 30 A-Pox
1 101 2021-04-22 Mary 22 B-Pox
2 102 2021-05-02 Karen 42 C-Pox
This question already has answers here:
Pandas: create new column in df with random integers from range
(3 answers)
Closed last year.
This post was edited and submitted for review 6 months ago and failed to reopen the post:
Duplicate This question has been answered, is not unique, and doesn’t differentiate itself from another question.
I have a data frame with the customers as shown below.
df:
id name
1 john
2 dan
3 sam
also, I have a list as
['www.costco.com', 'www.walmart.com']
I would like to add a column named domain to df by randomly selecting the elements from the list.
Expected output:
id name domain
1 john www.walmart.com
2 dan www.costco.com
3 sam www.costco.com
Note: since it is a random selection output may not be the same as always.
It is randomly selecting from the given list of strings, hence it is not same and not duplicate. And it is a specific question and it got great and very specific answers.
You can use random.choices:
import random
df['domain'] = random.choices(lst, k=len(df))
A sample output:
id name domain
0 1 john www.walmart.com
1 2 dan www.costco.com
2 3 sam www.costco.com
This question already has an answer here:
Python pandas splitting text and numbers in dataframe
(1 answer)
Closed 2 years ago.
I have a dataframe that looks like:
Name
John5346
Alex7789
Jackie1123
John Smith4567
A.J Doe349
I am hoping to get:
Name No
John 5346
Alex 7789
Jackie 1123
John Smith 4567
A.J Doe 349
Have tried something like:
df["No"]= df["Name"].str.split(r'[0-9]')
for no such luck> Any ideas? Thanks very much
EDIT
Updated to include names that have a space or full stop in them
Try:
df[["Name", "sep", "No"]] = df["Name"].str.split("(\d)", n=1, expand=True)
df["No"] = df["sep"] + df["No"]
df.drop("sep", inplace=True, axis=1)
The essence in here is:
to split, keeping the separator - just put separator in square brackets (\d)
ensure max splits is exactly 1 - n=1
This question already has answers here:
splitting at underscore in python and storing the first value
(5 answers)
Get last "column" after .str.split() operation on column in pandas DataFrame
(5 answers)
Closed 2 years ago.
I have a dataframe of email addresses, and I want to search which are the most used email providers (eg. gmail.com, yahoo.com etc). I used the following code
dfEmail=Ecom['Email']
I have the following data
0 pdunlap#yahoo.com
1 anthony41#reed.com
2 amymiller#morales-harrison.com
3 brent16#olson-robinson.info
4 christopherwright#gmail.com
...
9995 iscott#wade-garner.com
9996 mary85#hotmail.com
9997 tyler16#gmail.com
9998 elizabethmoore#reid.net
9999 rachelford#vaughn.com
Name: Email, Length: 10000, dtype: object
I want to split these email addresses at "#" and get only names of email providers.
I tried the following
dfEmailSplit=dfEmail.str.split('#')
dfEmailSplit[500][1]
this gave me the following result:
'gmail.com'
How do i do this for all the email addresses?
import pandas as pd
df = pd.DataFrame()
data = {'email':['pdunlap#yahoo.com', 'anthony41#reed.com', 'amymiller#morales- harrison.com']}
df = pd.DataFrame(data)
tlds = {'tlds': [x.split('#')[1] for x in df['email']]}
df = pd.DataFrame(tlds)
print(df)