How to get the City from address column in Python Pandas [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last month.
Improve this question
I am trying to get the city from the Purchase Address column, my code is like below:
when I tried [0] or [-1], I can get the street address or the state/zip. But when I try 1, it raised the error: index out of range?
Can anyone help solve this problem?
when I try to get the street address, it works
enter image description here
This is the result when I tried 1, since city is in the middle of the address
when I try to get the city, it raise error

Example
we need minimal and reproducible example for answer. also need text or code not image.
df = pd.DataFrame(['a,B,c', 'a,C,b', 'd'], columns=['col1'])
df
col1
0 a,B,c
1 a,C,b
2 d
Code
your code :
df['col1'].apply(lambda x: x.split(',')[1])
IndexError: list index out of range
try following code:
out = df['col1'].str.split(',').str[1]
out
0 B
1 C
2 NaN
Name: col1, dtype: object

Related

How to find which doctor a patient is using, when only given a list of doctor's patients? (code improvement request) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I need to create a dataframe which lists all patients and their matching doctors.
I have a txt file with doctor/patient records organized in the following format:
Doctor_1: patient23423,patient292837,patient1232423...
Doctor_2: patient456785,patient25363,patient23425665...
And a list of all unique patients.
To do this, I imported the txt file into a doctorsDF dataframe, separated by a colon. I also created a patientsDF dataframe with 2 columns: 'Patients' filled from the patient list, and 'Doctors' column empty.
I then ran the following:
for pat in patientsDF['Patient']:
for i, doc in enumerate(doctorsDF[1]):
if doctorsDF[1][i].find(str(pat)) >= 0 :
patientsDF['Doctor'][i] = doctorsDF.loc[i,0]
else:
continue
This worked fine, and now all patients are matched with the doctors, but the method seems clumsy. Is there any function that can more cleanly achieve the result? Thanks!
(First StackOverflow post here. Sorry if this is a newb question!)
If you use Pandas, try:
df = pd.read_csv('data.txt', sep=':', header=None, names=['Doctor', 'Patient'])
df = df[['Doctor']].join(df['Patient'].str.strip().str.split(',')
.explode()).reset_index(drop=True)
Output:
>>> df
Doctor Patient
0 Doctor_1 patient23423
1 Doctor_1 patient292837
2 Doctor_1 patient1232423
3 Doctor_2 patient456785
4 Doctor_2 patient25363
5 Doctor_2 patient23425665
How to search:
>>> df.loc[df['Patient'] == 'patient25363', 'Doctor'].squeeze()
'Doctor_2'

How to unite two different columns into one column in Python? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a dataframe named sales_raw with 28 columns and 2823 rows. It has two address columns, Address_1 and Address_2. Address_2 is detail address of Address_1. I want to unite them without any conditional and keep the new column Address in the same dataframe.
How to do this? Is there any alternative to do this?
Note: I have some NaN values in the column Address_2
You can use np.where:
>>> sales_raw
Address_1 Address_2
0 AddressA DetailA
1 AddressB NaN
2 AddressC DetailC
sales_raw['Address'] = np.where(sales_raw['Address_2'].isna(),
sales_raw['Address_1'],
sales_raw['Address_1'] + ', ' + sales_raw['Address_2'])
>>> sales_raw
Address_1 Address_2 Address
0 AddressA DetailA AddressA, DetailA
1 AddressB NaN AddressB
2 AddressC DetailC AddressC, DetailC
Try like this:
df["Address"] = df["Address_1"] + df["Address_2"].fillna("")
This will concatenate the values of the two columns while using missing values from the second column with empty strings.
Since str concatenations are also vectorized, you can simply do:
df["Address"] = df.pop("Address_1") + " " + df.pop("Address_2")
You can use df.insert() if you'd like to insert the new column at a specific position.

How to remove duplicates from data frame using python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
dframe= pd.DataFrame({'col1':['A']*3 + ['B']*4 + ['C','B','A'],'col2':[2,3,4,2,4,2,1,3,4,4]})
I want to remove duplicates from both columns and final result should look like this:
pd.DataFrame({'col1':['A'] + ['B'] + ['C'],'col2':[2,4,3]})
I tried following but the result was not as per the expectations
dframe.drop_duplicates(subset=['col1'], keep='first')
Please help.
Thanks
try:
via agg() and dropna() method:
out=dframe.agg(lambda x:pd.Series(pd.unique(x))).dropna()
OR
via apply() and dropna() method:
out=dframe.apply(lambda x:pd.Series(pd.unique(x))).dropna()
output of out:
col1 col2
0 A 2
1 B 3
2 C 4

how to add 1 to a column in dataframe range [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
i want a range of values from two columns From and To since the number in To column should be included in range of values so i'm adding 1 to that as shown in below
df.apply(lambda x : range(x['From'],x['To']+1),1)
df.apply(lambda x : ','.join(map(str, range(x['From'],x['To']))),1)
i need output some thing like this
if from value starts from 5 and To value ends with 11
myoutput should be like this
5,6,7,8,9,10,11
i'm getting till 10 only even i have added +1 to range of end value
df:
----
From To
15887 16251
15888 16252
15889 16253
15890 16254
and range should be written in new column
Try this:
df=pd.DataFrame({'From':[15887,15888,15889,15890],'To':[16251,16252,16253,16254]})
df['Range']=[list(range(i,k+1)) for i,k in zip(df['From'],df['To'])]

Python :Select the rows for the most recent entry from multiple users [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have a dataframe df with 3 columns :
df=pd.DataFrame({
'User':['A','A','B','A','C','B','C'],
'Values':['x','y','z','p','q','r','s'],
'Date':[14,11,14,12,13,10,14]
})
I want to create a new dataframe that will contain the rows corresponding to highest values in the 'Date' columns for each user. For example for the above dataframe I want the desired dataframe to be as follows ( its a jpeg image):
Can anyone help me with this problem?
This answer assumes that there is different maximum values per user in Values column:
In [10]: def get_max(group):
...: return group[group.Date == group.Date.max()]
...:
In [12]: df.groupby('User').apply(get_max).reset_index(drop=True)
Out[12]:
Date User Values
0 14 A x
1 14 B z
2 14 C s

Categories