Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a dataframe named sales_raw with 28 columns and 2823 rows. It has two address columns, Address_1 and Address_2. Address_2 is detail address of Address_1. I want to unite them without any conditional and keep the new column Address in the same dataframe.
How to do this? Is there any alternative to do this?
Note: I have some NaN values in the column Address_2
You can use np.where:
>>> sales_raw
Address_1 Address_2
0 AddressA DetailA
1 AddressB NaN
2 AddressC DetailC
sales_raw['Address'] = np.where(sales_raw['Address_2'].isna(),
sales_raw['Address_1'],
sales_raw['Address_1'] + ', ' + sales_raw['Address_2'])
>>> sales_raw
Address_1 Address_2 Address
0 AddressA DetailA AddressA, DetailA
1 AddressB NaN AddressB
2 AddressC DetailC AddressC, DetailC
Try like this:
df["Address"] = df["Address_1"] + df["Address_2"].fillna("")
This will concatenate the values of the two columns while using missing values from the second column with empty strings.
Since str concatenations are also vectorized, you can simply do:
df["Address"] = df.pop("Address_1") + " " + df.pop("Address_2")
You can use df.insert() if you'd like to insert the new column at a specific position.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last month.
Improve this question
I am trying to get the city from the Purchase Address column, my code is like below:
when I tried [0] or [-1], I can get the street address or the state/zip. But when I try 1, it raised the error: index out of range?
Can anyone help solve this problem?
when I try to get the street address, it works
enter image description here
This is the result when I tried 1, since city is in the middle of the address
when I try to get the city, it raise error
Example
we need minimal and reproducible example for answer. also need text or code not image.
df = pd.DataFrame(['a,B,c', 'a,C,b', 'd'], columns=['col1'])
df
col1
0 a,B,c
1 a,C,b
2 d
Code
your code :
df['col1'].apply(lambda x: x.split(',')[1])
IndexError: list index out of range
try following code:
out = df['col1'].str.split(',').str[1]
out
0 B
1 C
2 NaN
Name: col1, dtype: object
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I need to create a dataframe which lists all patients and their matching doctors.
I have a txt file with doctor/patient records organized in the following format:
Doctor_1: patient23423,patient292837,patient1232423...
Doctor_2: patient456785,patient25363,patient23425665...
And a list of all unique patients.
To do this, I imported the txt file into a doctorsDF dataframe, separated by a colon. I also created a patientsDF dataframe with 2 columns: 'Patients' filled from the patient list, and 'Doctors' column empty.
I then ran the following:
for pat in patientsDF['Patient']:
for i, doc in enumerate(doctorsDF[1]):
if doctorsDF[1][i].find(str(pat)) >= 0 :
patientsDF['Doctor'][i] = doctorsDF.loc[i,0]
else:
continue
This worked fine, and now all patients are matched with the doctors, but the method seems clumsy. Is there any function that can more cleanly achieve the result? Thanks!
(First StackOverflow post here. Sorry if this is a newb question!)
If you use Pandas, try:
df = pd.read_csv('data.txt', sep=':', header=None, names=['Doctor', 'Patient'])
df = df[['Doctor']].join(df['Patient'].str.strip().str.split(',')
.explode()).reset_index(drop=True)
Output:
>>> df
Doctor Patient
0 Doctor_1 patient23423
1 Doctor_1 patient292837
2 Doctor_1 patient1232423
3 Doctor_2 patient456785
4 Doctor_2 patient25363
5 Doctor_2 patient23425665
How to search:
>>> df.loc[df['Patient'] == 'patient25363', 'Doctor'].squeeze()
'Doctor_2'
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
The data is like this and it is in a data frame.
PatientId Payor
0 PAT10000 [Cash, Britam]
1 PAT10001 [Madison, Cash]
2 PAT10002 [Cash]
3 PAT10003 [Cash, Madison, Resolution]
4 PAT10004 [CIC Corporate, Cash]
I want to remove the square brackets and filter all patients who used at least a certain mode of payment eg madison then obtain their ID. Please help.
This will generate a list of tuples (id, payor). (df is the dataframe)
payment = 'Madison'
ids = [(id, df.Payor[i][1:-1]) for i, id in enumerate(df.PatientId) if payment in df.Payor[i]]
let's say, your data frame variable initialized as "df" and after removing square brackets, you want to filter all elements containing "Madison" under "Payor" column
df.replace({'[':''}, regex = True)
df.replace({']':''}, regex = True)
filteredDf = df.loc[df['Payor'].str.contains("Madison")]
print(filteredDf)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have the user input two lists, one for sizes one for minutes they are each stored in a list. For example they can input sizes: 111, 121 and for minutes, 5, 10, 15.
I want to have the dataframe have columns that are named by the size and minute. (I did a for loop to extract each size and minute.) For example I want the columns to say 111,5 ; 111,10; 111;15, etc. I tried to do df[size+minute]=values (Values is data I want to input into each column) but instead the column name is just the values added up so I got the column name to be 116 instead of 111,5.
If you have two lists:
l = [111,121]
l2 = [5,10,15]
Then you can use list comprehension to form your column names:
col_names = [str(x)+';'+str(y) for x in l for y in l2]
print(col_names)
['111;5', '111;10', '111;15', '121;5', '121;10', '121;15']
And create a dataframe with these column names using pandas.DataFrame:
df = pd.DataFrame(columns=col_names)
If we add a row of data:
row = pd.DataFrame([[1,2,3,4,5,6]])
row.columns = col_names
df = df.append(pd.DataFrame(row))
We can see that our dataframe looks like this:
print(df)
111;5 111;10 111;15 121;5 121;10 121;15
0 1 2 3 4 5 6
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
My csv file looks like this:
5783,145v
g656,4589,3243,tt56
6579
How do I read this with pandas (or otherwise)?
(the table should contain empty cells)
You could pass a dummy separator, and then use str.split (by ",") with expand=True:
df = pd.read_csv('path/to/file.csv', sep=" ", header=None)
df = df[0].str.split(",", expand=True).fillna("")
print(df)
Output
0 1 2 3
0 5783 145v
1 g656 4589 3243 tt56
2 6579
I think that the solution proposed by #researchnewbie is good. If you need to replace the NaN values for say, zero, you could add this line after the read:
dataFrame.fillna(0, inplace=True)
Try doing the following:
import pandas as pd
dataFrame = pd.read_csv(filename)
Your empty cells should contain the NaN value, which essentially null.