convert text to csv with python pandas [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 months ago.
Improve this question
I want to convert text file to csv file
import pandas as pd
readfile = pd.read_csv(r'text.txt')
readfile.to_csv(r'CSV.csv, index=None)
my text file format:
the result:
In the red circle it's add a decimal number follow the data it's duplicate
I don't want it to add a decimal number
please suggestion me what to do next, thank you.
and if it possible to read file and convert to csv with limit column please advise!

#just add header = None, since first line of txt is considered header that's why it is managing duplicate column names.
import pandas as pd
readfile = pd.read_csv(r'text.txt',header=None)
readfile.to_csv(r'CSV.csv, index=None)
#sample example output of readfile
0 1 2 3 4 5 6 7 8
0 1 2 3 5 0.0 0.0 0.0 4 6

Related

How to find which doctor a patient is using, when only given a list of doctor's patients? (code improvement request) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I need to create a dataframe which lists all patients and their matching doctors.
I have a txt file with doctor/patient records organized in the following format:
Doctor_1: patient23423,patient292837,patient1232423...
Doctor_2: patient456785,patient25363,patient23425665...
And a list of all unique patients.
To do this, I imported the txt file into a doctorsDF dataframe, separated by a colon. I also created a patientsDF dataframe with 2 columns: 'Patients' filled from the patient list, and 'Doctors' column empty.
I then ran the following:
for pat in patientsDF['Patient']:
for i, doc in enumerate(doctorsDF[1]):
if doctorsDF[1][i].find(str(pat)) >= 0 :
patientsDF['Doctor'][i] = doctorsDF.loc[i,0]
else:
continue
This worked fine, and now all patients are matched with the doctors, but the method seems clumsy. Is there any function that can more cleanly achieve the result? Thanks!
(First StackOverflow post here. Sorry if this is a newb question!)
If you use Pandas, try:
df = pd.read_csv('data.txt', sep=':', header=None, names=['Doctor', 'Patient'])
df = df[['Doctor']].join(df['Patient'].str.strip().str.split(',')
.explode()).reset_index(drop=True)
Output:
>>> df
Doctor Patient
0 Doctor_1 patient23423
1 Doctor_1 patient292837
2 Doctor_1 patient1232423
3 Doctor_2 patient456785
4 Doctor_2 patient25363
5 Doctor_2 patient23425665
How to search:
>>> df.loc[df['Patient'] == 'patient25363', 'Doctor'].squeeze()
'Doctor_2'

How to remove duplicates from data frame using python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
dframe= pd.DataFrame({'col1':['A']*3 + ['B']*4 + ['C','B','A'],'col2':[2,3,4,2,4,2,1,3,4,4]})
I want to remove duplicates from both columns and final result should look like this:
pd.DataFrame({'col1':['A'] + ['B'] + ['C'],'col2':[2,4,3]})
I tried following but the result was not as per the expectations
dframe.drop_duplicates(subset=['col1'], keep='first')
Please help.
Thanks
try:
via agg() and dropna() method:
out=dframe.agg(lambda x:pd.Series(pd.unique(x))).dropna()
OR
via apply() and dropna() method:
out=dframe.apply(lambda x:pd.Series(pd.unique(x))).dropna()
output of out:
col1 col2
0 A 2
1 B 3
2 C 4

How to Read A CSV With A Variable Number of Columns? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
My csv file looks like this:
5783,145v
g656,4589,3243,tt56
6579
How do I read this with pandas (or otherwise)?
(the table should contain empty cells)
You could pass a dummy separator, and then use str.split (by ",") with expand=True:
df = pd.read_csv('path/to/file.csv', sep=" ", header=None)
df = df[0].str.split(",", expand=True).fillna("")
print(df)
Output
0 1 2 3
0 5783 145v
1 g656 4589 3243 tt56
2 6579
I think that the solution proposed by #researchnewbie is good. If you need to replace the NaN values for say, zero, you could add this line after the read:
dataFrame.fillna(0, inplace=True)
Try doing the following:
import pandas as pd
dataFrame = pd.read_csv(filename)
Your empty cells should contain the NaN value, which essentially null.

Python :Select the rows for the most recent entry from multiple users [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have a dataframe df with 3 columns :
df=pd.DataFrame({
'User':['A','A','B','A','C','B','C'],
'Values':['x','y','z','p','q','r','s'],
'Date':[14,11,14,12,13,10,14]
})
I want to create a new dataframe that will contain the rows corresponding to highest values in the 'Date' columns for each user. For example for the above dataframe I want the desired dataframe to be as follows ( its a jpeg image):
Can anyone help me with this problem?
This answer assumes that there is different maximum values per user in Values column:
In [10]: def get_max(group):
...: return group[group.Date == group.Date.max()]
...:
In [12]: df.groupby('User').apply(get_max).reset_index(drop=True)
Out[12]:
Date User Values
0 14 A x
1 14 B z
2 14 C s

creating date range on csv using python-pandas [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
How to create a date range in python using pandas in Y-M-D?
import pandas as pd
df = pd.DataFrame([['2015-07-07','2016-09-22'],['2012-02-03','2013-02-19'],['2013-02-17','2013-03-22']],columns = ['start','end'])
#change strings to date format
df['start'] = [pd.to_datetime(x) for x in df['start']]
df['end'] = [pd.to_datetime(x) for x in df['end']]
df['range'] = df['end']-df['start']
df
Output should be:
start end range
0 2015-07-07 2016-09-22 443 days
1 2012-02-03 2013-02-19 382 days
2 2013-02-17 2013-03-22 33 days
In case you want to read from csv, switch the beginning to:
df = pd.read_csv('file_name.csv')
in case you want a concatenated column:
df['details'] = [str(x)+' - '+str(y)+' has '+str(z)[:-9] for x,y,z in zip(df['start'],df['end'],df['range'])]

Categories