creating date range on csv using python-pandas [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
How to create a date range in python using pandas in Y-M-D?

import pandas as pd
df = pd.DataFrame([['2015-07-07','2016-09-22'],['2012-02-03','2013-02-19'],['2013-02-17','2013-03-22']],columns = ['start','end'])
#change strings to date format
df['start'] = [pd.to_datetime(x) for x in df['start']]
df['end'] = [pd.to_datetime(x) for x in df['end']]
df['range'] = df['end']-df['start']
df
Output should be:
start end range
0 2015-07-07 2016-09-22 443 days
1 2012-02-03 2013-02-19 382 days
2 2013-02-17 2013-03-22 33 days
In case you want to read from csv, switch the beginning to:
df = pd.read_csv('file_name.csv')
in case you want a concatenated column:
df['details'] = [str(x)+' - '+str(y)+' has '+str(z)[:-9] for x,y,z in zip(df['start'],df['end'],df['range'])]

Related

convert text to csv with python pandas [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 months ago.
Improve this question
I want to convert text file to csv file
import pandas as pd
readfile = pd.read_csv(r'text.txt')
readfile.to_csv(r'CSV.csv, index=None)
my text file format:
the result:
In the red circle it's add a decimal number follow the data it's duplicate
I don't want it to add a decimal number
please suggestion me what to do next, thank you.
and if it possible to read file and convert to csv with limit column please advise!
#just add header = None, since first line of txt is considered header that's why it is managing duplicate column names.
import pandas as pd
readfile = pd.read_csv(r'text.txt',header=None)
readfile.to_csv(r'CSV.csv, index=None)
#sample example output of readfile
0 1 2 3 4 5 6 7 8
0 1 2 3 5 0.0 0.0 0.0 4 6

How to remove duplicates from data frame using python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
dframe= pd.DataFrame({'col1':['A']*3 + ['B']*4 + ['C','B','A'],'col2':[2,3,4,2,4,2,1,3,4,4]})
I want to remove duplicates from both columns and final result should look like this:
pd.DataFrame({'col1':['A'] + ['B'] + ['C'],'col2':[2,4,3]})
I tried following but the result was not as per the expectations
dframe.drop_duplicates(subset=['col1'], keep='first')
Please help.
Thanks
try:
via agg() and dropna() method:
out=dframe.agg(lambda x:pd.Series(pd.unique(x))).dropna()
OR
via apply() and dropna() method:
out=dframe.apply(lambda x:pd.Series(pd.unique(x))).dropna()
output of out:
col1 col2
0 A 2
1 B 3
2 C 4

Pandas new column with calculation based on other existing column [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a Panda and want to do a calculation based on an existing column.
However, the apply. function is not working for some reason.
It's something like letssay
df = pd.DataFrame({'Age': age, 'Input': input})
and the input column is something like [1.10001, 1.49999, 1.60001]
Now I want to add a new column to the Dataframe, that is doing the following:
Add 0.0001 to each element in column
Multiply each value by 10
Transform each value of new column to int
Use Series.add, Series.mul and Series.astype:
#input is python code word (builtin), so better dont use it like variable
inp = [1.10001, 1.49999, 1.60001]
age = [10,20,30]
df = pd.DataFrame({'Age': age, 'Input': inp})
df['new'] = df['Input'].add(0.0001).mul(10).astype(int)
print (df)
Age Input new
0 10 1.10001 11
1 20 1.49999 15
2 30 1.60001 16
You could make a simple function and then apply it by row.
def f(row):
return int((row['input']+0.0001)*10))
df['new'] = df.apply(f, axis=1)

How to Read A CSV With A Variable Number of Columns? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
My csv file looks like this:
5783,145v
g656,4589,3243,tt56
6579
How do I read this with pandas (or otherwise)?
(the table should contain empty cells)
You could pass a dummy separator, and then use str.split (by ",") with expand=True:
df = pd.read_csv('path/to/file.csv', sep=" ", header=None)
df = df[0].str.split(",", expand=True).fillna("")
print(df)
Output
0 1 2 3
0 5783 145v
1 g656 4589 3243 tt56
2 6579
I think that the solution proposed by #researchnewbie is good. If you need to replace the NaN values for say, zero, you could add this line after the read:
dataFrame.fillna(0, inplace=True)
Try doing the following:
import pandas as pd
dataFrame = pd.read_csv(filename)
Your empty cells should contain the NaN value, which essentially null.

Python :Select the rows for the most recent entry from multiple users [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have a dataframe df with 3 columns :
df=pd.DataFrame({
'User':['A','A','B','A','C','B','C'],
'Values':['x','y','z','p','q','r','s'],
'Date':[14,11,14,12,13,10,14]
})
I want to create a new dataframe that will contain the rows corresponding to highest values in the 'Date' columns for each user. For example for the above dataframe I want the desired dataframe to be as follows ( its a jpeg image):
Can anyone help me with this problem?
This answer assumes that there is different maximum values per user in Values column:
In [10]: def get_max(group):
...: return group[group.Date == group.Date.max()]
...:
In [12]: df.groupby('User').apply(get_max).reset_index(drop=True)
Out[12]:
Date User Values
0 14 A x
1 14 B z
2 14 C s

Categories