I have a data frame df
Date Mobile_No Amount Time .....
121526 2014-12-24 739637 200.00 9:44:00
121529 2014-12-28 199002 500.00 9:49:44
121531 2014-12-10 813770 100.00 9:50:41
121536 2014-12-09 178795 100.00 9:52:15
121537 2014-12-09 178795 100.00 9:52:24
having Date and Time of type datetime64 and object. I need to group this data frame by time interval of 5 minutes and Mobile_No. My expected output is the last two rows should be counted as one (Same Mobile_No and time interval is less than 5 minutes).
Is there any way to achieve this?
First I thought to combine Date and Time column and make timestamp and then use it as index and apply pd.TimeGrouper(), but this doesn't seem to work
>>>import datetime as dt
>>>import pandas as pd
...
>>> df.apply(lambda x: dt.datetime.combine(x['Date'], dt.time(x['Time'])), axis=1)
gives the error
'an integer is required', u'occurred at index 121526'
Can you not convert to string, concat the strings and parse the format in to_datetime if you are having issues:
df['Time']=df['Time'].astype(str)
df['Date']=df['Date'].astype(str)
df['Timestamp'] = df['Date'] +' ' + df['Time']
df.index = pd.to_datetime(df['Timestamp'], format='%Y/%m/%d %H:%M:%S')
from there you can resample or us pd.Grouper as required.
Related
I have a column with timestamps (strings) which look like the following:
2017-10-25T09:57:00.319Z
2017-10-25T09:59:00.319Z
2017-10-27T11:03:00.319Z
Tbh I do not know the meaning of Z but I guess it is not that important.
How to convert the above strings into correct timestamp to calculate the difference/delta (e.g. in seconds or minutes)?
I want to have a column where the deltas between one to anoter timestamp are listed.
You can use pd.to_datetime() to convert the string to datetime format. Then get the time difference/delta by .diff(). Finally, convert the timedelta to seconds by .dt.total_seconds(), as follows:
(Assuming your column of string is named Date):
df['Date'] = pd.to_datetime(df['Date'])
df['TimeDelta'] = df['Date'].diff().dt.total_seconds()
Result:
Time delta in seconds:
print(df)
Date TimeDelta
0 2017-10-25 09:57:00.319000+00:00 NaN
1 2017-10-25 09:59:00.319000+00:00 120.0
2 2017-10-27 11:03:00.319000+00:00 176640.0
I am trying to round seconds in a dataframe column which contains date and time in the format 01Jan2019:11:03:57.541.
I want to get the result as 01Jan2019:11:03:58
The column is in object format.
Could someone please help.
Use to_datetime for datetimes, then round by Series.dt.round and last convert by strftime:
df = pd.DataFrame({'date':['01Jan2019:11:03:57.541','01Jan2019:11:03:57.241']})
print (df)
date
0 01Jan2019:11:03:57.541
1 01Jan2019:11:03:57.241
df['date'] = (pd.to_datetime(df['date'], format='%d%b%Y:%H:%M:%S.%f')
.dt.round('S')
.dt.strftime('%d%b%Y:%H:%M:%S'))
print (df)
date
0 01Jan2019:11:03:58
1 01Jan2019:11:03:57
I have a dataFrame with two columns, ["StartDate" ,"duration"]
the elements in the StartDate column are datetime type, and the duration are ints.
Something like:
StartDate Duration
08:16:05 20
07:16:01 20
I expect to get:
EndDate
08:16:25
07:16:21
Simply add the seconds to the hour.
I'd being checking some ideas about it like the delta time types and that all those datetimes have the possibilities to add delta times, but so far I can find how to do it with the DataFrames (in a vector fashion, cause It might be possible to iterate over all the rows performing the operation ).
consider this df
StartDate duration
0 01/01/2017 135
1 01/02/2017 235
You can get the datetime column like this
df['EndDate'] = pd.to_datetime(df['StartDate']) + pd.to_timedelta(df['duration'], unit='s')
df.drop('StartDate,'duration', axis = 1, inplace = True)
You get
EndDate
0 2017-01-01 00:02:15
1 2017-01-02 00:03:55
EDIT: with the sample dataframe that you posted
df['EndDate'] = pd.to_timedelta(df['StartDate']) + pd.to_timedelta(df['Duration'], unit='s')
df.StartDate = df.apply(lambda x: pd.to_datetime(x.StartDate)+pd.Timedelta(Second(df.duration)) ,axis = 1)
I'm working on a pandas dataframe, one of my column is a date (YYYYMMDD), another one is an hour (HH:MM), I would like to concatenate the two column as one timestamp or datetime64 column, to later use that column as an index (for a time series). Here is the situation :
Do you have any ideas? The classic pandas.to_datetime() seems to work only if the columns contain hours only, day only and year only, ... etc...
Setup
df
Out[1735]:
id date hour other
0 1820 20140423 19:00:00 8
1 4814 20140424 08:20:00 22
Solution
import datetime as dt
#convert date and hour to str, concatenate them and then convert them to datetime format.
df['new_date'] = df[['date','hour']].astype(str).apply(lambda x: dt.datetime.strptime(x.date + x.hour, '%Y%m%d%H:%M:%S'), axis=1)
df
Out[1756]:
id date hour other new_date
0 1820 20140423 19:00:00 8 2014-04-23 19:00:00
1 4814 20140424 08:20:00 22 2014-04-24 08:20:00
I have a Pandas Dataframe df:
a date
1 2014-06-29 00:00:00
df.types return:
a object
date object
I want convert column data to data without time but:
df['date']=df['date'].astype('datetime64[s]')
return:
a date
1 2014-06-28 22:00:00
df.types return:
a object
date datetime64[ns]
But value is wrong.
I'd have:
a date
1 2014-06-29
or:
a date
1 2014-06-29 00:00:00
I would start by putting your dates in pd.datetime:
df['date'] = pd.to_datetime(df.date)
Now, you can see that the time component is still there:
df.date.values
array(['2014-06-28T19:00:00.000000000-0500'], dtype='datetime64[ns]')
If you are ok having a date object again, you want:
df['date'] = [x.strftime("%y-%m-%d") for x in df.date]
Here would be ending with a datetime:
df['date'] = [x.date() for x in df.date]
df.date
datetime.date(2014, 6, 29)
Here you go. Just use this pattern:
df.to_datetime().date()