How to generate timstamp for a day? - python

I have a value under timeseries pandas dataframe. However, this frame does not contain the datetime column. The data is about 1440 rows, match with 1440 minutes for a day. So, I would like to generate the minutes timestamp column for this frame, how to do that?
Desired result is under the format '%Y-%m-%d %H:%M:%S'
Before
value
0 210.38
1 210.50
2 210.51
3 210.40
4 210.41
After
datetime value
0 2019-09-18 23:55:00 210.38
1 2019-09-18 23:56:00 210.50
2 2019-09-18 23:57:00 210.51
3 2019-09-18 23:58:00 210.40
4 2019-09-18 23:59:00 210.41
Thank you!

Use DataFrame.insert for first column with date_range:
df.insert(0, 'datetime', pd.date_range('2019-09-18 23:55:00', periods=len(df), freq='T'))
print (df)
datetime value
0 2019-09-18 23:55:00 210.38
1 2019-09-18 23:56:00 210.50
2 2019-09-18 23:57:00 210.51
3 2019-09-18 23:58:00 210.40
4 2019-09-18 23:59:00 210.41
If want generate datetimes dynamically:
df.insert(0,'datetime',pd.date_range(pd.Timestamp.now().floor('T'), periods=len(df),freq='T'))
print (df)
datetime value
0 2020-01-10 10:36:00 210.38
1 2020-01-10 10:37:00 210.50
2 2020-01-10 10:38:00 210.51
3 2020-01-10 10:39:00 210.40
4 2020-01-10 10:40:00 210.41

try this,
from datetime import datetime
df=pd.DataFrame({'value': [ 210.38, 210.50, 210.51, 210.40, 210.41]})
df['date'] = pd.date_range(start=datetime.today().replace(microsecond=0), periods=len(df), freq='T')
O/P:
value date
0 210.38 2020-01-10 15:08:32
1 210.50 2020-01-10 15:09:32
2 210.51 2020-01-10 15:10:32
3 210.40 2020-01-10 15:11:32
4 210.41 2020-01-10 15:12:32

Related

Pandas: time column addition and repeating all rows for a month

I'd like to change my dataframe adding time intervals for every hour during a month
Original df
money food
0 1 2
1 4 5
2 5 7
Output:
money food time
0 1 2 2020-01-01 00:00:00
1 1 2 2020-01-01 00:01:00
2 1 2 2020-01-01 00:02:00
...
2230 5 7 2020-01-31 00:22:00
2231 5 7 2020-01-31 00:23:00
where 2231 = out_rows_number-1 = month_days_number*hours_per_day*orig_rows_number - 1
What is the proper way to perform it?
Use cross join by DataFrame.merge and new DataFrame with all hours per month created by date_range:
df1 = pd.DataFrame({'a':1,
'time':pd.date_range('2020-01-01', '2020-01-31 23:00:00', freq='h')})
df = df.assign(a=1).merge(df1, on='a', how='outer').drop('a', axis=1)
print (df)
money food time
0 1 2 2020-01-01 00:00:00
1 1 2 2020-01-01 01:00:00
2 1 2 2020-01-01 02:00:00
3 1 2 2020-01-01 03:00:00
4 1 2 2020-01-01 04:00:00
... ... ...
2227 5 7 2020-01-31 19:00:00
2228 5 7 2020-01-31 20:00:00
2229 5 7 2020-01-31 21:00:00
2230 5 7 2020-01-31 22:00:00
2231 5 7 2020-01-31 23:00:00
[2232 rows x 3 columns]

Converting to datetime in python

I have a time data in a column and trying to figure out how can I get it in datetime format
2000
2100
2300
2355
0
1
5
10
100
105
330
My question is how can I get these in datetime format:
output should be:
20:00:00
21:00:00
23:00:00
23:55:00
00:00:00
00:01:00
00:05:00
00:10:00
01:00:00
01:05:00
03:30:00
tried:
1. da = pd.to_datetime(330, format='%H%M')
output: '03:30:00'
2. d= str(datetime.timedelta(minutes = 55 ))
output : '0:55:00'
But if I apply 1. to 100 it gives 10 hrs.
eg: da = pd.to_datetime(100, format='%H%M')
output: '10:00:00'
Try,
pd.to_datetime(df['time'].astype(str).str.zfill(4), format = '%H%M').dt.time
0 20:00:00
1 21:00:00
2 23:00:00
3 23:55:00
4 00:00:00
5 00:01:00
6 00:05:00
7 00:10:00
8 01:00:00
9 01:05:00
10 03:30:00
IIUC str.rjust
pd.to_datetime(s.astype(str).str.rjust(4,'0'),format='%H%M').dt.time
Out[41]:
0 20:00:00
1 21:00:00
2 23:00:00
3 23:55:00
4 00:00:00
5 00:01:00
6 00:05:00
7 00:10:00
8 01:00:00
9 01:05:00
10 03:30:00
Name: x, dtype: object
Since novice code, I am making the things more explicit and adding the formatting letters %H and %M info:
df['cname'] = pd.to_datetime(df['cname'].astype(str).str.zfill(4), format = '%H%M').dt.time
print(df['cname'])
# %H Hour (24-hour clock) as a zero-padded decimal number. 07
# %M Minute as a zero-padded decimal number. 06

Pandas: Subtracting two date columns and the result being an integer

I have two columns in a Pandas data frame that are dates.
I am looking to subtract one column from another and the result being the difference in numbers of days as an integer.
A peek at the data:
df_test.head(10)
Out[20]:
First_Date Second Date
0 2016-02-09 2015-11-19
1 2016-01-06 2015-11-30
2 NaT 2015-12-04
3 2016-01-06 2015-12-08
4 NaT 2015-12-09
5 2016-01-07 2015-12-11
6 NaT 2015-12-12
7 NaT 2015-12-14
8 2016-01-06 2015-12-14
9 NaT 2015-12-15
I have created a new column successfully with the difference:
df_test['Difference'] = df_test['First_Date'].sub(df_test['Second Date'], axis=0)
df_test.head()
Out[22]:
First_Date Second Date Difference
0 2016-02-09 2015-11-19 82 days
1 2016-01-06 2015-11-30 37 days
2 NaT 2015-12-04 NaT
3 2016-01-06 2015-12-08 29 days
4 NaT 2015-12-09 NaT
However I am unable to get a numeric version of the result:
df_test['Difference'] = df_test[['Difference']].apply(pd.to_numeric)
df_test.head()
Out[25]:
First_Date Second Date Difference
0 2016-02-09 2015-11-19 7.084800e+15
1 2016-01-06 2015-11-30 3.196800e+15
2 NaT 2015-12-04 NaN
3 2016-01-06 2015-12-08 2.505600e+15
4 NaT 2015-12-09 NaN
How about:
df_test['Difference'] = (df_test['First_Date'] - df_test['Second Date']).dt.days
This will return difference as int if there are no missing values(NaT) and float if there is.
Pandas have a rich documentation on Time series / date functionality and Time deltas
You can divide column of dtype timedelta by np.timedelta64(1, 'D'), but output is not int, but float, because NaN values:
df_test['Difference'] = df_test['Difference'] / np.timedelta64(1, 'D')
print (df_test)
First_Date Second Date Difference
0 2016-02-09 2015-11-19 82.0
1 2016-01-06 2015-11-30 37.0
2 NaT 2015-12-04 NaN
3 2016-01-06 2015-12-08 29.0
4 NaT 2015-12-09 NaN
5 2016-01-07 2015-12-11 27.0
6 NaT 2015-12-12 NaN
7 NaT 2015-12-14 NaN
8 2016-01-06 2015-12-14 23.0
9 NaT 2015-12-15 NaN
Frequency conversion.
You can use datetime module to help here. Also, as a side note, a simple date subtraction should work as below:
import datetime as dt
import numpy as np
import pandas as pd
#Assume we have df_test:
In [222]: df_test
Out[222]:
first_date second_date
0 2016-01-31 2015-11-19
1 2016-02-29 2015-11-20
2 2016-03-31 2015-11-21
3 2016-04-30 2015-11-22
4 2016-05-31 2015-11-23
5 2016-06-30 2015-11-24
6 NaT 2015-11-25
7 NaT 2015-11-26
8 2016-01-31 2015-11-27
9 NaT 2015-11-28
10 NaT 2015-11-29
11 NaT 2015-11-30
12 2016-04-30 2015-12-01
13 NaT 2015-12-02
14 NaT 2015-12-03
15 2016-04-30 2015-12-04
16 NaT 2015-12-05
17 NaT 2015-12-06
In [223]: df_test['Difference'] = df_test['first_date'] - df_test['second_date']
In [224]: df_test
Out[224]:
first_date second_date Difference
0 2016-01-31 2015-11-19 73 days
1 2016-02-29 2015-11-20 101 days
2 2016-03-31 2015-11-21 131 days
3 2016-04-30 2015-11-22 160 days
4 2016-05-31 2015-11-23 190 days
5 2016-06-30 2015-11-24 219 days
6 NaT 2015-11-25 NaT
7 NaT 2015-11-26 NaT
8 2016-01-31 2015-11-27 65 days
9 NaT 2015-11-28 NaT
10 NaT 2015-11-29 NaT
11 NaT 2015-11-30 NaT
12 2016-04-30 2015-12-01 151 days
13 NaT 2015-12-02 NaT
14 NaT 2015-12-03 NaT
15 2016-04-30 2015-12-04 148 days
16 NaT 2015-12-05 NaT
17 NaT 2015-12-06 NaT
Now, change type to datetime.timedelta, and then use the .days method on valid timedelta objects.
In [226]: df_test['Diffference'] = df_test['Difference'].astype(dt.timedelta).map(lambda x: np.nan if pd.isnull(x) else x.days)
In [227]: df_test
Out[227]:
first_date second_date Difference Diffference
0 2016-01-31 2015-11-19 73 days 73
1 2016-02-29 2015-11-20 101 days 101
2 2016-03-31 2015-11-21 131 days 131
3 2016-04-30 2015-11-22 160 days 160
4 2016-05-31 2015-11-23 190 days 190
5 2016-06-30 2015-11-24 219 days 219
6 NaT 2015-11-25 NaT NaN
7 NaT 2015-11-26 NaT NaN
8 2016-01-31 2015-11-27 65 days 65
9 NaT 2015-11-28 NaT NaN
10 NaT 2015-11-29 NaT NaN
11 NaT 2015-11-30 NaT NaN
12 2016-04-30 2015-12-01 151 days 151
13 NaT 2015-12-02 NaT NaN
14 NaT 2015-12-03 NaT NaN
15 2016-04-30 2015-12-04 148 days 148
16 NaT 2015-12-05 NaT NaN
17 NaT 2015-12-06 NaT NaN
Hope that helps.
I feel that the overall answer does not handle if the dates 'wrap' around a year. This would be useful in understanding proximity to a date being accurate by day of year. In order to do these row operations, I did the following. (I had this used in a business setting in renewing customer subscriptions).
def get_date_difference(row, x, y):
try:
# Calcuating the smallest date difference between the start and the close date
# There's some tricky logic in here to calculate for determining date difference
# the other way around (Dec -> Jan is 1 month rather than 11)
sub_start_date = int(row[x].strftime('%j')) # day of year (1-366)
close_date = int(row[y].strftime('%j')) # day of year (1-366)
later_date_of_year = max(sub_start_date, close_date)
earlier_date_of_year = min(sub_start_date, close_date)
days_diff = later_date_of_year - earlier_date_of_year
# Calculates the difference going across the next year (December -> Jan)
days_diff_reversed = (365 - later_date_of_year) + earlier_date_of_year
return min(days_diff, days_diff_reversed)
except ValueError:
return None
Then the function could be:
dfAC_Renew['date_difference'] = dfAC_Renew.apply(get_date_difference, x = 'customer_since_date', y = 'renewal_date', axis = 1)
Create a vectorized method
def calc_xb_minus_xa(df):
time_dict = {
'<Minute>': 'm',
'<Hour>': 'h',
'<Day>': 'D',
'<Week>': 'W',
'<Month>': 'M',
'<Year>': 'Y'
}
time_delta = df.at[df.index[0], 'end_time'] - df.at[df.index[0], 'open_time']
offset_base_name = str(to_offset(time_delta).base)
time_term = time_dict.get(offset_base_name)
result = (df.end_time - df.open_time) / np.timedelta64(1, time_term)
return result
Then in your df do:
df['x'] = calc_xb_minus_xa(df)
This will work for minutes, hours, days, weeks, month and Year.
open_time and end_time need to change according your df

python pandas series loc value from multi index

I have a series that looks like this
2014 7 2014-07-01 -0.045417
8 2014-08-01 -0.035876
9 2014-09-02 -0.030971
10 2014-10-01 -0.027471
11 2014-11-03 -0.032968
12 2014-12-01 -0.031110
2015 1 2015-01-02 -0.028906
2 2015-02-02 -0.035563
3 2015-03-02 -0.040338
4 2015-04-01 -0.032770
5 2015-05-01 -0.025762
6 2015-06-01 -0.019746
7 2015-07-01 -0.018541
8 2015-08-03 -0.028101
9 2015-09-01 -0.043237
10 2015-10-01 -0.053565
11 2015-11-02 -0.062630
12 2015-12-01 -0.064618
2016 1 2016-01-04 -0.064852
I want to be able to get the value from a date. Something like:
myseries.loc('2015-10-01') and it returns -0.053565
The index are tuples in the form (2016, 1, 2016-01-04)
You can do it like this:
In [32]:
df.loc(axis=0)[:,:,'2015-10-01']
Out[32]:
value
year month date
2015 10 2015-10-01 -0.053565
You can also pass slice for each level:
In [39]:
df.loc[(slice(None),slice(None),'2015-10-01'),]
Out[39]:
value
year month date
2015 10 2015-10-01 -0.053565|
Or just pass the first 2 index levels:
In [40]:
df.loc[2015,10]
Out[40]:
value
date
2015-10-01 -0.053565
Try xs:
print s.xs('2015-10-01',level=2,axis=0)
#year datetime
#2015 10 -0.053565
#Name: series, dtype: float64
print s.xs(7,level=1,axis=0)
#year datetime
#2014 2014-07-01 -0.045417
#2015 2015-07-01 -0.018541
#Name: series, dtype: float64

How to rearrange a date in python

I have a column in a pandas data frame looking like:
test1.Received
Out[9]:
0 01/01/2015 17:25
1 02/01/2015 11:43
2 04/01/2015 18:21
3 07/01/2015 16:17
4 12/01/2015 20:12
5 14/01/2015 11:09
6 15/01/2015 16:05
7 16/01/2015 21:02
8 26/01/2015 03:00
9 27/01/2015 08:32
10 30/01/2015 11:52
This represents a time stamp as Day Month Year Hour Minute. I would like to rearrange the date as Year Month Day Hour Minute. So that it would look like:
test1.Received
Out[9]:
0 2015/01/01 17:25
1 2015/01/02 11:43
...
Just use pd.to_datetime:
In [33]:
import pandas as pd
pd.to_datetime(df['date'])
Out[33]:
index
0 2015-01-01 17:25:00
1 2015-02-01 11:43:00
2 2015-04-01 18:21:00
3 2015-07-01 16:17:00
4 2015-12-01 20:12:00
5 2015-01-14 11:09:00
6 2015-01-15 16:05:00
7 2015-01-16 21:02:00
8 2015-01-26 03:00:00
9 2015-01-27 08:32:00
10 2015-01-30 11:52:00
Name: date, dtype: datetime64[ns]
In your case:
pd.to_datetime(test1['Received'])
should just work
If you want to change the display format then you need to parse as a datetime and then apply `datetime.strftime:
In [35]:
import datetime as dt
pd.to_datetime(df['date']).apply(lambda x: dt.datetime.strftime(x, '%m/%d/%y %H:%M:%S'))
Out[35]:
index
0 01/01/15 17:25:00
1 02/01/15 11:43:00
2 04/01/15 18:21:00
3 07/01/15 16:17:00
4 12/01/15 20:12:00
5 01/14/15 11:09:00
6 01/15/15 16:05:00
7 01/16/15 21:02:00
8 01/26/15 03:00:00
9 01/27/15 08:32:00
10 01/30/15 11:52:00
Name: date, dtype: object
So the above is now showing month/day/year, in your case the following should work:
pd.to_datetime(test1['Received']).apply(lambda x: dt.datetime.strftime(x, '%y/%m/%d %H:%M:%S'))
EDIT
it looks like you need to pass param dayfirst=True to to_datetime:
In [45]:
pd.to_datetime(df['date'], format('%d/%m/%y %H:%M:%S'), dayfirst=True).apply(lambda x: dt.datetime.strftime(x, '%m/%d/%y %H:%M:%S'))
Out[45]:
index
0 01/01/15 17:25:00
1 01/02/15 11:43:00
2 01/04/15 18:21:00
3 01/07/15 16:17:00
4 01/12/15 20:12:00
5 01/14/15 11:09:00
6 01/15/15 16:05:00
7 01/16/15 21:02:00
8 01/26/15 03:00:00
9 01/27/15 08:32:00
10 01/30/15 11:52:00
Name: date, dtype: object
Pandas has this in-built, you can specify your datetime format
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html.
use infer_datetime_format
>>> import pandas as pd
>>> i = pd.date_range('20000101',periods=100)
>>> df = pd.DataFrame(dict(year = i.year, month = i.month, day = i.day))
>>> pd.to_datetime(df.year*10000 + df.month*100 + df.day, format='%Y%m%d')
0 2000-01-01
1 2000-01-02
...
98 2000-04-08
99 2000-04-09
Length: 100, dtype: datetime64[ns]
you can use the datetime functions to convert from and to strings.
# converts to date
datetime.strptime(date_string, 'DD/MM/YYYY HH:MM')
and
# converts to your requested string format
datetime.strftime(date_string, "YYYY/MM/DD HH:MM:SS")

Categories