I'm trying to figure out the easiest way to automate the conversion of an array of seconds into datetime. I'm very familiar with converting the seconds from 1970 into datetime, but the values that I have here are for the seconds elapsed in a given day. For example, 14084 is the number if seconds that has passed on 2011,11,11, and I was able to generate the datetime below.
str(dt.timedelta(seconds = 14084))
Out[245]: '3:54:44'
dt.datetime.combine(date(2011,11,11),time(3,54,44))
Out[250]: datetime.datetime(2011, 11, 11, 3, 54, 44)
Is there a faster way of conversion for an array.
numpy has support for arrays of datetimes with a timedelta type for manipulating them:
https://numpy.org/doc/stable/reference/arrays.datetime.html
e.g. you can do this:
import numpy as np
date_array = np.arange('2005-02', '2005-03', dtype='datetime64[D]')
date_array += np.timedelta64(4, 's') # Add 4 seconds
If you have an array of seconds, you could convert it into an array of timedeltas and add that to a fixed datetime
Say you have
seconds = [14084, 14085, 15003]
You can use pandas
import pandas as pd
series = pd.to_timedelta(seconds, unit='s') + pd.to_datetime('2011-11-11')
series = series.to_series().reset_index(drop=True)
print(series)
0 2011-11-11 03:54:44
1 2011-11-11 03:54:45
2 2011-11-11 04:10:03
dtype: datetime64[ns]
Or a list comprehension
list_comp = [datetime.datetime(2011, 11, 11) +
datetime.timedelta(seconds=s) for s in seconds]
print(list_comp)
[datetime.datetime(2011, 11, 11, 3, 54, 44), datetime.datetime(2011, 11, 11, 3, 54, 45), datetime.datetime(2011, 11, 11, 4, 10, 3)]
Related
How do I convert a list of dates that are in the form yyyymmdd to a serial number? For example, if I have this list of dates:
t = [1898-10-12 06:00,1898-10-12 12:00,1932-09-30 08:00,1932-09-30 00:00]
How do I convert each date to a serial number? Im currently using the datetime toordinal() command, but each date is being rounded to the same serial number. How do I get the same dates with different times to be different numbers?
The times in the list are the datetime.datetime numbers. I tried then doing:
thurser = []
for i in range(len(t)):
thurser.append(t[i].toordinal())
But am not getting serial numbers as floats.
datetime.toordinal() considers only the 'date' part of the datetime object, not the time. So does date.toordinal() - it only has a date part. The first 2 and last 2 elements in your list have datetimes on the same date but at different times, which .toordinal ignores. So, .toordinal will give you the same value for those same-dated datetimes.
In general, the solution would be to calculate the delta between your dates and a pre-determined/fixed one. I'm using datetime.datetime(1, 1, 1), the earliest possible datetime, so all the deltas are positive:
thurser = []
# assuming t is a list of datetime objects
for d in t:
delta = d - datetime.datetime(1, 1, 1)
thurser.append(delta.days + delta.seconds/(24 * 3600))
>>> print(thurser)
[693149.25, 693149.5, 705555.3333333334, 705555.0]
And if you prefer ints instead of floats, then use seconds instead of days:
thurser.append(int(delta.total_seconds())) # total_seconds has microseconds in the float
>>> print(thurser)
[59888095200, 59888116800, 60959980800, 60959952000]
And to get back the original values in the 2nd example:
>>> [datetime.timedelta(seconds=d) + datetime.datetime(1, 1, 1) for d in thurser]
[datetime.datetime(1898, 10, 12, 6, 0), datetime.datetime(1898, 10, 12, 12, 0),
datetime.datetime(1932, 9, 30, 8, 0), datetime.datetime(1932, 9, 30, 0, 0)]
>>> _ == t # compare with original values
True
Let me know if my understanding is wrong, I tried following and gives distinct numbers for each value of the list.
I modified
t = ['1898-10-12 06:00','1898-10-12 12:00','1932-09-30 08:00','1932-09-30 00:00']
with
t = [datetime.datetime(1898, 10, 12, 6, 0), datetime.datetime(1898, 10, 12, 12, 0), datetime.datetime(1932, 9, 30, 8, 0), datetime.datetime(1932, 9, 30, 0, 0)]
As mentioned in comment it is list of datetime.datetime.
I am considering total MilliSeconds from 1970-01-01 00:00:00 the given date to generate a number.
So dates which are before above date give values in negative. But distinct values.
t = [datetime.datetime(1898, 10, 12, 6, 0), datetime.datetime(1898, 10, 12, 12, 0), datetime.datetime(1932, 9, 30, 8, 0), datetime.datetime(1932, 9, 30, 0, 0)]
thurser = []
x = []
for i in range(len(t)):
thurser.append(t[i].toordinal())
x.append((t[i]-datetime.datetime.utcfromtimestamp(0)).total_seconds() * 1000.0)
print(thurser)
print(x)
output:
[693150, 693150, 705556, 705556]
[-2247501600000.0, -2247480000000.0, -1175616000000.0, -1175644800000.0]
How do you get a valid timedelta instance when differencing datetimes with different timezones in Python? I'm finding the timedelta is always 0 if the timezones are different.
>>> from dateutil.parser import parse
>>> dt0=parse('2017-02-06 18:14:32-05:00')
>>> dt0
datetime.datetime(2017, 2, 6, 18, 14, 32, tzinfo=tzoffset(None, -18000))
>>> dt1=parse('2017-02-06 23:14:32+00:00')
>>> dt1
datetime.datetime(2017, 2, 6, 23, 02, 12, tzinfo=tzutc())
>>> (dt1-dt0).total_seconds()
0.0
This doesn't make any sense to me. I would have thought that Python's datetime class would be smart enough to normalize both values to UTC internally, and then return a timedelta based on those values. Or throw an exception. Instead it returns 0, implying both datetimes are equal, which clearly they're not. What am I doing wrong here?
You are confused about what the timezone means; the two times you gave are identical, so of course their difference is zero. I can duplicate your results, except that I don't have the discrepancy between the second string and second datetime that you have:
>>> from dateutil.parser import parse
>>> dt0=parse('2017-02-06 18:14:32-05:00')
>>> dt0
datetime.datetime(2017, 2, 6, 18, 14, 32, tzinfo=tzoffset(None, -18000))
>>> dt1=parse('2017-02-06 23:14:32+00:00')
>>> dt1
datetime.datetime(2017, 2, 6, 23, 14, 32, tzinfo=tzutc())
>>> (dt1-dt0).total_seconds()
0.0
But watch what happens when I convert dt0 to UTC. The time gets adjusted by the 5 hour timezone difference, and it becomes identical to the second.
>>> dt0.astimezone(dt1.tzinfo)
datetime.datetime(2017, 2, 6, 23, 14, 32, tzinfo=tzutc())
I am attempting to build a program to handle alerts. I want it to be able to handle specific dates like 8/23/2015 7:00 and relative dates like 5 days and 7 hours from now. specific dates are fine but for relative dates if I try and just add 5 days and 7 hours to the date time it can overflow the values intended for that spot
import datetime
dt = datetime.datetime.now()
dayslater = 5
hourslater = 7
minuteslater = 30
alarmTime = datetime.datetime(dt.year, dt.month, dt.day + dayslater,
dt.hour + hourslater,
dt.minute + minuteslater, 0,0)
this is fine sometimes but if dayslater was 40 days it would overflow the value. I did set up a simple
if hours >= 24:
hours -= 24
days++
however this won't work for overflowing months whose length in days isn't consistent.
Don't. Dates are hard, and it's very easy to get it wrong.
Instead, use timedelta:
In [1]: from datetime import datetime, timedelta
In [2]: dt = datetime.now()
In [3]: dt
Out[3]: datetime.datetime(2015, 7, 23, 15, 2, 55, 836914)
In [4]: alarmTime = dt + timedelta(days=5, hours=7, minutes=30)
In [5]: alarmTime
Out[5]: datetime.datetime(2015, 7, 28, 22, 32, 55, 836914)
Use a datetime.timedelta() object and leave calculations to the datetime library:
import datetime
delta = datetime.timedelta(days=dayslater, hours=hourslater, minutes=minuteslater)
alarmTime = datetime.datetime.now() + delta
Demo:
>>> import datetime
>>> dt = datetime.datetime.now()
>>> dayslater = 5
>>> hourslater = 7
>>> minuteslater = 30
>>> delta = datetime.timedelta(days=dayslater, hours=hourslater, minutes=minuteslater)
>>> delta
datetime.timedelta(5, 27000)
>>> dt
datetime.datetime(2015, 7, 23, 21, 4, 59, 987926)
>>> dt + delta
datetime.datetime(2015, 7, 29, 4, 34, 59, 987926)
Note how the hours carried over to the next day (from 21:04 to 04:34), and thus the date went from the 23rd to the 29th. I did not have to worry about 'overflow' here.
This continues to work at month boundaries, at year boundaries, and in leap years, with February 29th:
>>> datetime.datetime(2015, 7, 26, 22, 42) + delta
datetime.datetime(2015, 8, 1, 6, 12)
>>> datetime.datetime(2015, 12, 26, 22, 42) + delta
datetime.datetime(2016, 1, 1, 6, 12)
>>> datetime.datetime(2016, 2, 23, 22, 42) + delta
datetime.datetime(2016, 2, 29, 6, 12)
This question already has answers here:
Converting between datetime, Timestamp and datetime64
(14 answers)
Closed 7 years ago.
I basically face the same problem posted here:Converting between datetime, Timestamp and datetime64
but I couldn't find satisfying answer from it, my question how to extract datetime from numpy.datetime64 type:
if I try:
np.datetime64('2012-06-18T02:00:05.453000000-0400').astype(datetime.datetime)
it gave me:
1339999205453000000L
my current solution is convert datetime64 into a string and then turn to datetime again. but it seems quite a silly method.
Borrowing from
Converting between datetime, Timestamp and datetime64
In [220]: x
Out[220]: numpy.datetime64('2012-06-17T23:00:05.453000000-0700')
In [221]: datetime.datetime.utcfromtimestamp(x.tolist()/1e9)
Out[221]: datetime.datetime(2012, 6, 18, 6, 0, 5, 452999)
Accounting for timezones I think that's right. Looks rather clunky though.
Using int() is more explicit (I think) than tolist()):
In [294]: datetime.datetime.utcfromtimestamp(int(x)/1e9)
Out[294]: datetime.datetime(2012, 6, 18, 6, 0, 5, 452999)
or to get datetime in local:
In [295]: datetime.datetime.fromtimestamp(x.astype('O')/1e9)
But in the test_datetime.py file
https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_datetime.py
I find some other options - first convert the general datetime64 to one of the format that specifies units:
In [296]: x.astype('M8[D]').astype('O')
Out[296]: datetime.date(2012, 6, 18)
In [297]: x.astype('M8[ms]').astype('O')
Out[297]: datetime.datetime(2012, 6, 18, 6, 0, 5, 453000)
This works for arrays:
In [303]: np.array([[x,x],[x,x]],dtype='M8[ms]').astype('O')[0,1]
Out[303]: datetime.datetime(2012, 6, 18, 6, 0, 5, 453000)
Note that Timestamp IS a sub-class of datetime.datetime so the [4] will generally work
In [4]: pd.Timestamp(np.datetime64('2012-06-18T02:00:05.453000000-0400'))
Out[4]: Timestamp('2012-06-18 06:00:05.453000')
In [5]: pd.Timestamp(np.datetime64('2012-06-18T02:00:05.453000000-0400')).to_pydatetime()
Out[5]: datetime.datetime(2012, 6, 18, 6, 0, 5, 453000)
Short version: I have two TimeSeries (recording start and recording end) I would like to use as indices for data in a Panel (or DataFrame). Not hierarchical, but parallel. I am uncertain how to do this.
Long version:
I am constructing a pandas Panel with some data akin to temperature and density at certain distances from an antenna. As I see it, the most natural structure is having e.g. temp and dens as items (i.e. sub-DataFrames of the Panel), recording time as major axis (index), and thus distance from the antenna as minor axis (colums).
My problem is this: For each recording, the instrument averages/integrates over some amount of time. Thus, for each data dump, two timestamps are saved: start recording and end recording. I need both of those. Thus, I would need something which might be called "parallel indexing", where two different TimeSeries (startRec and endRec) work as indices, and I can get whichever I prefer for a certain data point. Of course, I don't really need to index by both, but both need to be naturally available in the data structure. For example, for any given temperature or density recording, I need to be able to get both the start and end time of the recording.
I could of course keep the two TimeSeries in a separate DataFrame, but with the main point of pandas being automatic data alignment, this is not really ideal.
How can I best achieve this?
Example data
Sample Panel with three recordings at two distances from the antenna:
import pandas as pd
import numpy as np
data = pd.Panel(data={'temp': np.array([[21, 20],
[19, 17],
[15, 14]]),
'dens': np.array([[1001, 1002],
[1000, 998],
[997, 995]])},
minor_axis=['1m', '3m'])
Output of data:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 3 (major_axis) x 2 (minor_axis)
Items axis: dens to temp
Major_axis axis: 0 to 2
Minor_axis axis: 1m to 3m
Here, the major axis is currently only an integer-based index (0 to 2). The minor axis is the two measurement distances from the antenna.
I have two TimeSeries I'd like to use as indices:
from datetime import datetime
startRec = pd.TimeSeries([datetime(2013, 11, 11, 15, 00, 00),
datetime(2013, 11, 12, 15, 00, 00),
datetime(2013, 11, 13, 15, 00, 00)])
endRec = pd.TimeSeries([datetime(2013, 11, 11, 15, 00, 10),
datetime(2013, 11, 12, 15, 00, 10),
datetime(2013, 11, 13, 15, 00, 10)])
Output of startRec:
0 2013-11-11 15:00:00
1 2013-11-12 15:00:00
2 2013-11-13 15:00:00
dtype: datetime64[ns]
Being in a Panel makes this a little trickier. I typically stick with DataFrames.
But how does this look:
import pandas as pd
from datetime import datetime
startRec = pd.TimeSeries([datetime(2013, 11, 11, 15, 0, 0),
datetime(2013, 11, 12, 15, 0, 0),
datetime(2013, 11, 13, 15, 0, 0)])
endRec = pd.TimeSeries([datetime(2013, 11, 11, 15, 0, 10),
datetime(2013, 11, 12, 15, 0, 10),
datetime(2013, 11, 13, 15, 0, 10)])
_data1m = pd.DataFrame(data={
'temp': np.array([21, 19, 15]),
'dens': np.array([1001, 1000, 997]),
'start': startRec,
'end': endRec
}
)
_data3m = pd.DataFrame(data={
'temp': np.array([20, 17, 14]),
'dens': np.array([1002, 998, 995]),
'start': startRec,
'end': endRec
}
)
_data1m.set_index(['start', 'end'], inplace=True)
_data3m.set_index(['start', 'end'], inplace=True)
data = pd.Panel(data={'1m': _data1m, '3m': _data3m})
data.loc['3m'].select(lambda row: row[0] < pd.Timestamp('2013-11-12') or
row[1] < pd.Timestamp('2013-11-13'))
and that outputs:
dens temp
start end
2013-11-11 15:00:00 2013-11-11 15:00:10 1002 20
2013-11-12 15:00:00 2013-11-12 15:00:10 998 17