Working on Python, I need to convert an array of datetime values into sample times, because I want to treat the corresponding time of the time series as sampletime [0..T].
[2013/11/09 14:29:54.660, 2013/11/09 14:29:54.680, ... T] where T> 1000. So I have an array of >1000 date time values, pretty big
I come up with the following code:
tiempos= [datetime.strptime(x,"%Y/%m/%d %H:%M:%S.%f") for x in csvTimeColum]
sampletime= [(t- tiempos[0]).microseconds/1000 for t in tiempos]
This piece of code seem to work well, but I have batches of 1000 samples within the signal:
[0,20,...,980,0,20,...,980,0,20,...,980,...]
So, my resulting signal is not a continuos one. How do I properly do this conversion in order to keep a continuous signal? Anybody has a good idea on how to solve this?
Use total_seconds() which also works for timedeltas:
Convert TimeDiff to total seconds
sampletime= [(t- tiempos[0]).total_seconds()*1000 for t in tiempos]
Working example:
import datetime
csvTimeColum = ["2013/11/09 14:29:54.660", "2013/11/09 14:29:54.680"]
tiempos= [datetime.datetime.strptime(x,"%Y/%m/%d %H:%M:%S.%f") for x in csvTimeColum]
sampletime= [(t- tiempos[0]).total_seconds()*1000 for t in tiempos]
sampletime # [0.0, 20.0]
Related
so i am taking the difference between two times (2022-07-20 23:10:00.990000) and 2022-07-20 23:10:02.100000. that gives me back 0:00:01.110000. i want that to transform to HH:MM:SS without the microseconds. the easiest way to take off microseconds is to do "avg_inqueue_time = str(avg_inqueue_time).split(".")[0]" that will give me 0:00:01. then i try to do avg_inqueue_time_transformed = datetime.strptime('%H:%M:%S', avg_inqueue_time) but gives an error ValueError: time data '%H:%M:%S' does not match format '0:02:07'.
any ideas how to transform that?
I believe a fast approach is to take a different time delta object where you only pick the seconds.
import datetime
time_1 = datetime.datetime(2022,7,20,23,10,00,990000)
time_2 = datetime.datetime(2022,7,20,23,10,2,100000)
timediff=(time_2-time_1)
timediff_wo_microseconds = datetime.timedelta(days=timediff.days, seconds=timediff.seconds)
print(timediff_wo_microseconds)
I have made a class called localSun. I've taken a simplified model of the Earth-Sun system and have tried to compute the altitude angle of the sun for any location on earth for any time. When I run the code for current time and check timeandddate it matches well. So it works.
But then I wanted to basically go through one year and store all the altitude angles into an array (numpy array) for a specific location and I went in 1 minutes intervals.
Here's my very first naive attempt which I'm fairly certain is not good for performance. I just wanted to test for performance anyways.
import numpy as np
from datetime import datetime
from datetime import date
from datetime import timedelta
...
...
altitudes = np.zeros(int(year/60))
m = datetime(2018, 5, 29, 15, 21, 0)
for i in range(0, len(altitudes)):
n = m + timedelta(minutes = i+1)
nn = localSun(30, 0, n)
altitudes[i] = nn.altitude() # .altitude() is a method in localSun
altitudes is the array to which I want to store all the altitudes and its size is 525969 which is basically the amount of minutes in a year.
The localSun() object takes 3 parameters: colatitude (30 deg), longitude (0 deg) and a datetime object which has the time from a bit over an hour ago (when this is posted)
So the question is: What would be a good efficient way of going through a year in 1 minute intervals and computing the altitude angle at that time because this seems rather slow. Should I use map to update the values of the altitude angle instead of a for loop. I presume I'll have to each time create a new localSun object too. Also it's probably bad to just create these variables n and nn all the time.
We can assume the localSun objects all methods work fine. I'm just asking what is an efficient way (if there is) of going through a year in 1 minute intervals and updating the array with the altitude. The code I have should reveal enough information.
I would want to perhaps even do this in just 1 second interval later so it would be great to know if there's an efficient way. I tried that but it takes very long with that if I use this code.
This piece of code took about a minute to do on a university computer which are quite fast as far as I know.
Greatly appreaciate if someone can answer. Thanks in advance!
Numpy has naitive datetime and timedelta support so you could take an approach like this:
start = datetime.datetime(2018,5,29,15,21,0)
end = datetime.datetime(2019,5,29,15,21,0)
n = np.arange(start, end, dtype='datetime64[m]') # [m] specifies the interval as minutes
altitudes = np.vectorize(lambda x, y, z: localSun(x, y, z).altitude())(30,0,n)
np.vectorize is not fast at all, but gets this working until you can modify 'localSun' to work with arrays of datetimes.
Since you are already using numpy you can go one step further with pandas. It has powerful date and time manipulation routines such as pd.date_range:
import pandas as pd
start = pd.Timestamp(year=2018, month=1, day=1)
stop = pd.Timestamp(year=2018, month=12, day=31)
dates = pd.date_range(start, stop, freq='min')
altitudes = localSun(30, 0, dates)
You would then need to adapt your localSun to work with an array of pd.Timestamp rather than a single datetime.datetime.
Changing from minutes to seconds would then be as simple as changing freq='min' to freq='S'.
Trying to implement the model of time series predicting in python but facing with issues with datetime data.
So I have a dataframe 'df' with two columns of datetime and float types:
Then I try to build an array using values method. But smth strange happens and it displays the date in strange format with timestamps and time:
And basically because of it, I can not implement the model receiving the following messages for example:"Cannot add integral value to Timestamp without freq."
So what seems to be the problem and how can it be solved?
It's complicated.
First of all, when creating a numpy array, all types will be the same. However, datetime64 is not the same as int. So we'll have to resolve that, and we will.
Second, you tried to do this with df.values. Which makes sense, however, what happens is that pandas makes the whole df into dtype=object then into an object array. The problem with that is that Timestamps get left as Timestamps which is getting in your way.
So I'd convert them on my own like this
a = np.column_stack([df[c].values.astype(int) for c in ['transaction_date', 'amount']])
a
array([[1454284800000000000, 1],
[1454371200000000000, 2],
[1454457600000000000, 3],
[1454544000000000000, 4],
[1454630400000000000, 5]])
We can always convert the first column of a back like this
a[:, 0].astype(df.transaction_date.values.dtype)
array(['2016-02-01T00:00:00.000000000', '2016-02-02T00:00:00.000000000',
'2016-02-03T00:00:00.000000000', '2016-02-04T00:00:00.000000000',
'2016-02-05T00:00:00.000000000'], dtype='datetime64[ns]')
you can convert your integer into a timedelta, and do the calculations as you did before:
from datetime import timedelta
interval = timedelta(days = 5)
#5 days later
time_stamp += interval
I have a list of time creation of files obtained using os.path.getmtime
time_creation_sorted
Out[45]:
array([ 1.47133334e+09, 1.47133437e+09, 1.47133494e+09,
1.47133520e+09, 1.47133577e+09, 1.47133615e+09,
1.47133617e+09, 1.47133625e+09, 1.47133647e+09])
I know how to convert those elements in hour minute seconds.
datetime.fromtimestamp(time_creation_sorted[1]).strftime('%H:%M:%S')
Out[62]: '09:59:26'
What I would like to do is to create another table that contains the time elapsed since the first element but expressed in hour:min:sec such that it would look like:
array(['00:00:00','00:16:36',...])
But I have not manage to find how to do that. Taking naively the difference between the elements of time_creation_sorted and trying to convert to hour:min:sec does not give something logical:
datetime.fromtimestamp(time_creation_sorted[1]-time_creation_sorted[0]).strftime('%H:%M:%S')
Out[67]: '01:17:02'
Any idea or link on how to do that?
Thanks,
Grégory
You need to rearrange some parts of your code in order to get the desired output.
First you should convert the time stamps to datetime objects which differences result into so called timedelta objects. The __str__() representation of those timedelta objects are exactly what you want:
from datetime import datetime
tstamps = [1.47133334e+09, 1.47133437e+09, 1.47133494e+09, 1.47133520e+09, 1.47133577e+09, 1.47133615e+09, 1.47133617e+09, 1.47133625e+09, 1.47133647e+09]
tstamps = [datetime.fromtimestamp(stamp) for stamp in tstamps]
tstamps_relative = [(t - tstamps[0]).__str__() for t in tstamps]
print(tstamps_relative)
giving:
['0:00:00', '0:17:10', '0:26:40', '0:31:00', '0:40:30', '0:46:50', '0:47:10', '0:48:30', '0:52:10']
Check out timedelta objects, it gives difference between two dates or times
https://docs.python.org/2/library/datetime.html#timedelta-objects
I try to obtain day deltas for a wide range of pandas dates. However, for time deltas >292 years I obtain negative values. For example,
import pandas as pd
dates = pd.Series(pd.date_range('1700-01-01', periods=4500, freq='m'))
days_delta = (dates-dates.min()).astype('timedelta64[D]')
However, using a DatetimeIndex I can do it and it works as I want it to,
import pandas as pd
import numpy as np
dates = pd.date_range('1700-01-01', periods=4500, freq='m')
days_fun = np.vectorize(lambda x: x.days)
days_delta = days_fun(dates.date - dates.date.min())
The question then is how to obtain the correct days_delta for Series objects?
Read here specifically about timedelta limitations:
Pandas represents Timedeltas in nanosecond resolution using 64 bit integers. As such, the 64 bit integer limits determine the Timedelta limits.
Incidentally this is the same limitation the docs mentioned that is placed on Timestamps in Pandas:
Since pandas represents timestamps in nanosecond resolution, the timespan that can be represented using a 64-bit integer is limited to approximately 584 years
This would suggest that the same recommendations the docs make for circumventing the timestamp limitations can be applied to timedeltas. The solution to the timestamp limitations are found in the docs (here):
If you have data that is outside of the Timestamp bounds, see Timestamp limitations, then you can use a PeriodIndex and/or Series of Periods to do computations.
Workaround
If you have continuous dates with small gaps which are calculatable, as in your example, you could sort the series and then use cumsum to get around this problem, like this:
import pandas as pd
dates = pd.TimeSeries(pd.date_range('1700-01-01', periods=4500, freq='m'))
dates.sort()
dateshift = dates.shift(1)
(dates - dateshift).fillna(0).dt.days.cumsum().describe()
count 4500.000000
mean 68466.072444
std 39543.094524
min 0.000000
25% 34233.250000
50% 68465.500000
75% 102699.500000
max 136935.000000
dtype: float64
See the min and max are both positive.
Failaround
If you have too big gaps, this workaround with not work. Like here:
dates = pd.Series(pd.datetools.to_datetime(['2016-06-06', '1700-01-01','2200-01-01']))
dates.sort()
dateshift = dates.shift(1)
(dates - dateshift).fillna(0).dt.days.cumsum()
1 0
0 -97931
2 -30883
This is because we calculate the step between each date, then add them up. And when they are sorted, we are guaranteed the smallest possible steps, however, each step is too big to handle in this case.
Resetting the order
As you see in the Failaround example, the series is no longer ordered by the index. Fix this by calling the .reset_index(inplace=True) method on the series.