Netcdf dataset conversion from seconds from starting time to utc hours

Netcdf dataset conversion from seconds from starting time to utc hours - python

I am working with a netcdf format code and I need to convert the time from seconds from the starting time (2016-01-01 00:00:00.0) to time in UTC. I'm fairly new to all of this so I am really struggling!
I have tried using the num2date from netCDF4.
from netCDF4 import date2num , num2date, Dataset
time=f.variables['time'][:]
dates=netCDF4.num2date(time[:],time.units)
print(dates.strftime('%Y%m%d%H') for date in dates)
AttributeError: 'MaskedArray' object has no attribute 'units'

Since you extract time from the variables in time=f.variables['time'][:], it will lose it's associated unit (time is just a masked array, as the error says).
What you have to feed to num2date() is variables['time'].units, e.g.
from netCDF4 import date2num, num2date, Dataset
file = ... # your nc file
with Dataset(file) as root:
time = root.variables['time'][:]
dates = num2date(time, root.variables['time'].units)
## directly get UTC hours here:
# unit_utchours = root.variables['time'].units.replace('seconds', 'hours')
## would e.g. be 'hours since 2019-08-15 00:00:00'
# utc_hours = date2num(dates, unit_utchours)
# check:
print(dates[0].strftime('%Y%m%d%H'))
# e.g. prints 2019081516
...to get the dates as a number, you could e.g. do
num_dates = [int(d.strftime('%Y%m%d%H')) for d in dates]
# replace int with float if you need floating point numbers etc.
...to get the dates in UTC hours, see the commented section in the first code block. Since the dates array contains objects of type datetime.datetime, you could also do
utc_hours = [d.hour+(d.minute/60)+(d.second/3600) for d in dates]

Related

Convert time zone date column to timestamp format

I have column containing dates in format as seen here....
2021-09-02 06:00:10.474000+00:00
However, I need to convert this column into a 13 numbered timestamp.
I have tried...
df['date_timestamp'] = df[['date']].apply(lambda x: x[0].timestamp(), axis=1).astype(int)
...but this is not producing a 13 numbered timestamp, just 10 numbers instead.
How can get it to spit a 13 numbered timestamp?

you parse to datetime, take the int64 representation and divide that by 1e6 to get Unix time in milliseconds since the epoch (1970-01-01 UTC). Ex:
import numpy as np
import pandas as pd
# string to datetime
s = pd.to_datetime(["2021-09-02 06:00:10.474000+00:00"])
# datetime to Unix time in milliseconds
unix = s.view(np.int64)/1e6
print(unix[0])
# 1630562410473.9998
The standard int64 representation is nanoseconds; so divide by 1e3 if you need microseconds.

Converting Epoch time format to standard time format

I am having an issue with converting the Epoch time format 1585542406929 into the 2020-09-14 Hours Minutes Seconds format.
I tried running this, but it gives me an error
from datetime import datetime
DATETIME_FORMAT = '%Y-%m-%d %H:%M:%S'
datetime.utcfromtimestamp(df2.timestamp_ms).strftime('%Y-%m-%d %H:%M:%S')
error : cannot convert the series to <class 'int'>
What am I not understanding about this datetime function? Is there a better function that I should be using?
edit: should mention that timestamp_ms is my column from my dataframe called df.

Thanks to #chepner for helping me understand the format that this is in.
A quick solution is the following:
# make a new column with Unix time as #ForceBru mentioned
start_date = '1970-01-01'
df3['helper'] = pd.to_datetime(start_date)
# convert your column of JSON dates / numbers to days
df3['timestamp_ms'] = df3['timestamp_ms'].apply(lambda x: (((x/1000)/60)/60/24))
# add a day adder column
df3['time_added'] = pd.to_timedelta(df3['timestamp_ms'],'d')
# add the two columns together
df3['actual_time'] = df3['helper'] + df3['time_added']
Note that you might have to subtract some time off from the actual time stamp. For instance, I had sent my message at 10: 40 am today when it is central time (mid west USA), but the timestamp was putting it at 3:40 pm today.

Python convert from ordinal time with milliseconds [duplicate]

I just started moving from Matlab to Python 2.7 and I have some trouble reading my .mat-files. Time information is stored in Matlab's datenum format. For those who are not familiar with it:
A serial date number represents a calendar date as the number of days that has passed since a fixed base date. In MATLAB, serial date number 1 is January 1, 0000.
MATLAB also uses serial time to represent fractions of days beginning at midnight; for example, 6 p.m. equals 0.75 serial days. So the string '31-Oct-2003, 6:00 PM' in MATLAB is date number 731885.75.
(taken from the Matlab documentation)
I would like to convert this to Pythons time format and I found this tutorial. In short, the author states that
If you parse this using python's datetime.fromordinal(731965.04835648148) then the result might look reasonable [...]
(before any further conversions), which doesn't work for me, since datetime.fromordinal expects an integer:
>>> datetime.fromordinal(731965.04835648148)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: integer argument expected, got float
While I could just round them down for daily data, I actually need to import minutely time series. Does anyone have a solution for this problem? I would like to avoid reformatting my .mat files since there's a lot of them and my colleagues need to work with them as well.
If it helps, someone else asked for the other way round. Sadly, I'm too new to Python to really understand what is happening there.
/edit (2012-11-01): This has been fixed in the tutorial posted above.

You link to the solution, it has a small issue. It is this:
python_datetime = datetime.fromordinal(int(matlab_datenum)) + timedelta(days=matlab_datenum%1) - timedelta(days = 366)
a longer explanation can be found here

Using pandas, you can convert a whole array of datenum values with fractional parts:
import numpy as np
import pandas as pd
datenums = np.array([737125, 737124.8, 737124.6, 737124.4, 737124.2, 737124])
timestamps = pd.to_datetime(datenums-719529, unit='D')
The value 719529 is the datenum value of the Unix epoch start (1970-01-01), which is the default origin for pd.to_datetime().
I used the following Matlab code to set this up:
datenum('1970-01-01') % gives 719529
datenums = datenum('06-Mar-2018') - linspace(0,1,6) % test data
datestr(datenums) % human readable format

Just in case it's useful to others, here is a full example of loading time series data from a Matlab mat file, converting a vector of Matlab datenums to a list of datetime objects using carlosdc's answer (defined as a function), and then plotting as time series with Pandas:
from scipy.io import loadmat
import pandas as pd
import datetime as dt
import urllib
# In Matlab, I created this sample 20-day time series:
# t = datenum(2013,8,15,17,11,31) + [0:0.1:20];
# x = sin(t)
# y = cos(t)
# plot(t,x)
# datetick
# save sine.mat
urllib.urlretrieve('http://geoport.whoi.edu/data/sine.mat','sine.mat');
# If you don't use squeeze_me = True, then Pandas doesn't like
# the arrays in the dictionary, because they look like an arrays
# of 1-element arrays. squeeze_me=True fixes that.
mat_dict = loadmat('sine.mat',squeeze_me=True)
# make a new dictionary with just dependent variables we want
# (we handle the time variable separately, below)
my_dict = { k: mat_dict[k] for k in ['x','y']}
def matlab2datetime(matlab_datenum):
day = dt.datetime.fromordinal(int(matlab_datenum))
dayfrac = dt.timedelta(days=matlab_datenum%1) - dt.timedelta(days = 366)
return day + dayfrac
# convert Matlab variable "t" into list of python datetime objects
my_dict['date_time'] = [matlab2datetime(tval) for tval in mat_dict['t']]
# print df
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 201 entries, 2013-08-15 17:11:30.999997 to 2013-09-04 17:11:30.999997
Data columns (total 2 columns):
x 201 non-null values
y 201 non-null values
dtypes: float64(2)
# plot with Pandas
df = pd.DataFrame(my_dict)
df = df.set_index('date_time')
df.plot()

Here's a way to convert these using numpy.datetime64, rather than datetime.
origin = np.datetime64('0000-01-01', 'D') - np.timedelta64(1, 'D')
date = serdate * np.timedelta64(1, 'D') + origin
This works for serdate either a single integer or an integer array.

Just building on and adding to previous comments. The key is in the day counting as carried out by the method toordinal and constructor fromordinal in the class datetime and related subclasses. For example, from the Python Library Reference for 2.7, one reads that fromordinal
Return the date corresponding to the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1. ValueError is raised unless 1 <= ordinal <= date.max.toordinal().
However, year 0 AD is still one (leap) year to count in, so there are still 366 days that need to be taken into account. (Leap year it was, like 2016 that is exactly 504 four-year cycles ago.)
These are two functions that I have been using for similar purposes:
import datetime
def datetime_pytom(d,t):
'''
Input
d Date as an instance of type datetime.date
t Time as an instance of type datetime.time
Output
The fractional day count since 0-Jan-0000 (proleptic ISO calendar)
This is the 'datenum' datatype in matlab
Notes on day counting
matlab: day one is 1 Jan 0000
python: day one is 1 Jan 0001
hence an increase of 366 days, for year 0 AD was a leap year
'''
dd = d.toordinal() + 366
tt = datetime.timedelta(hours=t.hour,minutes=t.minute,
seconds=t.second)
tt = datetime.timedelta.total_seconds(tt) / 86400
return dd + tt
def datetime_mtopy(datenum):
'''
Input
The fractional day count according to datenum datatype in matlab
Output
The date and time as a instance of type datetime in python
Notes on day counting
matlab: day one is 1 Jan 0000
python: day one is 1 Jan 0001
hence a reduction of 366 days, for year 0 AD was a leap year
'''
ii = datetime.datetime.fromordinal(int(datenum) - 366)
ff = datetime.timedelta(days=datenum%1)
return ii + ff
Hope this helps and happy to be corrected.

ValueError: time data does not match format (convert part of string to time)

I have data that is formatted like DDHHMM (day hour minutes), for example 120630. So the 12th at 06:30. I want to extract only the hour and minutes and convert it to a time object. Is this possible. I get the following error.
time = datetime.strptime(column[3], '%H:%M') #data is from CSV
ValueError: time data '120630' does not match format '%H:%M'

You first need to parse the datetime string in the format it currently is using strptime, and then convert the datetime object to the format you want using strftime:
datetime.strptime('120630', '%d%H%M').strftime('%H:%M')
# '06:30'

Python: convert 'days since 1990' to datetime object

I have a time series that I have pulled from a netCDF file and I'm trying to convert them to a datetime format. The format of the time series is in 'days since 1990-01-01 00:00:00 +10' (+10 being GMT: +10)
time = nc_data.variables['time'][:]
time_idx = 0 # first timestamp
print time[time_idx]
9465.0
My desired output is a datetime object like so (also GMT +10):
"2015-12-01 00:00:00"
I have tried converting this using the time module without much success although I believe I may be using wrong (I'm still a novice in python and programming).
import time
time_datetime = time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(time[time_idx]*24*60*60))
Any advice appreciated,
Cheers!

The datetime module's timedelta is probably what you're looking for.
For example:
from datetime import date, timedelta
days = 9465 # This may work for floats in general, but using integers
# is more precise (e.g. days = int(9465.0))
start = date(1990,1,1) # This is the "days since" part
delta = timedelta(days) # Create a time delta object from the number of days
offset = start + delta # Add the specified number of days to 1990
print(offset) # >>> 2015-12-01
print(type(offset)) # >>> <class 'datetime.date'>
You can then use and/or manipulate the offset object, or convert it to a string representation however you see fit.
You can use the same format as for this date object as you do for your time_datetime:
print(offset.strftime('%Y-%m-%d %H:%M:%S'))
Output:
2015-12-01 00:00:00
Instead of using a date object, you could use a datetime object instead if, for example, you were later going to add hours/minutes/seconds/timezone offsets to it.
The code would stay the same as above with the exception of two lines:
# Here, you're importing datetime instead of date
from datetime import datetime, timedelta
# Here, you're creating a datetime object instead of a date object
start = datetime(1990,1,1) # This is the "days since" part
Note: Although you don't state it, but the other answer suggests you might be looking for timezone aware datetimes. If that's the case, dateutil is the way to go in Python 2 as the other answer suggests. In Python 3, you'd want to use the datetime module's tzinfo.

netCDF num2date is the correct function to use here:
import netCDF4
ncfile = netCDF4.Dataset('./foo.nc', 'r')
time = ncfile.variables['time'] # do not cast to numpy array yet
time_convert = netCDF4.num2date(time[:], time.units, time.calendar)
This will convert number of days since 1900-01-01 (i.e. the units of time) to python datetime objects. If time does not have a calendar attribute, you'll need to specify the calendar, or use the default of standard.

We can do this in a couple steps. First, we are going to use the dateutil library to handle our work. It will make some of this easier.
The first step is to get a datetime object from your string (1990-01-01 00:00:00 +10). We'll do that with the following code:
from datetime import datetime
from dateutil.relativedelta import relativedelta
import dateutil.parser
days_since = '1990-01-01 00:00:00 +10'
days_since_dt = dateutil.parser.parse(days_since)
Now, our days_since_dt will look like this:
datetime.datetime(1990, 1, 1, 0, 0, tzinfo=tzoffset(None, 36000))
We'll use that in our next step, of determining the new date. We'll use relativedelta in dateutils to handle this math.
new_date = days_since_dt + relativedelta(days=9465.0)
This will result in your value in new_date having a value of:
datetime.datetime(2015, 12, 1, 0, 0, tzinfo=tzoffset(None, 36000))
This method ensures that the answer you receive continues to be in GMT+10.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.