Unable to convert Matlab Timestamp to datetime in Python - python

I am trying to convert timestamp column to datetime. This is part of my data set:
Time,INDOOR Ambient Temp.,INDOOR Relative Humidity,INDOOR Air Velocity,INDOOR Mean Radiant Temp.,INDOOR Lumens,INDOOR CO2,Predicted Mean Vote (PMV)
735080.010417,24.584695,63.70399999999999,0.030988,24.584695,,-0.269505
735080.020833,24.584695,63.856,0.030988,24.584695,,-0.26837300000000003
When parsing to datetime using the following code:
# Load data
df = pd.read_csv("ContData.txt", parse_dates=['Time'])
# Group by day and compute the max temp per day
df.index = df['Time']
pd.to_datetime(df['Time']).apply(lambda x: x.date())
# Identify the day, month and year
df['day'] = df['Time'].map(lambda x: x.day)
df['month'] = df['Time'].map(lambda x: x.month)
df['year'] = df['Time'].map(lambda x: x.year)
I am getting the following error:
ValueError: hour must be in 0..23

Matlab considers the origin January 0, 0000 and outputs the date as the number of days since then. This creates a bit of an issue because that's not a real date and well outside of the datetime64[ns] bounds. With a simple subtraction relative to the POSIX origin (1970-01-01) you can then use the vectorized pd.to_datetime conversion.
import pandas as pd
from datetime import datetime
# Additional 366 because January 0, year 0000
offset = datetime(1970, 1, 1).toordinal() + 366 #719529
pd.to_datetime(df['Time']-offset, unit='D')
#0 2012-07-30 00:15:00.028799999
#1 2012-07-30 00:29:59.971200000
#Name: Time, dtype: datetime64[ns]

Since you added that it's a matlab absolute time, please try the following:
def datenum_to_datetime(datenum):
"""
Convert Matlab datenum into Python datetime.
:param datenum: Date in datenum format
:return: Datetime object corresponding to datenum.
"""
days = datenum % 1
return datetime.fromordinal(int(datenum)) \
+ timedelta(days=days) \
- timedelta(days=366)
print(datenum_to_datetime(your_value))

Related

Pandas date column: problem with date conversion

I have a column date in a Covid data set. The dates appear in this format 20211030 (year - month - day).
However, when converting that column, everything appears with 1970.
This is my code:
df["FECHA"] = pd.to_datetime(df["FECHA"], unit='s')
The result is this:
0 MI PERU 1970-08-22 21:58:27
1 SAN JUAN DE LURIGANCHO 1970-08-22 19:27:09
2 YANAHUARA 1970-08-22 19:22:01
3 CUSCO 1970-08-22 22:08:41
4 PANGOA 1970-08-22 21:58:36
Thank you in advance for your help, big hug.
I get this error:
ValueError: The 'datetime64' dtype has no unit. Please pass in 'datetime64[ns]' instead.
my complete code
import pandas as pd
import numpy as np
import matplotlib.pyplot as
plt from datetime import datetime
dataset_covid = "datasetcovid.csv"
df = pd.read_csv(dataset_covid, sep=";", usecols=["DISTRITO", "FECHA_RESULTADO"])
df['FECHA_RESULTADO'] = df['FECHA_RESULTADO'].astype('datetime64')
also try this other code
df['FECHA_RESULTADO'] = df['FECHA_RESULTADO''].astype(str).astype('datetime64')
ParserError: year 20210307 is out of range: 20210307.0
In your case, you don't need pd.to_datetime IF column contains strings:
df = pd.DataFrame({'FECHA': ['20211030']})
print(df)
# Output:
FECHA
0 20211030
Use astype:
df['FECHA'] = df['FECHA'].astype('datetime64')
print(df)
# Output:
FECHA
0 2021-10-30
BUT if the dtype of your column FECHA is integer, you have to cast your column to string before:
df['FECHA'] = df['FECHA'].astype(str).astype('datetime64')
print(df)
# Output:
FECHA
0 2021-10-30
As noted in the comments, the result is caused by the parameters you are inputing in the to_datetime function. To fix this you should :
drop the unit parameter which is not related to your formating
add a format parameter which correspond to the date format you are using.
Hence, your code should go from:
df["FECHA"] = pd.to_datetime(df["FECHA"], unit='s')
To this:
df["FECHA"] = pd.to_datetime(df["FECHA"], format='%Y%m%d')
In order to find the proper formating you can lookup the values that correspond within this documentation. Docs related to the to_datetime function can be found here.
In our scenario the %Y corresponds to a year with century as a decimal number.
The %m to a padded month (with a starting zero). And the %d to the day in the month. This should match the 20211030 (year - month - day) given.

Not all dates are captured when filtering by dates. Python Pandas

I am filtering a dataframe by dates to produce two seperate versions:
Data from only today's date
Data from the last two years
However, when I try to filter on the date, it seems to miss dates that are within the last two years.
date_format = '%m-%d-%Y' # desired date format
today = dt.now().strftime(date_format) # today's date. Will always result in today's date
today = dt.strptime(today, date_format).date() # converting 'today' into a datetime object
today = today.strftime(date_format)
two_years = today - relativedelta(years=2) # date is today's date minus two years.
two_years = two_years.strftime(date_format)
# normalizing the format of the date column to the desired format
df_data['date'] = pd.to_datetime(df_data['date'], errors='coerce').dt.strftime(date_format)
df_today = df_data[df_data['date'] == today]
df_two_year = df_data[df_data['date'] >= two_years]
Which results in:
all dates ['07-17-2020' '07-15-2020' '08-01-2019' '03-25-2015']
today df ['07-17-2020']
two year df ['07-17-2020' '08-01-2019']
The 07-15-2020 date is missing from the two year, even though 08-01-2019 is captured.
you don't need to convert anything to string, simply work with datetime dtype. Ex:
import pandas as pd
df = pd.DataFrame({'date': pd.to_datetime(['07-17-2020','07-15-2020','08-01-2019','03-25-2015'])})
today = pd.Timestamp('now')
print(df[df['date'].dt.date == today.date()])
# date
# 0 2020-07-17
print(df[(df['date'].dt.year >= today.year-1) & (df['date'].dt.date != today.date())])
# date
# 1 2020-07-15
# 2 2019-08-01
What you get from the comparison operations (adjust them as needed...) are boolean masks - you can use them nicely to filter the df.
Your datatype conversions are the problem here. You could do this:
today = dt.now() # today's date. Will always result in today's date
two_years = today - relativedelta(years=2) # date is today's date minus two years.
This prints '2018-07-17 18:40:42.704395'. You can then convert it to the date only format.
two_years = two_years.strftime(date_format)
two_years = dt.strptime(two_years, date_format).date()

Convert date string YYYY-MM-DD to YYYYMM in pandas

Is there a way in pandas to convert my column date which has the following format '1997-01-31' to '199701', without including any information about the day?
I tried solution of the following form:
df['DATE'] = df['DATE'].apply(lambda x: datetime.strptime(x, '%Y%m'))
but I obtain this error : 'ValueError: time data '1997-01-31' does not match format '%Y%m''
Probably the reason is that I am not including the day in the format. Is there a way better to pass from YYYY-MM_DD format to YYYYMM in pandas?
One way is to convert the date to date time and then use strftime. Just a note that you do lose the datetime functionality of the date
df = pd.DataFrame({'date':['1997-01-31' ]})
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.strftime('%Y%m')
date
0 199701
Might not need to go through the datetime conversion if the data are sufficiently clean (no incorrect strings like 'foo' or '001231'):
df = pd.DataFrame({'date':['1997-01-31', '1997-03-31', '1997-12-18']})
df['date'] = [''.join(x.split('-')[0:2]) for x in df.date]
# date
#0 199701
#1 199703
#2 199712
Or if you have null values:
df['date'] = df.date.str.replace('-', '').str[0:6]

Trying to convert object to DateTime, getting TypeError

I have two dataframes (see here), which contain dates and times.
The details for the first data frame are:
Date object
Time object
Channel1 float64
Channel2 float64
Channel3 float64
Channel4 float64
Channel5 float64
dtype: object
The details for the second data frame are:
Date object
Time object
Mean float64
STD float64
Min float64
Max float64
dtype: object
I am trying to convert the times to a DateTime object so that I can then do a calculation to make the time relative to the first time instance (i.e. the earliest time would become 0, and then all others would be seconds after the start).
When I try (from here):
df['Time'] = df['Time'].apply(pd.Timestamp)
I get this error:
TypeError: Cannot convert input [15:35:45] of type <class 'datetime.time'> to Timestamp
When I try (from here):
df['Time'] = pd.to_datetime(df['Time'])
but it gives me this error:
TypeError: <class 'datetime.time'> is not convertible to datetime
Any suggestions would be appreciated.
the reason why you are getting the error
TypeError: <class 'datetime.time'> is not convertible to datetime
is literally what it says, your df['Time'] contains datetime.time object and so, cannot be converted to a datetime.datetime or Timestamp object, both of which require the date component to be passed as well.
The solution is to combine df['Date'] and df['Time'] and then, pass it to pd.to_datetime. See below code sample:
df = pd.DataFrame({'Date': ['3/11/2000', '3/12/2000', '3/13/2000'],
'Time': ['15:35:45', '18:35:45', '05:35:45']})
df['datetime'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
Output
Date Time datetime
0 3/11/2000 15:35:45 2000-03-11 15:35:45
1 3/12/2000 18:35:45 2000-03-12 18:35:45
2 3/13/2000 05:35:45 2000-03-13 05:35:45
In the end my solution was different for the two dataframes which I had.
For the first dataframe, the solution which combines the Date column with the Time column worked well:
df['Date Time'] = df['Date'] + ' ' + df['Time']
After the two columns are combined, the following code is used to turn it into a datetime object (note the format='%d/%m/%Y %H:%M:%S' part is required because otherwise it confuses the month/date and uses the US formatting, i.e. it thinks 11/12/2018 is 12th of November, and not 11th of December):
df['Date Time'] = pd.to_datetime(df['Date Time'], format='%d/%m/%Y %H:%M:%S')
For my second dataframe, I went up earlier in my data processing journey and found an option which saves the date and month to a single column directly. After which the following code converted it to a datetime object:
df['Date Time'] = df['Date Time'].apply(pd.Timestamp)

Python datetime delta format

I am attempting to find records in my dataframe that are 30 days old or older. I pretty much have everything working but I need to correct the format of the Age column. Most everything in the program is stuff I found on stack overflow, but I can't figure out how to change the format of the delta that is returned.
import pandas as pd
import datetime as dt
file_name = '/Aging_SRs.xls'
sheet = 'All'
df = pd.read_excel(io=file_name, sheet_name=sheet)
df.rename(columns={'SR Create Date': 'Create_Date', 'SR Number': 'SR'}, inplace=True)
tday = dt.date.today()
tdelta = dt.timedelta(days=30)
aged = tday - tdelta
df = df.loc[df.Create_Date <= aged, :]
# Sets the SR as the index.
df = df.set_index('SR', drop = True)
# Created the Age column.
df.insert(2, 'Age', 0)
# Calculates the days between the Create Date and Today.
df['Age'] = df['Create_Date'].subtract(tday)
The calculation in the last line above gives me the result, but it looks like -197 days +09:39:12 and I need it to just be a positive number 197. I have also tried to search using the python, pandas, and datetime keywords.
df.rename(columns={'Create_Date': 'SR Create Date'}, inplace=True)
writer = pd.ExcelWriter('output_test.xlsx')
df.to_excel(writer)
writer.save()
I can't see your example data, but IIUC and you're just trying to get the absolute value of the number of days of a timedelta, this should work:
df['Age'] = abs(df['Create_Date'].subtract(tday)).dt.days)
Explanation:
Given a dataframe with a timedelta column:
>>> df
delta
0 26523 days 01:57:59
1 -1601 days +01:57:59
You can extract just the number of days as an int using dt.days:
>>> df['delta']dt.days
0 26523
1 -1601
Name: delta, dtype: int64
Then, all you need to do is wrap that in a call to abs to get the absolute value of that int:
>>> abs(df.delta.dt.days)
0 26523
1 1601
Name: delta, dtype: int64
here is what i worked out for basically the same issue.
# create timestamp for today, normalize to 00:00:00
today = pd.to_datetime('today', ).normalize()
# match timezone with datetimes in df so subtraction works
today = today.tz_localize(df['posted'].dt.tz)
# create 'age' column for days old
df['age'] = (today - df['posted']).dt.days
pretty much the same as the answer above, but without the call to abs().

Categories