Date Time Format Issues Python - python

I am currently having issues with date-time format, particularly converting string input to the correct python datetime format
Date/Time Dry_Temp[C] Wet_Temp[C] Solar_Diffuse_Rate[[W/m2]] \
0 01/01 00:10:00 8.45 8.237306 0.0
1 01/01 00:20:00 7.30 6.968360 0.0
2 01/01 00:30:00 6.15 5.710239 0.0
3 01/01 00:40:00 5.00 4.462898 0.0
4 01/01 00:50:00 3.85 3.226244 0.0
These are current examples of timestamps I have in my time, I have tried splitting date and time such that I now have the following columns:
WC_Humidity[%] WC_Htgsetp[C] WC_Clgsetp[C] Date Time
0 55.553640 18 26 1900-01-01 00:10:00
1 54.204342 18 26 1900-01-01 00:20:00
2 51.896272 18 26 1900-01-01 00:30:00
3 49.007770 18 26 1900-01-01 00:40:00
4 45.825810 18 26 1900-01-01 00:50:00
I have managed to get the year into datetime format, but there are still 2 problems to resolve:
the data was not recorded in 1900, so I would like to change the year in the Date,
I get the following error whent rying to convert time into time datetime python format
pandas/_libs/tslibs/strptime.pyx in pandas._libs.tslibs.strptime.array_strptime()
ValueError: time data '00:00:00' does not match format ' %m/%d %H:%M:%S' (match)
I tried having 24:00:00, however, python didn't like that either...
preferences:
I would prefer if they were both in the same cell without having to split this information into two columns.
I would also like to get rid of the seconds data as the data was recorded in 10 min intervals so there is no need for seconds in my case.
Any help would be greatly appreciated.

the data was not recorded in 1900, so I would like to change the year in the Date,
datetime.datetime.replace method of datetime.datetime instance is used for this task consider following example:
import pandas as pd
df = pd.DataFrame({"when":pd.to_datetime(["1900-01-01","1900-02-02","1900-03-03"])})
df["when"] = df["when"].apply(lambda x:x.replace(year=2000))
print(df)
output
when
0 2000-01-01
1 2000-02-02
2 2000-03-03
Note that it can be used also without pandas for example
import datetime
d = datetime.datetime.strptime("","") # use all default values which result in midnight of Jan 1 of year 1900
print(d) # 1900-01-01 00:00:00
d = d.replace(year=2000)
print(d) # 2000-01-01 00:00:00

Related

Reading in Date / Time Values Correctly

Any ideas on how I can manipulate my current date-time data to make it suitable for use when converting the datatype to time?
For example:
df1['Date/Time'] = pd.to_datetime(df1['Date/Time'])
The current format for the data is mm/dd 00:00:00
an example of the column in the dataframe can be seen below.
Date/Time Dry_Temp[C] Wet_Temp[C] Solar_Diffuse_Rate[[W/m2]] \
0 01/01 00:10:00 8.45 8.237306 0.0
1 01/01 00:20:00 7.30 6.968360 0.0
2 01/01 00:30:00 6.15 5.710239 0.0
3 01/01 00:40:00 5.00 4.462898 0.0
4 01/01 00:50:00 3.85 3.226244 0.0
For the condition where the hour is denoted as 24, you have two choices. First you can simply reset the hour to 00 and second you can reset the hour to 00 and also add 1 to the date.
In either case the first step is detecting the condition which can be done with a simple find statement t.find(' 24:')
Having detected the condition in the first case it is a simple matter of reseting the hour to 00 and proceeding with the process of formatting the field. In the second case, however, adding 1 to the day is a little more complicated because of the fact you can roll over to next month.
Here is the approach I would use:
Given a df of form:
Date Time
0 01/01 00:00:00
1 01/01 00:24:00
2 01/01 24:00:00
3 01/31 24:00:00
The First Case
def parseDate2(tx):
ti = tx.find(' 24:')
if ti >= 0:
tk = pd.to_datetime(tx[:5]+' 00:'+tx[10:], format= '%m/%d %H:%M:%S')
return tk + du.relativedelta.relativedelta(hours=+24)
return pd.to_datetime(tx, format= '%m/%d %H:%M:%S')
df['Date Time'] = df['Date Time'].apply(lambda x: parseDate(x))
Produces the following:
Date Time
0 1900-01-01 00:00:00
1 1900-01-01 00:24:00
2 1900-01-01 00:00:00
3 1900-01-31 00:00:00
For the second case, I employed the dateutil relativedelta library and slightly modified my parseDate funstion as shown below:
import dateutil as du
def parseDate2(tx):
ti = tx.find(' 24:')
if ti >= 0:
tk = pd.to_datetime(tx[:5]+' 00:'+tx[10:], format= '%m/%d %H:%M:%S')
return tk + du.relativedelta.relativedelta(hours=+24)
return pd.to_datetime(tx, format= '%m/%d %H:%M:%S')
df['Date Time'] = df['Date Time'].apply(lambda x: parseDate2(x))
Yields:
Date Time
0 1900-01-01 00:00:00
1 1900-01-01 00:24:00
2 1900-01-02 00:00:00
3 1900-02-01 00:00:00
​
To access the values of the datetime (namely the time), you can use:
# These are now in a usable format
seconds = df1['Date/Time'].dt.second
minutes = df1['Date/Time'].dt.minute
hours = df1['Date/Time'].dt.hours
And if need be, you can create its own independent time series with:
df1['Dat/Time'].dt.time

First week of year considering the first day last year

I have the following df:
time_series date sales
store_0090_item_85261507 1/2020 1,0
store_0090_item_85261501 2/2020 0,0
store_0090_item_85261500 3/2020 6,0
Being 'date' = Week/Year.
So, I tried use the following code:
df['date'] = df['date'].apply(lambda x: datetime.strptime(x + '/0', "%U/%Y/%w"))
But, return this df:
time_series date sales
store_0090_item_85261507 2020-01-05 1,0
store_0090_item_85261501 2020-01-12 0,0
store_0090_item_85261500 2020-01-19 6,0
But, the first day of the first week of 2020 is 2019-12-29, considering sunday as first day. How can I have the first day 2020-12-29 of the first week of 2020 and not 2020-01-05?
From the datetime module's documentation:
%U: Week number of the year (Sunday as the first day of the week) as a zero padded decimal number. All days in a new year preceding the first Sunday are considered to be in week 0.
Edit: My originals answer doesn't work for input 1/2023 and using ISO 8601 date values doesn't work for 1/2021, so I've edited this answer by adding a custom function
Here is a way with a custom function
import pandas as pd
from datetime import datetime, timedelta
##############################################
# to demonstrate issues with certain dates
print(datetime.strptime('0/2020/0', "%U/%Y/%w")) # 2019-12-29 00:00:00
print(datetime.strptime('1/2020/0', "%U/%Y/%w")) # 2020-01-05 00:00:00
print(datetime.strptime('0/2021/0', "%U/%Y/%w")) # 2020-12-27 00:00:00
print(datetime.strptime('1/2021/0', "%U/%Y/%w")) # 2021-01-03 00:00:00
print(datetime.strptime('0/2023/0', "%U/%Y/%w")) # 2023-01-01 00:00:00
print(datetime.strptime('1/2023/0', "%U/%Y/%w")) # 2023-01-01 00:00:00
#################################################
df = pd.DataFrame({'date':["1/2020", "2/2020", "3/2020", "1/2021", "2/2021", "1/2023", "2/2023"]})
print(df)
def get_first_day(date):
date0 = datetime.strptime('0/' + date.split('/')[1] + '/0', "%U/%Y/%w")
date1 = datetime.strptime('1/' + date.split('/')[1] + '/0', "%U/%Y/%w")
date = datetime.strptime(date + '/0', "%U/%Y/%w")
return date if date0 == date1 else date - timedelta(weeks=1)
df['new_date'] = df['date'].apply(lambda x:get_first_day(x))
print(df)
Input
date
0 1/2020
1 2/2020
2 3/2020
3 1/2021
4 2/2021
5 1/2023
6 2/2023
Output
date new_date
0 1/2020 2019-12-29
1 2/2020 2020-01-05
2 3/2020 2020-01-12
3 1/2021 2020-12-27
4 2/2021 2021-01-03
5 1/2023 2023-01-01
6 2/2023 2023-01-08
You'll want to use ISO week parsing directives, Ex:
import pandas as pd
date = pd.Series(["1/2020", "2/2020", "3/2020"])
pd.to_datetime(date+"/1", format="%V/%G/%u")
0 2019-12-30
1 2020-01-06
2 2020-01-13
dtype: datetime64[ns]
you can also shift by one day if the week should start on Sunday:
pd.to_datetime(date+"/1", format="%V/%G/%u") - pd.Timedelta('1d')
0 2019-12-29
1 2020-01-05
2 2020-01-12
dtype: datetime64[ns]

How to change the time of a pandas datetime object to the start of the hour?

I have a pandas Dataframe in which one of the column is pandas datetime column created using pd.to_datetime()1. I want to extract the date and hour from each datetime object, in other words, I want to change the minute and seconds to 0.
I used normalize() to change the time to midnight but don't how how to change the time to start of the hour. Please suggest a way to do so.
making some test data and turning it into a dataframe
rng = pd.date_range('1/1/2018 11:59:00', periods=3, freq='min')
df = pd.DataFrame(rng)
print(df)
print(df[0].round('H'))
gives the input
0
0 2018-01-01 11:59:00
1 2018-01-01 12:00:00
2 2018-01-01 12:01:00
and rounded to the nearest hour gives
0
0 2018-01-01 12:00:00
1 2018-01-01 12:00:00
2 2018-01-01 12:00:00
and
print(df[0].dt.floor('H'))
gives
0
0 2018-01-01 11:00:00
1 2018-01-01 12:00:00
2 2018-01-01 12:00:00
if you always want to round down. Likewise, ceil('H') if you want to round up
I think you need to checkout pandas.Series.dt.strftime
Or try this:
import datetime
df=pd.DataFrame({'timestamp':[pd.Timestamp('today')]})
df['Date']=[pd.to_datetime(i.date())+ datetime.timedelta(hours=i.hour) for i in df['timestamp']]

How to define a 4-4-5 week period in Pandas

My company uses a 4-4-5 calendar for reporting purposes. Each month (aka period) is 4-weeks long, except every 3rd month is 5-weeks long.
Pandas seems to have good support for custom calendar periods. However, I'm having trouble figuring out the correct frequency string or custom business month offset to achieve months for a 4-4-5 calendar.
For example:
df_index = pd.date_range("2020-03-29", "2021-03-27", freq="D", name="date")
df = pd.DataFrame(
index=df_index, columns=["a"], data=np.random.randint(0, 100, size=len(df_index))
)
df.groupby(pd.Grouper(level=0, freq="4W-SUN")).mean()
Grouping by 4-weeks starting on Sunday results in the following. The first three month start dates are correct but I need every third month to be 5-weeks long. The 4th month start date should be 2020-06-28.
a
date
2020-03-29 16.000000
2020-04-26 50.250000
2020-05-24 39.071429
2020-06-21 52.464286
2020-07-19 41.535714
2020-08-16 46.178571
2020-09-13 51.857143
2020-10-11 44.250000
2020-11-08 47.714286
2020-12-06 56.892857
2021-01-03 55.821429
2021-01-31 53.464286
2021-02-28 53.607143
2021-03-28 45.037037
Essentially what I'd like to achieve is something like this:
a
date
2020-03-29 20.000000
2020-04-26 50.750000
2020-05-24 49.750000
2020-06-28 49.964286
2020-07-26 52.214286
2020-08-23 47.714286
2020-09-27 46.250000
2020-10-25 53.357143
2020-11-22 52.035714
2020-12-27 39.750000
2021-01-24 43.428571
2021-02-21 49.392857
Pandas currently support only yearly and quarterly 5253 (aka 4-4-5 calendar).
See is pandas.tseries.offsets.FY5253 and pandas.tseries.offsets.FY5253Quarter
df_index = pd.date_range("2020-03-29", "2021-03-27", freq="D", name="date")
df = pd.DataFrame(index=df_index)
df['a'] = np.random.randint(0, 100, df.shape[0])
So indeed you need some more work to get to week level and maintain a 4-4-5 calendar. You could align to quarters using the native pandas offset and fill-in the 4-4-5 week pattern manually.
def date_range(start, end, offset_array, name=None):
start = pd.to_datetime(start)
end = pd.to_datetime(end)
index = []
start -= offset_array[0]
while(start<end):
for x in offset_array:
start += x
if start > end:
break
index.append(start)
return pd.Series(index, name=name)
This function takes a list of offsets rather than a regular frequency period, so it allows to move from date to date following the offsets in the given array:
offset_445 = [
pd.tseries.offsets.FY5253Quarter(weekday=6),
4*pd.tseries.offsets.Week(weekday=6),
4*pd.tseries.offsets.Week(weekday=6),
]
df_index_445 = date_range("2020-03-29", "2021-03-27", offset_445, name='date')
Out:
0 2020-05-03
1 2020-05-31
2 2020-06-28
3 2020-08-02
4 2020-08-30
5 2020-09-27
6 2020-11-01
7 2020-11-29
8 2020-12-27
9 2021-01-31
10 2021-02-28
Name: date, dtype: datetime64[ns]
Once the index is created, then it's back to aggregations logic to get the data in the right row buckets. Assuming that you want the mean for the start of each 4 or 5 week period, according to the df_index_445 you have generated, it could look like this:
# calculate the mean on reindex groups
reindex = df_index_445.searchsorted(df.index, side='right') - 1
res = df.groupby(reindex).mean()
# filter valid output
res = res[res.index>=0]
res.index = df_index_445
Out:
a
2020-05-03 47.857143
2020-05-31 53.071429
2020-06-28 49.257143
2020-08-02 40.142857
2020-08-30 47.250000
2020-09-27 52.485714
2020-11-01 48.285714
2020-11-29 56.178571
2020-12-27 51.428571
2021-01-31 50.464286
2021-02-28 53.642857
Note that since the frequency is not regular, pandas will set the datetime index frequency to None.

Pandas - convert float to proper datetime or time object

I have an observational data set which contain weather information. Each column contain specific field in which date and time are in two separate column. The time column contain hourly time like 0000, 0600 .. up to 2300. What I am trying to do is to filter the data set based on certain time frame, for example between 0000 UTC to 0600 UTC. When I try to read the data file in pandas data frame, by default the time column is read in float. When I try to convert it in to datatime object, it produces a format which I am unable to convert. Code example is given below:
import pandas as pd
import datetime as dt
df = pd.read_excel("test.xlsx")
df.head()
which produces the following result:
tdate itime moonph speed ... qnh windir maxtemp mintemp
0 01-Jan-17 1000.0 NM7 5 ... $1,011.60 60.0 $32.60 $22.80
1 01-Jan-17 1000.0 NM7 2 ... $1,015.40 999.0 $32.60 $22.80
2 01-Jan-17 1030.0 NM7 4 ... $1,015.10 60.0 $32.60 $22.80
3 01-Jan-17 1100.0 NM7 3 ... $1,014.80 999.0 $32.60 $22.80
4 01-Jan-17 1130.0 NM7 5 ... $1,014.60 270.0 $32.60 $22.80
Then I extracted the time column with following line:
df["time"] = df.itime
df["time"]
0 1000.0
1 1000.0
2 1030.0
3 1100.0
4 1130.0
5 1200.0
6 1230.0
7 1300.0
8 1330.0
.
.
3261 2130.0
3262 2130.0
3263 600.0
3264 630.0
3265 730.0
3266 800.0
3267 830.0
3268 1900.0
3269 1930.0
3270 2000.0
Name: time, Length: 3279, dtype: float64
Then I tried to convert the time column to datetime object:
df["time"] = pd.to_datetime(df.itime)
which produced the following result:
df["time"]
0 1970-01-01 00:00:00.000001000
1 1970-01-01 00:00:00.000001000
2 1970-01-01 00:00:00.000001030
3 1970-01-01 00:00:00.000001100
It appears that it has successfully converted the data to datetime object. However, it added the hour time to ms which is difficult for me to do filtering.
The final data format I would like to get is either:
1970-01-01 06:00:00
or
06:00
Any help is appreciated.
When you read the excel file specify the dtype of col itime as a str:
df = pd.read_excel("test.xlsx", dtype={'itime':str})
then you will have a time column of strings looking like:
df = pd.DataFrame({'itime':['2300', '0100', '0500', '1000']})
Then specify the format and convert to time:
df['Time'] = pd.to_datetime(df['itime'], format='%H%M').dt.time
itime Time
0 2300 23:00:00
1 0100 01:00:00
2 0500 05:00:00
3 1000 10:00:00
Just addon to Chris answer, if you are unable to convert because there is no zero in the front, apply the following to the dataframe.
df['itime'] = df['itime'].apply(lambda x: x.zfill(4))
So basically is that because the original format does not have even leading digit (4 digit). Example: 945 instead of 0945.
Try
df["time"] = pd.to_datetime(df.itime).dt.strftime('%Y-%m-%d %H:%M:%S')
df["time"] = pd.to_datetime(df.itime).dt.strftime('%H:%M:%S')
For the first and second outputs you want to
Best!

Categories