I have an excel with such format:
I'm trying to get the 12.07.41am in python (I need to do all edits in python)
from datetime import timedelta, date, datetime
df['start'] = pd.to_datetime(df['start'])
df['start'] = pd.to_datetime(df['start'], format = '%y/%m/%d #H:%M:%S')
but is giving me error: hour must be in 0..23
Given a df of the form:
import pandas as pd
tin = [['07:04.2', '08:12.6'], ['12:14.2', "13:12.8"], ['07:24.0', '07:36.6'], ['09:14.2', "10:12.8"]]
df=pd.DataFrame(data=tin, columns=["start", 'end'])
which creates the df:
start end
0 07:04.2 08:12.6
1 12:14.2 13:12.8
2 07:24.0 07:36.6
3 09:14.2 10:12.8
You can convert the time data in start and end columns to datetime objects using:
df['start'] = [pd.to_datetime(t[0:t.index('.')] + f":{int(float(t[t.index('.'):])*60)}", format='%H:%M:%S') for t in df['start'].to_list()]
df['end'] = [pd.to_datetime(t[0:t.index('.')] + f":{int(float(t[t.index('.'):])*60)}", format='%H:%M:%S') for t in df['end'].to_list()]
Producing an updated df:
start end
0 1900-01-01 07:04:12 1900-01-01 08:12:36
1 1900-01-01 12:14:12 1900-01-01 13:12:48
2 1900-01-01 07:24:00 1900-01-01 07:36:36
3 1900-01-01 09:14:12 1900-01-01 10:12:48
1
Related
I am trying to get the next month first date based on billDate in a dataframe.
I did this:
import pandas as pd
import datetime
from datetime import timedelta
dt = pd.to_datetime('15/4/2019', errors='coerce')
print(dt)
print((dt.replace(day=1) + datetime.timedelta(days=32)).replace(day=1))
It is working perfectly, and the output is :
2019-04-15 00:00:00
2019-05-01 00:00:00
Now, I am applying same logic in my dataframe in the below code
df[comNewColName] = (pd.to_datetime(df['billDate'], errors='coerce').replace(day=1) + datetime.timedelta(days=32)).replace(day=1)
But I am getting error like this:
---> 69 df[comNewColName] = (pd.to_datetime(df['billDate'], errors='coerce').replace(day=1) + datetime.timedelta(days=32)).replace(day=1)
70 '''print(df[['billDate']])'''
71 '''df = df.assign(Product=lambda x: (x['Field_1'] * x['Field_2'] * x['Field_3']))'''
TypeError: replace() got an unexpected keyword argument 'day'
You can use Series.to_period for month periods, add 1 for next month and then convert back to datetimes by Series.dt.to_timestamp:
print (df)
billDate
0 15/4/2019
1 30/4/2019
2 15/8/2019
df['billDate'] = (pd.to_datetime(df['billDate'], errors='coerce', dayfirst=True)
.dt.to_period('m')
.add(1)
.dt.to_timestamp())
print (df)
billDate
0 2019-05-01
1 2019-05-01
2 2019-09-01
I'm looking to convert a UNIX timestamp object to pandas date time. I'm importing the timestamps from a separate source, which displays a date time of 21-01-22 00:01 for the first timepoint and 21-01-22 00:15 for the second time point. Yet my conversion is 10 hours behind these two. Is this related to the +1000 at the end of each string?
df = pd.DataFrame({
'Time' : ['/Date(1642687260000+1000)/','/Date(1642688100000+1000)/'],
})
df['Time'] = df['Time'].str.split('+').str[0]
df['Time'] = df['Time'].str.split('(').str[1]
df['Time'] = pd.to_datetime(df['Time'], unit = 'ms')
Out:
Time
0 2022-01-20 14:01:00
1 2022-01-20 14:15:00
Other source:
Time
0 2022-01-21 00:01:00
1 2022-01-21 00:15:00
You could use a regex to extract Unix time and UTC offset, then parse Unix time to datetime and add the UTC offset as a timedelta, e.g.
import pandas as pd
df = pd.DataFrame({
'Time' : ['/Date(1642687260000+1000)/','/Date(1642688100000+1000)/', None],
})
df[['unix', 'offset']] = df['Time'].str.extract(r'(\d+)([+-]\d+)')
# datetime from unix first, leaves NaT for invalid values
df['datetime'] = pd.to_datetime(df['unix'], unit='ms')
# where datetime is not NaT, add the offset:
df.loc[~df['datetime'].isnull(), 'datetime'] += (
pd.to_datetime(df['offset'][~df['datetime'].isnull()], format='%z').apply(lambda t: t.utcoffset())
)
# or without the apply, but by using an underscored method:
# df['datetime'] = (pd.to_datetime(df['unix'], unit='ms') +
# pd.to_datetime(df['offset'], format='%z').dt.tz._offset)
df['datetime']
# 0 2022-01-21 00:01:00
# 1 2022-01-21 00:15:00
# 2 NaT
# Name: datetime, dtype: datetime64[ns]
Unfortunately, you'll have to use an underscored ("private") method, if you want to avoid the apply. This also only works if you have a constant offset, i.e. if it's the same offset throughout the whole series.
Any ideas on how I can manipulate my current date-time data to make it suitable for use when converting the datatype to time?
For example:
df1['Date/Time'] = pd.to_datetime(df1['Date/Time'])
The current format for the data is mm/dd 00:00:00
an example of the column in the dataframe can be seen below.
Date/Time Dry_Temp[C] Wet_Temp[C] Solar_Diffuse_Rate[[W/m2]] \
0 01/01 00:10:00 8.45 8.237306 0.0
1 01/01 00:20:00 7.30 6.968360 0.0
2 01/01 00:30:00 6.15 5.710239 0.0
3 01/01 00:40:00 5.00 4.462898 0.0
4 01/01 00:50:00 3.85 3.226244 0.0
For the condition where the hour is denoted as 24, you have two choices. First you can simply reset the hour to 00 and second you can reset the hour to 00 and also add 1 to the date.
In either case the first step is detecting the condition which can be done with a simple find statement t.find(' 24:')
Having detected the condition in the first case it is a simple matter of reseting the hour to 00 and proceeding with the process of formatting the field. In the second case, however, adding 1 to the day is a little more complicated because of the fact you can roll over to next month.
Here is the approach I would use:
Given a df of form:
Date Time
0 01/01 00:00:00
1 01/01 00:24:00
2 01/01 24:00:00
3 01/31 24:00:00
The First Case
def parseDate2(tx):
ti = tx.find(' 24:')
if ti >= 0:
tk = pd.to_datetime(tx[:5]+' 00:'+tx[10:], format= '%m/%d %H:%M:%S')
return tk + du.relativedelta.relativedelta(hours=+24)
return pd.to_datetime(tx, format= '%m/%d %H:%M:%S')
df['Date Time'] = df['Date Time'].apply(lambda x: parseDate(x))
Produces the following:
Date Time
0 1900-01-01 00:00:00
1 1900-01-01 00:24:00
2 1900-01-01 00:00:00
3 1900-01-31 00:00:00
For the second case, I employed the dateutil relativedelta library and slightly modified my parseDate funstion as shown below:
import dateutil as du
def parseDate2(tx):
ti = tx.find(' 24:')
if ti >= 0:
tk = pd.to_datetime(tx[:5]+' 00:'+tx[10:], format= '%m/%d %H:%M:%S')
return tk + du.relativedelta.relativedelta(hours=+24)
return pd.to_datetime(tx, format= '%m/%d %H:%M:%S')
df['Date Time'] = df['Date Time'].apply(lambda x: parseDate2(x))
Yields:
Date Time
0 1900-01-01 00:00:00
1 1900-01-01 00:24:00
2 1900-01-02 00:00:00
3 1900-02-01 00:00:00
To access the values of the datetime (namely the time), you can use:
# These are now in a usable format
seconds = df1['Date/Time'].dt.second
minutes = df1['Date/Time'].dt.minute
hours = df1['Date/Time'].dt.hours
And if need be, you can create its own independent time series with:
df1['Dat/Time'].dt.time
I am working with the following piece of data which has a different format of dates and which creates confusion later in the process. The data is like:
S. No DateTime Area
1 03/05/2019 6:33 A
2 06/03/2019 07:23:45 AM B
The first row is the format %m/%d/%Y h: mm and the second row is the format of %d/%m/%Y hh:mm: ss AM/PM. The first date value can be confusing though, is it 5th march or 3rd May. So to make everything of the same format, I want that my code detects the date format and changes in the desired format.
I have tried doing this:
df['Detection Date'] = pd.to_datetime(df['Detection Date & Time'], errors = 'coerce').dt.datetime
col = df['Detection Date'].apply(str)
for i in df.index:
if datetime.datetime.strptime(col, '%m/%d/%Y h:mm'):
ColDate = datetime.datetime.strftime(col, '%d/%m/%Y hh:mm:ss AM/PM')
But i am getting an error saying:
TypeError: strptime() argument 1 must be str, not Series
How it should be conducted.
Thanks
If it is ok to install a dependency then you can use dateparser link
import pandas as pd
import dateparser
df = pd.DataFrame({'Detection Date & Time': ['03/05/2019 6:33', '06/03/2019 07:23:45 AM']})
df['Date & time'] = df['Detection Date & Time'].apply(dateparser.parse)
You can specify both possible formats in to_datetime, so if format not match is returned missing values, so is possible use Series.fillna:
date1 = pd.to_datetime(df['DateTime'], errors = 'coerce', format='%m/%d/%Y %H:%M')
date2 = pd.to_datetime(df['DateTime'], errors = 'coerce', format='%d/%m/%Y %H:%M:%S %p')
df['DateTime'] = date1.fillna(date2)
print (df)
S. No DateTime Area
0 1 2019-03-05 06:33:00 A
1 2 2019-03-06 07:23:45 B
Last if want specify new format add Series.dt.strftime - advanatage of solution are verify both formats:
df['DateTime'] = date1.fillna(date2).dt.strftime('%d/%m/%Y %H:%M:%S %p')
print (df)
S. No DateTime Area
0 1 05/03/2019 06:33:00 AM A
1 2 06/03/2019 07:23:45 AM B
Details:
print (date1)
0 2019-03-05 06:33:00
1 NaT
Name: DateTime, dtype: datetime64[ns]
print (date2)
0 NaT
1 2019-03-06 07:23:45
Name: DateTime, dtype: datetime64[ns]
Another possible solution without verify another formats - only repalaced format %m/%d/%Y %H:%M to %d/%m/%Y %H:%M:%S %p:
date1 = pd.to_datetime(df['DateTime'], errors = 'coerce', format='%m/%d/%Y %H:%M').dt.strftime('%d/%m/%Y %H:%M:%S %p')
df['DateTime'] = date1.replace('NaT', df['DateTime'])
print (df)
S. No DateTime Area
0 1 05/03/2019 06:33:00 AM A
1 2 06/03/2019 07:23:45 AM B
I have a dataframe column which looks like this :
It reads M:S.MS. How can I convert it into a M:S:MS timeformat so I can plot it as a time series graph?
If I plot it as it is, python throws an Invalid literal for float() error.
Note
: This dataframe contains one hour worth of data. Values between
0:0.0 - 59:59.9
df = pd.DataFrame({'date':['00:02.0','00:05:0','00:08.1']})
print (df)
date
0 00:02.0
1 00:05:0
2 00:08.1
It is possible convert to datetime:
df['date'] = pd.to_datetime(df['date'], format='%M:%S.%f')
print (df)
date
0 1900-01-01 00:00:02.000
1 1900-01-01 00:00:05.000
2 1900-01-01 00:00:08.100
Or to timedeltas:
df['date'] = pd.to_timedelta(df['date'].radd('00:'))
print (df)
date
0 00:00:02
1 00:00:05
2 00:00:08.100000
EDIT:
For custom date use:
date = '2015-01-04'
td = pd.to_datetime(date) - pd.to_datetime('1900-01-01')
df['date'] = pd.to_datetime(df['date'], format='%M:%S.%f') + td
print (df)
date
0 2015-01-04 00:00:02.000
1 2015-01-04 00:00:05.000
2 2015-01-04 00:00:08.100