Convert Dataframe column to time format in python - python

I have a dataframe column which looks like this :
It reads M:S.MS. How can I convert it into a M:S:MS timeformat so I can plot it as a time series graph?
If I plot it as it is, python throws an Invalid literal for float() error.
Note
: This dataframe contains one hour worth of data. Values between
0:0.0 - 59:59.9

df = pd.DataFrame({'date':['00:02.0','00:05:0','00:08.1']})
print (df)
date
0 00:02.0
1 00:05:0
2 00:08.1
It is possible convert to datetime:
df['date'] = pd.to_datetime(df['date'], format='%M:%S.%f')
print (df)
date
0 1900-01-01 00:00:02.000
1 1900-01-01 00:00:05.000
2 1900-01-01 00:00:08.100
Or to timedeltas:
df['date'] = pd.to_timedelta(df['date'].radd('00:'))
print (df)
date
0 00:00:02
1 00:00:05
2 00:00:08.100000
EDIT:
For custom date use:
date = '2015-01-04'
td = pd.to_datetime(date) - pd.to_datetime('1900-01-01')
df['date'] = pd.to_datetime(df['date'], format='%M:%S.%f') + td
print (df)
date
0 2015-01-04 00:00:02.000
1 2015-01-04 00:00:05.000
2 2015-01-04 00:00:08.100

Related

How to homogenize date type in a pandas dataframe column?

I have a Date column in my dataframe having dates with 2 different types (YYYY-DD-MM 00:00:00 and YYYY-DD-MM) :
Date
0 2023-01-10 00:00:00
1 2024-27-06
2 2022-07-04 00:00:00
3 NaN
4 2020-30-06
(you can use pd.read_clipboard(sep='\s\s+') after copying the previous dataframe to get it in your notebook)
I would like to have only a YYYY-MM-DD type. Consequently, I would like to have :
Date
0 2023-10-01
1 2024-06-27
2 2022-04-07
3 NaN
4 2020-06-30
How please could I do ?
Use Series.str.replace with to_datetime and format parameter:
df['Date'] = pd.to_datetime(df['Date'].str.replace(' 00:00:00',''), format='%Y-%d-%m')
print (df)
Date
0 2023-10-01
1 2024-06-27
2 2022-04-07
3 NaT
4 2020-06-30
Another idea with match both formats:
d1 = pd.to_datetime(df['Date'], format='%Y-%d-%m', errors='coerce')
d2 = pd.to_datetime(df['Date'], format='%Y-%d-%m 00:00:00', errors='coerce')
df['Date'] = d1.fillna(d2)

String dates into unixtime in a pandas dataframe

i got dataframe with column like this:
Date
3 mins
2 hours
9-Feb
13-Feb
the type of the dates is string for every row. What is the easiest way to get that dates into integer unixtime ?
One idea is convert columns to datetimes and to timedeltas:
df['dates'] = pd.to_datetime(df['Date']+'-2020', format='%d-%b-%Y', errors='coerce')
times = df['Date'].replace({'(\d+)\s+mins': '00:\\1:00',
'\s+hours': ':00:00'}, regex=True)
df['times'] = pd.to_timedelta(times, errors='coerce')
#remove rows if missing values in dates and times
df = df[df['Date'].notna() | df['times'].notna()]
df['all'] = df['dates'].dropna().astype(np.int64).append(df['times'].dropna().astype(np.int64))
print (df)
Date dates times all
0 3 mins NaT 00:03:00 180000000000
1 2 hours NaT 02:00:00 7200000000000
2 9-Feb 2020-02-09 NaT 1581206400000000000
3 13-Feb 2020-02-13 NaT 1581552000000000000

Adding 1 day to selective time period of date column

I have a table where it has a column 'Date', 'Time', 'Costs'.
I want to select rows where the time is greater than 12:00:00, then add 1 day to 'Date' column of the selected rows.
How should I go about in doing it?
So far I have:
df[df['Time']>'12:00:00']['Date'] = df[df['Time']>'12:00:00']['Date'].astype('datetime64[ns]') + timedelta(days=1)
I am a beginner in learning coding and any suggestions would be really helpful! Thanks.
Use to_datetime first for column Date if not datetimes, then convert column Time to string if possible python times, convert to datetimes and get hours by Series.dt.hour, compare and add 1 day by condition:
df = pd.DataFrame({'Date':['2015-01-02','2016-05-08'],
'Time':['10:00:00','15:00:00']})
print (df)
Date Time
0 2015-01-02 10:00:00
1 2016-05-08 15:00:00
df['Date'] = pd.to_datetime(df['Date'])
mask = pd.to_datetime(df['Time'].astype(str)).dt.hour > 12
df.loc[mask, 'Date'] += pd.Timedelta(days=1)
print (df)
Date Time
0 2015-01-02 10:00:00
1 2016-05-09 15:00:00

Cleaning up pandas column of datetime strings

I currently have some data in the form of datestrings that I would like to standardize into a zero-padded %H:%M:%S string. In its original form, the data deviates from the standard format in the following ways:
The time is not zero padded (e.g. '2:05:00')
There can be trailing whitespaces (e.g., ' 2:05:00')
There can be times over 24H displayed (e.g., '25:00:00')
Currently, this is what I have:
df['arrival_time'] = pd.to_datetime(df['arrival_time'].map(lambda x: x.strip()), format='%H:%M:%S').dt.strftime('%H:%M:%S')
But I get an error on the times that are over 24H. Is there a good way to transform this dataframe column into the proper format?
I believe you need:
df = pd.DataFrame({'arrival_time':['2:05:00','2:05:00','25:00:00'],})
df['arrival_time'] = df['arrival_time'].str.strip().str.zfill(8)
print (df)
arrival_time
0 02:05:00
1 02:05:00
2 25:00:00
Or:
df['arrival_time'] = pd.to_datetime(df['arrival_time'].str.strip(), errors='coerce')
.dt.strftime('%H:%M:%S')
print (df)
arrival_time
0 02:05:00
1 02:05:00
2 NaT
Or:
df['arrival_time'] = (pd.to_timedelta(df['arrival_time'].str.strip())
.astype(str)
.str.extract('\s.*\s(.*)\.', expand=False))
print (df)
arrival_time
0 02:05:00
1 02:05:00
2 01:00:00

How do I set the time of a datetime column in python pandas based on a string column

I have a DataFrame containing a DateTime column with dates but without time ['date_from']. I have the time in column ['Time'] (string). How can I add only the time to the already existing DateTime column?
I tried:
df['date_from'].dt.time = pd.to_datetime(df['Time'], format='%H%M').dt.time
Convert column to to_timedelta and add to datetime column:
#convert to string and if necessary add zero for 4 values
s = df['Time'].astype(str).str.zfill(4)
df['date_from'] += pd.to_timedelta(s.str[:2] + ':' + s.str[2:] + ':00')
Sample:
df = pd.DataFrame({'date_from':pd.date_range('2015-01-01', periods=3),
'Time':[1501,112, 2012]})
print (df)
Time date_from
0 1501 2015-01-01
1 0112 2015-01-02
2 2012 2015-01-03
s = df['Time'].astype(str).str.zfill(4)
df['date_from'] += pd.to_timedelta(s.str[:2] + ':' + s.str[2:] + ':00')
print (df)
Time date_from
0 1501 2015-01-01 15:01:00
1 0112 2015-01-02 01:12:00
2 2012 2015-01-03 20:12:00

Categories