split datetime column into date and time columns in pandas - python

I have a following question. I have a date_time column in my dataframe (and many other columns).
df["Date_time"].head()
0 2021-05-15 09:54
1 2021-05-27 17:04
2 2021-05-27 00:00
3 2021-05-27 09:36
4 2021-05-26 18:39
Name: Date_time, dtype: object
I would like to split this column into two (date and time).
I use this formula that works fine:
df["Date"] = ""
df["Time"] = ""
def split_date_time(data_frame):
for i in range(0, len(data_frame)):
df["Date"][i] = df["Date_time"][i].split()[0]
df["Time"][i] = df["Date_time"][i].split()[1]
split_date_time(df)
But is there a more elegant way? Thanks

dt accessor can give you date and time separately:
df["Date"] = df["Date_time"].dt.date
df["Time"] = df["Date_time"].dt.time
to get
>>> df
Date_time Date Time
0 2021-05-15 09:54:00 2021-05-15 09:54:00
1 2021-05-27 17:04:00 2021-05-27 17:04:00
2 2021-05-27 00:00:00 2021-05-27 00:00:00
3 2021-05-27 09:36:00 2021-05-27 09:36:00
4 2021-05-26 18:39:00 2021-05-26 18:39:00

Related

pd.to_datetime wrong format

I have a dataframe df, where I want to set the column 'Time' to a datetimeindex. The column before converting looks like this:
01-10-19 09:05
01-10-19 10:04
01-10-19 11:05
01-10-19 12:04
01-10-19 13:04
...
31-05-20 22:05
31-05-20 23:05
01-06-20 00:05
01-06-20 01:05
01-06-20 02:05
So I tried the following line of code:
df['Time'] = pd.to_datetime(df['Time'], format='%d-%m-%Y %H:%M', errors='coerce')
Which lead to only NaT 'values' in the column, without datetimeindex being installed. I've also tried to change the format in multiple ways such as: '%%dd-%%mm-%%YY %%HH:%%MM' or '%d%d-%m%m-%Y%Y %H%H:%M%M', but it resulted in the same error.
When I remove the errors='coerce', I got the message: ValueError: time data '09:05' does not match format '%d-%m-%Y %H:%M' (match). What am I missing? Why is it the wrong format and how do I fix it? Thanks very much in advance!
Try this:
df['Time'] = pd.to_datetime(df['Time'], infer_datetime_format= True)
print(df)
#output:
Time
0 2019-01-10 09:05:00
1 2019-01-10 10:04:00
2 2019-01-10 11:05:00
3 2019-01-10 12:04:00
4 2019-01-10 13:04:00
5 2020-05-31 22:05:00
6 2020-05-31 23:05:00
7 2020-01-06 00:05:00
8 2020-01-06 01:05:00
9 2020-01-06 02:05:00

Formating a calendar table to a datetime dataframe

I have a calendar data in the following format:
df = pd.read_csv('2021.txt', sep=" ")
df.head()
I'd like to have it as:
Date y
2021-01-01 17:26
2021-01-02 17:27
2021-01-03 17:28
2021-01-04 17:28
...
2021-12-31 17:25
I've searched and found no similar questions. I'm trying to provide a minimal example but don't know where to start. I know I have to use pandas.to_datetime function but I don't even know how to apply it in this case because everything is separated.
Use DataFrame.melt with to_datetime and errros='coerce' for convert wrong datetimes like 2021-02-30 to missing values and then remove this rows by DataFrame.dropna:
df1 = df.melt('Day', var_name='Date', value_name='y')
df1['Date'] = pd.to_datetime('2021' + df1['Date'] + df1.pop('Day').astype(str),
format='%Y%b%d', errors='coerce')
df1 = df1.dropna(subset=['Date'])
print (df1)
Date y
0 2021-01-01 17:28
1 2021-01-02 17:27
2 2021-01-03 17:28
3 2021-01-04 17:28
4 2021-01-05 17:29
.. ... ...
67 2021-12-02 17:15
68 2021-12-03 17:15
69 2021-12-04 17:15
70 2021-12-05 17:15
71 2021-12-06 17:15
[72 rows x 2 columns]

Adding Future Dates to DataFrame

How do I add future dates to a data frame? This datetime delta only adds deltas to adjacent columns.
import pandas as pd
from datetime import timedelta
df = pd.DataFrame({
'date': ['2001-02-01','2001-02-02','2001-02-03', '2001-02-04'],
'Monthly Value': [100, 200, 300, 400]
})
df["future_date"] = df["date"] + timedelta(days=4)
print(df)
date future_date
0 2001-02-01 00:00:00 2001-02-05 00:00:00
1 2001-02-02 00:00:00 2001-02-06 00:00:00
2 2001-02-03 00:00:00 2001-02-07 00:00:00
3 2001-02-04 00:00:00 2001-02-08 00:00:00
Desired dataframe:
date future_date
0 2001-02-01 00:00:00 2001-02-01 00:00:00
1 2001-02-02 00:00:00 2001-02-02 00:00:00
2 2001-02-03 00:00:00 2001-02-03 00:00:00
3 2001-02-04 00:00:00 2001-02-04 00:00:00
4 2001-02-05 00:00:00
5 2001-02-06 00:00:00
6 2001-02-07 00:00:00
7 2001-02-08 00:00:00
You can do the following:
# set to timestamp
df['date'] = pd.to_datetime(df['date'])
# create a future date df
ftr = (df['date'] + pd.Timedelta(4, unit='days')).to_frame()
ftr['Monthly Value'] = None
# join the future data
df1 = pd.concat([df, ftr], ignore_index=True)
date Monthly Value
0 2001-02-01 100
1 2001-02-02 200
2 2001-02-03 300
3 2001-02-04 400
4 2001-02-05 None
5 2001-02-06 None
6 2001-02-07 None
7 2001-02-08 None
I found that this also works:
df.append(pd.DataFrame({'date': pd.date_range(start=df.date.iloc[-1], periods= 4, freq='d', closed='right')}))
If I understand you correctly,
we can create a new dataframe using the min of your date, and max + 4 days.
we just concat this back using axis = 1.
df['date'] = pd.to_datetime(df['date'])
fdates = pd.DataFrame(
pd.date_range(df["date"].min(), df["date"].max() + pd.DateOffset(days=4))
,columns=['future_date'])
df_new = pd.concat([df,fdates],axis=1)
print(df_new[['date','future_date','Monthly Value']])
0 2001-02-01 2001-02-01 100.0
1 2001-02-02 2001-02-02 200.0
2 2001-02-03 2001-02-03 300.0
3 2001-02-04 2001-02-04 400.0
4 NaT 2001-02-05 NaN
5 NaT 2001-02-06 NaN
6 NaT 2001-02-07 NaN
7 NaT 2001-02-08 NaN

Substracting rows in different files

I am selecting several csv file in a folder. Each file has a "Time" Column.
I would like to plot an additional column called time duration which substract the time of each row with the first row and this for each file
What should I add in my code?
strong textoutput = pd.DataFrame()
for name in list_files_log:
with folder.get_download_stream(name) as f:
try:
tmp = pd.read_csv(f)
tmp["sn"] = get_sn(name)
tmp["filename"]= os.path.basename(name)
output = output.append(tmp)
except:
pass
If your Time column would look like this:
Time
0 2015-02-04 02:10:00
1 2016-03-05 03:30:00
2 2017-04-06 04:40:00
3 2018-05-07 05:50:00
You could create Duration column using:
df['Duration'] = df['Time'] - df['Time'][0]
And you'd get:
Time Duration
0 2015-02-04 02:10:00 0 days 00:00:00
1 2016-03-05 03:30:00 395 days 01:20:00
2 2017-04-06 04:40:00 792 days 02:30:00
3 2018-05-07 05:50:00 1188 days 03:40:00

How to rearrange a date in python

I have a column in a pandas data frame looking like:
test1.Received
Out[9]:
0 01/01/2015 17:25
1 02/01/2015 11:43
2 04/01/2015 18:21
3 07/01/2015 16:17
4 12/01/2015 20:12
5 14/01/2015 11:09
6 15/01/2015 16:05
7 16/01/2015 21:02
8 26/01/2015 03:00
9 27/01/2015 08:32
10 30/01/2015 11:52
This represents a time stamp as Day Month Year Hour Minute. I would like to rearrange the date as Year Month Day Hour Minute. So that it would look like:
test1.Received
Out[9]:
0 2015/01/01 17:25
1 2015/01/02 11:43
...
Just use pd.to_datetime:
In [33]:
import pandas as pd
pd.to_datetime(df['date'])
Out[33]:
index
0 2015-01-01 17:25:00
1 2015-02-01 11:43:00
2 2015-04-01 18:21:00
3 2015-07-01 16:17:00
4 2015-12-01 20:12:00
5 2015-01-14 11:09:00
6 2015-01-15 16:05:00
7 2015-01-16 21:02:00
8 2015-01-26 03:00:00
9 2015-01-27 08:32:00
10 2015-01-30 11:52:00
Name: date, dtype: datetime64[ns]
In your case:
pd.to_datetime(test1['Received'])
should just work
If you want to change the display format then you need to parse as a datetime and then apply `datetime.strftime:
In [35]:
import datetime as dt
pd.to_datetime(df['date']).apply(lambda x: dt.datetime.strftime(x, '%m/%d/%y %H:%M:%S'))
Out[35]:
index
0 01/01/15 17:25:00
1 02/01/15 11:43:00
2 04/01/15 18:21:00
3 07/01/15 16:17:00
4 12/01/15 20:12:00
5 01/14/15 11:09:00
6 01/15/15 16:05:00
7 01/16/15 21:02:00
8 01/26/15 03:00:00
9 01/27/15 08:32:00
10 01/30/15 11:52:00
Name: date, dtype: object
So the above is now showing month/day/year, in your case the following should work:
pd.to_datetime(test1['Received']).apply(lambda x: dt.datetime.strftime(x, '%y/%m/%d %H:%M:%S'))
EDIT
it looks like you need to pass param dayfirst=True to to_datetime:
In [45]:
pd.to_datetime(df['date'], format('%d/%m/%y %H:%M:%S'), dayfirst=True).apply(lambda x: dt.datetime.strftime(x, '%m/%d/%y %H:%M:%S'))
Out[45]:
index
0 01/01/15 17:25:00
1 01/02/15 11:43:00
2 01/04/15 18:21:00
3 01/07/15 16:17:00
4 01/12/15 20:12:00
5 01/14/15 11:09:00
6 01/15/15 16:05:00
7 01/16/15 21:02:00
8 01/26/15 03:00:00
9 01/27/15 08:32:00
10 01/30/15 11:52:00
Name: date, dtype: object
Pandas has this in-built, you can specify your datetime format
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html.
use infer_datetime_format
>>> import pandas as pd
>>> i = pd.date_range('20000101',periods=100)
>>> df = pd.DataFrame(dict(year = i.year, month = i.month, day = i.day))
>>> pd.to_datetime(df.year*10000 + df.month*100 + df.day, format='%Y%m%d')
0 2000-01-01
1 2000-01-02
...
98 2000-04-08
99 2000-04-09
Length: 100, dtype: datetime64[ns]
you can use the datetime functions to convert from and to strings.
# converts to date
datetime.strptime(date_string, 'DD/MM/YYYY HH:MM')
and
# converts to your requested string format
datetime.strftime(date_string, "YYYY/MM/DD HH:MM:SS")

Categories