How to rearrange a date in python - python

I have a column in a pandas data frame looking like:
test1.Received
Out[9]:
0 01/01/2015 17:25
1 02/01/2015 11:43
2 04/01/2015 18:21
3 07/01/2015 16:17
4 12/01/2015 20:12
5 14/01/2015 11:09
6 15/01/2015 16:05
7 16/01/2015 21:02
8 26/01/2015 03:00
9 27/01/2015 08:32
10 30/01/2015 11:52
This represents a time stamp as Day Month Year Hour Minute. I would like to rearrange the date as Year Month Day Hour Minute. So that it would look like:
test1.Received
Out[9]:
0 2015/01/01 17:25
1 2015/01/02 11:43
...

Just use pd.to_datetime:
In [33]:
import pandas as pd
pd.to_datetime(df['date'])
Out[33]:
index
0 2015-01-01 17:25:00
1 2015-02-01 11:43:00
2 2015-04-01 18:21:00
3 2015-07-01 16:17:00
4 2015-12-01 20:12:00
5 2015-01-14 11:09:00
6 2015-01-15 16:05:00
7 2015-01-16 21:02:00
8 2015-01-26 03:00:00
9 2015-01-27 08:32:00
10 2015-01-30 11:52:00
Name: date, dtype: datetime64[ns]
In your case:
pd.to_datetime(test1['Received'])
should just work
If you want to change the display format then you need to parse as a datetime and then apply `datetime.strftime:
In [35]:
import datetime as dt
pd.to_datetime(df['date']).apply(lambda x: dt.datetime.strftime(x, '%m/%d/%y %H:%M:%S'))
Out[35]:
index
0 01/01/15 17:25:00
1 02/01/15 11:43:00
2 04/01/15 18:21:00
3 07/01/15 16:17:00
4 12/01/15 20:12:00
5 01/14/15 11:09:00
6 01/15/15 16:05:00
7 01/16/15 21:02:00
8 01/26/15 03:00:00
9 01/27/15 08:32:00
10 01/30/15 11:52:00
Name: date, dtype: object
So the above is now showing month/day/year, in your case the following should work:
pd.to_datetime(test1['Received']).apply(lambda x: dt.datetime.strftime(x, '%y/%m/%d %H:%M:%S'))
EDIT
it looks like you need to pass param dayfirst=True to to_datetime:
In [45]:
pd.to_datetime(df['date'], format('%d/%m/%y %H:%M:%S'), dayfirst=True).apply(lambda x: dt.datetime.strftime(x, '%m/%d/%y %H:%M:%S'))
Out[45]:
index
0 01/01/15 17:25:00
1 01/02/15 11:43:00
2 01/04/15 18:21:00
3 01/07/15 16:17:00
4 01/12/15 20:12:00
5 01/14/15 11:09:00
6 01/15/15 16:05:00
7 01/16/15 21:02:00
8 01/26/15 03:00:00
9 01/27/15 08:32:00
10 01/30/15 11:52:00
Name: date, dtype: object

Pandas has this in-built, you can specify your datetime format
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html.
use infer_datetime_format
>>> import pandas as pd
>>> i = pd.date_range('20000101',periods=100)
>>> df = pd.DataFrame(dict(year = i.year, month = i.month, day = i.day))
>>> pd.to_datetime(df.year*10000 + df.month*100 + df.day, format='%Y%m%d')
0 2000-01-01
1 2000-01-02
...
98 2000-04-08
99 2000-04-09
Length: 100, dtype: datetime64[ns]

you can use the datetime functions to convert from and to strings.
# converts to date
datetime.strptime(date_string, 'DD/MM/YYYY HH:MM')
and
# converts to your requested string format
datetime.strftime(date_string, "YYYY/MM/DD HH:MM:SS")

Related

split datetime column into date and time columns in pandas

I have a following question. I have a date_time column in my dataframe (and many other columns).
df["Date_time"].head()
0 2021-05-15 09:54
1 2021-05-27 17:04
2 2021-05-27 00:00
3 2021-05-27 09:36
4 2021-05-26 18:39
Name: Date_time, dtype: object
I would like to split this column into two (date and time).
I use this formula that works fine:
df["Date"] = ""
df["Time"] = ""
def split_date_time(data_frame):
for i in range(0, len(data_frame)):
df["Date"][i] = df["Date_time"][i].split()[0]
df["Time"][i] = df["Date_time"][i].split()[1]
split_date_time(df)
But is there a more elegant way? Thanks
dt accessor can give you date and time separately:
df["Date"] = df["Date_time"].dt.date
df["Time"] = df["Date_time"].dt.time
to get
>>> df
Date_time Date Time
0 2021-05-15 09:54:00 2021-05-15 09:54:00
1 2021-05-27 17:04:00 2021-05-27 17:04:00
2 2021-05-27 00:00:00 2021-05-27 00:00:00
3 2021-05-27 09:36:00 2021-05-27 09:36:00
4 2021-05-26 18:39:00 2021-05-26 18:39:00

How to generate timstamp for a day?

I have a value under timeseries pandas dataframe. However, this frame does not contain the datetime column. The data is about 1440 rows, match with 1440 minutes for a day. So, I would like to generate the minutes timestamp column for this frame, how to do that?
Desired result is under the format '%Y-%m-%d %H:%M:%S'
Before
value
0 210.38
1 210.50
2 210.51
3 210.40
4 210.41
After
datetime value
0 2019-09-18 23:55:00 210.38
1 2019-09-18 23:56:00 210.50
2 2019-09-18 23:57:00 210.51
3 2019-09-18 23:58:00 210.40
4 2019-09-18 23:59:00 210.41
Thank you!
Use DataFrame.insert for first column with date_range:
df.insert(0, 'datetime', pd.date_range('2019-09-18 23:55:00', periods=len(df), freq='T'))
print (df)
datetime value
0 2019-09-18 23:55:00 210.38
1 2019-09-18 23:56:00 210.50
2 2019-09-18 23:57:00 210.51
3 2019-09-18 23:58:00 210.40
4 2019-09-18 23:59:00 210.41
If want generate datetimes dynamically:
df.insert(0,'datetime',pd.date_range(pd.Timestamp.now().floor('T'), periods=len(df),freq='T'))
print (df)
datetime value
0 2020-01-10 10:36:00 210.38
1 2020-01-10 10:37:00 210.50
2 2020-01-10 10:38:00 210.51
3 2020-01-10 10:39:00 210.40
4 2020-01-10 10:40:00 210.41
try this,
from datetime import datetime
df=pd.DataFrame({'value': [ 210.38, 210.50, 210.51, 210.40, 210.41]})
df['date'] = pd.date_range(start=datetime.today().replace(microsecond=0), periods=len(df), freq='T')
O/P:
value date
0 210.38 2020-01-10 15:08:32
1 210.50 2020-01-10 15:09:32
2 210.51 2020-01-10 15:10:32
3 210.40 2020-01-10 15:11:32
4 210.41 2020-01-10 15:12:32

Pandas Series of hour values to Series of dates

I have a time series covering January of 1979 with 6 hours time deltas. Time format is in continuous hour range:
1
7
13
18
25
31
.
.
.
739
Is it possible to convert these ints to dates? For instance:
1979/01/01 - 1:00
1979/01/01 - 7:00
1979/01/01 - 13:00
1979/01/01 - 18:00
1979/01/02 - 1:00
Thank you so much!
Setup
df = pd.DataFrame({'hour': [1,7,13,18,25,31]})
Use pd.to_datetime with the unit flag, and set the origin flag to the beginning of your desired year.
pd.to_datetime(df.hour, unit='h', origin='1979-01-01')
0 1979-01-01 01:00:00
1 1979-01-01 07:00:00
2 1979-01-01 13:00:00
3 1979-01-01 18:00:00
4 1979-01-02 01:00:00
5 1979-01-02 07:00:00
Name: hour, dtype: datetime64[ns]
Here is another way:
import pandas as pd
s = pd.Series([1,7,13])
s = pd.to_datetime(s*1e9*60*60+ pd.Timestamp(1979,1,1).value)
print(s)
Returns:
0 1979-01-01 01:00:00
1 1979-01-01 07:00:00
2 1979-01-01 13:00:00
dtype: datetime64[ns]
Could also just do this:
from datetime import datetime, timedelta
s = pd.Series([1,7,13,18,25])
s = s.apply(lambda h: datetime(1979, 1, 1) + timedelta(hours=h))
print(s)
Returns:
0 1979-01-01 01:00:00
1 1979-01-01 07:00:00
2 1979-01-01 13:00:00
3 1979-01-01 18:00:00
4 1979-01-02 01:00:00
dtype: datetime64[ns]

Modify timestamp subset column in pandas

I have a Dataframe with a timestamp column:
tij_pd.datetime[0:5]
Out[29]:
0 2016-01-09 05:27:00
1 2016-01-09 06:49:00
2 2016-01-09 08:05:00
3 2016-01-09 12:09:00
4 2016-01-09 14:54:00
Name: datetime, dtype: datetime64[ns]
I need to select times between '00:00' and '04:00' and add 1Hour.
In[31]: tij_pd.set_index('datetime').between_time('00:00','04:00').reset_index().datetime
Out[31]:
0 2016-03-09 01:01:00
1 2016-10-09 00:31:00
...
16 2016-03-09 01:40:00
17 2016-09-23 00:46:00
Name: datetime, dtype: datetime64[ns]
How can I add 1Hour to the datetime column of this subset?
tij_pd['datetime'] = tij_pd['datetime']+pd.to_timedelta(1,'d')
IIUC:
tij_pd.loc[(tij_pd.datetime.dt.hour >= 0) &
(tij_pd.datetime.dt.hour <= 4),
'datetime'] += \
pd.to_timedelta('1H')

python pandas series loc value from multi index

I have a series that looks like this
2014 7 2014-07-01 -0.045417
8 2014-08-01 -0.035876
9 2014-09-02 -0.030971
10 2014-10-01 -0.027471
11 2014-11-03 -0.032968
12 2014-12-01 -0.031110
2015 1 2015-01-02 -0.028906
2 2015-02-02 -0.035563
3 2015-03-02 -0.040338
4 2015-04-01 -0.032770
5 2015-05-01 -0.025762
6 2015-06-01 -0.019746
7 2015-07-01 -0.018541
8 2015-08-03 -0.028101
9 2015-09-01 -0.043237
10 2015-10-01 -0.053565
11 2015-11-02 -0.062630
12 2015-12-01 -0.064618
2016 1 2016-01-04 -0.064852
I want to be able to get the value from a date. Something like:
myseries.loc('2015-10-01') and it returns -0.053565
The index are tuples in the form (2016, 1, 2016-01-04)
You can do it like this:
In [32]:
df.loc(axis=0)[:,:,'2015-10-01']
Out[32]:
value
year month date
2015 10 2015-10-01 -0.053565
You can also pass slice for each level:
In [39]:
df.loc[(slice(None),slice(None),'2015-10-01'),]
Out[39]:
value
year month date
2015 10 2015-10-01 -0.053565|
Or just pass the first 2 index levels:
In [40]:
df.loc[2015,10]
Out[40]:
value
date
2015-10-01 -0.053565
Try xs:
print s.xs('2015-10-01',level=2,axis=0)
#year datetime
#2015 10 -0.053565
#Name: series, dtype: float64
print s.xs(7,level=1,axis=0)
#year datetime
#2014 2014-07-01 -0.045417
#2015 2015-07-01 -0.018541
#Name: series, dtype: float64

Categories