Concatenate two dataframe columns as one timestamp - python

I'm working on a pandas dataframe, one of my column is a date (YYYYMMDD), another one is an hour (HH:MM), I would like to concatenate the two column as one timestamp or datetime64 column, to later use that column as an index (for a time series). Here is the situation :
Do you have any ideas? The classic pandas.to_datetime() seems to work only if the columns contain hours only, day only and year only, ... etc...

Setup
df
Out[1735]:
id date hour other
0 1820 20140423 19:00:00 8
1 4814 20140424 08:20:00 22
Solution
import datetime as dt
#convert date and hour to str, concatenate them and then convert them to datetime format.
df['new_date'] = df[['date','hour']].astype(str).apply(lambda x: dt.datetime.strptime(x.date + x.hour, '%Y%m%d%H:%M:%S'), axis=1)
df
Out[1756]:
id date hour other new_date
0 1820 20140423 19:00:00 8 2014-04-23 19:00:00
1 4814 20140424 08:20:00 22 2014-04-24 08:20:00

Related

Creating Columns for Hour of Day and date based on datetime column

How can I create a new column that has the day only, and hour of day only based of a column that has a datetime timestamp?
DF has column such as:
Timestamp
2019-05-31 21:11:43
2018-11-21 18:01:00
2017-11-21 22:01:04
2020-04-15 11:01:00
2017-04-20 04:00:33
I want two new columns that look like below:
Day | Hour of Day
2019-05-31 21:00
2018-11-21 18:00
2017-11-21 22:00
2020-04-15 11:00
2017-04-20 04:00
I tried something like below but it only gives me a # for hour of day,
df['hour'] = pd.to_datetime(df['Timestamp'], format='%H:%M:%S').dt.hour
where output would be 9 for 9:32:00 which isnt what I want to calculate
Thanks!
Please try dt.strftime(format+string)
df['hour'] = pd.to_datetime(df['Timestamp']).dt.strftime("%H"+":00")
Following your comments below. Lets Try use df.assign and extract hour and date separately
df=df.assign(hour=pd.to_datetime(df['Timestamp']).dt.strftime("%H"+":00"), Day=pd.to_datetime(df['Timestamp']).dt.date)
You could convert time to string and then just select substrings by index.
df = pd.DataFrame({'Timestamp': ['2019-05-31 21:11:43', '2018-11-21 18:01:00',
'2017-11-21 22:01:04', '2020-04-15 11:01:00',
'2017-04-20 04:00:33']})
df['Day'], df['Hour of Day'] = zip(*df.Timestamp.apply(lambda x: [str(x)[:10], str(x)[11:13]+':00']))

counting months between two days in dataframe

I have a dataframe with multiple columns, one of which is a date column. I'm interested in creating a new column which contains the number of months between the date column and a preset date. For example one of the dates in the 'start date' column is '2019-06-30 00:00:00' i would want to be able to calculate the number of months between that date and the end of 2021 so 2021-12-31 and place the answer into a new column and do this for the entire date column in the dataframe. I haven't been able to work out how i could go about this but i would like it in the end to look like this if the predetermined end date was 2021-12-31:
df =
|start date months
0|2019-06-30 30
1|2019-08-12 28
2|2020-01-24 23
You can do this using np.timedelta64:
end_date = pd.to_datetime('2021-12-31')
df['start date'] = pd.to_datetime(df['start date'])
df['month'] = ((end_date - df['start date'])/np.timedelta64(1, 'M')).astype(int)
print(df)
start date month
0 2019-06-30 30
1 2019-08-12 28
2 2020-01-24 23
Assume that start date column is of datetime type (not string)
and the reference date is defined as follows:
refDate = pd.to_datetime('2021-12-31')
or any other date of your choice.
Then you can compute the number of months as:
df['months'] = (refDate.to_period('M') - df['start date']\
.dt.to_period('M')).apply(lambda x: x.n)

String dates into unixtime in a pandas dataframe

i got dataframe with column like this:
Date
3 mins
2 hours
9-Feb
13-Feb
the type of the dates is string for every row. What is the easiest way to get that dates into integer unixtime ?
One idea is convert columns to datetimes and to timedeltas:
df['dates'] = pd.to_datetime(df['Date']+'-2020', format='%d-%b-%Y', errors='coerce')
times = df['Date'].replace({'(\d+)\s+mins': '00:\\1:00',
'\s+hours': ':00:00'}, regex=True)
df['times'] = pd.to_timedelta(times, errors='coerce')
#remove rows if missing values in dates and times
df = df[df['Date'].notna() | df['times'].notna()]
df['all'] = df['dates'].dropna().astype(np.int64).append(df['times'].dropna().astype(np.int64))
print (df)
Date dates times all
0 3 mins NaT 00:03:00 180000000000
1 2 hours NaT 02:00:00 7200000000000
2 9-Feb 2020-02-09 NaT 1581206400000000000
3 13-Feb 2020-02-13 NaT 1581552000000000000

How can I change the datetime values in a pandas df?

I have a pandas dataframe and datetime is used as an index in the following format: datetime.date(2018, 12, 31).
Each datetime represents the fiscal year end, i.e. 31/12/2018, 31/12/2017, 31/12/2016 etc.
However, for some companies the fiscal year end may be 30/11/2018 or 31/10/2018 and etc. instead of the last date of each year.
Is there any quick way in changing the non-standardized datetime to the last date of each year?
i.e. from 30/11/2018 to 30/12/2018 and 31/10/2018 to 31/12/2018 an so on.....
df = pd.DataFrame({'datetime': ['2019-01-02','2019-02-01', '2019-04-01', '2019-06-01', '2019-11-30','2019-12-30'],
'data': [1,2,3,4,5,6]})
df['datetime'] = pd.to_datetime(df['datetime'])
df['quarter'] = df['datetime'] + pd.tseries.offsets.QuarterEnd(n=0)
df
datetime data quarter
0 2019-01-02 1 2019-03-31
1 2019-02-01 2 2019-03-31
2 2019-04-01 3 2019-06-30
3 2019-06-01 4 2019-06-30
4 2019-11-30 5 2019-12-31
5 2019-12-30 6 2019-12-31
We have a datetime column with random dates I picked. Then we add a timeseries offset to the end of each date to make it quarter end and standardize the times.

pandas get business days data from datetime index

I have pandas dataframe as:
df.ix[1:4]
Data
DateTime
2015-05-24 02:00:00 4368.02
2015-05-24 03:00:00 4254.63
2015-05-24 04:00:00 4167.88
I have created a calendar as:
us_bd = CustomBusinessDay(calendar=myCalendar())
How do I extract the business days data and non business days data from df?
Right now I am extracting the dates from df and then checking their presence in us_bd using numpy.in1d which appears very clumsy.
I'd simply say a business day is such that adding and subtracting one business day returns to the same day.
df['is_biz'] = ((df.DateTime + us_bd) - us_bd ) == df.DateTime

Categories