I have a pandas dataframe that has a column with a 5 digit code that represent a day and time, and it works like following:
1 - The first three digits represent the day;
2 - The last two digits represent the hour:minute:second.
Example1: The first row have the code 19501, so the 195 represent the 1st of January of 2009 and the 01 part represents the time from 00:00:00 to 00:29:59;
Example2: In the second row i have the code 19502 which is the 1st of January of 2009 from 00:30:00 to 00:59:59;
Example3: Another example, 19711 would be the 3rd of January of 2009 from 05:00:00 to 05:29:59;
Example4: The last row is the code 73048, which represent the 20th of June of 2010 from 23:30:00 to 23:59:59.
Any ideas in how can I convert this 5 digit code into a proper datetime format?
I'm assuming your column is numeric.
import datetime as dt
df = pd.DataFrame({'code': [19501, 19502, 19711, 73048]})
df['days'] = pd.to_timedelta(df['code']//100, 'D')
df['half-hours'] = df['code']%100
df['hours'] = pd.to_timedelta(df['half-hours']//2, 'h')
df['minutes'] = pd.to_timedelta(df['half-hours']%2*30, 'm')
base_day = dt.datetime(2009, 1, 1) - dt.timedelta(days = 195)
df['dt0'] = base_day + df.days + df.hours + df.minutes - dt.timedelta(minutes = 30)
df['dt1'] = base_day + df.days + df.hours + df.minutes - dt.timedelta(seconds = 1)
A simple solution, add the days to 2008-06-20, add the (time-1)*30min;
df = pd.DataFrame({'code': [19501, 19502, 19711, 73048]})
d, t = df['code'].divmod(100)
df['datetime'] = (
pd.to_timedelta(d, unit='D')
.add(pd.Timestamp('2008-06-20'))
.add(pd.to_timedelta((t-1)*30, unit='T'))
)
NB. this gives you the start of the period, for the end replace (t-1)*30 by t*30-1.
Output:
code datetime
0 19501 2009-01-01 00:00:00
1 19502 2009-01-01 00:30:00
2 19711 2009-01-03 05:00:00
3 73048 2010-06-20 23:30:00
Related
I have the following df:
time_series date sales
store_0090_item_85261507 1/2020 1,0
store_0090_item_85261501 2/2020 0,0
store_0090_item_85261500 3/2020 6,0
Being 'date' = Week/Year.
So, I tried use the following code:
df['date'] = df['date'].apply(lambda x: datetime.strptime(x + '/0', "%U/%Y/%w"))
But, return this df:
time_series date sales
store_0090_item_85261507 2020-01-05 1,0
store_0090_item_85261501 2020-01-12 0,0
store_0090_item_85261500 2020-01-19 6,0
But, the first day of the first week of 2020 is 2019-12-29, considering sunday as first day. How can I have the first day 2020-12-29 of the first week of 2020 and not 2020-01-05?
From the datetime module's documentation:
%U: Week number of the year (Sunday as the first day of the week) as a zero padded decimal number. All days in a new year preceding the first Sunday are considered to be in week 0.
Edit: My originals answer doesn't work for input 1/2023 and using ISO 8601 date values doesn't work for 1/2021, so I've edited this answer by adding a custom function
Here is a way with a custom function
import pandas as pd
from datetime import datetime, timedelta
##############################################
# to demonstrate issues with certain dates
print(datetime.strptime('0/2020/0', "%U/%Y/%w")) # 2019-12-29 00:00:00
print(datetime.strptime('1/2020/0', "%U/%Y/%w")) # 2020-01-05 00:00:00
print(datetime.strptime('0/2021/0', "%U/%Y/%w")) # 2020-12-27 00:00:00
print(datetime.strptime('1/2021/0', "%U/%Y/%w")) # 2021-01-03 00:00:00
print(datetime.strptime('0/2023/0', "%U/%Y/%w")) # 2023-01-01 00:00:00
print(datetime.strptime('1/2023/0', "%U/%Y/%w")) # 2023-01-01 00:00:00
#################################################
df = pd.DataFrame({'date':["1/2020", "2/2020", "3/2020", "1/2021", "2/2021", "1/2023", "2/2023"]})
print(df)
def get_first_day(date):
date0 = datetime.strptime('0/' + date.split('/')[1] + '/0', "%U/%Y/%w")
date1 = datetime.strptime('1/' + date.split('/')[1] + '/0', "%U/%Y/%w")
date = datetime.strptime(date + '/0', "%U/%Y/%w")
return date if date0 == date1 else date - timedelta(weeks=1)
df['new_date'] = df['date'].apply(lambda x:get_first_day(x))
print(df)
Input
date
0 1/2020
1 2/2020
2 3/2020
3 1/2021
4 2/2021
5 1/2023
6 2/2023
Output
date new_date
0 1/2020 2019-12-29
1 2/2020 2020-01-05
2 3/2020 2020-01-12
3 1/2021 2020-12-27
4 2/2021 2021-01-03
5 1/2023 2023-01-01
6 2/2023 2023-01-08
You'll want to use ISO week parsing directives, Ex:
import pandas as pd
date = pd.Series(["1/2020", "2/2020", "3/2020"])
pd.to_datetime(date+"/1", format="%V/%G/%u")
0 2019-12-30
1 2020-01-06
2 2020-01-13
dtype: datetime64[ns]
you can also shift by one day if the week should start on Sunday:
pd.to_datetime(date+"/1", format="%V/%G/%u") - pd.Timedelta('1d')
0 2019-12-29
1 2020-01-05
2 2020-01-12
dtype: datetime64[ns]
I have a Pandas dataframe, which looks like below
I want to create a new column, which tells the exact date from the information from all the above columns. The code should look something like this:
df['Date'] = pd.to_datetime(df['Month']+df['WeekOfMonth']+df['DayOfWeek']+df['Year'])
I was able to find a workaround for your case. You will need to define the dictionaries for the months and the days of the week.
month = {"Jan":"01", "Feb":"02", "March":"03", "Apr": "04", "May":"05", "Jun":"06", "Jul":"07", "Aug":"08", "Sep":"09", "Oct":"10", "Nov":"11", "Dec":"12"}
week = {"Monday":1,"Tuesday":2,"Wednesday":3,"Thursday":4,"Friday":5,"Saturday":6,"Sunday":7}
With this dictionaries the transformation that I used with a custom dataframe was:
rows = [["Dec",5,"Wednesday", "1995"],
["Jan",3,"Wednesday","2013"]]
df = pd.DataFrame(rows, columns=["Month","Week","Weekday","Year"])
df['Date'] = (df["Year"] + "-" + df["Month"].map(month) + "-" + (df["Week"].apply(lambda x: (x - 1)*7) + df["Weekday"].map(week).apply(int) ).apply(str)).astype('datetime64[ns]')
However you have to be careful. With some data that you posted as example there were some dates that exceeds the date range. For example, for
row = ["Oct",5,"Friday","2018"]
The date displayed is 2018-10-33. I recommend using some logic to filter your data in order to avoid this kind of problems.
Let's approach it in 3 steps as follows:
Get the date of month start Month_Start from Year and Month
Calculate the date offsets DateOffset relative to Month_Start from WeekOfMonth and DayOfWeek
Get the actual date Date from Month_Start and DateOffset
Here's the codes:
df['Month_Start'] = pd.to_datetime(df['Year'].astype(str) + df['Month'] + '01', format="%Y%b%d")
import time
df['DateOffset'] = (df['WeekOfMonth'] - 1) * 7 + df['DayOfWeek'].map(lambda x: time.strptime(x, '%A').tm_wday) - df['Month_Start'].dt.dayofweek
df['Date'] = df['Month_Start'] + pd.to_timedelta(df['DateOffset'], unit='D')
Output:
Month WeekOfMonth DayOfWeek Year Month_Start DateOffset Date
0 Dec 5 Wednesday 1995 1995-12-01 26 1995-12-27
1 Jan 3 Wednesday 2013 2013-01-01 15 2013-01-16
2 Oct 5 Friday 2018 2018-10-01 32 2018-11-02
3 Jun 2 Saturday 1980 1980-06-01 6 1980-06-07
4 Jan 5 Monday 1976 1976-01-01 25 1976-01-26
The Date column now contains the dates derived from the information from other columns.
You can remove the working interim columns, if you like, as follows:
df = df.drop(['Month_Start', 'DateOffset'], axis=1)
I have a column of dates. I need to filter out those dates that fall between today's date and end of the current month. If the dates fall between these dates then the next column showns "Y"
Date
Column
01/02/2021
03/02/2021
31/03/2021
Y
01/03/2021
07/03/2021
Y
08/03/2021
Y
Since today's date is 07/03/2021 three dates fall between 07/03/2021 and 31/03/2021.
Convert into datetime column using specific time format and compare with today's timestamp
df.Date = pd.to_datetime(df.Date, format='%d/%m/%Y')
today = pd.to_datetime('today').normalize()
end_of_month = today + pd.tseries.offsets.MonthEnd(1)
df['Column'] = np.where((df.Date >= today) & (df.Date <= end_of_month), 'Y', '')
Output
Date Column
0 2021-02-01
1 2021-02-03
2 2021-03-31 Y
3 2021-03-01
4 2021-03-07 Y
5 2021-03-08 Y
6 2021-04-02
I have a Pandas time series dataframe.
It has minute data for a stock for 30 days.
I want to create a new column, stating the price of the stock at 6 AM for that day, e.g. for all lines for January 1, I want a new column with the price at noon on January 1, and for all lines for January 2, I want a new column with the price at noon on January 2, etc.
Existing timeframe:
Date Time Last_Price Date Time 12amT
1/1/19 08:00 100 1/1/19 08:00 ?
1/1/19 08:01 101 1/1/19 08:01 ?
1/1/19 08:02 100.50 1/1/19 08:02 ?
...
31/1/19 21:00 106 31/1/19 21:00 ?
I used this hack, but it is very slow, and I assume there is a quicker and easier way to do this.
for lab, row in df.iterrows() :
t=row["Date"]
df.loc[lab,"12amT"]=df[(df['Date']==t)&(df['Time']=="12:00")]["Last_Price"].values[0]
One way to do this is to use groupby with pd.Grouper:
For pandas 24.1+
df.groupby(pd.Grouper(freq='D'))[0]\
.transform(lambda x: x.loc[(x.index.hour == 12) &
(x.index.minute==0)].to_numpy()[0])
Older pandas use:
df.groupby(pd.Grouper(freq='D'))[0]\
.transform(lambda x: x.loc[(x.index.hour == 12) &
(x.index.minute==0)].values[0])
MVCE:
df = pd.DataFrame(np.arange(48*60), index=pd.date_range('02-01-2019',periods=(48*60), freq='T'))
df['12amT'] = df.groupby(pd.Grouper(freq='D'))[0].transform(lambda x: x.loc[(x.index.hour == 12)&(x.index.minute==0)].to_numpy()[0])
Output (head):
0 12amT
2019-02-01 00:00:00 0 720
2019-02-01 00:01:00 1 720
2019-02-01 00:02:00 2 720
2019-02-01 00:03:00 3 720
2019-02-01 00:04:00 4 720
I'm not sure why you have two DateTime columns, I made my own example to demonstrate:
ind = pd.date_range('1/1/2019', '30/1/2019', freq='H')
df = pd.DataFrame({'Last_Price':np.random.random(len(ind)) + 100}, index=ind)
def noon_price(df):
noon_price = df.loc[df.index.hour == 12, 'Last_Price'].values
noon_price = noon_price[0] if len(noon_price) > 0 else np.nan
df['noon_price'] = noon_price
return df
df.groupby(df.index.day).apply(noon_price).reindex(ind)
reindex by default will fill each day's rows with its noon_price.
To add a column with the next day's noon price, you can shift the column 24 rows down, like this:
df['T+1'] = df.noon_price.shift(-24)
I have a DataFrame containing a DateTime column with dates but without time ['date_from']. I have the time in column ['Time'] (string). How can I add only the time to the already existing DateTime column?
I tried:
df['date_from'].dt.time = pd.to_datetime(df['Time'], format='%H%M').dt.time
Convert column to to_timedelta and add to datetime column:
#convert to string and if necessary add zero for 4 values
s = df['Time'].astype(str).str.zfill(4)
df['date_from'] += pd.to_timedelta(s.str[:2] + ':' + s.str[2:] + ':00')
Sample:
df = pd.DataFrame({'date_from':pd.date_range('2015-01-01', periods=3),
'Time':[1501,112, 2012]})
print (df)
Time date_from
0 1501 2015-01-01
1 0112 2015-01-02
2 2012 2015-01-03
s = df['Time'].astype(str).str.zfill(4)
df['date_from'] += pd.to_timedelta(s.str[:2] + ':' + s.str[2:] + ':00')
print (df)
Time date_from
0 1501 2015-01-01 15:01:00
1 0112 2015-01-02 01:12:00
2 2012 2015-01-03 20:12:00