I have a dataframe that includes a column of day numbers for which the year is known:
print (df)
year day time
0 2012 227 800
15 2012 227 815
30 2012 227 830
... ... ... ...
194250 2013 226 1645
194265 2013 226 1700
I have attempted to convert the day numbers to datetime %m-%d using:
import pandas as pd
df['day'] = pd.to_datetime(df['day'], format='%j').dt.strftime('%m-%d')
which gives:
year day time
0 2012 08-15 800
15 2012 08-15 815
30 2012 08-15 830
... ... ... ...
194250 2013 08-14 1645
194265 2013 08-14 1700
but this conversion is incorrect because the 227th day of 2012 is August 14th (08-14). I believe this error is down to the lack of year specification in the conversion.
How can I specify the year in the conversion to get a) %Y-%m-%d ; b) %m-%d ; c)%Y-%m-%dT%H:%M from the dataframe I have?
Thank you
you can convert to string and feed into pd.to_datetime, which you supply with the right parsing directive:
import pandas as pd
df = pd.DataFrame({'year': [2012, 2012], 'day' : [227, 228], 'time': [800, 0]})
df['datetime'] = pd.to_datetime(df.year.astype(str) + ' ' +
df.day.astype(str) + ' ' +
df.time.astype(str).str.zfill(4),
format='%Y %j %H%M')
df['datetime']
0 2012-08-14 08:00:00
1 2012-08-15 00:00:00
Name: datetime, dtype: datetime64[ns]
Formatting to string is just a call to strftime via dt accessor, e.g.
df['datetime'].dt.strftime('%Y-%m-%dT%H:%M')
0 2012-08-14T08:00
1 2012-08-15T00:00
Name: datetime, dtype: object
You can try converting year into datetime type and day into timedelta type, remember to offset the date:
dates = pd.to_datetime(df['year'], format='%Y') + \
pd.to_timedelta(df['day'] -1, unit='D')
Output:
0 2012-08-14
15 2012-08-14
30 2012-08-14
194250 2013-08-14
194265 2013-08-14
dtype: datetime64[ns]
Then extract the date-month with strftime:
df['day'] = dates.dt.strftime('%M-%D')
Related
I try to convert multiple dates format into YYYY-MM-DD, then merge them into 1 column ignore the NULL, but I end up with TypeError: cannot add DatetimeArray and DatetimeArray
import pandas as pd
data = [[ 'Apr 2021'], ['Jan 1'], ['Fri'], [ 'Jan 18']]
df = pd.DataFrame(data, columns = ['date', ])
#convert Month date Jan 1
df['date1']=(pd.to_datetime('2021 '+ df['date'],errors='coerce',format='%Y %b %d'))
# convert Month Year Apr 2021
df['date2']=pd.to_datetime(df['date'], errors='coerce')
#convert fri to this friday
today = datetime.date.today()
friday = today + datetime.timedelta( (4-today.weekday()) % 7 )
this_firday = friday.strftime('%Y-%m-%d')
df['date3']=df['date'].map({'Fri':this_firday})
df['date3'] = pd.to_datetime(df['date3'])
df['dateFinal'] = df['date1'] + df['date2'] + df['date3']
I check the dtypes, they're all datetime, I don't know why. my approach is not efficient, feel free to let me know a better way.
IIUC:
try via bfill() on axis=1:
df['dateFinal'] = df[['date1','date2','date3']].bfill(axis=1).iloc[:,0]
OR
via ffill() on axis=1:
df['dateFinal'] = df[['date1','date2','date3']].ffill(axis=1).iloc[:,-1]
OR
via stack()+to_numpy()
df['dateFinal'] = df[['date1','date2','date3']].stack().to_numpy()
output of df:
date date1 date2 date3 dateFinal
0 Apr 2021 NaT 2021-04-01 NaT 2021-04-01
1 Jan 1 2021-01-01 NaT NaT 2021-01-01
2 Fri NaT NaT 2021-08-13 2021-08-13
3 Jan 18 2021-01-18 NaT NaT 2021-01-18
I've a sample dataframe
year_month
202004
202005
202011
202012
How can I append the month_name + year column to the dataframe
year_month month_name
202004 April 2020
202005 May 2020
202011 Nov 2020
202112 Dec 2021
You can use datetime.strptime to convert your string into a datetime object, then you can use datetime.strftime to convert it back into a string with different format.
>>> import datetime as dt
>>> import pandas as pd
>>> df = pd.DataFrame(['202004', '202005', '202011', '202012'], columns=['year_month'])
>>> df['month_name'] = df['year_month'].apply(lambda x: dt.datetime.strptime(x, '%Y%m').strftime('%b %Y'))
>>> df
year_month month_name
0 202004 Apr 2020
1 202005 May 2020
2 202011 Nov 2020
3 202012 Dec 2020
You can see the full list of format codes here.
I have a full year of data every minute:
dayofyear hourofday minuteofhour
1 0 0
.
.
365 23 57
365 23 58
365 23 59
I converted the dayofyear to a date:
df['date']=pd.to_datetime(df['dayofyear'], unit='D', origin=pd.Timestamp('2009-12-31'))
dayofyear hourofday minuteofhour date
1 0 0 2010-01-01
1 0 1 2010-01-01
1 0 2 2010-01-01
1 0 3 2010-01-01
1 0 4 2010-01-01
How can I combine the hourofday and minuteofhour with date in order to create a proper timestamp?
Like this maybe: '2010-12-30 19:00:00'
So that I can perform other time-filtering/subsetting etc in pandas later.
Convert the hourofday and minuteofhour columns into a TimeDelta, then add it to the date column:
df['timestamp'] = df['date'] + pd.to_timedelta(df['hourofday'].astype('str') + ':' + df['minuteofhour'].astype('str') + ':00')
import pandas as pd
from datetime import datetime, timedelta
df = pd.DataFrame({
'dayofyear': (365, ),
'hourofday': (23, ),
'minuteofhour': (57, ),
})
def parse_dt(x):
dt = datetime(2010, 1, 1) + timedelta(int(x['dayofyear']) - 1)
dt = dt.replace(hour=x['hourofday'], minute=x['minuteofhour'])
x['dt'] = dt
return x
df = df.apply(parse_dt, axis=1)
print(df)
# dayofyear hourofday minuteofhour dt
#0 365 23 57 2010-12-31 23:57:00
Hope this helps
My dataset has dates in the European format, and I'm struggling to convert it into the correct format before I pass it through a pd.to_datetime, so for all day < 12, my month and day switch.
Is there an easy solution to this?
import pandas as pd
import datetime as dt
df = pd.read_csv(loc,dayfirst=True)
df['Date']=pd.to_datetime(df['Date'])
Is there a way to force datetime to acknowledge that the input is formatted at dd/mm/yy?
Thanks for the help!
Edit, a sample from my dates:
renewal["Date"].head()
Out[235]:
0 31/03/2018
2 30/04/2018
3 28/02/2018
4 30/04/2018
5 31/03/2018
Name: Earliest renewal date, dtype: object
After running the following:
renewal['Date']=pd.to_datetime(renewal['Date'],dayfirst=True)
I get:
Out[241]:
0 2018-03-31 #Correct
2 2018-04-01 #<-- this number is wrong and should be 01-04 instad
3 2018-02-28 #Correct
Add format.
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
You can control the date construction directly if you define separate columns for 'year', 'month' and 'day', like this:
import pandas as pd
df = pd.DataFrame(
{'Date': ['01/03/2018', '06/08/2018', '31/03/2018', '30/04/2018']}
)
date_parts = df['Date'].apply(lambda d: pd.Series(int(n) for n in d.split('/')))
date_parts.columns = ['day', 'month', 'year']
df['Date'] = pd.to_datetime(date_parts)
date_parts
# day month year
# 0 1 3 2018
# 1 6 8 2018
# 2 31 3 2018
# 3 30 4 2018
df
# Date
# 0 2018-03-01
# 1 2018-08-06
# 2 2018-03-31
# 3 2018-04-30
I have a DataFrame with dates in the following format:
12/31/2000 20:00 (month/day/year hours:minutes)
The issue is that there are some dates that are wrong in the data set, for instance:
10/12/2003 24:00 should be 10/13/2003 00:00
This is what I get when I run dfUFO[wrongFormat]
So I have the following code in a pandas notebook to reformat these dates:
def convert2400ToTimestamp(x) :
date = pd.to_datetime(x.datetime.split(" ")[0], format='%m/%d/%Y')
return date + pd.Timedelta(days=1)
wrongFormat = dfUFO.datetime.str.endswith("24:00", na=False)
dfUFO[wrongFormat] = dfUFO[wrongFormat].apply(convert2400ToTimestamp, axis=1)
This code results in
ValueError: Must have equal len keys and value when setting with an iterable
I don't really get what this error means. Something I'm missing?
EDIT: Changed to
dfUFO.loc[wrongFormat, 'datetime'] = dfUFO[wrongFormat].apply(convert2400ToTimestamp, axis=1)
But datetime now shows values like 1160611200000000000 for date 10/11/2006
You can parse your datetime column to "correctly named" parts and use pd.to_datetime():
Source DF:
In [14]: df
Out[14]:
datetime
388 10/11/2006 24:00:00
693 10/1/2001 24:00:00
111 10/1/2001 23:59:59
Vectorized solution:
In [11]: pat = r'(?P<month>\d{1,2})\/(?P<day>\d{1,2})\/(?P<year>\d{4}) (?P<hour>\d{1,2})\:(?P<minute>\d{1,2})\:(?P<second>\d{1,2})'
In [12]: df.datetime.str.extract(pat, expand=True)
Out[12]:
month day year hour minute second
388 10 11 2006 24 00 00
693 10 1 2001 24 00 00
111 10 1 2001 23 59 59
In [13]: pd.to_datetime(df.datetime.str.extract(pat, expand=True))
Out[13]:
388 2006-10-12 00:00:00
693 2001-10-02 00:00:00
111 2001-10-01 23:59:59
dtype: datetime64[ns]
from docs:
Assembling a datetime from multiple columns of a DataFrame. The keys
can be common abbreviations like:
['year', 'month', 'day', 'minute', 'second','ms', 'us', 'ns']
or plurals of the same