I have the following date column that I would like to transform to a pandas datetime object. Is it possible to do this with weekly data? For example, 1-2018 stands for week 1 in 2018 and so on. I tried the following conversion but I get an error message: Cannot use '%W' or '%U' without day and year
import pandas as pd
df1 = pd.DataFrame(columns=["date"])
df1['date'] = ["1-2018", "1-2018", "2-2018", "2-2018", "3-2018", "4-2018", "4-2018", "4-2018"]
df1["date"] = pd.to_datetime(df1["date"], format = "%W-%Y")
You need to add a day to the datetime format
df1["date"] = pd.to_datetime('0' + df1["date"], format='%w%W-%Y')
print(df1)
Output
date
0 2018-01-07
1 2018-01-07
2 2018-01-14
3 2018-01-14
4 2018-01-21
5 2018-01-28
6 2018-01-28
7 2018-01-28
As the error message says, you need to specify the day of the week by adding %w :
df1["date"] = pd.to_datetime( '0'+df1.date, format='%w%W-%Y')
Related
I have a data frame with a datetime column like so:
dates
0 2017-09-19
1 2017-08-28
2 2017-07-13
I want to know if there is a way to adjust the dates with this condition:
If the day of the date is before 15, then change the date to the end of last month.
If the day of the date is 15 or after, then change the date to the end of the current month.
My desired output would look something like this:
dates
0 2017-09-30
1 2017-08-31
2 2017-06-30
Using np.where and Josh's suggestion of MonthEnd, this can be simplified a bit.
Given:
dates
0 2017-09-19
1 2017-08-28
2 2017-07-13
Doing:
from pandas.tseries.offsets import MonthEnd
# Where the day is less than 15,
# Give the DateEnd of the previous month.
# Otherwise,
# Give the DateEnd of the current month.
df.dates = np.where(df.dates.dt.day.lt(15),
df.dates.add(MonthEnd(-1)),
df.dates.add(MonthEnd(0)))
print(df)
# Output:
dates
0 2017-09-30
1 2017-08-31
2 2017-06-30
Easy with MonthEnd
Let's set up the data:
dates = pd.Series({0: '2017-09-19', 1: '2017-08-28', 2: '2017-07-13'})
dates = pd.to_datetime(dates)
Then:
from pandas.tseries.offsets import MonthEnd
pre, post = dates.dt.day < 15, dates.dt.day >= 15
dates.loc[pre] = dates.loc[pre] + MonthEnd(-1)
dates.loc[post] = dates.loc[post] + MonthEnd(1)
Explanation: create masks (pre and post) first. Then use the masks to either get month end for current or previous month, as appropriate.
I have a table that has a column Months_since_Start_fin_year and a Date column. I need to add the number of months in the first column to the date in the second column.
DateTable['Date']=DateTable['First_month']+DateTable['Months_since_Start_fin_year'].astype("timedelta64[M]")
This works OK for month 0, but month 1 already has a different time and for month 2 onwards has the wrong date.
Image of output table where early months have the correct date but month 2 where I would expect June 1st actually shows May 31st
It must be adding incomplete months, but I'm not sure how to fix it?
I have also tried
DateTable['Date']=DateTable['First_month']+relativedelta(months=DateTable['Months_since_Start_fin_year'])
but I get a type error that says
TypeError: cannot convert the series to <class 'int'>
My Months_since_Start_fin_year is type int32 and my First_month variable is datetime64[ns]
The problem with adding months as an offset to a date is that not all months are equally long (28-31 days). So you need pd.DateOffset which handles that ambiguity for you. .astype("timedelta64[M]") on the other hand only gives you the average days per month within a year (30 days 10:29:06).
Ex:
import pandas as pd
# a synthetic example since you didn't provide a mre
df = pd.DataFrame({'start_date': 7*['2017-04-01'],
'month_offset': range(7)})
# make sure we have datetime dtype
df['start_date'] = pd.to_datetime(df['start_date'])
# add month offset
df['new_date'] = df.apply(lambda row: row['start_date'] +
pd.DateOffset(months=row['month_offset']),
axis=1)
which would give you e.g.
df
start_date month_offset new_date
0 2017-04-01 0 2017-04-01
1 2017-04-01 1 2017-05-01
2 2017-04-01 2 2017-06-01
3 2017-04-01 3 2017-07-01
4 2017-04-01 4 2017-08-01
5 2017-04-01 5 2017-09-01
6 2017-04-01 6 2017-10-01
You can find similar examples here on SO, e.g. Add months to a date in Pandas. I only modified the answer there by using an apply to be able to take the months offset from one of the DataFrame's columns.
I'm new to pandas and I would like to create a DataFrame for each weekday based on a bigger DataFrame with all kind of dates.
I read my initial data from a csv with the method data = pd.read_csv() and then my "Timestamp" column is set to datetime this way : data["Timestamp"] = pd.to_datetime(data["Timestamp"]).
code :
import pandas as pd
from datetime import datetime
import calendar
data = pd.read_csv("stat.csv")
data["Timestamp"] = pd.to_datetime(data["Timestamp"])
dataMonday = data.loc[calendar.day_name[datetime.weekday(data["Timestamp"])] == "Monday"]
Now, here is the output :
TypeError: descriptor 'weekday' for 'datetime.date' objects doesn't apply to a 'Series' object
The only way I've found so far is to iterate with a for loop in the Timestamp column, but it appears to be a bad solution since I can hardly create another Dataframe based on that.
Here is a way to add a column with the name of the weekday. This approach uses the .dt date accessor, and operates on the series, which is fast.
import pandas as pd
n = 8
t = pd.DataFrame({'x': [*range(n)],
'Timestamp': pd.date_range(start='2020-01-01', periods=n, freq='D')})
t['Timestamp'] = pd.to_datetime(t['Timestamp']) # not needed in this example
t['weekday'] = t['Timestamp'].dt.day_name()
print(t)
x Timestamp weekday
0 0 2020-01-01 Wednesday
1 1 2020-01-02 Thursday
2 2 2020-01-03 Friday
3 3 2020-01-04 Saturday
4 4 2020-01-05 Sunday
5 5 2020-01-06 Monday
6 6 2020-01-07 Tuesday
7 7 2020-01-08 Wednesday
As the error suggests, you're trying to apply the weekday function to the whole series. Instead, you need to apply the weekday method element-wise over the series. apply is the tool for this:
dataMonday = data[data["Timestamp"].apply(datetime.weekday) == "Monday"]
My dataset has dates in the European format, and I'm struggling to convert it into the correct format before I pass it through a pd.to_datetime, so for all day < 12, my month and day switch.
Is there an easy solution to this?
import pandas as pd
import datetime as dt
df = pd.read_csv(loc,dayfirst=True)
df['Date']=pd.to_datetime(df['Date'])
Is there a way to force datetime to acknowledge that the input is formatted at dd/mm/yy?
Thanks for the help!
Edit, a sample from my dates:
renewal["Date"].head()
Out[235]:
0 31/03/2018
2 30/04/2018
3 28/02/2018
4 30/04/2018
5 31/03/2018
Name: Earliest renewal date, dtype: object
After running the following:
renewal['Date']=pd.to_datetime(renewal['Date'],dayfirst=True)
I get:
Out[241]:
0 2018-03-31 #Correct
2 2018-04-01 #<-- this number is wrong and should be 01-04 instad
3 2018-02-28 #Correct
Add format.
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
You can control the date construction directly if you define separate columns for 'year', 'month' and 'day', like this:
import pandas as pd
df = pd.DataFrame(
{'Date': ['01/03/2018', '06/08/2018', '31/03/2018', '30/04/2018']}
)
date_parts = df['Date'].apply(lambda d: pd.Series(int(n) for n in d.split('/')))
date_parts.columns = ['day', 'month', 'year']
df['Date'] = pd.to_datetime(date_parts)
date_parts
# day month year
# 0 1 3 2018
# 1 6 8 2018
# 2 31 3 2018
# 3 30 4 2018
df
# Date
# 0 2018-03-01
# 1 2018-08-06
# 2 2018-03-31
# 3 2018-04-30
I am attempting to add a year to a column of dates in a pandas dataframe, but when I use pd.to_timedelta I get additional hours & minutes. I know I could take the updated time and truncate the hours, but I feel like there must be a way to add a year precisely. My attempt as follows:
import pandas as pd
dates = pd.DataFrame({'date':['20170101','20170102','20170103']})
dates['date'] = pd.to_datetime(dates['date'], format='%Y%m%d')
dates['date2'] = dates['date'] + pd.to_timedelta(1, unit='y')
dates
yields:
Out[1]:
date date2
0 2017-01-01 2018-01-01 05:49:12
1 2017-01-02 2018-01-02 05:49:12
2 2017-01-03 2018-01-03 05:49:12
How can I add a year without adding 05:49:12 HH:mm:ss?
In [99]: dates['date'] + pd.offsets.DateOffset(years=1)
Out[99]:
0 2018-01-01
1 2018-01-02
2 2018-01-03
Name: date, dtype: datetime64[ns]
leap year check:
In [100]: pd.to_datetime(['2011-02-28', '2012-02-29']) + pd.offsets.DateOffset(years=1)
Out[100]: DatetimeIndex(['2012-02-28', '2013-02-28'], dtype='datetime64[ns]', freq=None)
You can normalize via pd.Series.dt.normalize:
dates['date2'] = (dates['date'] + pd.to_timedelta(1, unit='y')).dt.normalize()
Or convert datetime to date
dates['date'] = dates['date'].apply(lambda a: a.date())
Edit: This works if you don't care about leap years, etc. Otherwise see jp_data_analysis's answer.
You can use 365 and unit='d':
pd.to_timedelta(365, unit='d')
You can access the components of a date (year, month and day) using code of the form dataframe["column"].dt.component.
For example, the month component is dataframe["column"].dt.month, and the year component is dataframe["column"].dt.year.