Preserving a Month and Day as Date Format in Python Pandas - python

I'm trying to take a column in yyyy-mm-dd format and convert to it mm-dd format (or MON DD, that works too), while preserving a date or numeric format. I've tried to use pd.to_datetime, but it seems that doesn't work because it requires the year, so it ends up padding the new columns with year 1900. I'm not looking for conversion in which the new column is a object, because I need to use the column to plot later on. What's the best approach? Data frame is pretty small.
OldDate NewDate1 NewDate2 NewDate3
2017-01-02 01-02 01/02 Jan 2
2015-05-14 05-14 05/14 May 14

Let's say you have:
df = pd.DataFrame({"OldDate":["2017-01-02","2015-05-14"]})
df
OldDate
0 2017-01-02
1 2015-05-14
Then you can do:
from datetime import datetime as dt
df['OldDate'] = df.OldDate.apply(lambda s: dt.strptime(s, "%Y-%m-%d"))
df['NewDate1'] = df.OldDate.dt.strftime("%m-%d")
df['NewDate2'] = df.OldDate.dt.strftime("%m/%d")
df['NewDate3'] = df.OldDate.dt.strftime("%b %d")
df
OldDate NewDate1 NewDate2 NewDate3
0 2017-01-02 01-02 01/02 Jan 02
1 2015-05-14 05-14 05/14 May 14

You can use the substring concept on OldDate as below:
OldDate = '2017-01-02'
NewDate1=OldDate[5:]
print(NewDate1) # This will give result as : "01-02"
NewDate2 = OldDate[5:7] + "/" + OldDate[8:10]
print(NewDate2) # This will give result as "01/02"

Related

How do I adjust the dates of a column in pandas according to a threshhold?

I have a data frame with a datetime column like so:
dates
0 2017-09-19
1 2017-08-28
2 2017-07-13
I want to know if there is a way to adjust the dates with this condition:
If the day of the date is before 15, then change the date to the end of last month.
If the day of the date is 15 or after, then change the date to the end of the current month.
My desired output would look something like this:
dates
0 2017-09-30
1 2017-08-31
2 2017-06-30
Using np.where and Josh's suggestion of MonthEnd, this can be simplified a bit.
Given:
dates
0 2017-09-19
1 2017-08-28
2 2017-07-13
Doing:
from pandas.tseries.offsets import MonthEnd
# Where the day is less than 15,
# Give the DateEnd of the previous month.
# Otherwise,
# Give the DateEnd of the current month.
df.dates = np.where(df.dates.dt.day.lt(15),
df.dates.add(MonthEnd(-1)),
df.dates.add(MonthEnd(0)))
print(df)
# Output:
dates
0 2017-09-30
1 2017-08-31
2 2017-06-30
Easy with MonthEnd
Let's set up the data:
dates = pd.Series({0: '2017-09-19', 1: '2017-08-28', 2: '2017-07-13'})
dates = pd.to_datetime(dates)
Then:
from pandas.tseries.offsets import MonthEnd
pre, post = dates.dt.day < 15, dates.dt.day >= 15
dates.loc[pre] = dates.loc[pre] + MonthEnd(-1)
dates.loc[post] = dates.loc[post] + MonthEnd(1)
Explanation: create masks (pre and post) first. Then use the masks to either get month end for current or previous month, as appropriate.

Python - Converting a column with weekly data to a datetime object

I have the following date column that I would like to transform to a pandas datetime object. Is it possible to do this with weekly data? For example, 1-2018 stands for week 1 in 2018 and so on. I tried the following conversion but I get an error message: Cannot use '%W' or '%U' without day and year
import pandas as pd
df1 = pd.DataFrame(columns=["date"])
df1['date'] = ["1-2018", "1-2018", "2-2018", "2-2018", "3-2018", "4-2018", "4-2018", "4-2018"]
df1["date"] = pd.to_datetime(df1["date"], format = "%W-%Y")
You need to add a day to the datetime format
df1["date"] = pd.to_datetime('0' + df1["date"], format='%w%W-%Y')
print(df1)
Output
date
0 2018-01-07
1 2018-01-07
2 2018-01-14
3 2018-01-14
4 2018-01-21
5 2018-01-28
6 2018-01-28
7 2018-01-28
As the error message says, you need to specify the day of the week by adding %w :
df1["date"] = pd.to_datetime( '0'+df1.date, format='%w%W-%Y')

How to get a date from year, month, week of month and Day of week in Pandas?

I have a Pandas dataframe, which looks like below
I want to create a new column, which tells the exact date from the information from all the above columns. The code should look something like this:
df['Date'] = pd.to_datetime(df['Month']+df['WeekOfMonth']+df['DayOfWeek']+df['Year'])
I was able to find a workaround for your case. You will need to define the dictionaries for the months and the days of the week.
month = {"Jan":"01", "Feb":"02", "March":"03", "Apr": "04", "May":"05", "Jun":"06", "Jul":"07", "Aug":"08", "Sep":"09", "Oct":"10", "Nov":"11", "Dec":"12"}
week = {"Monday":1,"Tuesday":2,"Wednesday":3,"Thursday":4,"Friday":5,"Saturday":6,"Sunday":7}
With this dictionaries the transformation that I used with a custom dataframe was:
rows = [["Dec",5,"Wednesday", "1995"],
["Jan",3,"Wednesday","2013"]]
df = pd.DataFrame(rows, columns=["Month","Week","Weekday","Year"])
df['Date'] = (df["Year"] + "-" + df["Month"].map(month) + "-" + (df["Week"].apply(lambda x: (x - 1)*7) + df["Weekday"].map(week).apply(int) ).apply(str)).astype('datetime64[ns]')
However you have to be careful. With some data that you posted as example there were some dates that exceeds the date range. For example, for
row = ["Oct",5,"Friday","2018"]
The date displayed is 2018-10-33. I recommend using some logic to filter your data in order to avoid this kind of problems.
Let's approach it in 3 steps as follows:
Get the date of month start Month_Start from Year and Month
Calculate the date offsets DateOffset relative to Month_Start from WeekOfMonth and DayOfWeek
Get the actual date Date from Month_Start and DateOffset
Here's the codes:
df['Month_Start'] = pd.to_datetime(df['Year'].astype(str) + df['Month'] + '01', format="%Y%b%d")
import time
df['DateOffset'] = (df['WeekOfMonth'] - 1) * 7 + df['DayOfWeek'].map(lambda x: time.strptime(x, '%A').tm_wday) - df['Month_Start'].dt.dayofweek
df['Date'] = df['Month_Start'] + pd.to_timedelta(df['DateOffset'], unit='D')
Output:
Month WeekOfMonth DayOfWeek Year Month_Start DateOffset Date
0 Dec 5 Wednesday 1995 1995-12-01 26 1995-12-27
1 Jan 3 Wednesday 2013 2013-01-01 15 2013-01-16
2 Oct 5 Friday 2018 2018-10-01 32 2018-11-02
3 Jun 2 Saturday 1980 1980-06-01 6 1980-06-07
4 Jan 5 Monday 1976 1976-01-01 25 1976-01-26
The Date column now contains the dates derived from the information from other columns.
You can remove the working interim columns, if you like, as follows:
df = df.drop(['Month_Start', 'DateOffset'], axis=1)

Split Date Time string (not in usual format) and pull out month

I have a dataframe that has a date time string but is not in traditional date time format. I would like to separate out the date from the time into two separate columns. And then eventually also separate out the month.
This is what the date/time string looks like: 2019-03-20T16:55:52.981-06:00
>>> df.head()
Date Score
2019-03-20T16:55:52.981-06:00 10
2019-03-07T06:16:52.174-07:00 9
2019-06-17T04:32:09.749-06:003 1
I tried this but got a type error:
df['Month'] = pd.DatetimeIndex(df['Date']).month
This can be done just using pandas itself. You can first convert the Date column to datetime by passing utc = True:
df['Date'] = pd.to_datetime(df['Date'], utc = True)
And then just extract the month using dt.month:
df['Month'] = df['Date'].dt.month
Output:
Date Score Month
0 2019-03-20 22:55:52.981000+00:00 10 3
1 2019-03-07 13:16:52.174000+00:00 9 3
2 2019-06-17 10:32:09.749000+00:00 1 6
From the documentation of pd.to_datetime you can see a parameter:
utc : boolean, default None
Return UTC DatetimeIndex if True (converting any tz-aware datetime.datetime objects as well).

Extracting date components in pandas series

I have problems with transforming a Pandas dataframe column with dates to a number.
import matplotlib.dates
import datetime
for x in arsenalchelsea['Datum']:
year = int(x[:4])
month = int(x[5:7])
day = int(x[8:10])
hour = int(x[11:13])
minute = int(x[14:16])
sec = int(x[17:19])
arsenalchelsea['floatdate']=date2num(datetime.datetime(year, month, day, hour, minute, sec))
arsenalchelsea
I want to make a new column in my dataframe with the dates in numbers, because i want to make a line graph later with the date on the x-as.
This is the format of the date:
2017-11-29 14:06:45
Does anyone have a solution for this problem?
Slicing strings to get date components is bad practice. You should convert to datetime and extract directly.
In this case, it seems you can just use pd.to_datetime, but below I also demonstrate how you can extract the various components once you have performed the conversion.
df = pd.DataFrame({'Date': ['2017-01-15 14:55:42', '2017-11-10 12:15:21', '2017-12-05 22:05:45']})
df['Date'] = pd.to_datetime(df['Date'])
df[['year', 'month', 'day', 'hour', 'minute', 'sec']] = \
df['Date'].apply(lambda x: (x.year, x.month, x.day, x.hour, x.minute, x.second)).apply(pd.Series)
Result:
Date year month day hour minute sec
0 2017-01-15 14:55:42 2017 1 15 14 55 42
1 2017-11-10 12:15:21 2017 11 10 12 15 21
2 2017-12-05 22:05:45 2017 12 5 22 5 45

Categories