How to offset Date to the beginning of the month? - python

I have the data frame that goes more or less like this:
Date x y z
1998-01-30 000445 Abbey National Plc 2.24455118179321
1998-01-30 001097 Mytravel Group 1.55792689323425
The 'Date' column is datetime64[ns] type and I would like to offset the 'Date' column so that my date would shift to the beginning of the month, so this should go like this:
df['New Date'] = df['Date'].offsets.MonthBegin()
But returns an error:
AttributeError: 'Series' object has no attribute 'offsets'
Why so? a single df column is series, right?
type(df['Date'])
Out[83]: pandas.core.series.Series

You could try
df['New_date'] = df.set_index('Date').index.to_period('M').to_timestamp('D')
This assumes that Date is already a datetime object. If it isn't, then first convert using.
df['Date'] = pd.to_datetime(df['Date'])
It's not essential, but good practice to add an underscore in between column names.
So New_date instead of New date. Possibly make this lowercase also.

Related

How to remove the time from datetime of the pandas Dataframe. The type of the column is str and objects, but the value is dateime [duplicate]

i have a variable consisting of 300k records with dates and the date look like
2015-02-21 12:08:51
from that date i want to remove time
type of date variable is pandas.core.series.series
This is the way i tried
from datetime import datetime,date
date_str = textdata['vfreceiveddate']
format_string = "%Y-%m-%d"
then = datetime.strftime(date_str,format_string)
some Random ERROR
In the above code textdata is my datasetname and vfreceived date is a variable consisting of dates
How can i write the code to remove the time from the datetime.
Assuming all your datetime strings are in a similar format then just convert them to datetime using to_datetime and then call the dt.date attribute to get just the date portion:
In [37]:
df = pd.DataFrame({'date':['2015-02-21 12:08:51']})
df
Out[37]:
date
0 2015-02-21 12:08:51
In [39]:
df['date'] = pd.to_datetime(df['date']).dt.date
df
Out[39]:
date
0 2015-02-21
EDIT
If you just want to change the display and not the dtype then you can call dt.normalize:
In[10]:
df['date'] = pd.to_datetime(df['date']).dt.normalize()
df
Out[10]:
date
0 2015-02-21
You can see that the dtype remains as datetime:
In[11]:
df.dtypes
Out[11]:
date datetime64[ns]
dtype: object
You're calling datetime.datetime.strftime, which requires as its first argument a datetime.datetime instance, because it's an unbound method; but you're passing it a string instead of a datetime instance, whence the obvious error.
You can work purely at a string level if that's the result you want; with the data you give as an example, date_str.split()[0] for example would be exactly the 2015-02-21 string you appear to require.
Or, you can use datetime, but then you need to parse the string first, not format it -- hence, strptime, not strftime:
dt = datetime.strptime(date_str, '%Y-%m-%d %H:%M:%S')
date = dt.date()
if it's a datetime.date object you want (but if all you want is the string form of the date, such an approach might be "overkill":-).
simply writing
date.strftime("%d-%m-%Y") will remove the Hour min & sec

pandas: Extracting only the month-day values for rows from a column

I want to plot a line graph for my data however the x-axis becomes extremely tight together due to the long date format (Y-M-D), and I've checked the data type for 'date' and it returned:
In[200]: df['date'].dtypes
Out[200]: dtype('O')
So my 'date' values are:
date
----
2020-04-12
2020-05-13
2020-02-02
but I want to extract only the month and day to make the column look like
date
----
04-12
05-13
02-02
How should I do this? I apologise for dupes as I couldn't find anything similar due to my datatype being 'O'. Appreciate all the help!
Use Series.str.split and select second ist by indexing str[1]:
df['date'] = df['date'].str.split('-', n=1).str[1]
#if dates objects
#df['date'] = df['date'].astype(str).str.split('-', n=1).str[1]
print (df)
date
0 04-12
1 05-13
2 02-02
Or convert to datetimes by to_datetime with Series.dt.strftime:
df['date'] = pd.to_datetime(df['date']).dt.strftime('%m-%d')

Using DataFrame Index Dates for Date Column Creation

I am trying to reduce the code bloat in my project for the process of creating various date columns (weekday, business day, day index, week index) and I was wondering how I can take the index of my dataframe and build datetime attribute columns from the index.
I thought I could access the .index or index.values and then reference the datetime attributes like month, weekday, etc., but it doesn't appear that Index has those attributes. Would I need to convert the index values to a new list and then build the columns off of that?
Here is my code:
historicals = pd.read_csv("2018-2019_sessions.csv", index_col="date", na_values=0)
type(historicals)
// date formate = 2018-01-01, 2018-01-02, etc.
# Additional Date Fields
date_col = historicals.index
date_col.weekday
// AttributeError: 'Index' object has no attribute 'weekday'
Your index is in string format. You historicals.index probably looks like this
print(historicals.index)
Index(['2018-01-01', '2018-01-02'], dtype='object')
You need to convert it to datetimeindex and get its weekday attribute and assign to new column
historicals['weekday'] = pd.to_datetime(historicals.index).weekday
Or
date_col = pd.to_datetime(historicals.index)
print(date_col.weekday)

Convert Raw Date into Year / Month / Day of Week in Pandas

I have a Pandas dataframe with raw dates formatted as such "19990130". I want to convert these into new columns: 'year', 'month', and 'dayofweek'.
I tried using the following:
pd.to_datetime(df['date'], format='%Y%m%d', errors='ignore').values
Which does give me an array of datetime objects. However, the next step I tried was using .to_pydatetime() and then .year to try to get the year out, like this:
pd.to_datetime(df['date'], format='%Y%m%d', errors='ignore').values.to_pydatetime().year
This works when I test a single value, but with a Pandas dataframe. I get:
'numpy.ndarray' object has no attribute 'to_pydatetime'
What's the easiest way to extract the year, month, and day of week from this data?
Try:
s = pd.to_datetime(df['date'], format='%Y%m%d', errors='coerce')
s.dt.year
# or
# s.dt.month, etc

How to apply to_datetime on pandas Dataframe column?

I have a dataframe with Timestamp entries in one column, created from strings like so:
df = pd.DataFrame({"x": pd.to_datetime("MARCH2016")})
Now I want to select from df based on month, cutting across years, by accessing the .month attribute of the datetime object. However, to_datetime actually created a Timestamp object from the string, and I can't seem to coerce it to datetime. The following works as expected:
type(df.x[0].to_datetime()) # gives datetime object
but using apply (which in my real life example of course I want to do given that I have more than one row) doesn't:
type(df.x.apply(pd.to_datetime)[0]) # returns Timestamp
What am I missing?
The fact that it's a TimeStamp is irrelevant here, you can still access the month attribute using .dt accessor:
In [79]:
df = pd.DataFrame({"x": [pd.to_datetime("MARCH2016")]})
df['x'].dt.month
Out[79]:
0 3
Name: x, dtype: int64

Categories