I have the data frame that goes more or less like this:
Date x y z
1998-01-30 000445 Abbey National Plc 2.24455118179321
1998-01-30 001097 Mytravel Group 1.55792689323425
The 'Date' column is datetime64[ns] type and I would like to offset the 'Date' column so that my date would shift to the beginning of the month, so this should go like this:
df['New Date'] = df['Date'].offsets.MonthBegin()
But returns an error:
AttributeError: 'Series' object has no attribute 'offsets'
Why so? a single df column is series, right?
type(df['Date'])
Out[83]: pandas.core.series.Series
You could try
df['New_date'] = df.set_index('Date').index.to_period('M').to_timestamp('D')
This assumes that Date is already a datetime object. If it isn't, then first convert using.
df['Date'] = pd.to_datetime(df['Date'])
It's not essential, but good practice to add an underscore in between column names.
So New_date instead of New date. Possibly make this lowercase also.
Related
i have a variable consisting of 300k records with dates and the date look like
2015-02-21 12:08:51
from that date i want to remove time
type of date variable is pandas.core.series.series
This is the way i tried
from datetime import datetime,date
date_str = textdata['vfreceiveddate']
format_string = "%Y-%m-%d"
then = datetime.strftime(date_str,format_string)
some Random ERROR
In the above code textdata is my datasetname and vfreceived date is a variable consisting of dates
How can i write the code to remove the time from the datetime.
Assuming all your datetime strings are in a similar format then just convert them to datetime using to_datetime and then call the dt.date attribute to get just the date portion:
In [37]:
df = pd.DataFrame({'date':['2015-02-21 12:08:51']})
df
Out[37]:
date
0 2015-02-21 12:08:51
In [39]:
df['date'] = pd.to_datetime(df['date']).dt.date
df
Out[39]:
date
0 2015-02-21
EDIT
If you just want to change the display and not the dtype then you can call dt.normalize:
In[10]:
df['date'] = pd.to_datetime(df['date']).dt.normalize()
df
Out[10]:
date
0 2015-02-21
You can see that the dtype remains as datetime:
In[11]:
df.dtypes
Out[11]:
date datetime64[ns]
dtype: object
You're calling datetime.datetime.strftime, which requires as its first argument a datetime.datetime instance, because it's an unbound method; but you're passing it a string instead of a datetime instance, whence the obvious error.
You can work purely at a string level if that's the result you want; with the data you give as an example, date_str.split()[0] for example would be exactly the 2015-02-21 string you appear to require.
Or, you can use datetime, but then you need to parse the string first, not format it -- hence, strptime, not strftime:
dt = datetime.strptime(date_str, '%Y-%m-%d %H:%M:%S')
date = dt.date()
if it's a datetime.date object you want (but if all you want is the string form of the date, such an approach might be "overkill":-).
simply writing
date.strftime("%d-%m-%Y") will remove the Hour min & sec
I want to plot a line graph for my data however the x-axis becomes extremely tight together due to the long date format (Y-M-D), and I've checked the data type for 'date' and it returned:
In[200]: df['date'].dtypes
Out[200]: dtype('O')
So my 'date' values are:
date
----
2020-04-12
2020-05-13
2020-02-02
but I want to extract only the month and day to make the column look like
date
----
04-12
05-13
02-02
How should I do this? I apologise for dupes as I couldn't find anything similar due to my datatype being 'O'. Appreciate all the help!
Use Series.str.split and select second ist by indexing str[1]:
df['date'] = df['date'].str.split('-', n=1).str[1]
#if dates objects
#df['date'] = df['date'].astype(str).str.split('-', n=1).str[1]
print (df)
date
0 04-12
1 05-13
2 02-02
Or convert to datetimes by to_datetime with Series.dt.strftime:
df['date'] = pd.to_datetime(df['date']).dt.strftime('%m-%d')
I am trying to reduce the code bloat in my project for the process of creating various date columns (weekday, business day, day index, week index) and I was wondering how I can take the index of my dataframe and build datetime attribute columns from the index.
I thought I could access the .index or index.values and then reference the datetime attributes like month, weekday, etc., but it doesn't appear that Index has those attributes. Would I need to convert the index values to a new list and then build the columns off of that?
Here is my code:
historicals = pd.read_csv("2018-2019_sessions.csv", index_col="date", na_values=0)
type(historicals)
// date formate = 2018-01-01, 2018-01-02, etc.
# Additional Date Fields
date_col = historicals.index
date_col.weekday
// AttributeError: 'Index' object has no attribute 'weekday'
Your index is in string format. You historicals.index probably looks like this
print(historicals.index)
Index(['2018-01-01', '2018-01-02'], dtype='object')
You need to convert it to datetimeindex and get its weekday attribute and assign to new column
historicals['weekday'] = pd.to_datetime(historicals.index).weekday
Or
date_col = pd.to_datetime(historicals.index)
print(date_col.weekday)
I have a Pandas dataframe with raw dates formatted as such "19990130". I want to convert these into new columns: 'year', 'month', and 'dayofweek'.
I tried using the following:
pd.to_datetime(df['date'], format='%Y%m%d', errors='ignore').values
Which does give me an array of datetime objects. However, the next step I tried was using .to_pydatetime() and then .year to try to get the year out, like this:
pd.to_datetime(df['date'], format='%Y%m%d', errors='ignore').values.to_pydatetime().year
This works when I test a single value, but with a Pandas dataframe. I get:
'numpy.ndarray' object has no attribute 'to_pydatetime'
What's the easiest way to extract the year, month, and day of week from this data?
Try:
s = pd.to_datetime(df['date'], format='%Y%m%d', errors='coerce')
s.dt.year
# or
# s.dt.month, etc
I have a dataframe with Timestamp entries in one column, created from strings like so:
df = pd.DataFrame({"x": pd.to_datetime("MARCH2016")})
Now I want to select from df based on month, cutting across years, by accessing the .month attribute of the datetime object. However, to_datetime actually created a Timestamp object from the string, and I can't seem to coerce it to datetime. The following works as expected:
type(df.x[0].to_datetime()) # gives datetime object
but using apply (which in my real life example of course I want to do given that I have more than one row) doesn't:
type(df.x.apply(pd.to_datetime)[0]) # returns Timestamp
What am I missing?
The fact that it's a TimeStamp is irrelevant here, you can still access the month attribute using .dt accessor:
In [79]:
df = pd.DataFrame({"x": [pd.to_datetime("MARCH2016")]})
df['x'].dt.month
Out[79]:
0 3
Name: x, dtype: int64