I have a column from a pandas Dataframe that I want to use as input for np.busday_count:
np.busday_count(df['date_from'].tolist(), df['date_to_plus_one'].tolist(), weekmask='1000000')
I have always use .tolist() but since one of the last updates this results in an error:
> TypeError: Iterator operand 0 dtype could not be cast from
> dtype('<M8[us]') to dtype('<M8[D]') according to the rule 'safe'
The column df['date_from']is of type dtype: datetime64[ns].
Any tips or solution for this?
try Using
df['date_from'].date()
The column df['date_from'] with datatype dtype: datetime64[ns] contains data like
2018-04-06 00:00:00 its a timestamp
But np.busyday_count takes datetime.date as input like "2018-04-06"
Related
I have a csv file with a column "graduated" which either shows the date of graduation, or 0 if there is no graduation yet.
df.dtypes return 'object' for this column, I want to turn all the dates into a '1' (indicating that the person in that column graduated). How can I do that ?
Use pandas.to_datetime to convert dates and convert to boolean series. Then, cast it to int to get the desired result.
pd.to_datetime(df.graduated, errors='coerce').notnull().astype(int)
Currently, dataframe.dtypes outputs:
age int64
gender object
date datetime64[ns]
time datetime64[ns]
dtype: object
I want the output to only have date and time columns, or conversely, only the columns with type datetime64[ns], i.e. the output should be:
date datetime64[ns]
time datetime64[ns]
dtype: object
I tried various methods such as using dataframe.select_dtypes, but none of them exactly match the required output.
You may select a subpart of your df, with fewer columns :
df[['date', 'time']].dtypes
To be more generic and get those with datetime64 type, do
import numpy as np
tt = df.dtypes
print(tt[tt.apply(lambda x: np.issubdtype(x, np.datetime64))]
I have a column in my dataframe in this formate:
2013-01-25 00:00:00+00:00
non-null datetime64[ns, UTC]
I would like to convert this to daily format, like this:
2013-01-25
I tried this approach, but have been receiving an error:
df['date_column'].date()
AttributeError: 'Series' object has no attribute 'date'
The error message is not quite clear to me, because the object should be a datetime object according to df.info()
Can anyone suggest an approach of how to do this?
In short: It is not advisable to convert to date objects, since then you lose a lot of functionality to inspect the dates. It might be better to just dt.floor(..) [pandas-doc], or dt.normalize(..) [pandas-doc].
You can convert a series of strings with pd.to_datetime(..) [pandas-doc], for example:
>>> pd.to_datetime(pd.Series(['2013-01-25 00:00:00+00:00']))
0 2013-01-25
dtype: datetime64[ns]
We can then later convert this to date objects with .dt.date [pandas-doc]:
>>> pd.to_datetime(pd.Series(['2013-01-25 00:00:00+00:00'])).dt.date
0 2013-01-25
dtype: object
Note that a date is not a native Numpy type, and thus it will use a Python date(..) object. A disadvantage of this is that you can no longer process the objects are datetime-like objects. So the Series more or less loses a lot of functionality.
It might be better to just dt.floor(..) [pandas-doc] to the day, and thus keep it a datetime64[ns] object:
>>> pd.to_datetime(pd.Series(['2013-01-25 00:00:00+00:00'])).dt.floor(freq='d')
0 2013-01-25
dtype: datetime64[ns]
We can use dt.normalize(..) [pandas-doc] as well. This just sets the time component to 0:00:00, and leaves the timezone unaffected:
>>> pd.to_datetime(pd.Series(['2013-01-25 00:00:00+00:00'])).dt.normalize()
0 2013-01-25
dtype: datetime64[ns]
I have to compare two columns containing date values and find the difference between the 2 dates.
Out of the 2 columns one is of datetime type however another is an object type. W
hen trying to convert the object type to datetime using:
final['Valid to']=pd.to_datetime(final['Valid to'])
I am getting an error:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31
00:00:00
How to convert the column of object type to datetime so that i can compare and get the required result?
use format parameter to provide the correct format of the string so that to_datetime can understand what type of string it is converting into datetime object
in your case it would be like
pd.to_datetime(s, format='%Y-%m-%d %H:%M:%S')
please post the sample data for the correct answer as someone have already written in the comment, that would be helpful.
I have a dataframe that is called dfactual this dataframe has a column ForeCastEndDate, so
dfactual['ForeCastEndDate'] it contains:
311205
311205
This must be a date in the format 31-12-2005, but the current format is int64. I tried the following:
dfactual['ForeCastEndDate'] = pd.to_datetime(pd.Series(dfactual['ForecastEndDate']))
I tried also to add the format command to it, but it didn't work out the format stays the same, int64.
How should I do it?
You can't use to_datetime with dtypes that are not str so you need to convert the dtype using astype first and then you can use to_datetime and pass the format string:
In [154]:
df = pd.DataFrame({'ForecastEndDate':[311205]})
pd.to_datetime(df['ForecastEndDate'].astype(str), format='%d%m%y')
Out[154]:
0 2005-12-31
Name: ForecastEndDate, dtype: datetime64[ns]