Modifying output of dtypes - python

Currently, dataframe.dtypes outputs:
age int64
gender object
date datetime64[ns]
time datetime64[ns]
dtype: object
I want the output to only have date and time columns, or conversely, only the columns with type datetime64[ns], i.e. the output should be:
date datetime64[ns]
time datetime64[ns]
dtype: object
I tried various methods such as using dataframe.select_dtypes, but none of them exactly match the required output.

You may select a subpart of your df, with fewer columns :
df[['date', 'time']].dtypes
To be more generic and get those with datetime64 type, do
import numpy as np
tt = df.dtypes
print(tt[tt.apply(lambda x: np.issubdtype(x, np.datetime64))]

Related

Convert datetime ns to daily format

I have a column in my dataframe in this formate:
2013-01-25 00:00:00+00:00
non-null datetime64[ns, UTC]
I would like to convert this to daily format, like this:
2013-01-25
I tried this approach, but have been receiving an error:
df['date_column'].date()
AttributeError: 'Series' object has no attribute 'date'
The error message is not quite clear to me, because the object should be a datetime object according to df.info()
Can anyone suggest an approach of how to do this?
In short: It is not advisable to convert to date objects, since then you lose a lot of functionality to inspect the dates. It might be better to just dt.floor(..) [pandas-doc], or dt.normalize(..) [pandas-doc].
You can convert a series of strings with pd.to_datetime(..) [pandas-doc], for example:
>>> pd.to_datetime(pd.Series(['2013-01-25 00:00:00+00:00']))
0 2013-01-25
dtype: datetime64[ns]
We can then later convert this to date objects with .dt.date [pandas-doc]:
>>> pd.to_datetime(pd.Series(['2013-01-25 00:00:00+00:00'])).dt.date
0 2013-01-25
dtype: object
Note that a date is not a native Numpy type, and thus it will use a Python date(..) object. A disadvantage of this is that you can no longer process the objects are datetime-like objects. So the Series more or less loses a lot of functionality.
It might be better to just dt.floor(..) [pandas-doc] to the day, and thus keep it a datetime64[ns] object:
>>> pd.to_datetime(pd.Series(['2013-01-25 00:00:00+00:00'])).dt.floor(freq='d')
0 2013-01-25
dtype: datetime64[ns]
We can use dt.normalize(..) [pandas-doc] as well. This just sets the time component to 0:00:00, and leaves the timezone unaffected:
>>> pd.to_datetime(pd.Series(['2013-01-25 00:00:00+00:00'])).dt.normalize()
0 2013-01-25
dtype: datetime64[ns]

Trying to convert object to DateTime, getting TypeError

I have two dataframes (see here), which contain dates and times.
The details for the first data frame are:
Date object
Time object
Channel1 float64
Channel2 float64
Channel3 float64
Channel4 float64
Channel5 float64
dtype: object
The details for the second data frame are:
Date object
Time object
Mean float64
STD float64
Min float64
Max float64
dtype: object
I am trying to convert the times to a DateTime object so that I can then do a calculation to make the time relative to the first time instance (i.e. the earliest time would become 0, and then all others would be seconds after the start).
When I try (from here):
df['Time'] = df['Time'].apply(pd.Timestamp)
I get this error:
TypeError: Cannot convert input [15:35:45] of type <class 'datetime.time'> to Timestamp
When I try (from here):
df['Time'] = pd.to_datetime(df['Time'])
but it gives me this error:
TypeError: <class 'datetime.time'> is not convertible to datetime
Any suggestions would be appreciated.
the reason why you are getting the error
TypeError: <class 'datetime.time'> is not convertible to datetime
is literally what it says, your df['Time'] contains datetime.time object and so, cannot be converted to a datetime.datetime or Timestamp object, both of which require the date component to be passed as well.
The solution is to combine df['Date'] and df['Time'] and then, pass it to pd.to_datetime. See below code sample:
df = pd.DataFrame({'Date': ['3/11/2000', '3/12/2000', '3/13/2000'],
'Time': ['15:35:45', '18:35:45', '05:35:45']})
df['datetime'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
Output
Date Time datetime
0 3/11/2000 15:35:45 2000-03-11 15:35:45
1 3/12/2000 18:35:45 2000-03-12 18:35:45
2 3/13/2000 05:35:45 2000-03-13 05:35:45
In the end my solution was different for the two dataframes which I had.
For the first dataframe, the solution which combines the Date column with the Time column worked well:
df['Date Time'] = df['Date'] + ' ' + df['Time']
After the two columns are combined, the following code is used to turn it into a datetime object (note the format='%d/%m/%Y %H:%M:%S' part is required because otherwise it confuses the month/date and uses the US formatting, i.e. it thinks 11/12/2018 is 12th of November, and not 11th of December):
df['Date Time'] = pd.to_datetime(df['Date Time'], format='%d/%m/%Y %H:%M:%S')
For my second dataframe, I went up earlier in my data processing journey and found an option which saves the date and month to a single column directly. After which the following code converted it to a datetime object:
df['Date Time'] = df['Date Time'].apply(pd.Timestamp)

Python np.busday_count with datetime64[ns] as input

I have a column from a pandas Dataframe that I want to use as input for np.busday_count:
np.busday_count(df['date_from'].tolist(), df['date_to_plus_one'].tolist(), weekmask='1000000')
I have always use .tolist() but since one of the last updates this results in an error:
> TypeError: Iterator operand 0 dtype could not be cast from
> dtype('<M8[us]') to dtype('<M8[D]') according to the rule 'safe'
The column df['date_from']is of type dtype: datetime64[ns].
Any tips or solution for this?
try Using
df['date_from'].date()
The column df['date_from'] with datatype dtype: datetime64[ns] contains data like
2018-04-06 00:00:00 its a timestamp
But np.busyday_count takes datetime.date as input like "2018-04-06"

How to apply to_datetime on pandas Dataframe column?

I have a dataframe with Timestamp entries in one column, created from strings like so:
df = pd.DataFrame({"x": pd.to_datetime("MARCH2016")})
Now I want to select from df based on month, cutting across years, by accessing the .month attribute of the datetime object. However, to_datetime actually created a Timestamp object from the string, and I can't seem to coerce it to datetime. The following works as expected:
type(df.x[0].to_datetime()) # gives datetime object
but using apply (which in my real life example of course I want to do given that I have more than one row) doesn't:
type(df.x.apply(pd.to_datetime)[0]) # returns Timestamp
What am I missing?
The fact that it's a TimeStamp is irrelevant here, you can still access the month attribute using .dt accessor:
In [79]:
df = pd.DataFrame({"x": [pd.to_datetime("MARCH2016")]})
df['x'].dt.month
Out[79]:
0 3
Name: x, dtype: int64

String date to date (pandas)

I have a dataframe that is called dfactual this dataframe has a column ForeCastEndDate, so
dfactual['ForeCastEndDate'] it contains:
311205
311205
This must be a date in the format 31-12-2005, but the current format is int64. I tried the following:
dfactual['ForeCastEndDate'] = pd.to_datetime(pd.Series(dfactual['ForecastEndDate']))
I tried also to add the format command to it, but it didn't work out the format stays the same, int64.
How should I do it?
You can't use to_datetime with dtypes that are not str so you need to convert the dtype using astype first and then you can use to_datetime and pass the format string:
In [154]:
df = pd.DataFrame({'ForecastEndDate':[311205]})
pd.to_datetime(df['ForecastEndDate'].astype(str), format='%d%m%y')
Out[154]:
0 2005-12-31
Name: ForecastEndDate, dtype: datetime64[ns]

Categories