Pivoting out Datetimes and then calling an operation in Pandas/Python - python

I've seen several articles about using datetime and dateutil to convert into datetime objects.
However, I can't seem to figure out how to convert a column into a datetime object so I can pivot out that columns and perform operations against it.
I have a dataframe as such:
Col1 Col 2
a 1/1/2013
a 1/12/2013
b 1/5/2013
b 4/3/2013 ....etc
What I want is :
pivott = pivot_table( df, rows ='Col1', values='Col2', and then I want to get the range of dates for each value in Col1)
I am not sure how to correctly approach this. Even after using
df['Col2']= pd.to_datetime(df['Col2'])
I couldn't do operations against the dates since they are strings...
Any advise?

Use datetime.strptime
import pandas as pd
from datetime import datetime
df = pd.read_csv('somedata.csv')
convertdatetime = lambda d: datetime.strptime(d,'%d/%m/%Y')
converted = df['DATE_TIME_IN_STRING'].apply(convertdatetime)
converted[:10] # you should be getting dtype: datetime64[ns]

Related

Convert mm-yyyy to datetime datatype in Python

I am trying to convert a datetime datatype of the form 24/12/2021 07:24:00 to mm-yyyy format which is 12-2021 with datetime datatype. I need the mm-yyyy in datetime format in order to sort the column 'Month-Year' in a time series. I have tried
import pandas as pd
from datetime import datetime
df = pd.read_excel('abc.xlsx')
df['Month-Year'] = df['Due Date'].map(lambda x: x.strftime('%m-%y'))
df.set_index(['ID', 'Month-Year'], inplace=True)
df.sort_index(inplace=True)
df
The column 'Month-Year' does not sort in time series because 'Month-Year' is of object datatype. How do I please convert 'Month-Year' column to datetime datatype?
I have been able to obtain a solution to the problem.
df['month_year'] = pd.to_datetime(df['Due Date']).dt.to_period('M')
I got this from the link below
https://www.interviewqs.com/ddi-code-snippets/extract-month-year-pandas
df['Month-Year']=pd.to_datetime(df['Month-Year']).dt.normalize()
will convert the Month-Year to datetime64[ns].
Use it before sorting.

Remove time from pandas dataframe datetime64[ns] index

I am trying to merge two pandas dataframes, and to do this I want to make it so that they both have the same index. The problem is, one df has an index of datatype object which just includes the date while the other df has an index of datatype datetime64[ns] which includes the date and time. Is there a way to make these both the same data type so that I can merge the two dataframes?
Convert both date types into a pandas datetime format and convert them with having just dates.
df['date_only'] = df['dates'].dt.date
You could convert a date and time format to just date as below
import pandas as pd
date_n_time='2015-01-08 22:44:09'
date=pd.to_datetime(date_n_time).date()
make your index as a column using
df.reset_index()
set it back to index using
df.set_index()

Pandas to_datetime function resulting in Unix timestamp instead of Datetime for certain date strings

I am running into an issue where the Pandas to_datetime function results in a Unix timestamp instead of a datetime object for certain rows. The date format in rows that do convert to datetime and rows that convert to Unix timestamp as int appear to be identical. When the problem occurs it seems to affect all the dates in the row.
For example, :
2019-01-02T10:12:28.64Z (stored as str) ends up as 1546424003423000000
While
2019-09-17T11:28:49.35Z (stored as str) converts to a datetime object.
Another date in the same row is 2019-01-02T10:13:23.423Z (stored as str) which is converting to a timestamp as well.
There isn't much code to look at, the conversion happens on a single line:
full_df.loc[mask, 'transaction_changed_datetime'] = pd.to_datetime(full_df['SaleItemChangedOn']) and
full_df.loc[pd.isnull(full_df['completed_date']), 'completed_date'] = pd.to_datetime(full_df['SaleCompletedOn']
I've tried with errors='coerce' on as well but the result is the same. I can deal with this problem later in the code, but I would really like to understand why this is happening.
Edit
As requested, this is the MRE to reproduces the issue on my computer. Some notes on this:
The mask is somehow involved. If I remove the mask it converts fine.
If I only pass in the first row in the Dataframe (single row Dataframe) it converts fine.
import pandas as pd
from pandas import NaT, Timestamp
debug_dict = {'SaleItemChangedOn': ['2019-01-02T10:12:28.64Z', '2019-01-02T10:12:28.627Z'],
'transaction_changed_datetime': [NaT, Timestamp('2019-01-02 11:58:47.900000+0000', tz='UTC')]}
df = pd.DataFrame(debug_dict)
mask = (pd.isnull(df['transaction_changed_datetime']))
df.loc[mask, 'transaction_changed_datetime'] = pd.to_datetime(df['SaleItemChangedOn'])```
When I try the examples you mention:
import numpy as np
import pandas as pd
df = pd.DataFrame({'a':['2019-01-02T10:12:28.64Z', '2019-09-17T11:28:49.35Z', np.nan]})
pd.to_datetime(df['a'])
There doesn't seem to be any issue:
Out[74]:
0 2019-01-02 10:12:28.640000+00:00
1 2019-09-17 11:28:49.350000+00:00
2 NaT
Name: a, dtype: datetime64[ns, UTC]
Could you provide an MRE?
You might want to check if you have more than one column with the same name which is being sent to pd.to_datetime. It solved the datetime being converted to timestamp problem for me.
This appears to have been a bug in Panda that has been fixed with the release of V1.0. The example code above now produces the expected results.

How to convert timedelta to time of day in pandas?

I have a SQL table that contains data of the mySQL time type as follows:
time_of_day
-----------
12:34:56
I then use pandas to read the table in:
df = pd.read_sql('select * from time_of_day', engine)
Looking at df.dtypes yields:
time_of_day timedelta64[ns]
My main issue is that, when writing my df to a csv file, the data comes out all messed up, instead of essentially looking like my SQL table:
time_of_day
0 days 12:34:56.000000000
I'd like to instead (obviously) store this record as a time, but I can't find anything in the pandas docs that talk about a time dtype.
Does pandas lack this functionality intentionally? Is there a way to solve my problem without requiring janky data casting?
Seems like this should be elementary, but I'm confounded.
Pandas does not support a time dtype series
Pandas (and NumPy) do not have a time dtype. Since you wish to avoid Pandas timedelta, you have 3 options: Pandas datetime, Python datetime.time, or Python str. Below they are presented in order of preference. Let's assume you start with the following dataframe:
df = pd.DataFrame({'time': pd.to_timedelta(['12:34:56', '05:12:45', '15:15:06'])})
print(df['time'].dtype) # timedelta64[ns]
Pandas datetime series
You can use Pandas datetime series and include an arbitrary date component, e.g. today's date. Underlying such a series are integers, which makes this solution the most efficient and adaptable.
The default date, if unspecified, is 1-Jan-1970:
df['time'] = pd.to_datetime(df['time'])
print(df)
# time
# 0 1970-01-01 12:34:56
# 1 1970-01-01 05:12:45
# 2 1970-01-01 15:15:06
You can also specify a date, such as today:
df['time'] = pd.Timestamp('today').normalize() + df['time']
print(df)
# time
# 0 2019-01-02 12:34:56
# 1 2019-01-02 05:12:45
# 2 2019-01-02 15:15:06
Pandas object series of Python datetime.time values
The Python datetime module from the standard library supports datetime.time objects. You can convert your series to an object dtype series containing pointers to a sequence of datetime.time objects. Operations will no longer be vectorised, but each underlying value will be represented internally by a number.
df['time'] = pd.to_datetime(df['time']).dt.time
print(df)
# time
# 0 12:34:56
# 1 05:12:45
# 2 15:15:06
print(df['time'].dtype)
# object
print(type(df['time'].at[0]))
# <class 'datetime.time'>
Pandas object series of Python str values
Converting to strings is only recommended for presentation purposes that are not supported by other types, e.g. Pandas datetime or Python datetime.time. For example:
df['time'] = pd.to_datetime(df['time']).dt.strftime('%H:%M:%S')
print(df)
# time
# 0 12:34:56
# 1 05:12:45
# 2 15:15:06
print(df['time'].dtype)
# object
print(type(df['time'].at[0]))
# <class 'str'>
it's a hack, but you can pull out the components to create a string and convert that string to a datetime.time(h,m,s) object
def convert(td):
time = [str(td.components.hours), str(td.components.minutes),
str(td.components.seconds)]
return datetime.strptime(':'.join(time), '%H:%M:%S').time()
df['time'] = df['time'].apply(lambda x: convert(x))
found a solution, but i feel like it's gotta be more elegant than this:
def convert(x):
return pd.to_datetime(x).strftime('%H:%M:%S')
df['time_of_day'] = df['time_of_day'].apply(convert)
df['time_of_day'] = pd.to_datetime(df['time_of_day']).apply(lambda x: x.time())
Adapted this code

Frequency of a data frame

I have a data frame indexed with a date (Python datetime object). How could I find the frequency as the number of months of data in the data frame?
I tried the attribute data_frame.index.freq, but it returns a none value. I also tried asfreq function using data_frame.asfreq('M',how={'start','end'} but it does not return the expected results. Please advise how I can get the expected results.
You want to convert you index of datetimes to a DatetimeIndex, the easiest way is to use to_datetime:
df.index = pd.to_datetime(df.index)
Now you can do timeseries/frame operations, like resample or TimeGrouper.
If your data has a consistent frequency, then this will be df.index.freq, if it doesn't (e.g. if some days are missing) then df.index.freq will be None.
You probably want to be use pandas Timestamp for your index instead of datetime to use 'freq'. See example below
import pandas as pd
dates = pd.date_range('2012-1-1','2012-2-1')
df = pd.DataFrame(index=dates)
print (df.index.freq)
yields,
<Day>
You can easily convert your dataframe like so,
df.index = [pd.Timestamp(d) for d in df.index]

Categories