Pandas corrupting datetime object - python

I am trying to create timezone aware date column in a pandas DataFrame. When I run the code below, the resulting pandas column does not have the same datetime as the one I inputted. What am I doing wrong here?
I am using python 3.6.2 and pandas 0.20.3
from datetime import datetime
import pandas as pd
import pytz
date_string = "12/14/2016 12:00"
timezone = pytz.timezone("US/Pacific")
input_datetime = datetime.strptime(date_string, "%m/%d/%Y %H:%M").replace(tzinfo=timezone)
df = pd.DataFrame({"datetime":[input_datetime]})
If I run that code, df['datetime'][0].minute returns 53 while input_datetime.minute returns 0.
When I don't replace the tzinfo I do not have a problem.

If you first convert your input_datetime you can call the minutes (or years etc) of your dataframe with .dt.minute
input_datetime = pd.to_datetime(datetime.strptime(date_string,
"%m/%d/%Y %H:%M")).replace(tzinfo=timezone)
df = pd.DataFrame({"datetime":[input_datetime]})
df['datetime'].dt.minute

You can use pandas .dt and tz_localize:
from datetime import datetime
import pandas as pd
date_string = "12/14/2016 12:00"
df = pd.DataFrame({'datetime':[datetime.strptime(date_string, "%m/%d/%Y %H:%M")]})
df['datetime'].dt.tz_localize('US/Pacific')
Output:
0 2016-12-14 12:00:00-08:00
Name: datetime, dtype: datetime64[ns, US/Pacific]

Related

Subtract/add days to Pandas Timestamp

How do I subtract/add days (integer) to a Pandas Timestamp object?
For example, my atomics and datatypes are (lifted from Pycharm):
startDate = {Timestamp} 2008-09-20 00:00:00
dayDistance = {int} 124
The code as pulled from the Internet returns None:
from datetime import timedelta
newDate = startDate - timedelta(days=dayDistance)
I am expecting an object of type Timestamp so it is compatible with the rest of the code downstream from here.
pandas has its own Timedelta data type:
start_date = pd.Timestamp("2008-09-20 00:00:00")
dayDistance = 124
new_date = start_date - pd.Timedelta(dayDistance, unit="d")
But Python's built-in timedelta works too:
from datetime import timedelta
new_date = start_date - timedelta(days=dayDistance)
Actually it works fine on my test. But you may use pd.Timedelta instead
import pandas as pd
from datetime import timedelta
ts = pd.Timestamp.now()
print(f"{type(ts)} {ts}")
dt_td_ts = ts - timedelta(days=10)
pd_td_ts = ts - pd.Timedelta(10, unit='d')
print(f"With datetime timedelta: ({type(dt_td_ts)}) {dt_td_ts}")
print(f"With pandas timedelta: ({type(pd_td_ts)}) {pd_td_ts}")
returns
<class 'pandas._libs.tslibs.timestamps.Timestamp'> 2022-05-15 22:20:04.596195
With datetime timedelta: (<class 'pandas._libs.tslibs.timestamps.Timestamp'>) 2022-05-05 22:20:04.596195
With pandas timedelta: (<class 'pandas._libs.tslibs.timestamps.Timestamp'>) 2022-05-05 22:20:04.596195
as you can see the type is Timestamp as expected even when using datetime.timedelta

Convert mm-yyyy to datetime datatype in Python

I am trying to convert a datetime datatype of the form 24/12/2021 07:24:00 to mm-yyyy format which is 12-2021 with datetime datatype. I need the mm-yyyy in datetime format in order to sort the column 'Month-Year' in a time series. I have tried
import pandas as pd
from datetime import datetime
df = pd.read_excel('abc.xlsx')
df['Month-Year'] = df['Due Date'].map(lambda x: x.strftime('%m-%y'))
df.set_index(['ID', 'Month-Year'], inplace=True)
df.sort_index(inplace=True)
df
The column 'Month-Year' does not sort in time series because 'Month-Year' is of object datatype. How do I please convert 'Month-Year' column to datetime datatype?
I have been able to obtain a solution to the problem.
df['month_year'] = pd.to_datetime(df['Due Date']).dt.to_period('M')
I got this from the link below
https://www.interviewqs.com/ddi-code-snippets/extract-month-year-pandas
df['Month-Year']=pd.to_datetime(df['Month-Year']).dt.normalize()
will convert the Month-Year to datetime64[ns].
Use it before sorting.

How to convert datetime into GMT +7 in Python?

I have a dataframe that looks like that:
conversation__created_at
0 2020-10-15T03:39:42.766773+00:00
1 2020-10-14T11:24:33.831177+00:00
2 2020-10-14T08:29:44.192258+00:00
3 2020-10-14T01:42:06.674313+00:00
4 2020-10-13T12:57:04.218184+00:00
How to convert it into GMT +7?
I assume you have a pandas series because the data you posted looks like one.
Then you can use tz_convert, i.e.
import pandas as pd
pd.to_datetime('2020-10-15T03:39:42.766773+00:00').tz_convert('Etc/GMT+7')
As pointed out in the comments, since the datetime carries a T in it, it is of string format, thus we need to convert to datetime first and then convert to the correct timezone.
pd.to_datetime(series).dt.tz_convert('Etc/GMT+7')
You can use datetime library only.
from datetime import datetime, timedelta, timezone
d = datetime.fromisoformat("2020-10-15T03:39:42.766773+00:00")
tz = timezone(timedelta(hours=7))
new_time = d.astimezone(tz)
you can use pytz to set timezone for your datetime instance
for example:
from pytz import timezone
from datetime import datetime
date = datetime.now()
print(date)
tz = timezone("Etc/GMT+7")
date = date.replace(tzinfo=tz)
print(date)
out put:
2020-10-26 10:33:25.934699
2020-10-26 10:33:25.934699-07:00
You can apply pytz.timezone on the df
from pytz import timezone
from datetime import datetime
def myDate(x):
tz = timezone("Etc/GMT+7")
dt = x.replace(tzinfo=tz)
return dt
df['conversation__created_at'] = df.apply(lambda row: myDate(row['conversation__created_at'].to_pydatetime()))

I have a date column in a dataframe. I want to change the format of the dates,in that column

I have a date column in a dataset where the dates are like 'Apr-12','Jan-12' format. I would like to change the format to 04-2012,01-2012. I am looking for a function which can do this.
I think I know one guy with the same name. Jokes apart here is the solution to your problem.
We do have an inbuilt function named as strptime(), so it takes up the string and then convert into the format you want.
You need to import datetime first since it is the part of the datetime package of python. Don't no need to install anything, just import it.
Then this works like this: datetime.strptime(your_string, format_you_want)
# You can also do this, from datetime import * (this imports all the functions of datetime)
from datetime import datetime
str = 'Apr-12'
date_object = datetime.strptime(str, '%m-%Y')
print(date_object)
I hope this will work for you. Happy coding :)
You can do following:
import pandas as pd
df = pd.DataFrame({
'date': ['Apr-12', 'Jan-12', 'May-12', 'March-13', 'June-14']
})
pd.to_datetime(df['date'], format='%b-%y')
This will output:
0 2012-04-01
1 2012-01-01
2 2012-05-01
Name: date, dtype: datetime64[ns]
Which means you can update your date column right away:
df['date'] = pd.to_datetime(df['date'], format='%b-%y')
You can chain a couple of pandas methods together to get this the desired output:
df = pd.DataFrame({'date_fmt':['Apr-12','Jan-12']})
df
Input dataframe:
date_fmt
0 Apr-12
1 Jan-12
Use pd.to_datetime chained with .dt date accessor and strftime
pd.to_datetime(df['date_fmt'], format='%b-%y').dt.strftime('%m-%Y')
Output:
0 04-2012
1 01-2012
Name: date_fmt, dtype: object

change time in timestamp

I have some timestamps in python pandas, Timestamp('2000-02-09 00:00:00') and I would like to convert them to Timestamp('2000-02-09 13:00:00'). Just adding 13 hours wouldn't work as some of them have different time. Can you point to a solution to this problem?
Use replace method of pandas timestamp objects:
import pandas as pd
t = pd.Timestamp('2000-02-09 00:00:00')
t = t.replace(hour=13, minute=0, second=0)
pandas.Timestamp is a datetime subclass and therefore it has all its methods such as .replace():
>>> import pandas as pd
>>> from datetime import datetime
>>> issubclass(pd.Timestamp, datetime)
True
>>> isinstance(pd.Timestamp('2000-02-09 00:00:00'), datetime)
True
>>> pd.Timestamp('2000-02-09 00:00:00').replace(hour=13)
Timestamp('2000-02-09 13:00:00')

Categories