I have a column with datetime and integer values..However i want to convert even the integer values to datetikme as well.
StartTime EndTime Source
2170-01-01 00:00:00 2170-01-01 00:00:00 NA
1.60405e+18 1.60405e+18 Arm_dearm
I tried using
pd.to_datetime(site_list['StartTime'])
but it yields the same result.
And Ultimately i am not able to run the following line.
np.where(site_list["StartTime"].dt.date!=site_list["EndTime"].dt.date,"armed next day",site_list["Source"])
which throws the following error:
mixed datetimes and integers in passed array
The issue as the error states is that there are mixed types in the columns so it will not convert it.
I'm going to assume it is a UNIX timestamp.
Check this answer and see if maybe this is the time format you are dealing with.
We can get around this by doing something like this:
import datetime
def convert(time):
return datetime.datetime.fromtimestamp(time)
site_list["StartTime"] = site_list["StartTime"].apply(lambda x: convert(x) if isinstance(x, int) else x)
This will check each value in the StartTime column, then check if it is a integer or not. If it is then it will convert it to the datetime type. If is not a integer it will leave it alone.
Related
I am working with a date column in this form:
Date
1871.01
1871.02
...
1871.10
1871.11
So to convert the column to a datetimeindex, I use:
df["Date"].apply(lambda x: datetime.strptime(str(x), "%Y.%m"))
however the column is converted to:
Date
1871-01-01
1871-02-01
...
1871-01-01
1871-11-01
Does anyone have any idea of what causes this, where all "10"s convert to "01"s? Is there a better way to do this given my inputs are floats?
If the first format is a float, the 1871.10 and 1871.1 are exactly the same numbers. So the string of it will have the second value (the shortest one). But then it would seems it is January (month 1).
So you should stringify forcing two digits:
df["Date"].apply(lambda x: datetime.strptime("{:.2f}" % x, "%Y.%m"))
Note: the first format is very bad. The true solution is to correct it from beginning (e.g. when you read the input file you must tell the read function that the column is a date, not a float.
Good Afternoon,
I have a huge dataset where time informations are stored as a float64 (or integer) in one column of the dataframe in format 'ddmmyyyy' (ex. 20 January 2020 would be the float 20012020.0). I need to convert it into a datetime like 'dd-mm-yyyy'. I saw the function to_datetime, but i can't really manage to obtain what i want. Does someone know how to do it?
Massimo
You could try converting to string and after that, to date format, you want like this:
# The first step is to change the type of the column,
# in order to get rid of the .0 we will change first to int and then to
# string
df["date"] = df["date"].astype(int)
df["date"] = df["date"].astype(str)
for i, row in df.iterrows():
# In case that the date format is similar to 20012020
x = str(df["date"].iloc[i])
if len(x) == 8:
df.at[i,'date'] = "{}-{}-{}".format(x[:2], x[2:4], x[4:])
# In case that the format is similar to 1012020
else:
df.at[i,'date'] = "0{}-{}-{}".format(x[0], x[1:3], x[3:])
Edit:
As you said this solution only works if the month always comes in 2
digits.
Added missing variable in the loop
Added change column types before entering the loop.
Let me know if this helps!
I have the following datatable, which I would like to filter by dates greater than "2019-01-01". The problem is that the dates are strings.
dt_dates = dt.Frame({"days_date": ['2019-01-01','2019-01-02','2019-01-03']})
This is my best attempt.
dt_dates[f.days_date > datetime.strptime(f.days_date, "2019-01-01")]
this returns the error
TypeError: strptime() argument 1 must be str, not Expr
what is the best way to filter dates in python's datatable?
Reference
python datatable
f-expressions
Your datetime syntax is incorrect, for converting a string to a datetime.
What you're looking for is:
dt_dates[f.days_date > datetime.strptime(f.days_date, "%Y-%m-%d")]
Where the 2nd arguement for strptime is the date format.
However, lets take a step back, because this isn't the right way to do it.
First, we should convert all your dates in your Frame to a datetime. I'll be honest, I've never used a datatable, but the syntax looks extremely similar to panda's Dataframe.
In a dataframe, we can do the following:
df_date = df_date['days_date'].apply(lambda x: datetime.strptime(x, '%Y-%m'%d))
This goes through each row where the column is 'dates_date" and converts each string into a datetime.
From there, we can use a filter to get the relevant rows:
df_date = df_date[df_date['days_date'] > datetime.strptime("2019-01-01", "%Y-%m-%d")]
datatable version 1.0.0 introduced native support for date an time data types. Note the difference between these two ways to initialize data:
dt_dates = dt.Frame({"days_date": ['2019-01-01','2019-01-02','2019-01-03']})
dt_dates.stypes
> (stype.str32,)
and
dt_dates = dt.Frame({"days_date": ['2019-01-01','2019-01-02','2019-01-03']}, stype="date32")
dt_dates.stypes
> (stype.date32,)
The latter frame contains days_date column of type datatable.Type.date32 that represents a calendar date. Then one can filter by date as follows:
split_date = datetime.datetime.strptime("2019-01-01", "%Y-%m-%d")
dt_split_date = dt.time.ymd(split_date.year, split_date.month, split_date.day)
dt_dates[dt.f.days_date > dt_split_date, :]
I am running into an issue where the Pandas to_datetime function results in a Unix timestamp instead of a datetime object for certain rows. The date format in rows that do convert to datetime and rows that convert to Unix timestamp as int appear to be identical. When the problem occurs it seems to affect all the dates in the row.
For example, :
2019-01-02T10:12:28.64Z (stored as str) ends up as 1546424003423000000
While
2019-09-17T11:28:49.35Z (stored as str) converts to a datetime object.
Another date in the same row is 2019-01-02T10:13:23.423Z (stored as str) which is converting to a timestamp as well.
There isn't much code to look at, the conversion happens on a single line:
full_df.loc[mask, 'transaction_changed_datetime'] = pd.to_datetime(full_df['SaleItemChangedOn']) and
full_df.loc[pd.isnull(full_df['completed_date']), 'completed_date'] = pd.to_datetime(full_df['SaleCompletedOn']
I've tried with errors='coerce' on as well but the result is the same. I can deal with this problem later in the code, but I would really like to understand why this is happening.
Edit
As requested, this is the MRE to reproduces the issue on my computer. Some notes on this:
The mask is somehow involved. If I remove the mask it converts fine.
If I only pass in the first row in the Dataframe (single row Dataframe) it converts fine.
import pandas as pd
from pandas import NaT, Timestamp
debug_dict = {'SaleItemChangedOn': ['2019-01-02T10:12:28.64Z', '2019-01-02T10:12:28.627Z'],
'transaction_changed_datetime': [NaT, Timestamp('2019-01-02 11:58:47.900000+0000', tz='UTC')]}
df = pd.DataFrame(debug_dict)
mask = (pd.isnull(df['transaction_changed_datetime']))
df.loc[mask, 'transaction_changed_datetime'] = pd.to_datetime(df['SaleItemChangedOn'])```
When I try the examples you mention:
import numpy as np
import pandas as pd
df = pd.DataFrame({'a':['2019-01-02T10:12:28.64Z', '2019-09-17T11:28:49.35Z', np.nan]})
pd.to_datetime(df['a'])
There doesn't seem to be any issue:
Out[74]:
0 2019-01-02 10:12:28.640000+00:00
1 2019-09-17 11:28:49.350000+00:00
2 NaT
Name: a, dtype: datetime64[ns, UTC]
Could you provide an MRE?
You might want to check if you have more than one column with the same name which is being sent to pd.to_datetime. It solved the datetime being converted to timestamp problem for me.
This appears to have been a bug in Panda that has been fixed with the release of V1.0. The example code above now produces the expected results.
i want to convert integer type date to datetime.
ex) i : 20130601000011( 2013-6-1 00:00: 11 )
i don't know exactly how to use pd.to_datetime
please any advice
thanks
ps. my script is below
rent_date_raw = pd.Series(1, rent['RENT_DATE'])
return_date_raw = pd.Series(1, rent['RETURN_DATE'])
rent_date = pd.Series([pd.to_datetime(date)
for date in rent_date_raw])
daily_rent_ts = rent_date.resample('D', how='count')
monthly_rent_ts = rent_date.resample('M', how='count')
Pandas seems to deal with your format fine as long as you convert to string first:
import pandas as pd
eg_date = 20130601000011
pd.to_datetime(str(eg_date))
Out[4]: Timestamp('2013-06-01 00:00:11')
Your data at the moment is really more of a string than an integer, since it doesn't really represent a single number. Different subparts of the string reflect different aspects of the time.