In an Excel cell I call function fff(DATE(2001,1,1)). To receive the date argument, I use (xlwings 0.10.0) the following code:
#xw.func
#xw.arg('req_date', dates=datetime.date) # I also tried datetime.datetime
def fff(req_date):
print req_date
which prints just a number, not datetime object. I solved the problem by calling
req_date = datetime.datetime(1899, 12, 30)
+ datetime.timedelta(days=req_date)
but I wonder what did I do wrong with the xlwings way?!
xlwings only performs the automatic transformation into a datetime object if the cell in Excel is formatted as a Date. That is, if you put =DATE(2001,1,1) into one cell and then in a different cell write =fff('A1'), it will work as you expect (assuming that the date formula is in A1).
Related
Is it possible to write date values using pywin32 to Excel without the time? Even though the datetime object I create has no time nor UTC associated to it, when writing the value to Excel it still adds an hour component which is related to UTC. How can I solve this simple problem?
import win32com.client
from datetime import datetime
excel = win32com.client.Dispatch('Excel.Application')
excel.Visible = True
wb = excel.Workbooks.Add()
ws = wb.Sheets['Sheet1']
# Writes '01/01/2019 03:00:00' instead of '01/01/2019'
ws.Cells(1, 1).Value = datetime(2019, 1, 1)
If you just want the date with no time of day, you can call datatime.date() to get it. Unfortunately the value must be converted to a string because the win32com.client won't accept a datetime.date object directly.
# Writes '1/1/2019'
ws.Cells(1, 1).Value = str(datetime(2019, 1, 1).date())
Update:
You can workaround the cell having a text entry by assigning an Excel formula to the cell instead. Doing this will allow you to use the cell more easily in conjunction with other formulas and its other capabilities (such as sorting, charting, etc).
# Writes 1/1/2019
ws.Cells(1, 1).Formula = datetime(2019, 1, 1).strftime('=DATE(%Y,%m,%d)')
By using an API, I retrieved a Tibble (an R object) in Python (using rpy2.objects), that is a very large 2-dimensional table. It contains a column with dates in the format "YYYY-MM-DD" when I print the Tibble object. When I grab the Date in Python (simply by indexing the Tibble) it is converted to a 5 digit float. For example, the date "2019-09-28" is converted to the float 18167.0. I'm not sure how to convert it back to a string date (e.g. "YYYY-MM-DD").
Does anyone have any ideas? I'm happy to clarify anything that I can :)
Edit: The answer I discovered with help was the following
import pandas as pd
pd.to_datetime(18167.0,unit='d',origin='1970-01-01')
If the Date class got converted to numeric storage mode, we can use as.Date with origin
as.Date(18167, origin = "1970-01-01")
#[1] "2019-09-28"
The Date storage mode is numeric
storage.mode(Sys.Date())
#[1] "double"
In python, we can also do
from datetime import datetime, date, time
date.fromordinal(int(18167) + date(1970, 1, 1).toordinal()).strftime("%Y-%m-%d")
#'2019-09-28'
I have the following datatable, which I would like to filter by dates greater than "2019-01-01". The problem is that the dates are strings.
dt_dates = dt.Frame({"days_date": ['2019-01-01','2019-01-02','2019-01-03']})
This is my best attempt.
dt_dates[f.days_date > datetime.strptime(f.days_date, "2019-01-01")]
this returns the error
TypeError: strptime() argument 1 must be str, not Expr
what is the best way to filter dates in python's datatable?
Reference
python datatable
f-expressions
Your datetime syntax is incorrect, for converting a string to a datetime.
What you're looking for is:
dt_dates[f.days_date > datetime.strptime(f.days_date, "%Y-%m-%d")]
Where the 2nd arguement for strptime is the date format.
However, lets take a step back, because this isn't the right way to do it.
First, we should convert all your dates in your Frame to a datetime. I'll be honest, I've never used a datatable, but the syntax looks extremely similar to panda's Dataframe.
In a dataframe, we can do the following:
df_date = df_date['days_date'].apply(lambda x: datetime.strptime(x, '%Y-%m'%d))
This goes through each row where the column is 'dates_date" and converts each string into a datetime.
From there, we can use a filter to get the relevant rows:
df_date = df_date[df_date['days_date'] > datetime.strptime("2019-01-01", "%Y-%m-%d")]
datatable version 1.0.0 introduced native support for date an time data types. Note the difference between these two ways to initialize data:
dt_dates = dt.Frame({"days_date": ['2019-01-01','2019-01-02','2019-01-03']})
dt_dates.stypes
> (stype.str32,)
and
dt_dates = dt.Frame({"days_date": ['2019-01-01','2019-01-02','2019-01-03']}, stype="date32")
dt_dates.stypes
> (stype.date32,)
The latter frame contains days_date column of type datatable.Type.date32 that represents a calendar date. Then one can filter by date as follows:
split_date = datetime.datetime.strptime("2019-01-01", "%Y-%m-%d")
dt_split_date = dt.time.ymd(split_date.year, split_date.month, split_date.day)
dt_dates[dt.f.days_date > dt_split_date, :]
The default format of csv is dd/mm/yyyy. When I convert it to datetime by df['Date']=pd.to_datetime(df['Date']), it change the format to mm//dd/yyyy.
Then, I used df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%d/%m/%Y')
to convert to dd/mm/yyyy, But, they are in the string (object) format. However, I need to change them to datetime format. When I use again this (df['Date']=pd.to_datetime(df['Date'])), it gets back to the previous format. Need your help
You can use the parse_dates and dayfirst arguments of pd.read_csv, see: the docs for read_csv()
df = pd.read_csv('myfile.csv', parse_dates=['Date'], dayfirst=True)
This will read the Date column as datetime values, correctly taking the first part of the date input as the day. Note that in general you will want your dates to be stored as datetime objects.
Then, if you need to output the dates as a string you can call dt.strftime():
df['Date'].dt.strftime('%d/%m/%Y')
When I use again this: df['Date'] = pd.to_datetime(df['Date']), it gets back to the previous format.
No, you cannot simultaneously have the string format of your choice and keep your series of type datetime. As remarked here:
datetime series are stored internally as integers. Any
human-readable date representation is just that, a representation,
not the underlying integer. To access your custom formatting, you can
use methods available in Pandas. You can even store such a text
representation in a pd.Series variable:
formatted_dates = df['datetime'].dt.strftime('%m/%d/%Y')
The dtype of formatted_dates will be object, which indicates
that the elements of your series point to arbitrary Python times. In
this case, those arbitrary types happen to be all strings.
Lastly, I strongly recommend you do not convert a datetime series
to strings until the very last step in your workflow. This is because
as soon as you do so, you will no longer be able to use efficient,
vectorised operations on such a series.
This solution will work for all cases where a column has mixed date formats. Add more conditions to the function if needed. Pandas to_datetime() function was not working for me, but this seems to work well.
import date
def format(val):
a = pd.to_datetime(val, errors='coerce', cache=False).strftime('%m/%d/%Y')
try:
date_time_obj = datetime.datetime.strptime(a, '%d/%m/%Y')
except:
date_time_obj = datetime.datetime.strptime(a, '%m/%d/%Y')
return date_time_obj.date()
Saving the changes to the same column.
df['Date'] = df['Date'].apply(lambda x: format(x))
Saving as CSV.
df.to_csv(f'{file_name}.csv', index=False, date_format='%s')
I was having trouble manipulating a time-series data provided to me for a project. The data contains the number of flight bookings made on a website per second in a duration of 30 minutes. Here is a part of the column containing the timestamp
>>> df['Date_time']
0 7/14/2017 2:14:14 PM
1 7/14/2017 2:14:37 PM
2 7/14/2017 2:14:38 PM
I wanted to do
>>> pd.set_index('Date_time')
and use the datetime and timedelta methods provided by pandas to generate the timestamp to be used as index to access and modify any value in any cell.
Something like
>>> td=datetime(year=2017,month=7,day=14,hour=2,minute=14,second=36)
>>> td1=dt.timedelta(minutes=1,seconds=58)
>>> ti1=td1+td
>>> df.at[ti1,'column_name']=65000
But the timestamp generated is of the form
>>> print(ti1)
2017-07-14 02:16:34
Which cannot be directly used as an index in my case as can be clearly seen. Is there a workaround for the above case without writing additional methods myself?
I want to do the above as it provides me greater level of control over the data than looking for the default numerical index for each row I want to update and hence will prove more efficient accordig to me
Can you check the dtype of the 'Date_time' column and confirm for me that it is string (object) ?
df.dtypes
If so, you should be able to cast the values to pd.Timestamp by using the following.
df['timestamp'] = df['Date_time'].apply(pd.Timestamp)
When we call .dtypes now, we should have a 'timestamp' field of type datetime64[ns], which allows us to use builtin pandas methods more easily.
I would suggest it is prudent to index the dataframe by the timestamp too, achieved by setting the index equal to that column.
df.set_index('timestamp', inplace=True)
We should now be able to use some more useful methods such as
df.loc[timestamp_to_check, :]
df.loc[start_time_stamp : end_timestamp, : ]
df.asof(timestamp_to_check)
to lookup values from the DataFrame based upon passing a datetime.datetime / pd.Timestamp / np.datetime64 into the above. Note that you will need to cast any string (object) 'lookups' to one of the above types in order to make use of the above correctly.
I prefer to use pd.Timestamp() - https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Timestamp.html to handle datetime conversion from strings unless I am explicitly certain of what format the datetime string is always going to be in.