Sadly, the answer of this question for datetime.date do not work for datetime.time.
So I implemented an df.apply() function, which is doing what I expect:
def get_ts_timeonly_float(timeonly):
if isinstance(timeonly, datetime.time):
return timeonly.hour * 3600 + timeonly.minute * 60 + timeonly.second
elif isinstance(timeonly, pd.Timedelta):
return timeonly.seconds
fn_get_ts_timeonly_pd_timestamp = lambda row: get_ts_timeonly_float(row.ts_timeonly)
col = df.apply(fn_get_ts_timeonly_pd_timestamp, axis=1)
df = df.assign(ts_timeonly_as_ts=col.values)
Problem:
However, this is not yet “blazingly fast.” One reason is that .apply()
will try internally to loop over Cython iterators. But in this case,
the lambda that you passed isn’t something that can be handled in
Cython, so it’s called in Python, which is consequently not all that
fast.
This is a great blog post
So is there a faster method to convert datetime.time into some int representation (like total_seconds till start of day)? Thanks!
Related
I've recently started to use vaex for its great potentialities on large set of data.
I'm trying to apply the following function:
def get_columns(v: str, table_columns: List, pref: str = '', suff: str = '') -> List:
return [table_columns.index(i) for i in table_columns if (pref + v + suff) in i][0]
to a df as follows:
df["column_day"] = df.apply(get_columns, arguments=[df.part_day, table.columns.tolist(), "total_4wk_"])
but I get the error when I run df["column_day"]:
NameError: Column or variable 'total_4wk_' does not exist.
I do not understand what I am doing wrong, since other functions (with only one argument) I used with apply worked fine.
Thanks.
I believe vaex expects the arguments passed to apply to actually be expressions.
In your case table.columns.tolist() and "total_4wk_" are not expressions so it complains. So I would re-write your get_columns function such that it only takes in expressions as arguments, and I believe that will work.
I'm new to python and pandas but I have a problem I cannot wrap my head around.
I'm trying to add a new column to my DataFrame. To achieve that I use the assign() function.
Most of the examples on the internet are painfully trivial and I cannot find a solution for my problem.
What works:
my_dataset.assign(new_col=lambda x: my_custom_long_function(x['long_column']))
def my_custom_long_function(input)
return input * 2
What doesn't work:
my_dataset.assign(new_col=lambda x: my_custom_string_function(x['string_column'])
def my_custom_string_function(input)
return input.upper()
What confuses me is that in the debug I can see that even for my_custom_long_function the parameter is a Series, not a long.
I just want to use the lambda function and pass a value of the column to do my already written complicated functions. How do I do this?
Edit: The example here is just for demonstrative purpose, the real code is basically an existing complex function that does not care about panda's types and needs a str as a parameter.
Because the column doesn't have a upper method, in order to use it, you need to do str.upper:
my_dataset.assign(new_col=lambda x: my_custom_string_function(x['string_column'])
def my_custom_string_function(input)
return input.str.upper()
That said, I would use:
my_dataset['new column'] = my_dataset['string_column'].str.upper()
For efficiency.
Edit:
my_dataset['new column'] = my_dataset['string_column'].apply(lambda x: my_custom_string_function(x))
def my_custom_string_function(input):
return input.upper()
I am trying to rewrite a python script into matlab and I don't really understand the last line here:
a = 1
v = 0.5
nx = 32
x = sp.linspace(-2.,2.,nx)
dx = (max(x)-min(x))/nx
dt = a*dx.min( )/abs(v)
I am struggling with the dt definition. In the code a, dx and v are real numbers. Why is there a .min and why is the bracket empty?
I am sorry for my ignorance, but I am really new to python.
the code for min() means to output the min value of that object, otherwise you'll just get it's type (function float64.min.)
because dx is a scalar taking it's minimal value is meaningless, it has only one value to begin with.
the conversion you seek should be:
dt = a*min(dx)./abs(v)
This python code is invalid. It should not run because ints do not have function attributes.
Result: AttributeError: 'int' object has no attribute 'min'
I have this datetime 2011-07-02 03:03:32.793
To deal with the millisecond issue with python 2.5 version(mentioned here), I try to truncate it & convert it date time as:
import_transform: 'lambda x: x[:18]'
import_transform: transform.import_date_time('%Y-%m-%d %H:%M:%S')
How can I write these two import_transform in one line?
You can do this by writing a lambda function that does both together:
import-transform: lambda x: transform.import_date_time('%Y-%m-%d %H:%M:%S')(x[:18])
Note the chained function calls - transform.import_date_time returns a function, which you're then calling.
I am having a hard time coming up with a slick way to handle this sort. I have data coming back from a database read. I want to sort on the accoutingdate. However, accoutingdate may sometimes be null. I am currently doing the following:
results = sorted(results, key=operator.itemgetter('accountingdate'), reverse=True)
But, this bombs with "TypeError: can't compare datetime.date to NoneType" due to some accoutingdates being null.
What is the "most correct" or "most Pythonic" way to handle this?
Using a key= function is definitely right, you just have to decide how you want to treat the None values -- pick a datetime value that you want to treat as the equivalent of None for sorting purposes. E.g.:
import datetime
mindate = datetime.date(datetime.MINYEAR, 1, 1)
def getaccountingdate(x):
return x['accountingdate'] or mindate
results = sorted(results, key=getaccountingdate, reverse=True)
Just see how much simpler this is than defining a cmp function instead -- and if you do some benchmarking you'll find it's also significantly faster! There's no upside at all in using a cmp function instead of this key function, and it would be a bad design choice to do so.
You could use a custom sorting function that treats None specially:
def nonecmp(a, b):
if a is None and b is None:
return 0
if a is None:
return -1
if b is None:
return 1
return cmp(a, b)
results = sorted(results, cmp=nonecmp, ...)
This treats None as being smaller than all datetime objects.