Suppose I have a function like this:
from toolz.curried import *
#curry
def foo(x, y):
print(x, y)
Then I can call:
foo(1,2)
foo(1)(2)
Both return the same as expected.
However, I would like to do something like this:
#curry.inverse # hypothetical
def bar(*args, last):
print(*args, last)
bar(1,2,3)(last)
The idea behind this is that I would like to pre-configure a function and then put it in a pipe like this:
pipe(data,
f1, # another function
bar(1,2,3) # unknown number of arguments
)
Then, bar(1,2,3)(data) would be called as a part of the pipe. However, I don't know how to do this. Any ideas? Thank you very much!
Edit:
A more illustrative example was asked for. Thus, here it comes:
import pandas as pd
from toolz.curried import *
df = pd.DataFrame(data)
def filter_columns(*args, df):
return df[[*args]]
pipe(df,
transformation_1,
transformation_2,
filter_columns("date", "temperature")
)
As you can see, the DataFrame is piped through the functions, and filter_columns is one of them. However, the function is pre-configured and returns a function that only takes a DataFrame, similar to a decorator. The same behaviour could be achieved with this:
def filter_columns(*args):
def f(df):
return df[[*args]]
return f
However, I would always have to run two calls then, e.g. filter_columns()(df), and that is what I would like to avoid.
well I am unfamiliar with toolz module, but it looks like there is no easy way of curry a function with arbitrary number of arguments, so lets try something else.
First as a alternative to
def filter_columns(*args):
def f(df):
return df[*args]
return f
(and by the way, df[*args] is a syntax error )
to avoid filter_columns()(data) you can just grab the last element in args and use the slice notation to grab everything else, for example
def filter_columns(*argv):
df, columns = argv[-1], argv[:-1]
return df[columns]
And use as filter_columns(df), filter_columns("date", "temperature", df), etc.
And then use functools.partial to construct your new, well partially applied, filter to build your pipe like for example
from functools import partial
from toolz.curried import pipe # always be explicit with your import, the last thing you want is import something you don't want to, that overwrite something else you use
pipe(df,
transformation_1,
transformation_2,
partial(filter_columns, "date", "temperature")
)
Related
In Python we can assign a function to a variable. For example, the math.sine function:
sin = math.sin
rad = math.radians
print sin(rad(my_number_in_degrees))
Is there any easy way of assigning multiple functions (ie, a function of a function) to a variable? For example:
sin = math.sin(math.radians) # I cannot use this with brackets
print sin (my_number_in_degrees)
Just create a wrapper function:
def sin_rad(degrees):
return math.sin(math.radians(degrees))
Call your wrapper function as normal:
print sin_rad(my_number_in_degrees)
I think what the author wants is some form of functional chaining. In general, this is difficult, but may be possible for functions that
take a single argument,
return a single value,
the return values for the previous function in the list is of the same type as that of the input type of the next function is the list
Let us say that there is a list of functions that we need to chain, off of which take a single argument, and return a single argument. Also, the types are consistent. Something like this ...
functions = [np.sin, np.cos, np.abs]
Would it be possible to write a general function that chains all of these together? Well, we can use reduce although, Guido doesn't particularly like the map, reduce implementations and was about to take them out ...
Something like this ...
>>> reduce(lambda m, n: n(m), functions, 3)
0.99005908575986534
Now how do we create a function that does this? Well, just create a function that takes a value and returns a function:
import numpy as np
def chainFunctions(functions):
def innerFunction(y):
return reduce(lambda m, n: n(m), functions, y)
return innerFunction
if __name__ == '__main__':
functions = [np.sin, np.cos, np.abs]
ch = chainFunctions( functions )
print ch(3)
You could write a helper function to perform the function composition for you and use it to create the kind of variable you want. Some nice features are that it can combine a variable number of functions together that each accept a variable number of arguments.
import math
try:
reduce
except NameError: # Python 3
from functools import reduce
def compose(*funcs):
""" Compose a group of functions (f(g(h(...)))) into a single composite func. """
return reduce(lambda f, g: lambda *args, **kwargs: f(g(*args, **kwargs)), funcs)
sindeg = compose(math.sin, math.radians)
print(sindeg(90)) # -> 1.0
I am trying to create a function which resamples time series data in pandas. I would like to have the option to specify the type of aggregation that occurs depending on what type of data I am sending through (i.e. for some data, taking the sum of each bin is appropriate, while for others, taking the mean is needed, etc.). For example data like these:
import pandas as pd
import numpy as np
dr = pd.date_range('01-01-2020', '01-03-2020', freq='1H')
df = pd.DataFrame(np.random.rand(len(dr)), index=dr)
I could have a function like this:
def process(df, freq='3H', method='sum'):
r = df.resample(freq)
if method == 'sum':
r = r.sum()
elif method == 'mean':
r = r.mean()
#...
#more options
#...
return r
For a small amount of aggregation methods, this is fine, but seems like it could be tedious if I wanted to select from all of the possible choices.
I was hoping to use getattr to implement something like this post (under "Putting it to work: generalizing method calls"). However, I can't find a way to do this:
def process2(df, freq='3H', method='sum'):
r = df.resample(freq)
foo = getattr(r, method)
return r.foo()
#fails with:
#AttributeError: 'DatetimeIndexResampler' object has no attribute 'foo'
def process3(df, freq='3H', method='sum'):
r = df.resample(freq)
foo = getattr(r, method)
return foo(r)
#fails with:
#TypeError: __init__() missing 1 required positional argument: 'obj'
I get why process2 fails (calling r.foo() looks for the method foo() of r, not the variable foo). But I don't think I get why process3 fails.
I know another approach would be to pass functions to the parameter method, and then apply those functions on r. My inclination is that this would be less efficient? And it still doesn't allow me to access the built-in Resample methods directly.
Is there a working, more concise way to achieve this? Thanks!
Try .resample().apply(method)
But unless you are planning some more computation inside the function, it will probably be easier to just hard-code this line.
I have a function, which calculate features from my data.
Here is a dummy sample of it
import numpy as np
val1=[1,2,3,4,5,6,7,8,9]
val2=[2,4,6,8,10,12,14,16]
data=[]
def feature_cal(val):
val=np.array(val)
value=val*2
data.append(np.mean(value))
feature_cal(val1)
feature_cal(val2)
What i want is to define the function np.mean() out of my function feature_cal.
Pseudo code
def feature_cal(val,method):
val=np.array(val)
value=val*2
data.append(method(value))
feature_cal(val1,method=np.mean())
feature_cal(val2,method=np.mean())
This will help me to calculate other features such as np.std(), np.var() without changing the original function
To pass the function you need to remove the parentheses after np.mean:
import numpy as np
def feature_cal(val, method):
val = np.array(val)
value = val*2
data.append(method(value))
feature_cal(val1, method=np.mean)
feature_cal(val2, method=np.mean)
EDIT
If you need to pass arguments to np.mean you can use functools.partial:
import numpy as np
import functools
def feature_cal(val, method):
val = np.array(val)
value = val*2
data.append(method(value))
bound_function = functools.partial(np.mean, axis=1)
feature_cal(val1, method=bound_function)
feature_cal(val2, method=bound_function)
If I got you correctly you need to pass callable and not result of function invocation as you do now. So this line
feature_cal(val1,method=np.mean())
Shouls read
feature_cal(val1,method=np.mean)
You can simply insert a method as a parameter into a function by entering the name of the method (without parentheses) and by reading the function you will call(with parentheses) the inserted parameter
def feature_cal(val,method):
val=np.array(val)
value=val*2
data.append(method(value))
feature_cal(val1,method=np.mean)
I know this is super basic and I have been searching everywhere but I am still very confused by everything I'm seeing and am not sure the best way to do this and am having a hard time wrapping my head around it.
I have a script where I have multiple functions. I would like the first function to pass it's output to the second, then the second pass it's output to the third, etc. Each does it's own step in an overall process to the starting dataset.
For example, very simplified with bad names but this is to just get the basic structure:
#!/usr/bin/python
# script called process.py
import sys
infile = sys.argv[1]
def function_one():
do things
return function_one_output
def function_two():
take output from function_one, and do more things
return function_two_output
def function_three():
take output from function_two, do more things
return/print function_three_output
I want this to run as one script and print the output/write to new file or whatever which I know how to do. Just am unclear on how to pass the intermediate outputs of each function to the next etc.
infile -> function_one -> (intermediate1) -> function_two -> (intermediate2) -> function_three -> final result/outfile
I know I need to use return, but I am unsure how to call this at the end to get my final output
Individually?
function_one(infile)
function_two()
function_three()
or within each other?
function_three(function_two(function_one(infile)))
or within the actual function?
def function_one():
do things
return function_one_output
def function_two():
input_for_this_function = function_one()
# etc etc etc
Thank you friends, I am over complicating this and need a very simple way to understand it.
You could define a data streaming helper function
from functools import reduce
def flow(seed, *funcs):
return reduce(lambda arg, func: func(arg), funcs, seed)
flow(infile, function_one, function_two, function_three)
#for example
flow('HELLO', str.lower, str.capitalize, str.swapcase)
#returns 'hELLO'
edit
I would now suggest that a more "pythonic" way to implement the flow function above is:
def flow(seed, *funcs):
for func in funcs:
seed = func(seed)
return seed;
As ZdaR mentioned, you can run each function and store the result in a variable then pass it to the next function.
def function_one(file):
do things on file
return function_one_output
def function_two(myData):
doThings on myData
return function_two_output
def function_three(moreData):
doMoreThings on moreData
return/print function_three_output
def Main():
firstData = function_one(infile)
secondData = function_two(firstData)
function_three(secondData)
This is assuming your function_three would write to a file or doesn't need to return anything. Another method, if these three functions will always run together, is to call them inside function_three. For example...
def function_three(file):
firstStep = function_one(file)
secondStep = function_two(firstStep)
doThings on secondStep
return/print to file
Then all you have to do is call function_three in your main and pass it the file.
For safety, readability and debugging ease, I would temporarily store the results of each function.
def function_one():
do things
return function_one_output
def function_two(function_one_output):
take function_one_output and do more things
return function_two_output
def function_three(function_two_output):
take function_two_output and do more things
return/print function_three_output
result_one = function_one()
result_two = function_two(result_one)
result_three = function_three(result_two)
The added benefit here is that you can then check that each function is correct. If the end result isn't what you expected, just print the results you're getting or perform some other check to verify them. (also if you're running on the interpreter they will stay in namespace after the script ends for you to interactively test them)
result_one = function_one()
print result_one
result_two = function_two(result_one)
print result_two
result_three = function_three(result_two)
print result_three
Note: I used multiple result variables, but as PM 2Ring notes in a comment you could just reuse the name result over and over. That'd be particularly helpful if the results would be large variables.
It's always better (for readability, testability and maintainability) to keep your function as decoupled as possible, and to write them so the output only depends on the input whenever possible.
So in your case, the best way is to write each function independently, ie:
def function_one(arg):
do_something()
return function_one_result
def function_two(arg):
do_something_else()
return function_two_result
def function_three(arg):
do_yet_something_else()
return function_three_result
Once you're there, you can of course directly chain the calls:
result = function_three(function_two(function_one(arg)))
but you can also use intermediate variables and try/except blocks if needed for logging / debugging / error handling etc:
r1 = function_one(arg)
logger.debug("function_one returned %s", r1)
try:
r2 = function_two(r1)
except SomePossibleExceptio as e:
logger.exception("function_two raised %s for %s", e, r1)
# either return, re-reraise, ask the user what to do etc
return 42 # when in doubt, always return 42 !
else:
r3 = function_three(r2)
print "Yay ! result is %s" % r3
As an extra bonus, you can now reuse these three functions anywhere, each on it's own and in any order.
NB : of course there ARE cases where it just makes sense to call a function from another function... Like, if you end up writing:
result = function_three(function_two(function_one(arg)))
everywhere in your code AND it's not an accidental repetition, it might be time to wrap the whole in a single function:
def call_them_all(arg):
return function_three(function_two(function_one(arg)))
Note that in this case it might be better to decompose the calls, as you'll find out when you'll have to debug it...
I'd do it this way:
def function_one(x):
# do things
output = x ** 1
return output
def function_two(x):
output = x ** 2
return output
def function_three(x):
output = x ** 3
return output
Note that I have modified the functions to accept a single argument, x, and added a basic operation to each.
This has the advantage that each function is independent of the others (loosely coupled) which allows them to be reused in other ways. In the example above, function_two() returns the square of its argument, and function_three() the cube of its argument. Each can be called independently from elsewhere in your code, without being entangled in some hardcoded call chain such as you would have if called one function from another.
You can still call them like this:
>>> x = function_one(3)
>>> x
3
>>> x = function_two(x)
>>> x
9
>>> x = function_three(x)
>>> x
729
which lends itself to error checking, as others have pointed out.
Or like this:
>>> function_three(function_two(function_one(2)))
64
if you are sure that it's safe to do so.
And if you ever wanted to calculate the square or cube of a number, you can call function_two() or function_three() directly (but, of course, you would name the functions appropriately).
With d6tflow you can easily chain together complex data flows and execute them. You can quickly load input and output data for each task. It makes your workflow very clear and intuitive.
import d6tlflow
class Function_one(d6tflow.tasks.TaskCache):
function_one_output = do_things()
self.save(function_one_output) # instead of return
#d6tflow.requires(Function_one)
def Function_two(d6tflow.tasks.TaskCache):
output_from_function_one = self.inputLoad() # load function input
function_two_output = do_more_things()
self.save(function_two_output)
#d6tflow.requires(Function_two)
def Function_three():
output_from_function_two = self.inputLoad()
function_three_output = do_more_things()
self.save(function_three_output)
d6tflow.run(Function_three()) # executes all functions
function_one_output = Function_one().outputLoad() # get function output
function_three_output = Function_three().outputLoad()
It has many more useful features like parameter management, persistence, intelligent workflow management. See https://d6tflow.readthedocs.io/en/latest/
This way function_three(function_two(function_one(infile))) would be the best, you do not need global variables and each function is completely independent of the other.
Edited to add:
I would also say that function3 should not print anything, if you want to print the results returned use:
print function_three(function_two(function_one(infile)))
or something like:
output = function_three(function_two(function_one(infile)))
print output
Use parameters to pass the values:
def function1():
foo = do_stuff()
return function2(foo)
def function2(foo):
bar = do_more_stuff(foo)
return function3(bar)
def function3(bar):
baz = do_even_more_stuff(bar)
return baz
def main():
thing = function1()
print thing
I have a Pandas dataframe and want to do different things with it. Now my function has this structure:
def process_dataframe(df, save_to_file, print_to_screen, etc):
...
if save_to_file:
df.to_csv(filename)
elif print_to_screen:
print df
elif...
Which is an ugly if else case. I want to use a functional instead. A function pointer. Something like this. I create several functions:
def save_to_file(df, filename):
return create_function(to_csv, filename???)
def print_to_screen(df):
return create_function(print)
Which means I can change the structure of my function to this single line instead:
result = process_dataframe(save_to_file)
...
...
def process_dataframe(df, my_functional):
return my_functional(df)
The problem is that I dont understand the syntax. For instance, how to return the class member function ".to_csv" in "save_to_file()"? How does "save_to_file()" look like? Which args does it take?
Of course, I could use a lambda instead of defining each function. But I want to understand how to define functions first. The next step with lambdas, I can figure out myself.
I'd make sure this is actually what you want to do, but assuming it is, you can just write a function that calls functions (and passes through arguments), like this:
def process_df(df, function, *args, **kwargs):
function(df, *args, **kwargs)
And define your two actions.
def print_to_screen(df):
print df
def save_to_file(df, filename):
df.to_csv(filename)
Then you can use these as you like:
In [193]: df = pd.DataFrame([[1,2,3],[2,4,5]], columns=['a','b','c'])
In [197]: process_df(df, print_to_screen)
a b c
0 1 2 3
1 2 4 5
In [198]: process_df(df, save_to_file, 'temp.csv')
#writes temp.csv
The problem is that I dont understand the syntax. For instance, how to
return the class member function ".to_csv" in "save_to_file()?"
I think what you are asking is this :
def save_to_file(filename):
def df_to_csv(df):
return df.to_csv(filename)
return df_to_csv
And the call:
foo = save_to_file('myfile.csv')
foo(df) # <- here "df" will be saved to "myfile.csv"
You could also do this (which I believe is something you originally wanted):
def save_to_file(df, filename):
def df_to_csv():
return df.to_csv(filename)
return df_to_csv
And then call it like so:
foo = save_to_file(df, 'myfile.csv')
foo() # <- "df" is saved to "myfile.csv"
But to me this seems not much less ugly than the first solution, so you might want to rethink your approach.