python style: inline function that needs no inlining? - python

I'mm writing gtk code. I often have short callbacks that don't need to be closures, as they are passed all the parameters they need. For example, I have this in a loop when creating some gtk.TreeViewColumns:
def widthChanged(MAINCOL, SPEC, SUBCOL, expandable):
if expandable: return
w = MAINCOL.get_width()
SUBCOL.set_fixed_width(w)
cl.connect("notify::width", widthChanged, pnlcl, expand)
This is probably inefficient, since the function is being created on every iteration of the loop (side-question: is it actually, or is it optimized?). However, I feel like if I moved all these one-liners to the top level, the code would be more confusing. Any opinions?

Go with whatever style is most readable. Don't worry about speed unless your code profiling tools have told you that the area is a hotspot.

Related

Python: writing files in functional programming style

How are we supposed to write files in Python while staying functionally pure? Normally I would do something like this
from typing import Iterable
from io import IOBase
def transform_input(input_lines: Iterable[str]) -> Iterable[str]: ...
def print_pack(input_lines: Iterable[str], output: IOBase) -> None:
for line in input_lines:
print(line, file=output)
def main(*args, **kwargs):
# Somehow we get a bunch iterables with strings and a list of output streams
packs_of_input = ... # Iterable[Iterable[str]]
output_streams = ... # Iterable[IOBase]
packs_to_print = map(transform_input, packs_of_input)
for pack, output_stream in zip(packs_to_print, output_streams):
print_pack(pack, output_stream)
We can replace the for-loop with something like this
list(map(lambda pack_stream: print_pack(*pack_stream), zip(packs_to_print, output_streams))
but it would only make it look like the printing is done functionally. The problem is that print_pack is not a pure function, that is all its effort results in a side-effect and it returns nothing.
How are we supposed to write files and remain functionally-pure (or almost pure)?
Essentially, in Python, you need to have an impure function somewhere, so there's no way to have 100% pure functions in this application. In the end you need to do some IO, and IO is impure.
However, what you can do is try to represent a particular layer of abstraction within your application as pure functions, and isolate the part that does the actual side-effects in another module. You can do this pretty easily in an ad-hoc way -- for example, by accumulating the contents of the file you want to write as a pure immutable data structure in your main code. Then your side-effecting code can be reduced in size, since all it needs to do is dump a string to a file.
We can look to Haskell for a more rigorous way to purely represent the full power of side-effecting operations with pure functions and data structures -- using the Monad abstraction. Essentially, a Monad is something that you can bind callbacks to, to create a chain of effectful computations all based on pure functions. For the IO monad, the Haskell runtime takes care of actually performing the side-effects once you return an IO value from the main function -- so all the code you write is technically pure functions, and the runtime takes care of the IO.
The Effect library (disclaimer: I wrote it) basically implements a certain flavor of Monad (or something very close to a monad) in Python. This lets you represent arbitrary IO (and other side-effects) as pure objects and functions, and put the actual performance of those effects off to the side. So your application code can be 100% pure, as long as you have a kind of library of relatively simple side-effecting functions.
So, for example, to implement a function that writes a list of lines to a file with Effects, you'd do something like this:
#do
def write_lines_to_file(lines, filename):
file_handle = yield open_file(filename)
for line in lines:
yield write_data(file_handle, line)
# alternatively:
# from effect.fold import sequence; from functools import partial
# yield sequence(map(partial(write_data, file_handle), lines))
yield close_file(file_handle)
The Effect library provides this special do decorator that lets you use an imperative-looking syntax to describe a pure effectful operation. The above is function is equivalent to this one:
def write_lines_to_file(lines, filename):
file_handle_eff = open_file(filename).on(
lambda file_handle:
sequence(map(partial(write_data, file_handle), lines)).on(
lambda _: close_file(file_handle)))
These both assume that three functions exist: open_file, write_data, and close_file. These functions are assumed to return Effect objects that represent the intent to perform those actions. In the end, an Effect is essentially an intent (some transparent description of an action that is requested), and one or more callbacks to run when the result of that action has been completed. The interesting distinction is that write_lines_to_file does not actually write lines to a file; it simply returns some representation of the intent to write some lines to a file.
To actually perform this effect, you need to use the sync_perform function, like sync_perform(dispatcher, write_lines_to_file(lines, filename)). This is an impure function that actually runs the performers for all the effects that your pure representation of an effectful computation use.
I could go into the details of how open_file, write_data, and close_file would need to be implemented, and the details of what the "dispatcher" argument are, but really the documentation at https://effect.readthedocs.org/ is probably the right thing to reference at this point.
I also gave a talk at Strange Loop about Effect and its implementation, which you can watch on YouTube: https://www.youtube.com/watch?v=D37dc9EoFus
It's worth noting that Effect is a pretty heavy-handed way to keep code purely functional. You can get a long way toward maintainable code by taking a "functional core/imperative shell" approach and trying your best to write most of your code as pure functions and minimizing the effectful code. But if you're interested in a more rigorous approach, I think Effect is good. My team uses it in production and it has helped out a lot, especially with its testing API.

When is it required to define a separate function in Python

In my code, I am printing a menu
print(num_chars * char)
for option in options:
print("{:d}. {:s}".format(option, options[option]))
print(num_chars * char)
The code print(num_chars * char) prints a separator/delimiter in order to "beautify" the output. I have learned from several coding tutorials that I am not allowed to write the same code more than once.
Is it really preferable to define a function
def get_char_repeated(char='*', num_chars=30):
"""
Return the character repeated an arbitrary number of times
"""
return num_chars * char
and call this two times in my original code?
Are there any alternatives if I need to print nice looking menu from a dictionary?
Thank you.
I have learned from several coding tutorials that I am not allowed to write the same code more than once.
This principle, called "don't repeat yourself" (DRY) is a good rough guideline. For every programmer who writes too many functions (splitting code into too small units), there are 20 who write too few.
Don't go overboard with it, though. The reasoning behind DRY is to make reading and changing the code later on easier. print(num_chars * char) is pretty basic already, and super-easy to understand and change, so it doesn't really pay off to factor it into a function.
If the repeated code grows to 3 lines, you can (and probably should) factor it out then.
It's not necessary at that level. What might be helpful is if you were often doing that whole block of code, you could easily change it to
def printOptions(options, char='*', num_chars=30):
print(num_chars * char)
for option in options:
print("{:d}. {:s}".format(option, options[option]))
print(num_chars * char)
The main point of functions is to save time with blocks of code you use a lot in a very similar way by not retyping/copy pasting them. But they also save time when you make changes. If you used the same function in 10 different places you still only need to change it once for all 10 uses to be updated rather than having to find all 10 manually and update them.
So if you decided you wanted to put a title header into this menu printing section and you had used it as this or a similar function in a bunch of places, you could quite easily update them all without difficulty.
I find it a good rule, that if the code you're turning into a function is used more than once and takes up more than 3 lines, it is a candidate to be turned into a function. If it is a very complex single line of code (like this x = [ i**j for i in z for j in y ]) and is used more than twice, it could be a candidate to turn into a function.
It may be a matter of preference where you draw the line but the basic idea is if it makes your code easier to read or easier to write, turning something into a function can be a good idea. If it makes your code harder to read (because every time you see the function you have to look back at the specifics of what it does), you probably should not have turned that code into a function.

Recipe for anonymous functions in python?

I'm looking for the best recipie to allow inline definition of functions, or multi-line lambda, in python.
For example, I'd like to do the following:
def callfunc(func):
func("Hello")
>>> callfunc(define('x', '''
... print x, "World!"
... '''))
Hello World!
I've found an example for the define function in this answer:
def define(arglist, body):
g = {}
exec("def anonfunc({0}):\n{1}".format(
arglist,
"\n".join(" {0}".format(line) for line in body.splitlines())), g)
return g["anonfunc"]
This is one possible solution, but it is not ideal. Desireable features would be:
be smarter about indentation,
hide the innards better (e.g. don't have anonfunc in the function's scope)
provide access to variables in the surrounding scope / captures
better error handling
and some things I haven't thought of. I had a really nice implementation once that did most of the above, but I lost in unfortunately. I'm wondering if someone else has made something similar.
Disclaimer:
I'm well aware this is controversial among Python users, and regarded as a hack or unpythonic. I'm also aware of the discussions regaring multi-line-lambdas on the python-dev mailing list, and that a similar feature was omitted on purpose. However, from the same discussions I've learned that there is also interest in such a function by many others.
I'm not asking whether this is a good idea or not, but instead: Given that one has decided to implement this, (either out of fun and curiosity, madness, genuinely thinking this is a nice idea, or being held at gunpoint) how to make anonymous define work as close as possible to def using python's (2.7 or 3.x) current facilities?
Examples:
A bit more as to why, this can be really handy for callbacks in GUIs:
# gtk example:
self.ntimes = 0
button.connect('clicked', define('*a', '''
self.ntimes += 1
label.set_text("Button has been clicked %d times" % self.ntimes)
''')
The benefit over defining a function with def is that your code is in a more logical order. This is simplified code taken from a Twisted application:
# twisted example:
def sayHello(self):
d = self.callRemote(HelloCommand)
def handle_response(response):
# do something, this happens after (x)!
pass
d.addCallback(handle_response) # (x)
Note how it seems out of order. I usually break stuff like this up, to keep the code order == execution order:
def sayHello_d(self):
d = self.callRemote(HelloCommand)
d.addCallback(self._sayHello_2)
return d
def _sayHello_2(self, response):
# handle response
pass
This is better wrt. ordering but more verbose. Now, with the anonymous functions trick:
d = self.callRemote(HelloCommand)
d.addCallback(define('response', '''
print "callback"
print "got response from", response["name"]
'''))
If you come from a javascript or ruby background, python's abilities to deal with anonymous functions may indeed seem limited, but this is for a reason. Python designers decided that clarity of code is more important than conciseness. If you don't like that, you probably don't like python at all. There's nothing wrong about that, there are many other choices - why not to try a language that tastes better to you?
Putting chunks of code into strings and interpreting them on the fly is definitely a wrong way to "extend" a language, just because none of tools you're working with - from syntax highlighters to the python interpreter itself - would be able to deal with "stringified" code in a sensible way.
To answer the question as asked: what you're doing there is essentially an attempt to construct some better-than-python programming language and compile it to python on the fly. The idea is not new in the world of scripting languages and can be productive or not (CoffeeScript is an example of a successful implementation), but your very approach is wrong. format() not the tool you're looking for when working with code. If you're writing a compiler, do it properly: use a parser (e.g. pyparsing) to read your code in an AST, walk through the AST to generate python code (or even bytecode), catch syntax errors as you go and take measures to provide better runtime feedback (e.g. error context, line numbers etc). Finally, make sure your compiler works across different python versions and implementations.
Or just use ruby.

Unexpected performance loss when calling Cython function within Python script?

So I have a time-critical section of code within a Python script, and I decided to write a Cython module (with one function -- all I need) to replace it. Unfortunately, the execution speed of the function I'm calling from the Cython module (which I'm calling within my Python script) isn't nearly as fast as I tested it to be in a variety of other scenarios. Note that I CANNOT share the code itself because of contract law! See the following cases, and take them as an initial description of my issue:
(1) Execute Cython function by using the Python interpreter to import the module and run the function. Runs relatively quickly (~0.04 sec on ~100 separate tests, versus original ~0.24 secs).
(2) Call Cython function within Python script at 'global' level (i.e. not inside any function). Same speed as case (1).
(3) Call Cython function within Python script, with Cython function inside my Python script's main function; tested with the Cython function in global and local namespaces, all with the same speed as case (1).
(4) Same as (3), but inside a simple for-loop within said Python function. Same speed as case (1).
(5) problem! Same as (4), but inside yet another for-loop: Cython function's execution time (whether called globally or locally) balloons to ~10 times that of the other cases, and this is where I need the function to get called. Nothing odd to report about this loop, and I tested all of the components of this loop (adjusting/removing what I could). I also tried using a 'while' loop for giggles, to no avail.
"One thing I've yet to try is making this inner-most loop a function and going from there." EDIT: Just tried this- no luck.
Thanks for any suggestions you have- I deeply regret not being able to share my code...it hurts my soul a little, but my client just can't have this code floating around. Let me know if there is any other information that I can provide!
-The Real Problem and an Initial (ugly) Solution-
It turns out that the best hint in this scenario was the obvious one (as usual): it wasn't the for-loop that was causing the problem; why would it? After a few more tests, it became obvious that something about the way I was calling my Cython function was wrong, because I could call it elsewhere (using an input variable different from the one going to the 'real' Cython function) without the performance loss issue.
The underlying issue: data types. I wrote my Cython function to expect a list full of standard floats. Unfortunately, my code did this:
function_input = list(numpy_array_containing_npfloat64_data) # yuck.
type(function_input[0]) = numpy.float64
output = Cython_Function(function_input)
inside the Cython function:
def Cython_Function(list function_input):
cdef many_vars
"""process lots of vars expecting C floats""" # Slowness from converting numpy.float64's --> floats???
type(output) = list
return output
I'm aware that I can play around more with types in the Cython function, which I very well may do to prevent having to 'list' an existing numpy array. Anyway, here is my current solution:
function_input = [float(x) for x in function_input]
I welcome any feedback and suggestions for improvement. The function_input numpy array doesn't really need the precision of numpy.float64, but it does get used a few times before getting passed to my Cython function.
It could be that, while individually, each function call with the Cython implementation is faster than its corresponding Python function, there is more overhead in the Cython function call because it has to look up the name in the module namespace. You can try assigning the function to a local callable first, for example:
from module import function
def main():
my_func = functon
for i in sequence:
my_func()
If possible, you should try to include the loops within the Cython function, which would reduce the overhead of a Python loop to the (very minimal) overhead of a compiled C loop. I understand that it might not be possible (i.e. need references from a global/larger scope), but it's worth some investigation on your part. Good luck!
function_input = list(numpy_array_containing_npfloat64_data)
def Cython_Function(list function_input):
cdef many_vars
I think the problem is in using the numpy array as a list ... can't you use the np.ndarray as input to the Cython function?
def Cython_Function(np.ndarray[dtype=np.float64] input):
....

Is it bad style to reassign long variables as a local abbreviation?

I prefer to use long identifiers to keep my code semantically clear, but in the case of repeated references to the same identifier, I'd like for it to "get out of the way" in the current scope. Take this example in Python:
def define_many_mappings_1(self):
self.define_bidirectional_parameter_mapping("status", "current_status")
self.define_bidirectional_parameter_mapping("id", "unique_id")
self.define_bidirectional_parameter_mapping("location", "coordinates")
#etc...
Let's assume that I really want to stick with this long method name, and that these arguments are always going to be hard-coded.
Implementation 1 feels wrong because most of each line is taken up with a repetition of characters. The lines are also rather long in general, and will exceed 80 characters easily when nested inside of a class definition and/or a try/except block, resulting in ugly line wrapping. Let's try using a for loop:
def define_many_mappings_2(self):
mappings = [("status", "current_status"),
("id", "unique_id"),
("location", "coordinates")]
for mapping in mappings:
self.define_parameter_mapping(*mapping)
I'm going to lump together all similar iterative techniques under the umbrella of Implementation 2, which has the improvement of separating the "unique" arguments from the "repeated" method name. However, I dislike that this has the effect of placing the arguments before the method they're being passed into, which is confusing. I would prefer to retain the "verb followed by direct object" syntax.
I've found myself using the following as a compromise:
def define_many_mappings_3(self):
d = self.define_bidirectional_parameter_mapping
d("status", "current_status")
d("id", "unique_id")
d("location", "coordinates")
In Implementation 3, the long method is aliased by an extremely short "abbreviation" variable. I like this approach because it is immediately recognizable as a set of repeated method calls on first glance while having less redundant characters and much shorter lines. The drawback is the usage of an extremely short and semantically unclear identifier "d".
What is the most readable solution? Is the usage of an "abbreviation variable" acceptable if it is explicitly assigned from an unabbreviated version in the local scope?
itertools to the rescue again! Try using starmap - here's a simple demo:
list(itertools.starmap(min,[(1,2),(2,2),(3,2)]))
prints
[1,2,2]
starmap is a generator, so to actually invoke the methods, you have to consume the generator with a list.
import itertools
def define_many_mappings_4(self):
list(itertools.starmap(
self.define_parameter_mapping,
[
("status", "current_status"),
("id", "unique_id"),
("location", "coordinates"),
] ))
Normally I'm not a fan of using a dummy list construction to invoke a sequence of functions, but this arrangement seems to address most of your concerns.
If define_parameter_mapping returns None, then you can replace list with any, and then all of the function calls will get made, and you won't have to construct that dummy list.
I would go with Implementation 2, but it is a close call.
I think #2 and #3 are equally readable. Imagine if you had 100s of mappings... Either way, I cannot tell what the code at the bottom is doing without scrolling to the top. In #2 you are giving a name to the data; in #3, you are giving a name to the function. It's basically a wash.
Changing the data is also a wash, since either way you just add one line in the same pattern as what is already there.
The difference comes if you want to change what you are doing to the data. For example, say you decide to add a debug message for each mapping you define. With #2, you add a statement to the loop, and it is still easy to read. With #3, you have to create a lambda or something. Nothing wrong with lambdas -- I love Lisp as much as anybody -- but I think I would still find #2 easier to read and modify.
But it is a close call, and your taste might be different.
I think #3 is not bad although I might pick a slightly longer identifier than d, but often this type of thing becomes data driven, so then you would find yourself using a variation of #2 where you are looping over the result of a database query or something from a config file
There's no right answer, so you'll get opinions on all sides here, but I would by far prefer to see #2 in any code I was responsible for maintaining.
#1 is verbose, repetitive, and difficult to change (e.g. say you need to call two methods on each pair or add logging -- then you must change every line). But this is often how code evolves, and it is a fairly familiar and harmless pattern.
#3 suffers the same problem as #1, but is slightly more concise at the cost of requiring what is basically a macro and thus new and slightly unfamiliar terms.
#2 is simple and clear. It lays out your mappings in data form, and then iterates them using basic language constructs. To add new mappings, you only need add a line to the array. You might end up loading your mappings from an external file or URL down the line, and that would be an easy change. To change what is done with them, you only need change the body of your for loop (which itself could be made into a separate function if the need arose).
Your complaint of #2 of "object before verb" doesn't bother me at all. In scanning that function, I would basically first assume the verb does what it's supposed to do and focus on the object, which is now clear and immediately visible and maintainable. Only if there were problems would I look at the verb, and it would be immediately evident what it is doing.

Categories