Python: writing files in functional programming style - python

How are we supposed to write files in Python while staying functionally pure? Normally I would do something like this
from typing import Iterable
from io import IOBase
def transform_input(input_lines: Iterable[str]) -> Iterable[str]: ...
def print_pack(input_lines: Iterable[str], output: IOBase) -> None:
for line in input_lines:
print(line, file=output)
def main(*args, **kwargs):
# Somehow we get a bunch iterables with strings and a list of output streams
packs_of_input = ... # Iterable[Iterable[str]]
output_streams = ... # Iterable[IOBase]
packs_to_print = map(transform_input, packs_of_input)
for pack, output_stream in zip(packs_to_print, output_streams):
print_pack(pack, output_stream)
We can replace the for-loop with something like this
list(map(lambda pack_stream: print_pack(*pack_stream), zip(packs_to_print, output_streams))
but it would only make it look like the printing is done functionally. The problem is that print_pack is not a pure function, that is all its effort results in a side-effect and it returns nothing.
How are we supposed to write files and remain functionally-pure (or almost pure)?

Essentially, in Python, you need to have an impure function somewhere, so there's no way to have 100% pure functions in this application. In the end you need to do some IO, and IO is impure.
However, what you can do is try to represent a particular layer of abstraction within your application as pure functions, and isolate the part that does the actual side-effects in another module. You can do this pretty easily in an ad-hoc way -- for example, by accumulating the contents of the file you want to write as a pure immutable data structure in your main code. Then your side-effecting code can be reduced in size, since all it needs to do is dump a string to a file.
We can look to Haskell for a more rigorous way to purely represent the full power of side-effecting operations with pure functions and data structures -- using the Monad abstraction. Essentially, a Monad is something that you can bind callbacks to, to create a chain of effectful computations all based on pure functions. For the IO monad, the Haskell runtime takes care of actually performing the side-effects once you return an IO value from the main function -- so all the code you write is technically pure functions, and the runtime takes care of the IO.
The Effect library (disclaimer: I wrote it) basically implements a certain flavor of Monad (or something very close to a monad) in Python. This lets you represent arbitrary IO (and other side-effects) as pure objects and functions, and put the actual performance of those effects off to the side. So your application code can be 100% pure, as long as you have a kind of library of relatively simple side-effecting functions.
So, for example, to implement a function that writes a list of lines to a file with Effects, you'd do something like this:
#do
def write_lines_to_file(lines, filename):
file_handle = yield open_file(filename)
for line in lines:
yield write_data(file_handle, line)
# alternatively:
# from effect.fold import sequence; from functools import partial
# yield sequence(map(partial(write_data, file_handle), lines))
yield close_file(file_handle)
The Effect library provides this special do decorator that lets you use an imperative-looking syntax to describe a pure effectful operation. The above is function is equivalent to this one:
def write_lines_to_file(lines, filename):
file_handle_eff = open_file(filename).on(
lambda file_handle:
sequence(map(partial(write_data, file_handle), lines)).on(
lambda _: close_file(file_handle)))
These both assume that three functions exist: open_file, write_data, and close_file. These functions are assumed to return Effect objects that represent the intent to perform those actions. In the end, an Effect is essentially an intent (some transparent description of an action that is requested), and one or more callbacks to run when the result of that action has been completed. The interesting distinction is that write_lines_to_file does not actually write lines to a file; it simply returns some representation of the intent to write some lines to a file.
To actually perform this effect, you need to use the sync_perform function, like sync_perform(dispatcher, write_lines_to_file(lines, filename)). This is an impure function that actually runs the performers for all the effects that your pure representation of an effectful computation use.
I could go into the details of how open_file, write_data, and close_file would need to be implemented, and the details of what the "dispatcher" argument are, but really the documentation at https://effect.readthedocs.org/ is probably the right thing to reference at this point.
I also gave a talk at Strange Loop about Effect and its implementation, which you can watch on YouTube: https://www.youtube.com/watch?v=D37dc9EoFus
It's worth noting that Effect is a pretty heavy-handed way to keep code purely functional. You can get a long way toward maintainable code by taking a "functional core/imperative shell" approach and trying your best to write most of your code as pure functions and minimizing the effectful code. But if you're interested in a more rigorous approach, I think Effect is good. My team uses it in production and it has helped out a lot, especially with its testing API.

Related

How to remove function implementation from a python file?

I am designing python assignments for a class. I define functions, write docstrings and then I implement them. Afterward, I'd like to remove all my implementations of the functions and replace only the code (not the doc-strings, function names, and arguments) with a raise NotImplementedError.
Is there any tool (e.g. IDE) which removes all the code for me automatically, so that I don't have to replace the implemented function by myself? I was thinking about writing a small script, but I thought I might ask here before I do this ...
If anyone has written something similar or knows of a quick way how to this, I would appreciate this a lot.
Here's a minimal example of what I'd like to achieve:
test.py
def add(a,b):
"""
Adds two numbers
"""
return a+b
def multiply(a,b):
"""
Multiplies two numbers
"""
return a*b
should become in an automated fashion (and of course for much larger files):
test.py
def add(a,b):
"""
Adds two numbers
"""
raise NotImplementedEror
def multiply(a,b):
"""
Multiplies two numbers
"""
raise NotImplementedEror
I don't know of a tool to do specifically this, but Python provides great AST manipulation tools within its own standard library via the ast module. You'll need a third party module to "unparse" the result after transformation back into regular Python code, and after a quick search I found this one seems to do the trick, although there do seem to be many others.
Here's a bit of sample code to get you in the right direction. Obviously, you'll need to tweak this to get the exact behavior you want, especially if you want to provide classes instead of just top-level functions (as I've written nothing to handle that use case). But with a bit of Python knowledge, it should be possible to automate.
Also, this is Python 3 (which, as of the start of 2020, is the only supported Python version). If you're still on Python 2, it may require some modifications.
import ast
import astunparse
# Read our file using the built-in Python AST module.
with open('filename.py') as f:
data = ast.parse(f.read(), 'filename.py')
# Loop through all declarations.
for decl in data.body:
# Only modify functions
if isinstance(decl, ast.FunctionDef):
# The docstring is the first statement of the body. So we don't
# want to change it. Instead, replace the rest of the body with
# our pre-built "raise" call. Note that I figured out what "raise"
# looked like in AST form by running
#
# ast.dump(ast.parse("raise NotImplementedError()"))
#
decl.body[1:] = [ast.Raise(ast.Call(ast.Name('NotImplementedError'), [], []), None)]
# Use astunparse to pretty print the result as Python code.
print(astunparse.unparse(data))
It's definitely possible to automate, if you're willing to take the time to do it. If you're planning to do this for several assignments, or even over several semesters, you may consider making a script for it. But if you're just doing it once, it may be more worth your time to just do it by hand.

Is there a built-in "apply" function (like "lambda f: f()") as there used to be in Python 2?

Looking at this question, I realised that it is kind of awkward to use multiprocessing's Pool.map if you want is to run a list of functions in parallel:
from multiprocessing import Pool
def my_fun1(): return 1
def my_fun2(): return 2
def my_fun3(): return 3
with Pool(3) as p:
one, two, three = p.map(lambda f: f(), [my_fun1, my_fun2, my_fun3])
I'm not saying it is exactly cryptic, but I think I expected some conventional name for this, even if only within functools or something, similarly to apply/call in JavaScript (yes, I know JavaScript didn't have lambdas at the time those functions were defined, and no, I'm not saying JavaScript is an exemplary programming language, just an example). In fact, I definitely think something like this should be present in operator, but (unless my eyes deceive me) it seems to be absent. I read that in the case of the identity function the resolution was to let people define their own trivial functions, and I understand it better in that case because there are a couple of different variations you may want, but this one feels like a missing bit to me.
EDIT: As pointed out in the comments, Python 2 used to have an apply function for this purpose.
First, let's look at the practical question.
For any Python from 2.3 on, you can trivially write not just your no-argument apply, but a perfect-forwarding apply, as a one-liner, as explained in the 2.x docs for apply:
The use of apply() is equivalent to function(*args, **keywords)
In other words:
def apply(function, *args, **keywords):
return function(*args, **keywords)
… or, as an inline lambda:
lambda f, *a, **k: f(*a, **kw)
Of course the C implementation was a bit faster, but this is almost never relevant.1
If you're going to be using this more than once, I think defining the function out-of-line and reusing it by name is probably clearer, but the lamdba version is simple and obvious enough (even more so for your no-args use case) that I can't imagine anyone complaining about it.
Also, notice that this is actually more trivial than identity if you understand what you're doing, not less. With identity, it's ambiguous what you should return with multiple arguments (or keyword arguments), so you have to decide which behavior you want; with apple, there's only one obvious answer, and it's pretty much impossible to get wrong.
As for the history:
Python, like JavaScript, originally had no lambda. It's hard to dig up linkable docs for versions before 2.6, and hard to even find them before 2.3, but I think lambda was added in 1.5, and eventually reached the point where it could be used for perfect forwarding around 2.2. Before then, the docs recommended using apply for forwarding, but after that, the docs recommended using lambda in place of apply. In fact, there was no longer any recommended use of apply.
So in 2.3, the function was deprecated.2
During the Python-3000 discussions that led to 3.0, Guido suggested that all of the "functional programming" functions except maybe map and filter were unnecessary.3 Others made good cases for reduce and partial.4 But a big part of the case was that they're actually not trivial to write (in fully-general form), and easy to get wrong. That isn't true for apply. Also, people were able to find relevant uses of reduce and partial in real-world codebases, but the only uses of apply anyone could find were old pre-2.3 code. In fact, it was so rare that it wasn't even worth making the 2to3 tool transform calls to apply.
The final rationale for removing it was summarized in PEP 3100:
apply(): use f(*args, **kw) instead [2]
That footnote links to an essay by Guido called "Python Regrets", which is now a 404 link. The accompanying PowerPoint presentation is still available, however, or you can view an HTML flipbook of the presentation he wrote it for. But all it really says is the same one-liner, and IIRC, the only further discussion was "We already effectively got rid of it in 2.3."
1. In most idiomatic Python code that has to apply a function, the work inside that function is pretty heavy. In your case, of course, the overhead of calling the functions (pickling arguments and passing them over a pipe) is even heavier. The one case where it would matter is when you're doing "Haskell-style functional programming" instead of "Lisp-style"—that is, very few function definitions, and lots of functions made by transforming functions and composing the results. But that's already so slow (and stack-heavy) in Python that it's not a reasonable thing to do. (Flat use of decorators to apply a wrapper or three works great, but a potentially unbounded chain of wrappers will kill your performance.)
2. The formal deprecation mechanism didn't exist yet, so it was just moved to a "Non-essential Built-in Functions" section in the docs. But it was retroactively considered to be deprecated since 2.3, as you can see in the 2.7 docs.
3. Guido originally wanted to get rid of even them; the argument was that list comprehensions can do the same job better, as you can see in the "Regrets" flipbook. But promoting itertools.imap in place of map means it could be made lazy, like the new zip, and therefore better than comprehensions. I'm not sure why Guido didn't just make the same argument with generator expressions.
4. I'm not sure Guido himself was ever convinced for reduce, but the core devs as a whole were.
It sort of is in operator if you do one line of extra work:
>>> def foo():
... print 'hi'
...
>>> from operator import methodcaller
>>> call = methodcaller('__call__')
>>> call(foo)
hi
Of course, call = lambda f: f() is only one line as well...

python style: inline function that needs no inlining?

I'mm writing gtk code. I often have short callbacks that don't need to be closures, as they are passed all the parameters they need. For example, I have this in a loop when creating some gtk.TreeViewColumns:
def widthChanged(MAINCOL, SPEC, SUBCOL, expandable):
if expandable: return
w = MAINCOL.get_width()
SUBCOL.set_fixed_width(w)
cl.connect("notify::width", widthChanged, pnlcl, expand)
This is probably inefficient, since the function is being created on every iteration of the loop (side-question: is it actually, or is it optimized?). However, I feel like if I moved all these one-liners to the top level, the code would be more confusing. Any opinions?
Go with whatever style is most readable. Don't worry about speed unless your code profiling tools have told you that the area is a hotspot.

Have well-defined, narrowly-focused classes ... now how do I get anything done in my program?

I'm coding a poker hand evaluator as my first programming project. I've made it through three classes, each of which accomplishes its narrowly-defined task very well:
HandRange = a string-like object (e.g. "AA"). getHands() returns a list of tuples for each specific hand within the string:
[(Ad,Ac),(Ad,Ah),(Ad,As),(Ac,Ah),(Ac,As),(Ah,As)]
Translation = a dictionary that maps the return list from getHands to values that are useful for a given evaluator (yes, this can probably be refactored into another class).
{'As':52, 'Ad':51, ...}
Evaluator = takes a list from HandRange (as translated by Translator), enumerates all possible hand matchups and provides win % for each.
My question: what should my "domain" class for using all these classes look like, given that I may want to connect to it via either a shell UI or a GUI? Right now, it looks like an assembly line process:
user_input = HandRange()
x = Translation.translateList(user_input)
y = Evaluator.getEquities(x)
This smells funny in that it feels like it's procedural when I ought to be using OO.
In a more general way: if I've spent so much time ensuring that my classes are well defined, narrowly focused, orthogonal, whatever ... how do I actually manage work flow in my program when I need to use all of them in a row?
Thanks,
Mike
Don't make a fetish of object orientation -- Python supports multiple paradigms, after all! Think of your user-defined types, AKA classes, as building blocks that gradually give you a "language" that's closer to your domain rather than to general purpose language / library primitives.
At some point you'll want to code "verbs" (actions) that use your building blocks to perform something (under command from whatever interface you'll supply -- command line, RPC, web, GUI, ...) -- and those may be module-level functions as well as methods within some encompassing class. You'll surely want a class if you need multiple instances, and most likely also if the actions involve updating "state" (instance variables of a class being much nicer than globals) or if inheritance and/or polomorphism come into play; but, there is no a priori reason to prefer classes to functions otherwise.
If you find yourself writing static methods, yearning for a singleton (or Borg) design pattern, writing a class with no state (just methods) -- these are all "code smells" that should prompt you to check whether you really need a class for that subset of your code, or rather whether you may be overcomplicating things and should use a module with functions for that part of your code. (Sometimes after due consideration you'll unearth some different reason for preferring a class, and that's allright too, but the point is, don't just pick a class over a module w/functions "by reflex", without critically thinking about it!).
You could create a Poker class that ties these all together and intialize all of that stuff in the __init__() method:
class Poker(object):
def __init__(self, user_input=HandRange()):
self.user_input = user_input
self.translation = Translation.translateList(user_input)
self.evaluator = Evaluator.getEquities(x)
# and so on...
p = Poker()
# etc, etc...

How do you turn an unquoted Python function/lambda into AST? 2.6

This seems like it should be easy, but I can't find the answer anywhere - nor able to derive one myself. How do you turn an unquoted python function/lambda into an AST?
Here is what I'd like to be able to do.
import ast
class Walker(ast.NodeVisitor):
pass
# ...
# note, this doesnt work as ast.parse wants a string
tree = ast.parse(lambda x,y: x+y)
Walker().visit(tree)
In general, you can't. For example, 2 + 2 is an expression -- but if you pass it to any function or method, the argument being passed is just the number 4, no way to recover what expression it was computed from. Function source code can sometimes be recovered (though not for a lambda), but "an unquoted Python expression" gets evaluated so what you get is just the object that's the expression's value.
What problem are you trying to solve? There may be other, viable approaches.
Edit: tx to the OP for clarifying. There's no way to do it for lambda or some other corner cases, but as I mention function source code can sometimes be recovered...:
import ast
import inspect
def f():
return 23
tree = ast.parse(inspect.getsource(f))
print ast.dump(tree)
inspect.getsource raises IOError if it can't get the source code for whatever object you're passing it. I suggest you wrap the parsing and getsource call into an auxiliary function that can accept a string (and just parses it) OR a function (and tries getsource on it, possibly giving better errors in the IOError case).
If you only get access to the function/lambda you only have the compiled python bytecode. The exact Python AST can't be reconstructed from the bytecode because there is information loss in the compilation process. But you can analyze the bytecode and create AST's for that. There is one such analyzer in GeniuSQL. I also have a small proof of concept that analyzes bytecode and creates SQLAlchemy clauseelements from this.
The process I used for analyzing is the following:
Split the code into a list of opcodes with potential arguments.
Find the basic blocks in the code by going through the opcodes and for every jump create a basic block boundary after the jump and before the jump target
Create a control flow graph from the basic blocks.
Go through all the basic blocks with abstract interpretation tracking stack and variable assignments in SSA form.
To create the output expression just get the calculated SSA return value.
I have pasted my proof of concept and example code using it. This is non-clean quickly hacked together code, but you're free to build on it if you like. Leave a note if you decide to make something useful from it.
The Meta library allows you to recover the source in many cases, with some exceptions such as comprehensions and lambdas.
import meta, ast
source = '''
a = 1
b = 2
c = (a ** b)
'''
mod = ast.parse(source, '<nofile>', 'exec')
code = compile(mod, '<nofile>', 'exec')
mod2 = meta.decompile(code)
source2 = meta.dump_python_source(mod2)
assert source == source2
You can't generate AST from compiled bytecode. You need the source code.
Your lambda expression is a function, which has lots of information, but I don't think it still has source code associated with. I'm not sure you can get what you want.

Categories