Save lambda expressions to a file - python

I am implementing an Evolution Strategy algorithm in Python 3. I have created a class called Individual that reads the configuration from a file (YAML format) which looks like the following:
num_of_genes: 3
pre_init_gene:
gene1: 1.0
gene2: 1.0
gene3: 1.0
metrics:
- metric1
- metric2
obj_funcs:
obj_fun1: 'lambda x,y: x+y'
obj_fun2: 'lambda x: (x**3)/2'
The idea is that the individual would read this file to get its configuration.
I know I can save my lambda expression as a string and then call eval on it.
However, is there a more Pythonic solution for this problem? I do not feel much comfortable with OO in Python, but I am open for suggestions.

I would adhere to the Zen of
Python in particular
"Explicit is better than implicit" and "Readability counts".
So having your functions as readable strings defining lambda's is a
good idea, although, for security reasons, calling eval on the loaded string
representation of the lambda might not be. This
again depends on who has modification access to the file and on which
system they run.
In general you should not care too much if someone can
(non-inadvertedly) inject something resulting in recursive removal of
all files on a system, if they have login access rights with which
they can do so anyway. However if e.g. the software runs on a remote
system and these files can be edited via some web interface, or if the
file changes can be made by someone else than the person using the
files, this is something you should take into account.
If the lambdas come from a fixed set, you can just use
their string representation as a lookup:
lambdas = {}
for l in [
'lambda x,y: x+y',
'lambda x: (x**3)/2',
# some more
]:
lambdas[l] = eval(l)
You can then use the string loaded from your configuration YAML to get
the actual lambda and that string cannot be tampered with, as it has
to match the available set of lambdas you provided. You can of course
load the actual lambda strings from a file that only you can change,
instead of hard-coding them in the source code.
This is IMO more
explicit than dumping the actual lambda resulting in YAML looking like:
!!python/name:__main__.%3Clambda%3E
, something which requires unsafe loading of the YAML document anyway.
If you need to be more flexible, than using pre-defined lambdas, but
don't want the insecurity of using eval, then another possibility is
to use Python's AST
module. That module
allows for safe evaluation of unary and binary operators, but can be
extended to handle only those function (e.g. some mathematical
functions) that you want to allow in your lambda. I have done similar
extension in my Python Object Notation module
(PON) adding datetime and
dedenting
capabilities to the AST evaluated input.
Something else is that you should IMO improve on your YAML. Instead of
using gene1, gene2 as keys in mapping, use a sequence and tag the items:
pre_init_gene:
- !Gene 1.0
- !Gene 1.0
- !Gene 1.0
or, alternatively tag the sequence:
pre_init_gene: !Genes
- 1.0
- 1.0
- 1.0
Your lambdas have the same "problem" and I would do something like:
obj_funcs:
- !Lambda 'x, y: x+y'
- !Lambda 'x: (x**3)/2'
where the object implementing the from_yaml classmethod for the
tag !Lambda transparently does the eval or AST evaluation.

With cloudpickle you can dump a lambda to bytes. Then you need to convert bytes to str to be written to a file.
import cloudpickle
import base64
def lambda2str(expr):
b = cloudpickle.dumps(expr)
s = base64.b64encode(b).decode()
return s
def str2lambda(s):
b = base64.b64decode(s)
expr = cloudpickle.loads(b)
return expr
e = lambda x, y: x + y
s = lambda2str(e)
print(s) # => gASVNAEAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX2ZpbGxfZnVuY3Rpb26Uk5QoaACMD19tYWtlX3NrZWxfZnVuY5STlGgAjA1fYnVpbHRpbl90eXBllJOUjAhDb2RlVHlwZZSFlFKUKEsCSwBLAksCS0NDCHwAfAEXAFMAlE6FlCmMAXiUjAF5lIaUjCovVXNlcnMvYmxvd25oaXRoZXJtYS9wcm9qZWN0cy90ZXN0L3Rlc3QucHmUjAg8bGFtYmRhPpRLEUMAlCkpdJRSlEr/////fZSHlFKUf
# store s in file, read s from file
e2 = str2lambda(s)
print(e2(1, 1)) # => 2
Note that what base64 does is to avoid things like \n in the encoded string which will poison the file structure. decode() is simply converting bytes to str so that it can be written to a file.
This is not a concise representation but a safe one. If your working environment is safe, feel free to use your readable version!

Related

Is there a Python equivalent to Raku's dd (i.e. "tiny data dumper")?

I would like to debug my Python code by inspecting multiple variables, dumping out their names and contents, equivalent to Raku's dd (Raku was formerly known as "Perl 6"):
The closest I've found has been mentioned in another post, which compares Python's pprint to Perl 5's Data::Dumper. However, unlike dd, neither of those outputs the name of the variable. dd in Raku is closest to the show function from the Perl 5 module Data::Show, except show additionally outputs the filename and line number.
Here is a demo of Raku's dd in action:
#!/bin/env perl6
my %a = ( :A(1), :B(2) );
my %c = ( :C(3), :D(4) );
dd %a;
dd %c;
Which, when run, results in the following :
Hash %a = {:A(1), :B(2)}
Hash %c = {:C(3), :D(4)}
(By the way, a Hash in Perl or Raku is equivalent to a dictionary in Python)
And here is the closest I have yet been able to get in Python, but it redundantly requires both the name of the variable and the variable itself:
#!/usr/bin/env python
def tiny_dd(name,x):
print(name + ' is "' + str(x) + '"')
a = { 'A':1, 'B':2}
c = { 'C':3, 'D':4}
tiny_dd('a',a)
tiny_dd('c',c)
Which, when run, results in the following:
a is "{'A': 1, 'B': 2}"
c is "{'C': 3, 'D': 4}"
Repeating a name twice when printing a value often grates people who think of variables as having unique values. In Python, however, it is often very hard to find names which reference a particular value, which makes writing a printer like the one you're looking for pretty hard, since you'd need do some very expensive looking around the name space.
That said, PySnooper has done all this heavy lifting for you and can print out a great deal of information on how a program is running, which can be very useful for debugging.
Note that in Python 3.8 you get pretty much what you're looking for with the new = syntax for f strings, which works like this (copied from the release notes):
>>> user = 'eric_idle'
>>> member_since = date(1975, 7, 31)
>>> f'{user=} {member_since=}'
"user='eric_idle' member_since=datetime.date(1975, 7, 31)"
I would like to debug my Python code by inspecting multiple variables
Then you are probably better off using the built-in debugger, pdb.
If you must fall back on debug traces (my not-so-secret vice - just please make sure you don't check them in to version control) then hard-coding the variable name in a print call is not so bad - after all, maybe you can write something more descriptive than the variable name, anyway. There is also the pprint(prettyprint) module to get nicer formatting of complex nested data structures.
But if you really want to be able to find a variable given a string with its name, the built-in locals() and globals() functions provide dicts in which you can look up local and global variables respectively. You can also find global variables (attributes) of other modules (as well as attributes of anything else that has attributes) by name using the getattr() builtin function.

Printing z3 expressions using the python api

I am trying to use z3 to simplify a few expressions generated by S2E/KLEE
from z3 import *
f = open("query.smt2").read()
expr = parse_smt2_string(f)
print(expr)
print(simplify(expr))
But it seems to only log 200 lines. I have also tried writing it to file, but that has the same result.
g = open("simplified_query.smt2", 'w')
g.write(str(simplify(expr)))
g.close();
How should I log the entire expression?
Example input/output: https://paste.ee/p/tRwxQ
You can print the expressions using the Python pretty printer as you do. It cuts off the expressions if they become very big and the pretty printer is not efficient. There are settings you can add to the pretty printer to force it to print full expressions. The function is called set_pp_option and it is defined in z3printer.py. The main option is called max_depth. Other options are defined as fields in the Formatter class.
You can also print expressions in SMT2 format using the method "sexpr()".
BTW, the file you uploaded doesn't process because it is in UTF8 format, but this is orthogonal to your question and probably an artifact of how you uploaded the repro.

Python - any property file or data format that is mostly free-form?

I'm about to roll my own property file parser. I've got a somewhat odd requirement where I need to be able to store metadata in an existing field of a GUI. The data needs to be easily parse-able and human readable, preferably with some flexibility in defining the data (no yaml for example).
I was thinking I could do something like this:
this is random text that is truly a description
.metadata.
owner.first: rick
owner.second: bob
property: blue
pets.mammals.dog: rufus
pets.mammals.cat: ludmilla
I was thinking I could use something like '.metadata.' to denote that anything below that line is metadata to be parsed. Then, I would treat the properties almost like java properties where I would read each line in and build a map (or object) to hold the metadata, which would then be outputted and searchable via a simple web app.
My real question before I roll this on my own, is can anyone suggest a better method for solving this problem? A specific data format or library that would fit this use case? I would normally use something like yaml or the like, but there's no good way for me to validate that the data is indeed in yaml format when it is saved.
You have 3 problems:
How to fit two different things into one box.
If you are mixing free form text and something that is more tightly defined, you are always going to end up with stuff that you can't parse. Then you will have a never ending battle of trying to deal with the rubbish that gets put in. Is there really no other way?
How to define a simple format for metadata that is robust enough for simple use.
This is a hard problem - all attempts to do so seem to expand until they become quite complicated (e.g. YAML). You will probably have custom requirements for your domain, so what you've proposed may be best.
How to parse that format.
For this I would recommend parsy.
It would be quite simple to split the text on .metadata. and then parse what remains.
Here is an example using parsy:
from parsy import *
attribute = letter.at_least(1).concat()
name = attribute.sep_by(string("."))
value = regex(r"[^\n]+")
definition = seq(name << string(":") << string(" ").many(), value)
metadata = definition.sep_by(string("\n"))
Example usage:
>>> metadata.parse_partial("""owner.first: rick
owner.second: bob
property: blue
pets.mammals.dog: rufus
pets.mammals.cat: ludmilla""")
([[['owner', 'first'], 'rick'],
[['owner', 'second'], 'bob'],
[['property'], 'blue'],
[['pets', 'mammals', 'dog'], 'rufus'],
[['pets', 'mammals', 'cat'], 'ludmilla']],
'')
YAML is a simple and nice solution. There is a YAML library in Python:
import yaml
output = {'a':1,'b':{'c':output = {'a':1,'b':{'c':[2,3,4]}}}}
print yaml.dump(output,default_flow_style=False)
Giving as a result:
a: 1
b:
c:
- 2
- 3
- 4
You can also parse from string and so. Just explore it and check if it fits your requeriments.
Good luck!

Introducing a YAML string in a python script

I am working on a python code able to read a YAML file and generate a rule-based model in PySB.
A new rule in the YAML file is specified like:
--- !rule
name: L_binds_R
reaction:
L(unbound) + R(inactive) >> L(bound)%R(active)
rates:
- Kf
With this I create a pyyaml object (pyyaml is a package to work with yaml in python) in python and the reaction attribute is stored as a string.
Then, the rule in pysb requires to be specified as:
# Rule(name, reaction, constant)
Rule('L_binds_R', L(unbound) + R(inactive) >> L(bound)%R(active), kf)
My problem relies in the fact that the 'reaction' field in yaml is stored as string in the python object but pysb does not accept any other format than plain text.
I have checked in PySB and the reaction field cannot be a string in any case and I did not find how to scape the formating of variables in YAML.
Any idea to fix the problem?
You could approach this one of two ways: restructuring your YAML find to tokenise the reaction rules, or using eval in Python.
Tokenised reaction rules
The best approach would be to structure your YAML file such that your reaction rule is already specified in individual tokens, rather than just one field for the whole reaction, e.g.
--- rule!
name: L_binds_R
reaction:
reactant:
name: L
site: b
reactant:
name: R
site: b
state: inactive
product:
name: L
site: b
bond: 1
product:
name: R
site: b
bond: 1
state: active
fwd_rate: kf
You could then write a parser to translate this into the following PySB rule, building the ReactionPattern using the classes in PySB core (MonomerPattern, ComplexPattern and so on):
Rule(‘L_binds_R’, L(b=None) + R(b='inactive') >> L(b=1) % R(b=(‘active’, 1)), kf)
If you have control over the code where the YAML is coming from, you might find it easier to either output PySB code directly, or perhaps write to a standard like SBML, which PySB can now read.
You might find it helpful to look at the PySB BioNetGen language (BNGL) parser I wrote, which creates a PySB model from a BioNetGen XML file, as an example of how to create a model from an external file.
Using eval
The alternative is to use eval. While this is the easier solution, it is strongly discouraged for security reasons*. However if the YAML files are all generated by you/your own code and you just want a quick fix, this would do it.
Here’s an example:
# You would read these in from the YAML file, but I’ll just define
# the strings here for simplicity
reaction_name = "L_binds_R"
reaction_str = "L(b=None) + R(b='inactive') >> L(b=1) % R(b=('active', 1))"
reaction_fwd_rate = "Kf"
Rule(reaction_name, eval(reaction_str), eval(reaction_fwd_rate))
# Python output
# (assumes Monomers L and R and parameter Kf are already defined):
# >>> Rule('L_binds_R', L(b=None) + R(b='inactive') >> L(b=1) % R(b=('active', 1)), Kf)
*Consider the case where your YAML contained something like:
reaction:
import shutil; shutil.rmtree('~')
Importing that YAML file and evaling that field would delete your home directory! eval will execute any arbitrary Python code by definition. It should only be used where the source file is completely trusted. In general you should always "sanitise your inputs" (assume inputs are dangerous until proven otherwise).

Writing a compiler for a DSL in python

I am writing a game in python and have decided to create a DSL for the map data files. I know I could write my own parser with regex, but I am wondering if there are existing python tools which can do this more easily, like re2c which is used in the PHP engine.
Some extra info:
Yes, I do need a DSL, and even if I didn't I still want the experience of building and using one in a project.
The DSL contains only data (declarative?), it doesn't get "executed". Most lines look like:
SOMETHING: !abc #123 #xyz/123
I just need to read the tree of data.
I've always been impressed by pyparsing. The author, Paul McGuire, is active on the python list/comp.lang.python and has always been very helpful with any queries concerning it.
Here's an approach that works really well.
abc= ONETHING( ... )
xyz= ANOTHERTHING( ... )
pqr= SOMETHING( this=abc, that=123, more=(xyz,123) )
Declarative. Easy-to-parse.
And...
It's actually Python. A few class declarations and the work is done. The DSL is actually class declarations.
What's important is that a DSL merely creates objects. When you define a DSL, first you have to start with an object model. Later, you put some syntax around that object model. You don't start with syntax, you start with the model.
Yes, there are many -- too many -- parsing tools, but none in the standard library.
From what what I saw PLY and SPARK are popular. PLY is like yacc, but you do everything in Python because you write your grammar in docstrings.
Personally, I like the concept of parser combinators (taken from functional programming), and I quite like pyparsing: you write your grammar and actions directly in python and it is easy to start with. I ended up producing my own tree node types with actions though, instead of using their default ParserElement type.
Otherwise, you can also use existing declarative language like YAML.
I have written something like this in work to read in SNMP notification definitions and automatically generate Java classes and SNMP MIB files from this. Using this little DSL, I could write 20 lines of my specification and it would generate roughly 80 lines of Java code and a 100 line MIB file.
To implement this, I actually just used straight Python string handling (split(), slicing etc) to parse the file. I find Pythons string capabilities to be adequate for most of my (simple) parsing needs.
Besides the libraries mentioned by others, if I were writing something more complex and needed proper parsing capabilities, I would probably use ANTLR, which supports Python (and other languages).
For "small languages" as the one you are describing, I use a simple split, shlex (mind that the # defines a comment) or regular expressions.
>>> line = 'SOMETHING: !abc #123 #xyz/123'
>>> line.split()
['SOMETHING:', '!abc', '#123', '#xyz/123']
>>> import shlex
>>> list(shlex.shlex(line))
['SOMETHING', ':', '!', 'abc', '#', '123']
The following is an example, as I do not know exactly what you are looking for.
>>> import re
>>> result = re.match(r'([A-Z]*): !([a-z]*) #([0-9]*) #([a-z0-9/]*)', line)
>>> result.groups()
('SOMETHING', 'abc', '123', 'xyz/123')
DSLs are a good thing, so you don't need to defend yourself :-)
However, have you considered an internal DSL ? These have so many pros versus external (parsed) DSLs that they're at least worth consideration. Mixing a DSL with the power of the native language really solves lots of the problems for you, and Python is not really bad at internal DSLs, with the with statement handy.
On the lines of declarative python, I wrote a helper module called 'bpyml' which lets you declare data in python in a more XML structured way without the verbose tags, it can be converted to/from XML too, but is valid python.
https://svn.blender.org/svnroot/bf-blender/trunk/blender/release/scripts/modules/bpyml.py
Example Use
http://wiki.blender.org/index.php/User:Ideasman42#Declarative_UI_In_Blender
Here is a simpler approach to solve it
What if I can extend python syntax with new operators to introduce new functionally to the language? For example, a new operator <=> for swapping the value of two variables.
How can I implement such behavior? Here comes AST module.
The last module is a handy tool for handling abstract syntax trees. What’s cool about this module is it allows me to write python code that generates a tree and then compiles it to python code.
Let’s say we want to compile a superset language (or python-like language) to python:
from :
a <=> b
to:
a , b = b , a
I need to convert my 'python like' source code into a list of tokens.
So I need a tokenizer, a lexical scanner for Python source code. Tokenize module
I may use the same meta-language to define both the grammar of new 'python-like' language and then build the structure of the abstract syntax tree AST
Why use AST?
AST is a much safer choice when evaluating untrusted code
manipulate the tree before executing the code Working on the Tree
from tokenize import untokenize, tokenize, NUMBER, STRING, NAME, OP, COMMA
import io
import ast
s = b"a <=> b\n" # i may read it from file
b = io.BytesIO(s)
g = tokenize(b.readline)
result = []
for token_num, token_val, _, _, _ in g:
# naive simple approach to compile a<=>b to a,b = b,a
if token_num == OP and token_val == '<=' and next(g).string == '>':
first = result.pop()
next_token = next(g)
second = (NAME, next_token.string)
result.extend([
first,
(COMMA, ','),
second,
(OP, '='),
second,
(COMMA, ','),
first,
])
else:
result.append((token_num, token_val))
src = untokenize(result).decode('utf-8')
exp = ast.parse(src)
code = compile(exp, filename='', mode='exec')
def my_swap(a, b):
global code
env = {
"a": a,
"b": b
}
exec(code, env)
return env['a'], env['b']
print(my_swap(1,10))
Other modules using AST, whose source code may be a useful reference:
textX-LS: A DSL used to describe a collection of shapes and draw it for us.
pony orm: You can write database queries using Python generators and lambdas with translate to SQL query sting—pony orm use AST under the hood
osso: Role Based Access Control a framework handle permissions.

Categories