I am implementing an Evolution Strategy algorithm in Python 3. I have created a class called Individual that reads the configuration from a file (YAML format) which looks like the following:
num_of_genes: 3
pre_init_gene:
gene1: 1.0
gene2: 1.0
gene3: 1.0
metrics:
- metric1
- metric2
obj_funcs:
obj_fun1: 'lambda x,y: x+y'
obj_fun2: 'lambda x: (x**3)/2'
The idea is that the individual would read this file to get its configuration.
I know I can save my lambda expression as a string and then call eval on it.
However, is there a more Pythonic solution for this problem? I do not feel much comfortable with OO in Python, but I am open for suggestions.
I would adhere to the Zen of
Python in particular
"Explicit is better than implicit" and "Readability counts".
So having your functions as readable strings defining lambda's is a
good idea, although, for security reasons, calling eval on the loaded string
representation of the lambda might not be. This
again depends on who has modification access to the file and on which
system they run.
In general you should not care too much if someone can
(non-inadvertedly) inject something resulting in recursive removal of
all files on a system, if they have login access rights with which
they can do so anyway. However if e.g. the software runs on a remote
system and these files can be edited via some web interface, or if the
file changes can be made by someone else than the person using the
files, this is something you should take into account.
If the lambdas come from a fixed set, you can just use
their string representation as a lookup:
lambdas = {}
for l in [
'lambda x,y: x+y',
'lambda x: (x**3)/2',
# some more
]:
lambdas[l] = eval(l)
You can then use the string loaded from your configuration YAML to get
the actual lambda and that string cannot be tampered with, as it has
to match the available set of lambdas you provided. You can of course
load the actual lambda strings from a file that only you can change,
instead of hard-coding them in the source code.
This is IMO more
explicit than dumping the actual lambda resulting in YAML looking like:
!!python/name:__main__.%3Clambda%3E
, something which requires unsafe loading of the YAML document anyway.
If you need to be more flexible, than using pre-defined lambdas, but
don't want the insecurity of using eval, then another possibility is
to use Python's AST
module. That module
allows for safe evaluation of unary and binary operators, but can be
extended to handle only those function (e.g. some mathematical
functions) that you want to allow in your lambda. I have done similar
extension in my Python Object Notation module
(PON) adding datetime and
dedenting
capabilities to the AST evaluated input.
Something else is that you should IMO improve on your YAML. Instead of
using gene1, gene2 as keys in mapping, use a sequence and tag the items:
pre_init_gene:
- !Gene 1.0
- !Gene 1.0
- !Gene 1.0
or, alternatively tag the sequence:
pre_init_gene: !Genes
- 1.0
- 1.0
- 1.0
Your lambdas have the same "problem" and I would do something like:
obj_funcs:
- !Lambda 'x, y: x+y'
- !Lambda 'x: (x**3)/2'
where the object implementing the from_yaml classmethod for the
tag !Lambda transparently does the eval or AST evaluation.
With cloudpickle you can dump a lambda to bytes. Then you need to convert bytes to str to be written to a file.
import cloudpickle
import base64
def lambda2str(expr):
b = cloudpickle.dumps(expr)
s = base64.b64encode(b).decode()
return s
def str2lambda(s):
b = base64.b64decode(s)
expr = cloudpickle.loads(b)
return expr
e = lambda x, y: x + y
s = lambda2str(e)
print(s) # => gASVNAEAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX2ZpbGxfZnVuY3Rpb26Uk5QoaACMD19tYWtlX3NrZWxfZnVuY5STlGgAjA1fYnVpbHRpbl90eXBllJOUjAhDb2RlVHlwZZSFlFKUKEsCSwBLAksCS0NDCHwAfAEXAFMAlE6FlCmMAXiUjAF5lIaUjCovVXNlcnMvYmxvd25oaXRoZXJtYS9wcm9qZWN0cy90ZXN0L3Rlc3QucHmUjAg8bGFtYmRhPpRLEUMAlCkpdJRSlEr/////fZSHlFKUf
# store s in file, read s from file
e2 = str2lambda(s)
print(e2(1, 1)) # => 2
Note that what base64 does is to avoid things like \n in the encoded string which will poison the file structure. decode() is simply converting bytes to str so that it can be written to a file.
This is not a concise representation but a safe one. If your working environment is safe, feel free to use your readable version!
I want to parse a log4j configuration in order to know how to parse a given log.
Requirements: python 2.6+, no custom c modules (unless absolutely required).
For example:
%d{yyyy-MM-dd HH:mm:ss.SSS} %-5p{length=5} [%t] %c:%L %message%n
or
%d{ISO8601} %-5p{length=5} ((%t) %c:%L) %message%n
As a reference, the pattern layout is described here:
Pattern Layouts for log4j
Initially, I was going to customize it for each log pattern, as an example using re:
log1 = re.compile(r'([\d-]{10}) ([\d:.]{12}) {1}([A-Z]{0,}) \[(catalina-exec-[0-9]{2})\]{0,} (.*)\n')
Note: I realize that this is not a very comprehensive use of re, nor is it an optimized regular expression. It was testing only.
I initially started using parsimonious like so (very early stage):
from parsimonious.grammar import Grammar
grammar = Grammar(
r"""
category = "%c"
category_precise = category optional_open number optional_close
timedate = '%d'
timedate_absolute = timedate optional_open timedate_abstext optional_close
timedate_iso = timedate optional_open timedate_isotext optional_close
timedate_date = timedate optional_open timedate_date optional_close
timedate_era = "G"
timedate_year_two_digit = ~"y{2}"
timedate_year_number = ~"(?:y{1}|y{3,}"
timedate_month = "MM"
timedate_minute = "mm"
"""
Effectively, I am wondering if I am going about it the wrong way? It almost seems like I am using a PEG parser in the wrong way, in fact the more I look at it, I think I am.
I don't need full code, just a good concept, a start, an idea, or a good place to start reading.
In the end, I want to be able to review a log format, and for lack of better words "convert the log4j2 pattern into a regular expression"
Any help would be appreciated
I would suggest Plex 2.0. I have found it easy to write the code that would identify tokens such as ISO8601, %d, %t, etc, from the configuration file. Then, as you will discern from the documentation, I expect that you will be able to write regex code to be returned by Plex that parses the log file itself.
I have have a Python script that converts files from a custom form language into compilable C++ files. An example of what such a file looks like could be
data = open_special_file_format('data.nc')
f = div(grad(data.u)) + data.g
write_special_file(f, 'out.nc')
Note that is Python syntax an in fact is parsed with Python's ast. The magic that happens here is mostly in the custom keywords div, grad, and a few others.
Since this so closely resembles Python, I was asking myself if it is possible to embed this language into Python. I'm imagining something like
import mylang
data = mylang.open_special_file_format('data.nc')
f = mylang.div(mylang.grad(data.u)) + data.g
mylang.write_special_file(f, 'out.nc')
I'm not really sure though if it's possible to tell the module mylang to create and compile C++ code on the fly and insert it in the right place.
Any hints?
I'm using a library ABPY (library here) for python but it is in older version i think. I'm using Python 3.3.
I did fix some PRINT errors, but that's how much i know, I'm really new on programing.
I want to fetch some webpage and filter it from advertising and then print it again.
EDITED after Sg'te'gmuj told me how to convert from python 2.x to 3.x this is my new code:
#!/usr/local/bin/python3.1
import cgitb;cgitb.enable()
import urllib.request
response = urllib.request.build_opener()
response.addheaders = [('User-agent', 'Mozilla/5.0')]
response = urllib.request.urlopen("http://www.youtube.com")
html = response.read()
from abpy import Filter
with open("easylist.txt") as f:
ABPFilter = Filter(file('easylist.txt'))
ABPFilter.match(html)
print("Content-type: text/html")
print()
print (html)
Now it is displaying a blank page
Just took a peek at the library, it seems that the file "easylist.txt" does not exist; you need to create the file, and populate it with the appropriate filters (in whatever format ABP specifies).
Additionally, it appears it takes a file object; try something like this instead:
with open("easylist.txt") as f:
ABPFilter = Filter(f)
I can't say this is wholly accurate though since I have no experience with the library, but looking at it's code I'd suspect either of the two are the problem, if not both.
Addendum #1
Looking at the code more in-depth, I have to agree that even if that fix I supplied does work, you're going to have more problems (it's in 2.x as you suggested, when you're using 3.x). I'd suggest utilizing Python's 2to3 function, to convert from typical Python 2 to Python 3 code (it's not foolproof though). The command line would be as so:
2to3 -w abpy.py
That will convert it from Python 2.x to 3.x code, and re-write the source file.
Addendum #2
The code to pass the file object should be the "f" variable, as shown above (modified to represent that; I wasn't paying attention and just left the old file function call in the argument).
You need to pass a URI to the function as well:
ABPFilter.match(URI)
You'll need to modify the code to pass those items into an array (I'm assuming at least); I'm playing with it now to see. At present I'm getting a rule error (not a Python error; but merely error handling used by abpy.py, which is good because it suggests that it's the right train of thought).
The code for the Filter.match function is as following (after using the 2to3 Python script):
def match(self, url, elementtype=None):
tokens = RE_TOK.split(url)
print(tokens)
for tok in tokens:
if len(tok) > 2:
if tok in self.index:
for rule in self.index[tok]:
if rule.match(url, elementtype=elementtype):
print(str(rule))
What this means is you're, at present, at a point where you need to program the functionality; it appears this module only indicates the rule. However, that is still useful.
What this means is that you're going to have to modify this function to take the HTML, in place of the the "url" parameter. You're going to regex the HTML (this may be rather intensive) for a list of URIs and then run each item through the match loop Where you go from there to actually filter the nodes, I'm not sure; but there is a list of filter types, so I'm assuming there is a typical procedural ABP does to remove the nodes (possibly, in some cases merely by removing the given URI from the HTML?)
References
http://docs.python.org/3.3/library/2to3.html
I am writing a game in python and have decided to create a DSL for the map data files. I know I could write my own parser with regex, but I am wondering if there are existing python tools which can do this more easily, like re2c which is used in the PHP engine.
Some extra info:
Yes, I do need a DSL, and even if I didn't I still want the experience of building and using one in a project.
The DSL contains only data (declarative?), it doesn't get "executed". Most lines look like:
SOMETHING: !abc #123 #xyz/123
I just need to read the tree of data.
I've always been impressed by pyparsing. The author, Paul McGuire, is active on the python list/comp.lang.python and has always been very helpful with any queries concerning it.
Here's an approach that works really well.
abc= ONETHING( ... )
xyz= ANOTHERTHING( ... )
pqr= SOMETHING( this=abc, that=123, more=(xyz,123) )
Declarative. Easy-to-parse.
And...
It's actually Python. A few class declarations and the work is done. The DSL is actually class declarations.
What's important is that a DSL merely creates objects. When you define a DSL, first you have to start with an object model. Later, you put some syntax around that object model. You don't start with syntax, you start with the model.
Yes, there are many -- too many -- parsing tools, but none in the standard library.
From what what I saw PLY and SPARK are popular. PLY is like yacc, but you do everything in Python because you write your grammar in docstrings.
Personally, I like the concept of parser combinators (taken from functional programming), and I quite like pyparsing: you write your grammar and actions directly in python and it is easy to start with. I ended up producing my own tree node types with actions though, instead of using their default ParserElement type.
Otherwise, you can also use existing declarative language like YAML.
I have written something like this in work to read in SNMP notification definitions and automatically generate Java classes and SNMP MIB files from this. Using this little DSL, I could write 20 lines of my specification and it would generate roughly 80 lines of Java code and a 100 line MIB file.
To implement this, I actually just used straight Python string handling (split(), slicing etc) to parse the file. I find Pythons string capabilities to be adequate for most of my (simple) parsing needs.
Besides the libraries mentioned by others, if I were writing something more complex and needed proper parsing capabilities, I would probably use ANTLR, which supports Python (and other languages).
For "small languages" as the one you are describing, I use a simple split, shlex (mind that the # defines a comment) or regular expressions.
>>> line = 'SOMETHING: !abc #123 #xyz/123'
>>> line.split()
['SOMETHING:', '!abc', '#123', '#xyz/123']
>>> import shlex
>>> list(shlex.shlex(line))
['SOMETHING', ':', '!', 'abc', '#', '123']
The following is an example, as I do not know exactly what you are looking for.
>>> import re
>>> result = re.match(r'([A-Z]*): !([a-z]*) #([0-9]*) #([a-z0-9/]*)', line)
>>> result.groups()
('SOMETHING', 'abc', '123', 'xyz/123')
DSLs are a good thing, so you don't need to defend yourself :-)
However, have you considered an internal DSL ? These have so many pros versus external (parsed) DSLs that they're at least worth consideration. Mixing a DSL with the power of the native language really solves lots of the problems for you, and Python is not really bad at internal DSLs, with the with statement handy.
On the lines of declarative python, I wrote a helper module called 'bpyml' which lets you declare data in python in a more XML structured way without the verbose tags, it can be converted to/from XML too, but is valid python.
https://svn.blender.org/svnroot/bf-blender/trunk/blender/release/scripts/modules/bpyml.py
Example Use
http://wiki.blender.org/index.php/User:Ideasman42#Declarative_UI_In_Blender
Here is a simpler approach to solve it
What if I can extend python syntax with new operators to introduce new functionally to the language? For example, a new operator <=> for swapping the value of two variables.
How can I implement such behavior? Here comes AST module.
The last module is a handy tool for handling abstract syntax trees. What’s cool about this module is it allows me to write python code that generates a tree and then compiles it to python code.
Let’s say we want to compile a superset language (or python-like language) to python:
from :
a <=> b
to:
a , b = b , a
I need to convert my 'python like' source code into a list of tokens.
So I need a tokenizer, a lexical scanner for Python source code. Tokenize module
I may use the same meta-language to define both the grammar of new 'python-like' language and then build the structure of the abstract syntax tree AST
Why use AST?
AST is a much safer choice when evaluating untrusted code
manipulate the tree before executing the code Working on the Tree
from tokenize import untokenize, tokenize, NUMBER, STRING, NAME, OP, COMMA
import io
import ast
s = b"a <=> b\n" # i may read it from file
b = io.BytesIO(s)
g = tokenize(b.readline)
result = []
for token_num, token_val, _, _, _ in g:
# naive simple approach to compile a<=>b to a,b = b,a
if token_num == OP and token_val == '<=' and next(g).string == '>':
first = result.pop()
next_token = next(g)
second = (NAME, next_token.string)
result.extend([
first,
(COMMA, ','),
second,
(OP, '='),
second,
(COMMA, ','),
first,
])
else:
result.append((token_num, token_val))
src = untokenize(result).decode('utf-8')
exp = ast.parse(src)
code = compile(exp, filename='', mode='exec')
def my_swap(a, b):
global code
env = {
"a": a,
"b": b
}
exec(code, env)
return env['a'], env['b']
print(my_swap(1,10))
Other modules using AST, whose source code may be a useful reference:
textX-LS: A DSL used to describe a collection of shapes and draw it for us.
pony orm: You can write database queries using Python generators and lambdas with translate to SQL query sting—pony orm use AST under the hood
osso: Role Based Access Control a framework handle permissions.