Introducing a YAML string in a python script

Introducing a YAML string in a python script - python

I am working on a python code able to read a YAML file and generate a rule-based model in PySB.
A new rule in the YAML file is specified like:
--- !rule
name: L_binds_R
reaction:
L(unbound) + R(inactive) >> L(bound)%R(active)
rates:
- Kf
With this I create a pyyaml object (pyyaml is a package to work with yaml in python) in python and the reaction attribute is stored as a string.
Then, the rule in pysb requires to be specified as:
# Rule(name, reaction, constant)
Rule('L_binds_R', L(unbound) + R(inactive) >> L(bound)%R(active), kf)
My problem relies in the fact that the 'reaction' field in yaml is stored as string in the python object but pysb does not accept any other format than plain text.
I have checked in PySB and the reaction field cannot be a string in any case and I did not find how to scape the formating of variables in YAML.
Any idea to fix the problem?

You could approach this one of two ways: restructuring your YAML find to tokenise the reaction rules, or using eval in Python.
Tokenised reaction rules
The best approach would be to structure your YAML file such that your reaction rule is already specified in individual tokens, rather than just one field for the whole reaction, e.g.
--- rule!
name: L_binds_R
reaction:
reactant:
name: L
site: b
reactant:
name: R
site: b
state: inactive
product:
name: L
site: b
bond: 1
product:
name: R
site: b
bond: 1
state: active
fwd_rate: kf
You could then write a parser to translate this into the following PySB rule, building the ReactionPattern using the classes in PySB core (MonomerPattern, ComplexPattern and so on):
Rule(‘L_binds_R’, L(b=None) + R(b='inactive') >> L(b=1) % R(b=(‘active’, 1)), kf)
If you have control over the code where the YAML is coming from, you might find it easier to either output PySB code directly, or perhaps write to a standard like SBML, which PySB can now read.
You might find it helpful to look at the PySB BioNetGen language (BNGL) parser I wrote, which creates a PySB model from a BioNetGen XML file, as an example of how to create a model from an external file.
Using eval
The alternative is to use eval. While this is the easier solution, it is strongly discouraged for security reasons*. However if the YAML files are all generated by you/your own code and you just want a quick fix, this would do it.
Here’s an example:
# You would read these in from the YAML file, but I’ll just define
# the strings here for simplicity
reaction_name = "L_binds_R"
reaction_str = "L(b=None) + R(b='inactive') >> L(b=1) % R(b=('active', 1))"
reaction_fwd_rate = "Kf"
Rule(reaction_name, eval(reaction_str), eval(reaction_fwd_rate))
# Python output
# (assumes Monomers L and R and parameter Kf are already defined):
# >>> Rule('L_binds_R', L(b=None) + R(b='inactive') >> L(b=1) % R(b=('active', 1)), Kf)
*Consider the case where your YAML contained something like:
reaction:
import shutil; shutil.rmtree('~')
Importing that YAML file and evaling that field would delete your home directory! eval will execute any arbitrary Python code by definition. It should only be used where the source file is completely trusted. In general you should always "sanitise your inputs" (assume inputs are dangerous until proven otherwise).

Related

How to update YAML file without loss of comments and formatting / YAML automatic refactoring in Python

I would like to update YAML file values in Python without losing formatting and comments in Python. For example I would like to tranform
YAML file
value: 456 # nice value
to
value: 6 # nice value
with interface similar to
y = yaml.load('path')
y['value'] = 6
y.save()
Is there some way how to do it elegantly in Python (without writting a new YAML parsing library)?
I need systematic longterm maintainable solution - so no regex substitutions is okey for me since they get ugly and hardly maintainable, when you do to much of them in your code.
I haven't found any Python library which does the job. The only library I found, which is considering the feature, but is not implemented it yet, is C library libyaml(issue on Github). Have I missed any?
The problem might be also formulated as: do you know some automatic refactoring YAML library in Python?
Thanks.

ruamel.yaml may be what you are looking for, it is a YAML parser/emitter that supports roundtrip preservation of comments:
import sys
from ruamel.yaml import YAML
yaml_data = "value: 456 # nice value"
yaml = YAML()
data = yaml.load(yaml_data)
data["value"] = 6
yaml.dump(data, sys.stdout)
Output:
value: 6 # nice value

Exclude header comments when generating documentation with sphinx

I am trying to use sphinx to generate documentation for my python package. I have included well documented docstrings within all of my functions. I have called sphinx-quickstart to create the template, filled in the template, and called make html to generate the documentation. My problem is that I also have comments within my python module that are also showing up in the rendered documentation. The comments are not within the functions, but rather as a header at the top of the .py files. I am required to have these comment blocks and cannot remove them, but I don't want them to show up in my documentation.
I'm current using automodule to pull all of the functions out of the module. I know I can use autofunction to get the individual functions one by one, and this avoids the file headers, but I have a lot of functions and there must be a better way.
How can I adjust my conf.py file, or something else to use automodule, but to only pick up the functions and not the comments outside of the functions?
This is what my .py file looks like:
"""
This code is a part of a proj repo.
https://github.com/CurtLH/proj
author: greenbean4me
date: 2018-02-07
"""
def hit(nail):
"""
Hit the nail on the head
This function takes a string and returns the first character in the string
Parameter
----------
nail : str
Can be any word with any length
Return
-------
x : str
First character in the string
Example
-------
>>> x = hit("hello")
>>> print(x)
>>> "h"
"""
return nail[0]
This is my the auto-generated documentation looks like:
Auto-Generated Documentation
This code is a part of a proj repo.
https://github.com/CurtLH/proj
author: greenbean4me date: 2018-02-07
hammer.hit(nail)
Hit the nail on the head
This function takes a string and returns the first character in the string
nail : str
Can be any word with any length
x : str
First character in the string
>>> x = hit("hello")
>>> print(x)
>>> "h"
For a more comprehensive example, check out this example repo on GitHub: https://github.com/CurtLH/proj

As far as I know, there is no way to configure autodoc not to do this.
There is, however, a simple workaround: Adding an empty docstring at the top of your module.
"""""" # <- add this
"""
This code is a part of a proj repo.
https://github.com/CurtLH/proj
author: greenbean4me
date: 2018-02-07
"""
It's barely noticeable, and will trick autodoc into not displaying your module's real docstring.

Python - any property file or data format that is mostly free-form?

I'm about to roll my own property file parser. I've got a somewhat odd requirement where I need to be able to store metadata in an existing field of a GUI. The data needs to be easily parse-able and human readable, preferably with some flexibility in defining the data (no yaml for example).
I was thinking I could do something like this:
this is random text that is truly a description
.metadata.
owner.first: rick
owner.second: bob
property: blue
pets.mammals.dog: rufus
pets.mammals.cat: ludmilla
I was thinking I could use something like '.metadata.' to denote that anything below that line is metadata to be parsed. Then, I would treat the properties almost like java properties where I would read each line in and build a map (or object) to hold the metadata, which would then be outputted and searchable via a simple web app.
My real question before I roll this on my own, is can anyone suggest a better method for solving this problem? A specific data format or library that would fit this use case? I would normally use something like yaml or the like, but there's no good way for me to validate that the data is indeed in yaml format when it is saved.

You have 3 problems:
How to fit two different things into one box.
If you are mixing free form text and something that is more tightly defined, you are always going to end up with stuff that you can't parse. Then you will have a never ending battle of trying to deal with the rubbish that gets put in. Is there really no other way?
How to define a simple format for metadata that is robust enough for simple use.
This is a hard problem - all attempts to do so seem to expand until they become quite complicated (e.g. YAML). You will probably have custom requirements for your domain, so what you've proposed may be best.
How to parse that format.
For this I would recommend parsy.
It would be quite simple to split the text on .metadata. and then parse what remains.
Here is an example using parsy:
from parsy import *
attribute = letter.at_least(1).concat()
name = attribute.sep_by(string("."))
value = regex(r"[^\n]+")
definition = seq(name << string(":") << string(" ").many(), value)
metadata = definition.sep_by(string("\n"))
Example usage:
>>> metadata.parse_partial("""owner.first: rick
owner.second: bob
property: blue
pets.mammals.dog: rufus
pets.mammals.cat: ludmilla""")
([[['owner', 'first'], 'rick'],
[['owner', 'second'], 'bob'],
[['property'], 'blue'],
[['pets', 'mammals', 'dog'], 'rufus'],
[['pets', 'mammals', 'cat'], 'ludmilla']],
'')

YAML is a simple and nice solution. There is a YAML library in Python:
import yaml
output = {'a':1,'b':{'c':output = {'a':1,'b':{'c':[2,3,4]}}}}
print yaml.dump(output,default_flow_style=False)
Giving as a result:
a: 1
b:
c:
- 2
- 3
- 4
You can also parse from string and so. Just explore it and check if it fits your requeriments.
Good luck!

PyYAML and unusual tags

I am working on a project that uses the Unity3D game engine. For some of the pipeline requirements, it is best to be able to update some files from external tools using Python. Unity's meta and anim files are in YAML so I thought this would be strait forward enough using PyYAML.
The problem is that Unity's format uses custom attributes and I am not sure how to work with them as all the examples show more common tags used by Python and Ruby.
Here is what the top lines of a file look like:
%YAML 1.1
%TAG !u! tag:unity3d.com,2011:
--- !u!74 &7400000
AnimationClip:
m_ObjectHideFlags: 0
m_PrefabParentObject: {fileID: 0}
...
When I try to read the file I get this error:
could not determine a constructor for the tag 'tag:unity3d.com,2011:74'
Now after looking at all the other questions asked, this tag scheme does not seem to resemble those questions and answers. For example this file uses "!u!" which I was unable to figure out what it means or how something similar would behave (my wild uneducated guess says it looks like an alias or namespace).
I can do a hack way and strip the tags out but that is not the ideal way to try to do this. I am looking for help on a solution that will properly handle the tags and allow me to parse & encode the data in a way that preserves the proper format.
Thanks,
-R

I also had this problem, and the internet was not very helpful. After bashing my head against this problem for 3 days, I was able to sort it out...or at least get a working solution. If anyone wants to add more info, please do. But here's what I got.
1) The documentation on Unity's YAML file format(they call it a "textual scene file" because it contains text that is human readable) - http://docs.unity3d.com/Manual/TextualSceneFormat.html
It is a YAML 1.1 compliant format. So you should be able to use PyYAML or any other Python YAML library to load up a YAML object.
Okay, great. But it doesn't work. Every YAML library has issues with this file.
2) The file is not correctly formed. It turns out, the Unity file has some syntactical issues that make YAML parsers error out on it. Specifically:
2a) At the top, it uses a %TAG directive to create an alias for the string "unity3d.com,2011". It looks like:
%TAG !u! tag:unity3d.com,2011:
What this means is anywhere you see "!u!", replace it with "tag:unity3d.com,2011".
2b) Then it goes on to use "!u!" all over the place before each object stream. But the problem is that - to be YAML 1.1 compliant - it should actually declare a tag alias for each stream (any time a new object starts with "--- "). Declaring it once at the top and never again is only valid for the first stream, and the next stream knows nothing about "!u!", so it errors out.
Also, this tag is useless. It basically appends "tag:unity3d.com,2011" to each entry in the stream. Which we don't care about. We already know it's a Unity YAML file. Why clutter the data?
3) The object types are given by Unity's Class ID. Here is the documentation on that:
http://docs.unity3d.com/Manual/ClassIDReference.html
Basically, each stream is defined as a new class of object...corresponding to the IDs in that link. So a "GameObject" is "1", etc. The line looks like this:
--- !u!1 &100000
So the "--- " defines a new stream. The "!u!" is an alias for "tag:unity3d.com,2011" and the "&100000" is the file ID for this object (inside this file, if something references this object, it uses this ID....remember YAML is a node-based representation, so that ID is used to denote a node connection).
The next line is the root of the YAML object, which happens to be the name of the Unity Class...example "GameObject". So it turns out we don't actually need to translate from Class ID to Human Readable node type. It's right there. If you ever need to use it, just take the root node. And if you need to construct a YAML object for Unity, just keep a dictionary around based on that documentation link to translate "GameObject" to "1", etc.
The other problem is that most YAML parsers (PyYAML is the one I tested) only support 3 types of YAML objects out of the box:
Scalar
Sequence
Mapping
You can define/extend custom nodes. But this amounts to hand writing your own YAML parser because you have to define EXPLICITLY how each YAML constructor is created, and outputs. Why would I use a Library like PyYAML, then go ahead and write my own parser to read these custom nodes? The whole point of using a library is to leverage previous work and get all that functionality from day one. I spent 2 days trying to make a new constructor for each class ID in unity. It never worked, and I got into the weeds trying to build the constructors correctly.
THE GOOD NEWS/SOLUTION:
Turns out, all the Unity nodes I've ever run into so far are basic "Mapping" nodes in YAML. So you can throw away the custom node mapping and just let PyYAML auto-detect the node type. From there, everything works great!
In PyYAML, you can pass a file object, or a string. So, my solution was to write a simple 5 line pre-parser to strip out the bits that confuse PyYAML(the bits that Unity incorrectly syntaxed) and feed this new string to PyYAML.
1) Remove line 2 entirely, or just ignore it:
%TAG !u! tag:unity3d.com,2011:
We don't care. We know it's a unity file. And the tag does nothing for us.
2) For each stream declaration, remove the tag alias ("!u!") and remove the class ID. Leave the fileID. Let PyYAML auto-detect the node as a Mapping node.
--- !u!1 &100000
becomes...
--- &100000
3) The rest, output as is.
The code for the pre-parser looks like this:
def removeUnityTagAlias(filepath):
"""
Name: removeUnityTagAlias()
Description: Loads a file object from a Unity textual scene file, which is in a pseudo YAML style, and strips the
parts that are not YAML 1.1 compliant. Then returns a string as a stream, which can be passed to PyYAML.
Essentially removes the "!u!" tag directive, class type and the "&" file ID directive. PyYAML seems to handle
rest just fine after that.
Returns: String (YAML stream as string)
"""
result = str()
sourceFile = open(filepath, 'r')
for lineNumber,line in enumerate( sourceFile.readlines() ):
if line.startswith('--- !u!'):
result += '--- ' + line.split(' ')[2] + '\n' # remove the tag, but keep file ID
else:
# Just copy the contents...
result += line
sourceFile.close()
return result
To create a PyYAML object from a Unity textual scene file, call your pre-parser function on the file:
import yaml
# This fixes Unity's YAML %TAG alias issue.
fileToLoad = '/Users/vlad.dumitrascu/<SOME_PROJECT>/Client/Assets/Gear/MeleeWeapons/SomeAsset_test.prefab'
UnityStreamNoTags = removeUnityTagAlias(fileToLoad)
ListOfNodes = list()
for data in yaml.load_all(UnityStreamNoTags):
ListOfNodes.append( data )
# Example, print each object's name and type
for node in ListOfNodes:
if 'm_Name' in node[ node.keys()[0] ]:
print( 'Name: ' + node[ node.keys()[0] ]['m_Name'] + ' NodeType: ' + node.keys()[0] )
else:
print( 'Name: ' + 'No Name Attribute' + ' NodeType: ' + node.keys()[0] )
Hope that helps!
-Vlad
PS. To Answer the next issue in making this usable:
You also need to walk the entire project directory and parse all ".meta" files for the "GUID", which is Unity's inter-file reference. So, when you see a reference in a Unity YAML file for something like:
m_Materials:
- {fileID: 2100000, guid: 4b191c3a6f88640689fc5ea3ec5bf3a3, type: 2}
That file is somewhere else. And you can re-cursively open that one to find out any dependencies.
I just ripped through the game project and saved a dictionary of GUID:Filepath Key:Value pairs which I can match against.

Writing a compiler for a DSL in python

I am writing a game in python and have decided to create a DSL for the map data files. I know I could write my own parser with regex, but I am wondering if there are existing python tools which can do this more easily, like re2c which is used in the PHP engine.
Some extra info:
Yes, I do need a DSL, and even if I didn't I still want the experience of building and using one in a project.
The DSL contains only data (declarative?), it doesn't get "executed". Most lines look like:
SOMETHING: !abc #123 #xyz/123
I just need to read the tree of data.

I've always been impressed by pyparsing. The author, Paul McGuire, is active on the python list/comp.lang.python and has always been very helpful with any queries concerning it.

Here's an approach that works really well.
abc= ONETHING( ... )
xyz= ANOTHERTHING( ... )
pqr= SOMETHING( this=abc, that=123, more=(xyz,123) )
Declarative. Easy-to-parse.
And...
It's actually Python. A few class declarations and the work is done. The DSL is actually class declarations.
What's important is that a DSL merely creates objects. When you define a DSL, first you have to start with an object model. Later, you put some syntax around that object model. You don't start with syntax, you start with the model.

Yes, there are many -- too many -- parsing tools, but none in the standard library.
From what what I saw PLY and SPARK are popular. PLY is like yacc, but you do everything in Python because you write your grammar in docstrings.
Personally, I like the concept of parser combinators (taken from functional programming), and I quite like pyparsing: you write your grammar and actions directly in python and it is easy to start with. I ended up producing my own tree node types with actions though, instead of using their default ParserElement type.
Otherwise, you can also use existing declarative language like YAML.

I have written something like this in work to read in SNMP notification definitions and automatically generate Java classes and SNMP MIB files from this. Using this little DSL, I could write 20 lines of my specification and it would generate roughly 80 lines of Java code and a 100 line MIB file.
To implement this, I actually just used straight Python string handling (split(), slicing etc) to parse the file. I find Pythons string capabilities to be adequate for most of my (simple) parsing needs.
Besides the libraries mentioned by others, if I were writing something more complex and needed proper parsing capabilities, I would probably use ANTLR, which supports Python (and other languages).

For "small languages" as the one you are describing, I use a simple split, shlex (mind that the # defines a comment) or regular expressions.
>>> line = 'SOMETHING: !abc #123 #xyz/123'
>>> line.split()
['SOMETHING:', '!abc', '#123', '#xyz/123']
>>> import shlex
>>> list(shlex.shlex(line))
['SOMETHING', ':', '!', 'abc', '#', '123']
The following is an example, as I do not know exactly what you are looking for.
>>> import re
>>> result = re.match(r'([A-Z]*): !([a-z]*) #([0-9]*) #([a-z0-9/]*)', line)
>>> result.groups()
('SOMETHING', 'abc', '123', 'xyz/123')

DSLs are a good thing, so you don't need to defend yourself :-)
However, have you considered an internal DSL ? These have so many pros versus external (parsed) DSLs that they're at least worth consideration. Mixing a DSL with the power of the native language really solves lots of the problems for you, and Python is not really bad at internal DSLs, with the with statement handy.

On the lines of declarative python, I wrote a helper module called 'bpyml' which lets you declare data in python in a more XML structured way without the verbose tags, it can be converted to/from XML too, but is valid python.
https://svn.blender.org/svnroot/bf-blender/trunk/blender/release/scripts/modules/bpyml.py
Example Use
http://wiki.blender.org/index.php/User:Ideasman42#Declarative_UI_In_Blender

Here is a simpler approach to solve it
What if I can extend python syntax with new operators to introduce new functionally to the language? For example, a new operator <=> for swapping the value of two variables.
How can I implement such behavior? Here comes AST module.
The last module is a handy tool for handling abstract syntax trees. What’s cool about this module is it allows me to write python code that generates a tree and then compiles it to python code.
Let’s say we want to compile a superset language (or python-like language) to python:
from :
a <=> b
to:
a , b = b , a
I need to convert my 'python like' source code into a list of tokens.
So I need a tokenizer, a lexical scanner for Python source code. Tokenize module
I may use the same meta-language to define both the grammar of new 'python-like' language and then build the structure of the abstract syntax tree AST
Why use AST?
AST is a much safer choice when evaluating untrusted code
manipulate the tree before executing the code Working on the Tree
from tokenize import untokenize, tokenize, NUMBER, STRING, NAME, OP, COMMA
import io
import ast
s = b"a <=> b\n" # i may read it from file
b = io.BytesIO(s)
g = tokenize(b.readline)
result = []
for token_num, token_val, _, _, _ in g:
# naive simple approach to compile a<=>b to a,b = b,a
if token_num == OP and token_val == '<=' and next(g).string == '>':
first = result.pop()
next_token = next(g)
second = (NAME, next_token.string)
result.extend([
first,
(COMMA, ','),
second,
(OP, '='),
second,
(COMMA, ','),
first,
])
else:
result.append((token_num, token_val))
src = untokenize(result).decode('utf-8')
exp = ast.parse(src)
code = compile(exp, filename='', mode='exec')
def my_swap(a, b):
global code
env = {
"a": a,
"b": b
}
exec(code, env)
return env['a'], env['b']
print(my_swap(1,10))
Other modules using AST, whose source code may be a useful reference:
textX-LS: A DSL used to describe a collection of shapes and draw it for us.
pony orm: You can write database queries using Python generators and lambdas with translate to SQL query sting—pony orm use AST under the hood
osso: Role Based Access Control a framework handle permissions.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Introducing a YAML string in a python script - python

Related

How to update YAML file without loss of comments and formatting / YAML automatic refactoring in Python

Exclude header comments when generating documentation with sphinx

Python - any property file or data format that is mostly free-form?

PyYAML and unusual tags

Writing a compiler for a DSL in python

Categories

Resources