PyYAML and unusual tags

PyYAML and unusual tags - python

I am working on a project that uses the Unity3D game engine. For some of the pipeline requirements, it is best to be able to update some files from external tools using Python. Unity's meta and anim files are in YAML so I thought this would be strait forward enough using PyYAML.
The problem is that Unity's format uses custom attributes and I am not sure how to work with them as all the examples show more common tags used by Python and Ruby.
Here is what the top lines of a file look like:
%YAML 1.1
%TAG !u! tag:unity3d.com,2011:
--- !u!74 &7400000
AnimationClip:
m_ObjectHideFlags: 0
m_PrefabParentObject: {fileID: 0}
...
When I try to read the file I get this error:
could not determine a constructor for the tag 'tag:unity3d.com,2011:74'
Now after looking at all the other questions asked, this tag scheme does not seem to resemble those questions and answers. For example this file uses "!u!" which I was unable to figure out what it means or how something similar would behave (my wild uneducated guess says it looks like an alias or namespace).
I can do a hack way and strip the tags out but that is not the ideal way to try to do this. I am looking for help on a solution that will properly handle the tags and allow me to parse & encode the data in a way that preserves the proper format.
Thanks,
-R

I also had this problem, and the internet was not very helpful. After bashing my head against this problem for 3 days, I was able to sort it out...or at least get a working solution. If anyone wants to add more info, please do. But here's what I got.
1) The documentation on Unity's YAML file format(they call it a "textual scene file" because it contains text that is human readable) - http://docs.unity3d.com/Manual/TextualSceneFormat.html
It is a YAML 1.1 compliant format. So you should be able to use PyYAML or any other Python YAML library to load up a YAML object.
Okay, great. But it doesn't work. Every YAML library has issues with this file.
2) The file is not correctly formed. It turns out, the Unity file has some syntactical issues that make YAML parsers error out on it. Specifically:
2a) At the top, it uses a %TAG directive to create an alias for the string "unity3d.com,2011". It looks like:
%TAG !u! tag:unity3d.com,2011:
What this means is anywhere you see "!u!", replace it with "tag:unity3d.com,2011".
2b) Then it goes on to use "!u!" all over the place before each object stream. But the problem is that - to be YAML 1.1 compliant - it should actually declare a tag alias for each stream (any time a new object starts with "--- "). Declaring it once at the top and never again is only valid for the first stream, and the next stream knows nothing about "!u!", so it errors out.
Also, this tag is useless. It basically appends "tag:unity3d.com,2011" to each entry in the stream. Which we don't care about. We already know it's a Unity YAML file. Why clutter the data?
3) The object types are given by Unity's Class ID. Here is the documentation on that:
http://docs.unity3d.com/Manual/ClassIDReference.html
Basically, each stream is defined as a new class of object...corresponding to the IDs in that link. So a "GameObject" is "1", etc. The line looks like this:
--- !u!1 &100000
So the "--- " defines a new stream. The "!u!" is an alias for "tag:unity3d.com,2011" and the "&100000" is the file ID for this object (inside this file, if something references this object, it uses this ID....remember YAML is a node-based representation, so that ID is used to denote a node connection).
The next line is the root of the YAML object, which happens to be the name of the Unity Class...example "GameObject". So it turns out we don't actually need to translate from Class ID to Human Readable node type. It's right there. If you ever need to use it, just take the root node. And if you need to construct a YAML object for Unity, just keep a dictionary around based on that documentation link to translate "GameObject" to "1", etc.
The other problem is that most YAML parsers (PyYAML is the one I tested) only support 3 types of YAML objects out of the box:
Scalar
Sequence
Mapping
You can define/extend custom nodes. But this amounts to hand writing your own YAML parser because you have to define EXPLICITLY how each YAML constructor is created, and outputs. Why would I use a Library like PyYAML, then go ahead and write my own parser to read these custom nodes? The whole point of using a library is to leverage previous work and get all that functionality from day one. I spent 2 days trying to make a new constructor for each class ID in unity. It never worked, and I got into the weeds trying to build the constructors correctly.
THE GOOD NEWS/SOLUTION:
Turns out, all the Unity nodes I've ever run into so far are basic "Mapping" nodes in YAML. So you can throw away the custom node mapping and just let PyYAML auto-detect the node type. From there, everything works great!
In PyYAML, you can pass a file object, or a string. So, my solution was to write a simple 5 line pre-parser to strip out the bits that confuse PyYAML(the bits that Unity incorrectly syntaxed) and feed this new string to PyYAML.
1) Remove line 2 entirely, or just ignore it:
%TAG !u! tag:unity3d.com,2011:
We don't care. We know it's a unity file. And the tag does nothing for us.
2) For each stream declaration, remove the tag alias ("!u!") and remove the class ID. Leave the fileID. Let PyYAML auto-detect the node as a Mapping node.
--- !u!1 &100000
becomes...
--- &100000
3) The rest, output as is.
The code for the pre-parser looks like this:
def removeUnityTagAlias(filepath):
"""
Name: removeUnityTagAlias()
Description: Loads a file object from a Unity textual scene file, which is in a pseudo YAML style, and strips the
parts that are not YAML 1.1 compliant. Then returns a string as a stream, which can be passed to PyYAML.
Essentially removes the "!u!" tag directive, class type and the "&" file ID directive. PyYAML seems to handle
rest just fine after that.
Returns: String (YAML stream as string)
"""
result = str()
sourceFile = open(filepath, 'r')
for lineNumber,line in enumerate( sourceFile.readlines() ):
if line.startswith('--- !u!'):
result += '--- ' + line.split(' ')[2] + '\n' # remove the tag, but keep file ID
else:
# Just copy the contents...
result += line
sourceFile.close()
return result
To create a PyYAML object from a Unity textual scene file, call your pre-parser function on the file:
import yaml
# This fixes Unity's YAML %TAG alias issue.
fileToLoad = '/Users/vlad.dumitrascu/<SOME_PROJECT>/Client/Assets/Gear/MeleeWeapons/SomeAsset_test.prefab'
UnityStreamNoTags = removeUnityTagAlias(fileToLoad)
ListOfNodes = list()
for data in yaml.load_all(UnityStreamNoTags):
ListOfNodes.append( data )
# Example, print each object's name and type
for node in ListOfNodes:
if 'm_Name' in node[ node.keys()[0] ]:
print( 'Name: ' + node[ node.keys()[0] ]['m_Name'] + ' NodeType: ' + node.keys()[0] )
else:
print( 'Name: ' + 'No Name Attribute' + ' NodeType: ' + node.keys()[0] )
Hope that helps!
-Vlad
PS. To Answer the next issue in making this usable:
You also need to walk the entire project directory and parse all ".meta" files for the "GUID", which is Unity's inter-file reference. So, when you see a reference in a Unity YAML file for something like:
m_Materials:
- {fileID: 2100000, guid: 4b191c3a6f88640689fc5ea3ec5bf3a3, type: 2}
That file is somewhere else. And you can re-cursively open that one to find out any dependencies.
I just ripped through the game project and saved a dictionary of GUID:Filepath Key:Value pairs which I can match against.

Related

Python - Import formatted lines as indexed list objects

I am writing a minor OP5 plugin in Python 2.7 (version is out of my hands) that iterates over a multidimensional list that verifies fallback zip downloads have gone as they should.
Up until now I have put each host with their IP address in a multidimensional list looking like (cut short for brevity):
fallback = [
["host1", "192.168.1.3"],
["host2", "192.168.15.59"]
]
...and so on.
This lets me iterate through fallback[i] and use that along with fallback[i][1] for the IP address, the rest of the script uses both of these informations for various tasks and string manipulations. The script as it is now is mechanically sound but relies on availability of these indexes.
There is however a hidden file (.fallbackinfo) containing the same information for another script but it is written for perl, same as the script that uses that file as a source.
The file looks like this:
#hosts = (
["host1", "192.168.1.3", "type of firmware", "subfolder"],
["host2", "192.168.15.59", "type of firmware", "subfolder"],
);
I wish to import this into an iterable multidimensional list in my Python script, but am getting incredibly stuck.
My current attempt is the closest I have gotten:
with open("/home/runninguser/.fallbackinfo") as f:
lines = []
for line in f:
lines.append(line.rstrip().strip())
fallback = lines[1:len(lines)-1]
This has successfully made the list look as I want it, but all lines get imported as str objects. I have attempted to use list() to force the object to become a list but most of the time, that makes each character in the lines to become a list object instead. The network in question is cut off from internet access so I have to rely on built-in modules. My interpretation is that since it is formatted as a list, it should somehow be able to be interpreted as a list.
Can this be done at all, and if so, how?

You can use the json package (built-in) to achieve this:
import json
with open("/home/runninguser/.fallbackinfo") as f:
# For each line
for line in f:
# If the line starts with a bracket
if line.strip()[0] == "[":
# Print the line after removing spaces in front and the comma in the back
# and converting it into a list
print(json.loads(line.strip().rstrip(",")))
If you now use the type() function, you will see the list-formatted strings are now <class 'list'>

Python - any property file or data format that is mostly free-form?

I'm about to roll my own property file parser. I've got a somewhat odd requirement where I need to be able to store metadata in an existing field of a GUI. The data needs to be easily parse-able and human readable, preferably with some flexibility in defining the data (no yaml for example).
I was thinking I could do something like this:
this is random text that is truly a description
.metadata.
owner.first: rick
owner.second: bob
property: blue
pets.mammals.dog: rufus
pets.mammals.cat: ludmilla
I was thinking I could use something like '.metadata.' to denote that anything below that line is metadata to be parsed. Then, I would treat the properties almost like java properties where I would read each line in and build a map (or object) to hold the metadata, which would then be outputted and searchable via a simple web app.
My real question before I roll this on my own, is can anyone suggest a better method for solving this problem? A specific data format or library that would fit this use case? I would normally use something like yaml or the like, but there's no good way for me to validate that the data is indeed in yaml format when it is saved.

You have 3 problems:
How to fit two different things into one box.
If you are mixing free form text and something that is more tightly defined, you are always going to end up with stuff that you can't parse. Then you will have a never ending battle of trying to deal with the rubbish that gets put in. Is there really no other way?
How to define a simple format for metadata that is robust enough for simple use.
This is a hard problem - all attempts to do so seem to expand until they become quite complicated (e.g. YAML). You will probably have custom requirements for your domain, so what you've proposed may be best.
How to parse that format.
For this I would recommend parsy.
It would be quite simple to split the text on .metadata. and then parse what remains.
Here is an example using parsy:
from parsy import *
attribute = letter.at_least(1).concat()
name = attribute.sep_by(string("."))
value = regex(r"[^\n]+")
definition = seq(name << string(":") << string(" ").many(), value)
metadata = definition.sep_by(string("\n"))
Example usage:
>>> metadata.parse_partial("""owner.first: rick
owner.second: bob
property: blue
pets.mammals.dog: rufus
pets.mammals.cat: ludmilla""")
([[['owner', 'first'], 'rick'],
[['owner', 'second'], 'bob'],
[['property'], 'blue'],
[['pets', 'mammals', 'dog'], 'rufus'],
[['pets', 'mammals', 'cat'], 'ludmilla']],
'')

YAML is a simple and nice solution. There is a YAML library in Python:
import yaml
output = {'a':1,'b':{'c':output = {'a':1,'b':{'c':[2,3,4]}}}}
print yaml.dump(output,default_flow_style=False)
Giving as a result:
a: 1
b:
c:
- 2
- 3
- 4
You can also parse from string and so. Just explore it and check if it fits your requeriments.
Good luck!

Introducing a YAML string in a python script

I am working on a python code able to read a YAML file and generate a rule-based model in PySB.
A new rule in the YAML file is specified like:
--- !rule
name: L_binds_R
reaction:
L(unbound) + R(inactive) >> L(bound)%R(active)
rates:
- Kf
With this I create a pyyaml object (pyyaml is a package to work with yaml in python) in python and the reaction attribute is stored as a string.
Then, the rule in pysb requires to be specified as:
# Rule(name, reaction, constant)
Rule('L_binds_R', L(unbound) + R(inactive) >> L(bound)%R(active), kf)
My problem relies in the fact that the 'reaction' field in yaml is stored as string in the python object but pysb does not accept any other format than plain text.
I have checked in PySB and the reaction field cannot be a string in any case and I did not find how to scape the formating of variables in YAML.
Any idea to fix the problem?

You could approach this one of two ways: restructuring your YAML find to tokenise the reaction rules, or using eval in Python.
Tokenised reaction rules
The best approach would be to structure your YAML file such that your reaction rule is already specified in individual tokens, rather than just one field for the whole reaction, e.g.
--- rule!
name: L_binds_R
reaction:
reactant:
name: L
site: b
reactant:
name: R
site: b
state: inactive
product:
name: L
site: b
bond: 1
product:
name: R
site: b
bond: 1
state: active
fwd_rate: kf
You could then write a parser to translate this into the following PySB rule, building the ReactionPattern using the classes in PySB core (MonomerPattern, ComplexPattern and so on):
Rule(‘L_binds_R’, L(b=None) + R(b='inactive') >> L(b=1) % R(b=(‘active’, 1)), kf)
If you have control over the code where the YAML is coming from, you might find it easier to either output PySB code directly, or perhaps write to a standard like SBML, which PySB can now read.
You might find it helpful to look at the PySB BioNetGen language (BNGL) parser I wrote, which creates a PySB model from a BioNetGen XML file, as an example of how to create a model from an external file.
Using eval
The alternative is to use eval. While this is the easier solution, it is strongly discouraged for security reasons*. However if the YAML files are all generated by you/your own code and you just want a quick fix, this would do it.
Here’s an example:
# You would read these in from the YAML file, but I’ll just define
# the strings here for simplicity
reaction_name = "L_binds_R"
reaction_str = "L(b=None) + R(b='inactive') >> L(b=1) % R(b=('active', 1))"
reaction_fwd_rate = "Kf"
Rule(reaction_name, eval(reaction_str), eval(reaction_fwd_rate))
# Python output
# (assumes Monomers L and R and parameter Kf are already defined):
# >>> Rule('L_binds_R', L(b=None) + R(b='inactive') >> L(b=1) % R(b=('active', 1)), Kf)
*Consider the case where your YAML contained something like:
reaction:
import shutil; shutil.rmtree('~')
Importing that YAML file and evaling that field would delete your home directory! eval will execute any arbitrary Python code by definition. It should only be used where the source file is completely trusted. In general you should always "sanitise your inputs" (assume inputs are dangerous until proven otherwise).

Python 3.3 library abpy, file undefined

I'm using a library ABPY (library here) for python but it is in older version i think. I'm using Python 3.3.
I did fix some PRINT errors, but that's how much i know, I'm really new on programing.
I want to fetch some webpage and filter it from advertising and then print it again.
EDITED after Sg'te'gmuj told me how to convert from python 2.x to 3.x this is my new code:
#!/usr/local/bin/python3.1
import cgitb;cgitb.enable()
import urllib.request
response = urllib.request.build_opener()
response.addheaders = [('User-agent', 'Mozilla/5.0')]
response = urllib.request.urlopen("http://www.youtube.com")
html = response.read()
from abpy import Filter
with open("easylist.txt") as f:
ABPFilter = Filter(file('easylist.txt'))
ABPFilter.match(html)
print("Content-type: text/html")
print()
print (html)
Now it is displaying a blank page

Just took a peek at the library, it seems that the file "easylist.txt" does not exist; you need to create the file, and populate it with the appropriate filters (in whatever format ABP specifies).
Additionally, it appears it takes a file object; try something like this instead:
with open("easylist.txt") as f:
ABPFilter = Filter(f)
I can't say this is wholly accurate though since I have no experience with the library, but looking at it's code I'd suspect either of the two are the problem, if not both.
Addendum #1
Looking at the code more in-depth, I have to agree that even if that fix I supplied does work, you're going to have more problems (it's in 2.x as you suggested, when you're using 3.x). I'd suggest utilizing Python's 2to3 function, to convert from typical Python 2 to Python 3 code (it's not foolproof though). The command line would be as so:
2to3 -w abpy.py
That will convert it from Python 2.x to 3.x code, and re-write the source file.
Addendum #2
The code to pass the file object should be the "f" variable, as shown above (modified to represent that; I wasn't paying attention and just left the old file function call in the argument).
You need to pass a URI to the function as well:
ABPFilter.match(URI)
You'll need to modify the code to pass those items into an array (I'm assuming at least); I'm playing with it now to see. At present I'm getting a rule error (not a Python error; but merely error handling used by abpy.py, which is good because it suggests that it's the right train of thought).
The code for the Filter.match function is as following (after using the 2to3 Python script):
def match(self, url, elementtype=None):
tokens = RE_TOK.split(url)
print(tokens)
for tok in tokens:
if len(tok) > 2:
if tok in self.index:
for rule in self.index[tok]:
if rule.match(url, elementtype=elementtype):
print(str(rule))
What this means is you're, at present, at a point where you need to program the functionality; it appears this module only indicates the rule. However, that is still useful.
What this means is that you're going to have to modify this function to take the HTML, in place of the the "url" parameter. You're going to regex the HTML (this may be rather intensive) for a list of URIs and then run each item through the match loop Where you go from there to actually filter the nodes, I'm not sure; but there is a list of filter types, so I'm assuming there is a typical procedural ABP does to remove the nodes (possibly, in some cases merely by removing the given URI from the HTML?)
References
http://docs.python.org/3.3/library/2to3.html

How can I get a list of all the strings that Babel knows about?

I have an application in which the main strings are in English and then various translations are made in various .po/.mo files, as usual (using Flask and Flask-Babel). Is it possible to get a list of all the English strings somewhere within my Python code? Specifically, I'd like to have an admin interface on the website which lets someone log in and choose an arbitrary phrase to be used in a certain place without having to poke at actual Python code or .po/.mo files. This phrase might change over time but needs to be translated, so it needs to be something Babel knows about.
I do have access to the actual .pot file, so I could just parse that, but I was hoping for a cleaner method if possible.

You can use polib for this.
This section of the documentation shows examples of how to iterate over the contents of a .po file. Here is one taken from that page:
import polib
po = polib.pofile('path/to/catalog.po')
for entry in po:
print entry.msgid, entry.msgstr

If you alredy use babel you can get all items from po file:
from babel.messages.pofile import read_po
catalog = read_po(open(full_file_name))
for message in catalog:
print message.id, message.string
See http://babel.edgewall.org/browser/trunk/babel/messages/pofile.py.
You alredy can try get items from mo file:
from babel.messages.mofile import read_mo
catalog = read_po(open(full_file_name))
for message in catalog:
print message.id, message.string
But when I try use it last time it's not was availible. See http://babel.edgewall.org/browser/trunk/babel/messages/mofile.py.
You can use polib as #Miguel wrote.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

PyYAML and unusual tags - python

Related

Python - Import formatted lines as indexed list objects

Python - any property file or data format that is mostly free-form?

Introducing a YAML string in a python script

Python 3.3 library abpy, file undefined

How can I get a list of all the strings that Babel knows about?

Categories

Resources