Python - Import formatted lines as indexed list objects - python

I am writing a minor OP5 plugin in Python 2.7 (version is out of my hands) that iterates over a multidimensional list that verifies fallback zip downloads have gone as they should.
Up until now I have put each host with their IP address in a multidimensional list looking like (cut short for brevity):
fallback = [
["host1", "192.168.1.3"],
["host2", "192.168.15.59"]
]
...and so on.
This lets me iterate through fallback[i] and use that along with fallback[i][1] for the IP address, the rest of the script uses both of these informations for various tasks and string manipulations. The script as it is now is mechanically sound but relies on availability of these indexes.
There is however a hidden file (.fallbackinfo) containing the same information for another script but it is written for perl, same as the script that uses that file as a source.
The file looks like this:
#hosts = (
["host1", "192.168.1.3", "type of firmware", "subfolder"],
["host2", "192.168.15.59", "type of firmware", "subfolder"],
);
I wish to import this into an iterable multidimensional list in my Python script, but am getting incredibly stuck.
My current attempt is the closest I have gotten:
with open("/home/runninguser/.fallbackinfo") as f:
lines = []
for line in f:
lines.append(line.rstrip().strip())
fallback = lines[1:len(lines)-1]
This has successfully made the list look as I want it, but all lines get imported as str objects. I have attempted to use list() to force the object to become a list but most of the time, that makes each character in the lines to become a list object instead. The network in question is cut off from internet access so I have to rely on built-in modules. My interpretation is that since it is formatted as a list, it should somehow be able to be interpreted as a list.
Can this be done at all, and if so, how?

You can use the json package (built-in) to achieve this:
import json
with open("/home/runninguser/.fallbackinfo") as f:
# For each line
for line in f:
# If the line starts with a bracket
if line.strip()[0] == "[":
# Print the line after removing spaces in front and the comma in the back
# and converting it into a list
print(json.loads(line.strip().rstrip(",")))
If you now use the type() function, you will see the list-formatted strings are now <class 'list'>

Related

Key 'boot_num' is not recognized when being interpreted from a .JSON file

Currently, I am working on a Boot Sequence in Python for a larger project. For this specific part of the sequence, I need to access a .JSON file (specs.json), establish it as a dictionary in the main program. I then need to take a value from the .JSON file, and add 1 to it, using it's key to find the value. Once that's done, I need to push the changes to the .JSON file. Yet, every time I run the code below, I get the error:
bootNum = spcInfDat['boot_num']
KeyError: 'boot_num'`
Here's the code I currently have:
(Note: I'm using the Python json library, and have imported dumps, dump, and load.)
# Opening of the JSON files
spcInf = open('mki/data/json/specs.json',) # .JSON file that contains the current system's specifications. Not quite needed, but it may make a nice reference?
spcInfDat = load(spcInf)
This code is later followed by this, where I attempt to assign the value to a variable by using it's dictionary key (The for statement was a debug statement, so I could visibly see the Key):
for i in spcInfDat['spec']:
print(CBL + str(i) + CEN)
# Loacting and increasing the value of bootNum.
bootNum = spcInfDat['boot_num']
print(str(bootNum))
bootNum = bootNum + 1
(Another Note: CBL and CEN are just variables I use to colour text I send to the terminal.)
This is the interior of specs.json:
{
"spec": [
{
"os":"name",
"os_type":"getwindowsversion",
"lang":"en",
"cpu_amt":"cpu_count",
"storage_amt":"unk",
"boot_num":1
}
]
}
I'm relatively new with .JSON files, as well as using the Python json library; I only have experience with them through some GeeksforGeeks tutorials I found. There is a rather good chance that I just don't know how .JSON files work in conjunction with the library, but I figure that it would still be worth a shot to check here. The GeeksForGeeks tutorial had no documentation about this, as well as there being minimal I know about how this works, so I'm lost. I've tried searching here, and have found nothing.
Issue Number 2
Now, the prior part works. But, when I attempt to run the code on the following lines:
# Changing the values of specDict.
print(CBL + "Changing values of specDict... 50%" + CEN)
specDict ={
"os":name,
"os_type":ost,
"lang":"en",
"cpu_amt":cr,
"storage_amt":"unk",
"boot_num":bootNum
}
# Writing the product of makeSpec to `specs.json`.
print(CBL + "Writing makeSpec() result to `specs.json`... 75%" + CEN)
jsonobj = dumps(specDict, indent = 4)
with open('mki/data/json/specs.json', "w") as outfile:
dump(jsonobj, outfile)
I get the error:
TypeError: Object of type builtin_function_or_method is not JSON serializable.
Is there a chance that I set up my dictionary incorrectly, or am I using the dump function incorrectly?
You can show the data using:
print(spcInfData)
This shows it to be a dictionary, whose single entry 'spec' has an array, whose zero'th element is a sub-dictionary, whose 'boot_num' entry is an integer.
{'spec': [{'os': 'name', 'os_type': 'getwindowsversion', 'lang': 'en', 'cpu_amt': 'cpu_count', 'storage_amt': 'unk', 'boot_num': 1}]}
So what you are looking for is
boot_num = spcInfData['spec'][0]['boot_num']
and note that the value obtained this way is already an integer. str() is not necessary.
It's also good practice to guard against file format errors so the program handles them gracefully.
try:
boot_num = spcInfData['spec'][0]['boot_num']
except (KeyError, IndexError):
print('Database is corrupt')
Issue Number 2
"Not serializable" means there is something somewhere in your data structure that is not an accepted type and can't be converted to a JSON string.
json.dump() only processes certain types such as strings, dictionaries, and integers. That includes all of the objects that are nested within sub-dictionaries, sub-arrays, etc. See documentation for json.JSONEncoder for a complete list of allowable types.

python: list element in CSV file

I have a csv file of such structure:
Id,Country,Cities
1,Canada,"['Toronto','Ottawa','Montreal']"
2,Italy,"['Rome','Milan','Naples', 'Palermo']"
3,France,"['Paris','Cannes','Lyon']"
4,Spain,"['Seville','Alicante','Barcelona']"
The last column contains a list, but it is represented as a string so that it is treated as a single element. When parsing the file, I need to have this element as a list, not a string. So far I've found the way to convert it:
L = "['Toronto','Ottawa','Montreal']"
seq = ast.literal_eval(L)
Since I'm a newbie in python, my question is -- is this normal way of doing this, or there's a right way to represent lists in CSV so that I don't have to do conversions, or there's a simpler way to convert?
Thanks!
Using ast.literal_eval(...) will work, but it requires special syntax that other CSV-reading software won't recognize, and uses an eval statement which is a red flag.
Using eval can be dangerous, even though in this case you're using the safer literal_eval option which is more restrained than the raw eval function.
Usually what you'll see in CSV files that have many values in a single column is that they'll use a simple delimiter and quote the field.
For instance:
ID,Country,Cities
1,Canada,"Toronto;Ottawa;Montreal"
Then in python, or any other language, it becomes trivial to read without having to resort to eval:
import csv
with open("data.csv") as fobj:
reader = csv.reader(fobj)
field_names = next(reader)
rows = []
for row in reader:
row[-1] = row[-1].split(";")
rows.append(row)
Issues with ast.literal_eval
Even though the ast.literal_eval function is much safer than using a regular eval on user input, it still might be exploitable. The documentation for literal_eval has this warning:
Warning: It is possible to crash the Python interpreter with a sufficiently large/complex string due to stack depth limitations in Python’s AST compiler.
A demonstration of this can be found here:
>>> import ast
>>> ast.literal_eval("()" * 10 ** 6)
[1] 48513 segmentation fault python
I'm definitely not an expert, but giving a user the ability to crash a program and potentially exploit some obscure memory vulnerability is bad, and in this use-case can be avoided.
If the reason you want to use literal_eval is to get proper typing, and you're positive that the input data is 100% trusted, then I suppose it's fine to use. But, you could always wrap the function to perform some sanity checks:
def sanely_eval(value: str, max_size: int = 100_000) -> object:
if len(value) > max_size:
raise ValueError(f"len(value) is greater than the max_size={max_size!r}")
return ast.literal_eval(value)
But, depending on how you're creating and using the CSV files, this may make the data less portable, since it's a python-specific format.
If you can control the CSV, you could separate the items with some other known character that isn't going to be in a city and isn't a comma. Say colon (:).
Then row one, for example, would look like this:
1,Canada,Toronto:Ottawa:Montreal
When it comes to processing the data, you'll have that whole element, and you can just do
cities.split(':')
If you want to go the other way (you have the cities in a Python list, and you want to create this string) you can use join()
':'.join(['Toronto', 'Ottawa', 'Montreal'])
For the specific structure of the csv, you could convert cities to list like this:
cities = '''"['Rome','Milan','Naples', 'Palermo']"'''
cities = cities[2:-2] # remove "[ and ]"
print(cities) # 'Rome','Milan','Naples', 'Palermo'
cities = cities.split(',') # convert to list
print(cities) # ["'Rome'", "'Milan'", "'Naples'", " 'Palermo'"]
cities = [x.strip() for x in cities] # remove leading or following spaces (if exists)
print(cities) # ["'Rome'", "'Milan'", "'Naples'", "'Palermo'"]
cities = [x[1:-1] for x in cities] # remove quotes '' from each city
print(cities) # ['Rome', 'Milan', 'Naples', 'Palermo']

Can I use asterisk * in config file variable name?

I am using Config Parser to specify a list of variables, and the values for those variables are then pulled from a larger file. The variables/lines in the larger file all look like this:
callCount.1.cell=2
callCount.2.cell=10
callCount.3.cell=12
Rather than listing all these variables specifically, would I be able to use an '*' as a wildcard character, in place of the number, like this:
[variablesToPull]
callCount.*.cell
I can't change the formatting of the larger file I'm pulling values from, and I don't always know what the numbers that are apart of the variables will be.
EDIT: I'm using Python 2.7 to do all my Config Parsing
After looking around for a while I don't think it's possible. I ended up using just the first part of the desired variable name in the config file
[variablesToPull]
callCount
Then I created a dictionary from the file I was storing the full list of variable names and values in (All the following code is for Python 2.7, it probably works for Python 3, but might need some syntax changes)
f = open(variable_names_and_values_file, 'r')
dict_of_vars= {}
for line in f:
k, v = line.strip().split('=')
dict_of_vars[k.strip()] = v.strip()
f.close()
Then I looped over this dictionary and created a new list of the specific variables to pull (like callCount.1.cell, callCount.2.cell, ect...)
new_variables= []
for var in partial_variable_names_from_config_file:
for data in dict_of_vars:
if var in data:
new_variables.append(data)
Initially I didn't want to do so much looping for fear of performance decrease (my file of variables has like 10000 lines), but it doesn't seem to have slowed down my script by a noticeable amount.
Hope this helps someone out there

Getting positions from Python lists to generate a dynamic range

I have a list being built in Python, using this code:
def return_hosts():
'return a list of host names'
with open('./tfhosts') as hosts:
return [host.split()[1].strip() for host in hosts]
The format of tfhosts is that of a hosts file, so what I am doing is taking the hostname portion and populating that into a template, so far this works.
What I am trying to do is make sure that even if more hosts are added they're put into a default section as the other host sections are fixed, this part however I would like to be dynamic, to do that I've got the following:
rendered_inventory = inventory_template.render({
'host_main': gethosts[0],
'host_master1': gethosts[1],
'host_master2': gethosts[2],
'host_spring': gethosts[3],
'host_default': gethosts[4:],
})
Everything is rendered properly except the last host under the host_default section, instead of getting a newline separated lists of hosts, like this (which is what I want):
[host_default]
dc01-worker-02
dc01-worker-03
It just write out the remaining hostnames in a single list, as (which I don't want):
[host_default]
['dc01-worker-02', 'dc01-worker-03']
I've tried to wrap the host default section and split it, but I get a runtime error if I try:
[gethosts[4:].split(",").strip()...
I believe gethosts[4:] returns a list, if gethosts is a list (which seems to be the case) , hence it is directly writing the list to your file.
Also, you cannot do .split() on a list (I guess you hoped to do .split on the string, but gethosts[4:] returns a list). I believe an easy way out for you would be to join the strings in the list using str.join with \n as the delimiter. Example -
rendered_inventory = inventory_template.render({
'host_main': gethosts[0],
'host_master1': gethosts[1],
'host_master2': gethosts[2],
'host_spring': gethosts[3],
'host_default': '\n'.join(gethosts[4:]),
})
Demo -
>>> lst = ['dc01-worker-02', 'dc01-worker-03']
>>> print('\n'.join(lst))
dc01-worker-02
dc01-worker-03
If you own the template, a cleaner approach would be to loop through the list for host_default and print each element in the template. Example you can try using a for loop construct in the jinja template.

PyYAML and unusual tags

I am working on a project that uses the Unity3D game engine. For some of the pipeline requirements, it is best to be able to update some files from external tools using Python. Unity's meta and anim files are in YAML so I thought this would be strait forward enough using PyYAML.
The problem is that Unity's format uses custom attributes and I am not sure how to work with them as all the examples show more common tags used by Python and Ruby.
Here is what the top lines of a file look like:
%YAML 1.1
%TAG !u! tag:unity3d.com,2011:
--- !u!74 &7400000
AnimationClip:
m_ObjectHideFlags: 0
m_PrefabParentObject: {fileID: 0}
...
When I try to read the file I get this error:
could not determine a constructor for the tag 'tag:unity3d.com,2011:74'
Now after looking at all the other questions asked, this tag scheme does not seem to resemble those questions and answers. For example this file uses "!u!" which I was unable to figure out what it means or how something similar would behave (my wild uneducated guess says it looks like an alias or namespace).
I can do a hack way and strip the tags out but that is not the ideal way to try to do this. I am looking for help on a solution that will properly handle the tags and allow me to parse & encode the data in a way that preserves the proper format.
Thanks,
-R
I also had this problem, and the internet was not very helpful. After bashing my head against this problem for 3 days, I was able to sort it out...or at least get a working solution. If anyone wants to add more info, please do. But here's what I got.
1) The documentation on Unity's YAML file format(they call it a "textual scene file" because it contains text that is human readable) - http://docs.unity3d.com/Manual/TextualSceneFormat.html
It is a YAML 1.1 compliant format. So you should be able to use PyYAML or any other Python YAML library to load up a YAML object.
Okay, great. But it doesn't work. Every YAML library has issues with this file.
2) The file is not correctly formed. It turns out, the Unity file has some syntactical issues that make YAML parsers error out on it. Specifically:
2a) At the top, it uses a %TAG directive to create an alias for the string "unity3d.com,2011". It looks like:
%TAG !u! tag:unity3d.com,2011:
What this means is anywhere you see "!u!", replace it with "tag:unity3d.com,2011".
2b) Then it goes on to use "!u!" all over the place before each object stream. But the problem is that - to be YAML 1.1 compliant - it should actually declare a tag alias for each stream (any time a new object starts with "--- "). Declaring it once at the top and never again is only valid for the first stream, and the next stream knows nothing about "!u!", so it errors out.
Also, this tag is useless. It basically appends "tag:unity3d.com,2011" to each entry in the stream. Which we don't care about. We already know it's a Unity YAML file. Why clutter the data?
3) The object types are given by Unity's Class ID. Here is the documentation on that:
http://docs.unity3d.com/Manual/ClassIDReference.html
Basically, each stream is defined as a new class of object...corresponding to the IDs in that link. So a "GameObject" is "1", etc. The line looks like this:
--- !u!1 &100000
So the "--- " defines a new stream. The "!u!" is an alias for "tag:unity3d.com,2011" and the "&100000" is the file ID for this object (inside this file, if something references this object, it uses this ID....remember YAML is a node-based representation, so that ID is used to denote a node connection).
The next line is the root of the YAML object, which happens to be the name of the Unity Class...example "GameObject". So it turns out we don't actually need to translate from Class ID to Human Readable node type. It's right there. If you ever need to use it, just take the root node. And if you need to construct a YAML object for Unity, just keep a dictionary around based on that documentation link to translate "GameObject" to "1", etc.
The other problem is that most YAML parsers (PyYAML is the one I tested) only support 3 types of YAML objects out of the box:
Scalar
Sequence
Mapping
You can define/extend custom nodes. But this amounts to hand writing your own YAML parser because you have to define EXPLICITLY how each YAML constructor is created, and outputs. Why would I use a Library like PyYAML, then go ahead and write my own parser to read these custom nodes? The whole point of using a library is to leverage previous work and get all that functionality from day one. I spent 2 days trying to make a new constructor for each class ID in unity. It never worked, and I got into the weeds trying to build the constructors correctly.
THE GOOD NEWS/SOLUTION:
Turns out, all the Unity nodes I've ever run into so far are basic "Mapping" nodes in YAML. So you can throw away the custom node mapping and just let PyYAML auto-detect the node type. From there, everything works great!
In PyYAML, you can pass a file object, or a string. So, my solution was to write a simple 5 line pre-parser to strip out the bits that confuse PyYAML(the bits that Unity incorrectly syntaxed) and feed this new string to PyYAML.
1) Remove line 2 entirely, or just ignore it:
%TAG !u! tag:unity3d.com,2011:
We don't care. We know it's a unity file. And the tag does nothing for us.
2) For each stream declaration, remove the tag alias ("!u!") and remove the class ID. Leave the fileID. Let PyYAML auto-detect the node as a Mapping node.
--- !u!1 &100000
becomes...
--- &100000
3) The rest, output as is.
The code for the pre-parser looks like this:
def removeUnityTagAlias(filepath):
"""
Name: removeUnityTagAlias()
Description: Loads a file object from a Unity textual scene file, which is in a pseudo YAML style, and strips the
parts that are not YAML 1.1 compliant. Then returns a string as a stream, which can be passed to PyYAML.
Essentially removes the "!u!" tag directive, class type and the "&" file ID directive. PyYAML seems to handle
rest just fine after that.
Returns: String (YAML stream as string)
"""
result = str()
sourceFile = open(filepath, 'r')
for lineNumber,line in enumerate( sourceFile.readlines() ):
if line.startswith('--- !u!'):
result += '--- ' + line.split(' ')[2] + '\n' # remove the tag, but keep file ID
else:
# Just copy the contents...
result += line
sourceFile.close()
return result
To create a PyYAML object from a Unity textual scene file, call your pre-parser function on the file:
import yaml
# This fixes Unity's YAML %TAG alias issue.
fileToLoad = '/Users/vlad.dumitrascu/<SOME_PROJECT>/Client/Assets/Gear/MeleeWeapons/SomeAsset_test.prefab'
UnityStreamNoTags = removeUnityTagAlias(fileToLoad)
ListOfNodes = list()
for data in yaml.load_all(UnityStreamNoTags):
ListOfNodes.append( data )
# Example, print each object's name and type
for node in ListOfNodes:
if 'm_Name' in node[ node.keys()[0] ]:
print( 'Name: ' + node[ node.keys()[0] ]['m_Name'] + ' NodeType: ' + node.keys()[0] )
else:
print( 'Name: ' + 'No Name Attribute' + ' NodeType: ' + node.keys()[0] )
Hope that helps!
-Vlad
PS. To Answer the next issue in making this usable:
You also need to walk the entire project directory and parse all ".meta" files for the "GUID", which is Unity's inter-file reference. So, when you see a reference in a Unity YAML file for something like:
m_Materials:
- {fileID: 2100000, guid: 4b191c3a6f88640689fc5ea3ec5bf3a3, type: 2}
That file is somewhere else. And you can re-cursively open that one to find out any dependencies.
I just ripped through the game project and saved a dictionary of GUID:Filepath Key:Value pairs which I can match against.

Categories