How to dump all the variables in a file? - python

Is there an easy and more or less standard way to dump all the variables into a file, something like stacktrace but with the variables names and values? The ones that are in locals(), globals() and maybe dir().
I can't find an easy way, here's my code for "locals()" which doesn't work because the keys can be of different types:
vars1 = list(filter(lambda x: len(x) > 2 and locals()[x][:2] != "__", locals()))
And without filtering, when trying to dump the variables I get an error:
f.write(json.dumps(locals()))
# =>
TypeError: <filter object at 0x7f9bfd02b710> is not JSON serializable
I think there must be something better that doing it manually.

To start, in your non-working example, you don't exactly filter the keys (which should normally only be strings even if it's not technically required); locals()[x] is the values.
But even if you did filter the keys in some way, you don't generally know that all of the remaining values are JSON serialisable. Therefore, you either need to filter the values to keep only types that can be mapped to JSON, or you need a default serialiser implementation that applies some sensible serialisation to any value. The simplest thing would be to just use the built-in string representation as a fall-back:
json.dumps(locals(), default=repr)
By the way, there's also a more direct and efficient way of dumping JSON to a file (note the difference between dump and dumps):
json.dump(locals(), f, default=repr)

Related

Parsing complicated string-based configuration options

I'm trying to build a Python program that will parse the conversion codes from a Universe Database. These conversion codes are highly dense strings that encode a ton of information, and I'm struggling to get started. The different permutations of the conversion codes likely number in the hundreds of thousands (though I haven't done the math).
I'm familiar with argparse, however I can't come up with a way of handling this kind of parsing with argparse, and my Google-fu hasn't come up with any other solution.
Initially, my lazy work around was to just do a dictionary lookup for the most common conversion codes, but now that we're using this Python program for more data, it's becoming a huge chore to maintain each individual conversion code.
For example, date conversion codes may take forms like:
Date: D[n][*m][s][fmt[[f1, f2, f3, f4, f5]]][E][L], e.g. D2/ or D4-2/RM
Datetime: DT[4|D|4D|T|TS|Z][;timezone], e.g. DTZ or DT4;America/Denver
Datetime ISO: DTI[B][R|W][S][Z][2|1|0][;[timezone|offset]], e.g. DTIBZ2 or DTIR;America/Denver
And there are a bunch of other conversion codes, with equally complicated parameters.
My end goal is to be able to convert Universe's string data into the appropriate Python object and back again, and in order to do that, I need to understand these conversion codes.
If it helps, I do not need to validate these conversion codes. Once they are set in the database, they are validated there.
I recommend that you use the readnamedfields/writenamedfields methods, this will return OCONV data, and when you write back it handles the ICONV.
import u2py
help( u2py.File.readnamedfields)
Help on function readnamedfields in module u2py:
readnamedfields(self, *args)
F.readnamedfields(recordid, fieldnnames, [lockflag]) -> new DynArray object -- read the specified fields by name of a record in the file
fieldnames is a u2py.DynArray object with each of its fields being a name defined in the dictionary file
lockflag is either 0 (default), or [LOCK_EXCLUSIVE or LOCK_SHARED] [ + LOCK_WAIT]
note: if fieldnames contains names that are not defined in the dictionary, these names are replaced by #ID and no exception is raised

Reading a csv persisted list of floats back into a list of floats

I have persisted a list of floats in a csv file and it appears thus (a single row).
"[6.61501123e-04 1.23390303e-04 1.59454121e-03 2.17852772e-02
:
3.02987776e-04 3.83064064e-03 6.90607396e-04 3.30468375e-03
2.78064613e-02]"
Now when converting reading back to a list, I am using the ast literal_eval approach:
probs = [float(p) for p in ast.literal_eval(row['prob_array'])]
And I get this error:
probs = [float(p) for p in ast.literal_eval(row['prob_array'])]
File "/Users/santino/anaconda/lib/python2.7/ast.py", line 49, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "/Users/santino/anaconda/lib/python2.7/ast.py", line 37, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "<unknown>", line 1
[6.61501123e-04 1.23390303e-04 1.59454121e-03 2.17852772e-02
^
SyntaxError: invalid syntax
Not sure how I can instruct ast to read the exponent syntax, or am I wrong in assuming it's the exponent syntax that is causing the exception.
Edit: I used csv.DictWriter to persist into the csv file. Is there a different way I should be persisting?
Edit2:
with open("./input_file.csv","w") as r:
writer = csv.DictWriter(r,fieldnames=["item_id","item_name","prob_array"])
writer.writeheader()
res_list = ...
for i,res in enumerate(res_list):
row_dict = {}
row_dict['item_id'] = id_list[i]
row_dict['prob_array'] = res
row_dict['item_name'] = item_list[i]
writer.writerow(row_dict)
CSV only stores string columns. Using it to store strings, ints, floats, and a few other basic types is fine, as long as you manually convert the objects: whenever you do str(i) to an int, you can get the int back with int(s).
But that isn't true for a list of floats. There's no function you can use to get back the result of str(lst) on an arbitrary list.1 And it isn't true for… whatever you have, which seems to be most likely a numpy array or Pandas Series… either.2
If you can store each float as a separate column, instead of storing a list of them in a single column, that's the easiest answer. But it may not be appropriate.3
So, you just need to pick some other function to use in place of the implicit str, which can be reversed with a simple function call. There are formats designed for persisting data to strings—JSON, XML, even a nested CSV—so that's the first place to look.
Usually JSON should be the first one you look at. As long as it can handle all of your data (and it definitely can here), it's dead simple to use, someone's already thought throw all the annoying edge cases, and there's code to parse it for every platform in the universe.
So, you write the value like this:
row_dict['prob_array'] = json.dumps(res)
And then you can read it back like this:
prob_array = json.loads(row['prob_array'])
If prob_array is actually a numpy arrays or Pandas series or something rather than a list, you'll want to either convert through list, or use numpy or Pandas JSON methods instead of the stdlib module.
The only real problem here is that if you want the CSV to be human-readable/editable, the escaped commas and quotes could be pretty ugly.
In this case, you can define a simpler format that's still easy to write and parse for your specific data, and also more human-readable, like just space-separated floats:
row_dict['prob_array'] = ' '.join(map(str, res))
prob_array = [float(val) for val in row['prob_array'].split()]
1. Sometimes you can use ast.literal_eval, but relying on that is never a good idea, and it isn't working here.
2. The human-readable format used by numpy and Pandas is even less parser-friendly than the one used by Python lists. You could switch to their repr instead of their str, but it still isn't going to ast.literal_eval.
3. For an obvious example, imagine a table with two different arbitrary-length lists…

Generate variable definition code from runtime data structure

Say I have a dict at hand at runtime, is there an easy way to create code that defines the dict. For example it should output the string
"d = {'string_attr1' : 'value1', 'bool_attr1': True}"
Of course it would be possible to write a converter function by hand, which iterates over the key-value pairs and puts together the string. Would still require to handle special cases to decide if values have to be quoted or not, etc.
More generally: Is there a built in way or a library to generate variable declarations from runtime data structures?
Context: I would like to use a list of dicts as input for a code generator. The content of the dicts would be queried from an SQL database. I don't want to tightly couple code generation to the querying of the SQL database, so I think it would be convenient to go with generating a python source file defining a list of dictionaries, which can be used as an input to the code generator.
>>> help(repr)
Help on built-in function repr in module __builtin__:
repr(...)
repr(object) -> string
Return the canonical string representation of the object.
For most object types, eval(repr(object)) == object.

Convert string representation of list of objects back to list in python

I have a list of objects, that has been stringified:
u'[<object: objstuff1, objstuff2>, <object: objstuff1, objstuff2>]'
I want to convert this back into a list:
[<object: objstuff1, objstuff2>, <object: objstuff1, objstuff2>]
I've tried using ast.literal_eval(), but unfortunately, it doesn't seem to work if the elements are objects, and I get a SyntaxError.
Is there any way I can reconvert my string representation of the list of objects back into a list?
You need to have a look at the pickle module to do this.
Basically, dump your objects using pickle.dumps, and load them back using pickle.loads.
ast.literal_eval doesn't work obviously, because there is a lot of information related to the objects (like attributes, and values) which is simply not captured in that string. Also note that you will be able to resurrect only the pickled data, if all you have are those string representations right now, you won't be able to create the objects back from them because of the information loss.

Is it possible to use multiple keys for a single element in a dict?

I am writing my own function for parsing XML text into objects which is can manipulate and render back into XML text. To handle the nesting, I am allowing XML objects to contain other XML objects as elements.
Since I am automatically generating these XML objects, my plan is to just enter them as elements of a dict as they are created. I was planning on generating an attribute called name which I could use as the key, and having the XML object itself be a value assigned to that key.
All this makes sense to me at this point. But now I realize that I would really like to also save an attribute called line_number, which would be the line from the original XML file where I first encountered the object, and there may be some cases where I would want to locate an XML object by line_number, rather than by name.
So these are my questions:
Is it possible to use a dict in such a way that I could find my XML object either by name or by line number? That is, is it possible to have multiple keys assigned to a single value in a dict?
How do I do that?
If this is a bad idea, what is a better way?
Yes, it is possible. No special magic is required:
In [1]: val = object()
In [2]: d = {}
In [3]: d[123] = val
In [4]: d['name'] = val
In [5]: d
Out[5]: {123: <object at 0x23c6d0>, 'name': <object at 0x23c6d0>}
I would, however, use two separate dictionaries, one for indexing by name, and one for indexing by line number. Even if the sets of names and line numbers are completely disjoint, I think this is a cleaner design.
my_dict['key1'] = my_dict['key2'] = SomeObject
should work fine i would think
Since dictionaries can have keys of multiple types, and you are using names (strings only) as one key and numbers (integers only) as another, you can simply make two separate entries point to the same object - one for the number, and one for the string.
dict[0] = dict['key'] = object1

Categories