I'm trying to create a program module that contains data structures (dictionaries) and text strings that describe those data structures. I want to import these (dictionaries and descriptions) into a module that is feeding a GUI interface. One of the displayed lines is the contents contained in the first dictionary with one field that contains all possible values contained in another dictionary. I'm trying to avoid 'hard-coding' this relationship and would like to pass a link to the second dictionary (containing all possible values) to the string describing the first dictionary. An abstracted example would be:
dict1 = {
"1":["dog","cat","fish"],
"2":["alpha","beta","gamma","epsilon"]
}
string="parameter1,parameter2,dict1"
# Silly example starts here
#
string=string.split(",")
print string[2]["2"]
(I'd like to get: ["alpha","beta","gamma","epsilon"]
But of course this doesn't work
Does anyone have a clever solution to this problem?
Generally, this kind of dynamic code execution is a bad idea. it leads to very difficult to read and maintain code. However, if you must, you can use globals for this:
globals()[string[2]]["2"]
A better solution would be to put dict1 into a dictionary in the first place:
dict1 = ...
namespace = {'dict1': dict1}
string = ...
namespace[string[2]]["2"]
Related
I have a nested dictionary, where I have tickers to identifiy certain assets in my dictionary and then for each of these assets I would like to store characteristics in a subdictionary for the asset, creating them in a simple loop like the below:
ticker = ["a","bb","ccc"]
ticker_dict = dict.fromkeys(ticker, {"Var":[]})
for key in ticker_dict:
ticker_dict[key]["Var"] = len(key)
From the above output I would expect, that for each ticker/asset it saves the "Var" variable as the length of its name, meaning the following:
{"a":{"Var":1},
"bb":{"Var":2},
"ccc":{"Var":3}}
But, in my view rather weirdly, the result is this
{"a":{"Var":3},
"bb":{"Var":3},
"ccc":{"Var":3}}
To provide further context, the real process is that I have four assets, for which I would like to store dataframes in their subdictionaries as this makes it easy for me to access them later in loops etc. Somehow though, the data from the last asset is simply copied over all assets, eventhogh I explicitly loop through different keys.
What's going on?
PS: I'm not sure how to explain the problem without the sample code, so I might have missed a similar entry on this site. If so, any hints to it would be appreciated as well of course.
In your code, {"Var":[]} is only evaluated once, causing there to be only 1 inner dictionary shared by all keys. Instead, you can use a dictionary comprehension:
ticker_dict = {key:{"Var":[]} for key in ticker}
and it will work as expected.
Given the following yaml file stored in my_yaml that contains varying sets of dictionary keys and/or class variables (denoted by self._*):
config1.json:
- [[foo, bar], [hello, world]]
config2.json:
- [[foo], [self._hi]]
From the json file, I want to populate a new list of tuples. The items in each tuple are determined by looking up dict keys in this yaml file.
So if I iterate through a dictionary called config1.json, and I have an empty list called config_list, I want to do something like:
config_list.append(tuple[i['foo']['bar],i['hello']['world']])
But if it were config2.json, I want to do something like:
config_list.append(tuple[i['foo'],self._hi])
I can do this in a less dynamic way:
for i in my_yaml['config1.json'][0]:
config_list.append(tuple([ i[my_yaml[asset][0][0]][my_yaml[asset][0][1]],i[my_yaml[asset][1][0]][my_yaml[asset][1][1]]]))
or:
for i in my_yaml['config2.json'][0]:
config_list.append(tuple([ i[my_yaml[asset][0][0]],i[my_yaml[asset][1][0]]]))
Instead I would like to dynamically generate the contents of config_list
Any ideas or alternatives would be greatly appreciated.
I think you are bit confusing things, first of all because you are referring
to a file in "From the json [sic] file" and there is no JSON file mentioned
anywhere in the question. There are mapping keys that look like
filenames for JSON files, so I hope we can assume you mean "From the value
associated with the mapping key that ends in the string .json".
The other confusing thing is that you obfuscate the fact that you want tuples
but load list nested in list nested in lists from you YAML document.
If you want tuples, it is much more clear to specify them in your YAML document:
config1.json:
- !!python/tuple [[foo, bar], [hello, world]]
config2.json:
- !!python/tuple [[foo], [self._hi]]
So you can do:
import sys
import ruamel.yaml
yaml = ruamel.yaml.YAML(typ='unsafe')
with open('my.yaml') as fp:
my_yaml = yaml.load(fp)
for key in my_yaml:
for idx, elem in enumerate(my_yaml[key]):
print('{}[{}] -> {}'.format(key, idx, my_yaml[key][idx]))
which directly gives you the tuples you seem to want instead of lists you need to process:
config1.json[0] -> (['foo', 'bar'], ['hello', 'world'])
config2.json[0] -> (['foo'], ['self._hi'])
In your question you hard code access to the first and only
element of the sequence that are the values for the root level
mapping. This forces you to use the final [0] in your for loop. I
assume you are going to have multiple elements in those sequences, but
for a good question you should leave that out, as it is irrelevant for the
question on how to get the tuples, and thereby only obfuscating things.
Please note that you need to keep control over your input, as using
typ='unsafe' is, you guessed, unsafe. If you cannot guarantee that
use typ='safe' and register and use the tag !tuple.
My question is not really a problem because the program works the way it is right now, however I'm looking for a way to improve the maintainability of it since the system is growing quite fast.
In essence, I have a function (let's call it 'a') that is going to process a XML (in a form of a python dict) and it is responsible for getting a specific array of elements (let's call it 'obj') from this XML. The problem is that we process a lot of XMLs from different sources, therefore, each XML has its own structure and the obj element is located in different places.
The code is currently in the following structure:
function a(self, code, ...):
xml_doc = ... # this is a dict from a xml document that can have one of many different structures
obj = None # Array of objects that I want to get from the XML. It might be processed later but is eventually returned.
#Because the XML can have different structures, the object I want to get can be placed in different (well-known) places depending on the code value.
if code is 'a':
obj = xml_doc["key1"]["key2"]
elif code is 'b':
obj = xml_doc["key3"]
...
# code that processes the obj object
...
elif code is 'b':
obj = xml_doc["key4"]["key5"]["key6"]
... # elif for different codes goes indefinitely
return obj
As you can see (or not - but believe me), it's not very friendly to add new entries to this function and add code to the cases that have to be processed. So I was looking for a way to do it using dictionaries to map the code to the correct XML structure. Something in the direction of the following example:
...
xml_doc = ...
# That would be extremely neat.
code_to_pattern = {
'a': xml_doc["key1"]["key2"],
'b': xml_doc["key3"],
'c': xml_doc["key4"]["key5"]["key6"],
...
}
obj = code_to_pattern[code]
obj = self.process_xml(code, obj) # It will process the array if it has to in another function.
return obj
...
However, the above code doesn't work for obvious reasons. Each entry of the code_to_pattern dictionary is trying to access an element in xml_doc that might not exist, then an exception is raised. I thought in adding the entries as strings and then using the exec() function, so python only interpret the string in the right moment, however I'm not very found of the exec function and I am sure someone can come up with a better idea.
The conditional processing part of the XML is easy to do, however I can't think in a better way to have an easy method to add new entries to the system.
I'd be very pleased if someone can help me with some ideas.
EDIT1: Thank you for your replies, guys. Your both (#jarondl and #holdenweb) gave me workable and working ideas. For the right answer I am going to choose the one that required me the less change in the format I gave you even though I am going to solve it through xPath.
You should first consider alternatives such as xpath to read the xml, depending on how you parsed it.
If you want to proceed with your dictionary, you can have non evaluated code with lambda - no need for exec:
code_to_pattern = {
'a': lambda doc: doc["key1"]["key2"],
'b': lambda doc: doc["key3"],
'c': lambda doc: doc["key4"]["key5"]["key6"],
...
}
obj = code_to_pattern[code](xml_doc)
Essentially you are looking for a data-driven solution. You have the essence of such a solution, but rather than mapping the codes to elements of the xml_doc table, it might be easier to map the codes to the required keys. In other words, look at doing:
xml_doc = ...
code_to_pattern = {
'a': "key1", "key2",
'b': "key3",
'c': "key4", "key5", "key6",
...
}
The problem there is that you would then need to adapt to the variable number of keys that the different objects mapped to, so a simple
obj = code_to_pattern[code]
wouldn't cut it. Be aware, though, that dicts can take tuples as arguments, so it's possible (though you'd know better than I) that rather than using successive indices like xml_doc["key4"]["key5"]["key6"] you might be able to use tuple indices like xml_doc["key4", "key5", "key6"]. This may or may not help you with your problem.
Finally, you might find it helpful to learn about the collections.defaultdict object, since this automates the creation of new entries rather than forcing you to test for their presence and create them if absent. This could be helpful even in tuple keys won't cut it for you.
I have a list with dictionaries in it as below:
wordsList = [
{'Definition': 'Allows you to store data with ease' , 'Word': 'Database'},
{'Definition': 'This can either be static or dynamic' , 'Word': 'IP'},
]
Essentially, what I want to do is:
Be able to print each separate definition
Be able to print each separate word
And so my question is: How do I do this? I only know how to do this with regular lists/dictionaries, not what I have here.
for word_def in wordsList:
print word_def.get("Word")
print word_def.get("Definition")
print
output
Database
Allows you to store data with ease
IP
This can either be static or dynamic
Essentially, these are "regular" lists/dictionaries.
You must understand, that a list in Python can contain any object, also dicts. Thus, neither the list nor the contained dicts become anyhow "irregular".
You can access anything inside your list/dict like that:
word_def[index][name]
With appropriate values for index/name.
You can also iterate over the list (as shown by SSNR) and thus grab any of the dictionaries contained and deal with them like ordinary dicts.
You also can get hold of one of the dicts this way:
one_dict = word_def[index]
Than just access the contents:
value = one_dict[name]
I have a dict that has unix epoch timestamps for keys, like so:
lookup_dict = {
1357899: {} #some dict of data
1357910: {} #some other dict of data
}
Except, you know, millions and millions and millions of entries. I'd like to subset this dict, over and over again. Ideally, I'd love to be able to write something like I can in R, like:
lookup_value = 1357900
dict_subset = lookup_dict[key >= lookup_value]
# dict_subset now contains {1357910: {}}
But I confess, I can't find any actual proof that this is something Python can do without having, one way or the other, to iterate over every row. If I understand Python correctly (and I might not), key lookup of the form key in dict uses binary search, and is thus very fast; any way to do a binary search, on dict keys?
To do this without iterating, you're going to need the keys in sorted order. Then you just need to do a binary search for the first one >= lookup_value, instead of checking each one for >= lookup_value.
If you're willing to use a third-party library, there are plenty out there. The first two that spring to mind are bintrees (which uses a red-black tree, like C++, Java, etc.) and blist (which uses a B+Tree). For example, with bintrees, it's as simple as this:
dict_subset = lookup_dict[lookup_value:]
And this will be as efficient as you'd hope—basically, it adds a single O(log N) search on top of whatever the cost of using that subset. (Of course usually what you want to do with that subset is iterate the whole thing, which ends up being O(N) anyway… but maybe you're doing something different, or maybe the subset is only 10 keys out of 1000000.)
Of course there is a tradeoff. Random access to a tree-based mapping is O(log N) instead of "usually O(1)". Also, your keys obviously need to be fully ordered, instead of hashable (and that's a lot harder to detect automatically and raise nice error messages on).
If you want to build this yourself, you can. You don't even necessarily need a tree; just a sorted list of keys alongside a dict. You can maintain the list with the bisect module in the stdlib, as JonClements suggested. You may want to wrap up bisect to make a sorted list object—or, better, get one of the recipes on ActiveState or PyPI to do it for you. You can then wrap the sorted list and the dict together into a single object, so you don't accidentally update one without updating the other. And then you can extend the interface to be as nice as bintrees, if you want.
Using the following code will work out
some_time_to_filter_for = # blah unix time
# Create a new sub-dictionary
sub_dict = {key: val for key, val in lookup_dict.items()
if key >= some_time_to_filter_for}
Basically we just iterate through all the keys in your dictionary and given a time to filter out for we take all the keys that are greater than or equal to that value and place them into our new dictionary