I have a Python dictionary that uses integers as keys
d[7] = ...
to reference custom objects:
c = Car()
d[7] = c
However, each of these custom objects also has a string identifier (from a third party). I want to be able to access the objects using both an integer or a string. Is the preferred way to use both keys in the same dictionary?
d[7] = c
d["uAbdskmakclsa"] = c
Or should I split it up into two dictionaries? Or is there a better way?
It really depends on what you're doing.
If you get the different kinds of keys from different sources, so you always know which kind you're looking up, it makes more sense, conceptually, to use separate dictionaries.
On the other hand, if you need to be able to handle keys that could be either kind, it's probably simpler to use a single dictionary. Otherwise, you need to write code like that uses type-switching, or tries one dict and then tries the other on KeyError, or something else ugly.
(If you're worried about efficiency, it really won't make much difference either way. It's only a very, very tiny bit faster to look things up in a 5000-key dictionary as in a 10000-key dictionary, and it only costs a very small of extra memory to keep two 5000-key dictionaries than one 10000-key dictionary. So, don't worry about efficiency; do whichever makes sense. I don't have any reason to believe you are worried about efficiency, but a lot of people who ask questions like this seem to be, so I'm just covering the bases.)
I would use two dicts. One mapping 3rd party keys to integer keys and another one mapping integer keys to objects. Depending on which one you use more frequently you can switch that of course.
It's a fairly specific situation, I doubt there is any 'official' preference on what to do in this situation. I do however, feel that having keys of multiple types is 'dirty', although I can't really think of a reason why it is.
But since you state that the string keys come from a third party, that alone might be a good reason to split off to another dictionary. I would split as well. You never know what the future might bring and this method is easier to maintain. Also less error prone if you think of type safety.
For setting values in your dictionaries you can then use helper methods. This will make adding easier and prevent you from forgetting to add/update to one of the dictionaries.
Related
I've been noodling around with Python for quite a while in my spare time, and while I have sort of understood and definitely used dictionaries, they've always seemed somewhat foreign to me, like I wasn't quite getting them. Maybe it's the name "dictionary" throwing me off, of the fact I started way back when with Basic (I know) which had arrays, but they were quite different.
Can I simply think of a dictionary in Python as nothing more or less than a two-column table where we name the contents of the first column "keys" and the contents of the second column "values"? Is this conceptualization extremely accurate and useful, or problematic?
If the former, I think I can finally swallow the concept in such a way to finally make it more natural to my thinking.
The analogy of a 2-column table might work to some degree but it doesn't cover some important aspects of how and why dictionaries are used in practice.
The comment by #Sayse is more conceptually useful. Think of the dictionary as a physical language dictionary, where the key is the word itself and the value is the word's definition. Two items in the dictionary cannot have the same key but could have the same value. In the analogy of a language dictionary, if two words had the same spelling then they are the same word. However, synonyms can exist where two words which are spelled differently could have the same definition.
The table analogy also doesn't cover the behaviour of a dictionary where the order is not preserved or reliable. In a dictionary, the order does not matter and the item is retrieved by its key. Perhaps another useful analogy is to think of the key as a person's name and the value is the person themselves (and maybe lots of information about them as well). The people are identified by their names but they may be in any given order or location...it doesn't matter, since we know their names we can identify them.
While the order of items in a dictionary may not be preserved, a dictionary has the advantage of having very fast retrieval for a single item. This becomes especially significant as the number of items to lookup grows larger (on the order of thousands or more).
Finally, I would also add that dictionaries can often improve the readability of code. For example, if you wanted create a lookup table of HTML color codes, an API using a dictionary of HTML color names is much more readable and usable than using a list and relying on documentation of indices to retrieve the values.
So if it helps you to conceptualize a dictionary as a table of 2 columns, that is fine, as long as you also keep in mind the rules for their use and the scenarios where they provide some benefit:
Duplicate keys are not allowed
The order of keys is not preserved and therefore not reliable
Retrieving a single item is fast (esp. for many items)
Improved readability of lookup tables
I'm going to store on the order of 10,000 securities X 300 date pairs X 2 Types in some caching mechanism.
I'm assuming I'm going to use a dictionary.
Question Part 1:
Which is more efficient or Faster? Assume that I'll be generally looking up knowing a list of security IDs and the 2 dates plus type. If there is a big efficiency gain by tweaking my lookup, I'm happy to do that. Also assume I can be wasteful of memory to an extent.
Method 1: store and look up using keys that look like strings "securityID_date1_date2_type"
Method 2: store and look up using keys that look like tuples (securityID, date1, date2, type)
Method 3: store and look up using nested dictionaries of some variation mentioned in methods 1 and 2
Question Part 2:
Is there an easy and better way to do this?
It's going to depend a lot on your use case. Is lookup the only activity or will you do other things, e.g:
Iterate all keys/values? For simplicity, you wouldn't want to nest dictionaries if iteration is relatively common.
What about iterating a subset of keys with a given securityID, type, etc.? Nested dictionaries (each keyed on one or more components of your key) would be beneficial if you needed to iterate "keys" with one component having a given value.
What about if you need to iterate based on a different subset of the key components? If that's the case, plain dict is probably not the best idea; you may want relational database, either the built-in sqlite3 module or a third party module for a more "production grade" DBMS.
Aside from that, it matters quite a bit how you construct and use keys. Strings cache their hash code (and can be interned for even faster comparisons), so if you reuse a string for lookup having stored it elsewhere, it's going to be fast. But tuples are usually safer (strings constructed from multiple pieces can accidentally produce the same string from different keys if the separation between components in the string isn't well maintained). And you can easily recover the original components from a tuple, where a string would need to be parsed to recover the values. Nested dicts aren't likely to win (and require some finesse with methods like setdefault to populate properly) in a simple contest of lookup speed, so it's only when iterating a subset of the data for a single component of the key that they're likely to be beneficial.
If you want to benchmark, I'd suggest populating a dict with sample data, then use the timeit module (or ipython's %timeit magic) to test something approximating your use case. Just make sure it's a fair test, e.g. don't lookup the same key each time (using itertools.cycle to repeat a few hundred keys would work better) since dict optimizes for that scenario, and make sure the key is constructed each time, not just reused (unless reuse would be common in the real scenario) so string's caching of hash codes doesn't interfere.
I feel like this must have been asked before (probably more than once), so potential apologies in advance, but I can't find it anywhere (here or through Google).
Anyway, when explaining the difference between lists and tuples in Python, the second thing mentioned, after tuples being immutable, is that lists are best for homogeneous data and tuples are best for heterogeneous data. But nobody seems to think to explain why that's the case. So why is that the case?
First of all, that guideline is only sort of true. You're free to use tuples for homogenous data and lists for heterogenous data, and there may be cases where that's a fine thing to do. One important case is if you need the collection to the hashable so you can use it as a dictionary key; in this case you must use a tuple, even if all the elements are homogenous in nature.
Also note that the homogenous/heterogenous distinction is really about the semantics of the data, not just the types. A sequence of a name, occupation, and address would probably be considered heterogenous, even though all three might be represented as strings. So it's more important to think about what you're going to do with the data (i.e., will you actually treat the elements the same) than about what types they are.
That said, I think one reason lists are preferred for homogenous data is because they're mutable. If you have a list of several things of the same kind, it may make sense to add another one to the list, or take one away; when you do that, you're still left with a list of things of the same kind.
By contrast, if you have a collection of things of heterogenous kinds, it's usually because you have a fixed structure or "schema" to them (e.g., the first one is an ID number, the second is a name, the third is an address, or whatever). In this case, it doesn't make sense to add or remove an element from the collection, because the collection is an integrated whole with specified roles for each element. You can't add an element without changing your whole schema for what the elements represent.
In short, changes in size are more natural for homogenous collections than for heterogenous collections, so mutable types are more natural for homogenous collections.
The difference is philosophical more than anything.
A tuple is meant to be a shorthand for fixed and predetermined data meanings. For example:
person = ("John", "Doe")
So, this example is a person, who has a first name and last name. The fixed nature of this is the critical factor. Not the data type. Both "John" and "Doe" are strings, but that is not the point. The benefit of this is unchangeable nature:
You are never surprised to find a value missing. person always has two values. Always.
You are never surprised to find something added. Unlike a dictionary, another bit of code can't "add a new key" or attribute
This predictability is called immutability
It is just a fancy way of saying it has a fixed structure.
One of the direct benefits is that it can be used as a dictionary key. So:
some_dict = {person: "blah blah"}
works. But:
da_list = ["Larry", "Smith"]
some_dict = {da_list: "blah blah"}
does not work.
Don't let the fact that element reference is similar (person[0] vs da_list[0]) throw you off. person[0] is a first name. da_list[0] is simply the first item in a list at this moment in time.
It's not a rule, it's just a tradition.
In many languages, lists must be homogenous and tuples must be fixed-length. This is true of C++, C#, Haskell, Rust, etc. Tuples are used as anonymous structures. It is the same way in mathematics.
Python's type system, however, does not allow you to make these distinctions: you can make tuples of dynamic length, and you can make lists with heterogeneous data. So you are allowed to do whatever you want with lists and tuples in Python, it just might be surprising to other people reading your code. This is especially true if the people reading your code have a background in mathematics or are more familiar with other languages.
Lists are often used for iterating over them, and performing the same operation to every element in the list. Lots of list operations are based on that. For that reason, it's best to have every element be the same type, lest you get an exception because an item was the wrong type.
Tuples are more structured data; they're immutable so if you handle them correctly you won't run into type errors. That's the data structure you'd use if you specifically want to combine multiple types (like an on-the-fly struct).
I am developing AI to perform MDP, I am getting states(just integers in this case) and assigning it a value, and I am going to be doing this a lot. So I am looking for a data structure that can hold(no need for delete) that information and will have a very fast get/update function. Is there something faster than the regular dictionary? I am looking for anything really so native python, open sourced, I just need fast getting.
Using a Python dictionary is the way to go.
You're saying that all your keys are integers? In that case, it might be faster to use a list and just treat the list indices as the key values. However, you'd have to make sure that you never delete or add list items; just start with as many as you think you'll need, setting them all equal to None, as shown:
mylist = [None for i in xrange(totalitems)]
Then, when you need to "add" an item, just set the corresponding value.
Note that this probably won't actually gain you much in terms of actual efficiency, and it might be more confusing than just using a dictionary.
For 10,000 items, it turns out (on my machine, with my particular test case) that accessing each one and assigning it to a variable takes about 334.8 seconds with a list and 565 seconds with a dictionary.
If you want a rapid prototype, use python. And don't worry about speed.
If you want to write fast scientific code (and you can't build on fast native libraries, like LAPACK for linear algebra stuff) write it in C, C++ (maybe only to call from Python). If fast instead of ultra-fast is enough, you can also use Java or Scala.
So I currently have a 2d list of objects that define the map of a game where each object represents a tile on that map. As I was repurposing the code for something else, I wondered if it would make more sense to use a dictionary to store the map data or to continue using a list. With a list, the indices represent the x and y of the map, whereas in a dictionary, a (x,y) tuple would be the keys for the dictionary.
The reason I ask is because the map changing is a rare event, so the data is fairly static, and as far as i know, the fairly constant lookups will be faster in dictionaries. It should also simplify looping through the map to draw it. Mostly I think using dictionaries will simplify accessing the data, though I'm not sure that will be the case in all cases.
Are these benefits worth the additional memory that I assume the dictionary will take up? or am I even right about the benefits being benefits?
EDIT
I know that the current method works, its was moreso to whether or not it would make sense to switch in order to have cleaner looking code and to find any potential drawbacks.
Stuff like looping through the array would go from something like
for i in range(size[0]):
for e in range(size[1]):
thing.blit(....using i and e)
to
for i, e in dict.items():
i.blit(....using i and e)
or looking up a dict item would be
def get(x, y):
if (x in range(size[0])) and (y in range(size[1])):
return self.map[x][y].tile
to
def get(item):
return self.dict.get(item)
its not much, but its somewhat cleaner, and if its not any slower and there are no other drawbacks i see no reason not to.
I would be wary of premature optimization.
Does your current approach have unacceptable performance? Does the data structure you're using make it harder to reason about or write your code?
If there isn't a specific problem you need to solve that can't be addressed with your current architecture, I would be wary about changing it.
This is a good answer to reference about the speed and memory usage of python lists vs dictionarys: https://stackoverflow.com/a/513906/1506904
Until you get to an incredibly large data set it is most likely that your current method, if it is working well for you, will be perfectly suitable.
I'm not sure that you'll get "the right" answer for this, but when I created the *Game of Life *in Python I used a dict. Realistically there should not be a substantial difference between the lookup cost of multi-dimensional lists and the lookup cost in a dict (both are O(1)), but if you're using a dict then you don't need to worry about instantiating the entire game-board. In chess, this means that you are only creating 32 pieces instead of 64 squares and 32 pieces. In the game of Go, on the other hand you can create only 1 object instead of 361 list-cells.
That said, in the case of the dict you will need to instantiate the tuples. If you can cache those (or only iterate the dict's keys) then maybe you will get the best of all worlds.