Here is the code:
EDIT**** Please no more "it's not possible with unordered dictionary replies". I pretty much already know that. I made this post on the off-chance that it MIGHT be possible or someone has a workable idea.
#position equals some set of two dimensional coords
for name in self.regions["regions"]: # I want to start the iteration with 'last_region'
# I don't want to run these next two lines over every dictionary key each time since the likelihood is that the new
# position is still within the last region that was matched.
rect = (self.regions["regions"][name]["pos1"], self.regions["regions"][name]["pos2"])
if all(self.point_inside(rect, position)):
# record the name of this region in variable- 'last_region' so I can start with it on the next search...
# other code I want to run when I get a match
return
return # if code gets here, the points were not inside any of the named regions
Hopefully the comments in the code explain my situation well enough. Lets say I was last inside region "delta" (i.e., the key name is delta, the value will be sets of coordinates defining it's boundaries) and I have 500 more regions. The first time I find myself in region delta, the code may not have discovered this until, let's say (hypothetically), the 389th iteration... so it made 388 all(self.point_inside(rect, position)) calculations before it found that out. Since I will probably still be in delta the next time it runs (but I must verify that each time the code runs), it would be helpful if the key "delta" was the first one that got checked by the for loop.
This particular code can be running many times a second for many different users.. so speed is critical. The design is such that very often, the user will not be in a region and all 500 records may need to be cycled through and will exit the loop with no matches, but I would like to speed the overall program up by speeding it up for those that are presently in one of the regions.
I don't want an additional overhead of sorting the dictionary in any particular order, etc.. I just want it to start looking with the last one that it successfully matched all(self.point_inside(rect, position))
Maybe this will help a bit more.. The following is the dictionary I am using (only the first record shown) so you can see the structure I coded to above... and yes, despite the name "rect" in the code, it actually checks for the point in a cubical region.
{"regions": {"shop": {"flgs": {"breakprot": true, "placeprot": true}, "dim": 0, "placeplayers": {"4f953255-6775-4dc6-a612-fb4230588eff": "SurestTexas00"}, "breakplayers": {"4f953255-6775-4dc6-a612-fb4230588eff": "SurestTexas00"}, "protected": true, "banplayers": {}, "pos1": [5120025, 60, 5120208], "pos2": [5120062, 73, 5120257], "ownerUuid": "4f953255-6775-4dc6-a612-fb4230588eff", "accessplayers": {"4f953255-6775-4dc6-a612-fb4230588eff": "SurestTexas00"}}, more, more, more...}
You may try to implement some caching mechanism within a custom subclass of dict.
You could set a self._cache = None in __init__, add a method like set_cache(self, key) to set the cache and finally overriding __iter__ to yield self._cache before calling the default __iter__.
However, that can be kinda cumbersome, if you consider this stackoverflow answer and also this one.
For what it's written in your question, I would try, instead, to implement this caching logic in your code.
def _match_region(self, name, position):
rect = (self.regions["regions"][name]["pos1"], self.regions["regions"][name]["pos2"])
return all(self.point_inside(rect, position))
if self.last_region and self._match_region(self.last_region, position):
self.code_to_run_when_match(position)
return
for name in self.regions["regions"]:
if self._match_region(name, position):
self.last_region = name
self.code_to_run_when_match(position)
return
return # if code gets here, the points were not inside any of the named regions
That is right, dictionary is an unordered type. Therefore OrderedDict won't help you much for what you want to do.
You could store the last region into your class. Then, on the next call, check if last region is still good before check the entire dictionary ?
Instead of a for-loop, you could use iterators directly. Here's an example function that does something similar to what you want, using iterators:
def iterate(what, iterator):
iterator = iterator or what.iteritems()
try:
while True:
k,v = iterator.next()
print "Trying k = ", k
if v > 100:
return iterator
except StopIteration:
return None
Instead of storing the name of the region in last_region, you would store the result of this function, which is like a "pointer" to where you left off. Then, you can use the function like this (shown as if run in the Python interactive interpreter, including the output):
>>> x = {'a':12, 'b': 42, 'c':182, 'd': 9, 'e':12}
>>> last_region = None
>>> last_region = iterate(x, last_region)
Trying k = a
Trying k = c
>>> last_region = iterate(x, last_region)
Trying k = b
Trying k = e
Trying k = d
Thus, you can easily resume from where you left off, but there's one additional caveat to be aware of:
>>> last_region = iterate(x, last_region)
Trying k = a
Trying k = c
>>> x['z'] = 45
>>> last_region = iterate(x, last_region)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in iterate
RuntimeError: dictionary changed size during iteration
As you can see, it'll raise an error if you ever add a new key. So, if you use this method, you'll need to be sure to set last_region = None any time you add a new region to the dictionary.
TigerhawkT3 is right. Dicts are unordered in a sense that there is no guaranteed order or keys in the given dictionary. You can even have different order of keys if you iterate over same dictionary. If you want order you need to use either OrderedDict or just plain list. You can convert your dict to list and sort it the way it represents the order you need.
Without knowing what your objects are and whether self in the example is a user instance or an environment instance it is hard to come up with a solution. But if self in the example is the environment, its Class could have a class attribute that is a dictionary of all current users and their last known position, if the user instance is hashable.
Something like this
class Thing(object):
__user_regions = {}
def where_ami(self, user):
try:
region = self.__user_regions[user]
print 'AHA!! I know where you are!!'
except KeyError:
# find region
print 'Hmmmm. let me think about that'
region = 'foo'
self.__user_regions[user] = region
class User(object):
def __init__(self, position):
self.pos = position
thing = Thing()
thing2 = Thing()
u = User((1,2))
v = User((3,4))
Now you can try to retrieve the user's region from the class attribute. If there is more than one Thing they would share that class attribute.
>>>
>>> thing._Thing__user_regions
{}
>>> thing2._Thing__user_regions
{}
>>>
>>> thing.where_ami(u)
Hmmmm. let me think about that
>>>
>>> thing._Thing__user_regions
{<__main__.User object at 0x0433E2B0>: 'foo'}
>>> thing2._Thing__user_regions
{<__main__.User object at 0x0433E2B0>: 'foo'}
>>>
>>> thing2.where_ami(v)
Hmmmm. let me think about that
>>>
>>> thing._Thing__user_regions
{<__main__.User object at 0x0433EA90>: 'foo', <__main__.User object at 0x0433E2B0>: 'foo'}
>>> thing2._Thing__user_regions
{<__main__.User object at 0x0433EA90>: 'foo', <__main__.User object at 0x0433E2B0>: 'foo'}
>>>
>>> thing.where_ami(u)
AHA!! I know where you are!!
>>>
You say that you "don't want an additional overhead of sorting the dictionary in any particular order". What overhead? Presumably OrderedDict uses some additional data structure internally to keep track of the order of keys. But unless you know that this is costing you too much memory, then OrderedDict is your solution. That means profiling your code and making sure that an OrderedDict is the source of your bottleneck.
If you want the cleanest code, just use an OrderedDict. It has a move_to_back method which takes a key and puts it either in the front of the dictionary, or at the end. For example:
from collections import OrderedDict
animals = OrderedDict([('cat', 1), ('dog', 2), ('turtle', 3), ('lizard', 4)])
def check_if_turtle(animals):
for animal in animals:
print('Checking %s...' % animal)
if animal == 'turtle':
animals.move_to_end('turtle', last=False)
return True
else:
return False
Our check_if_turtle function looks through an OrderedDict for a turtle key. If it doesn't find it, it returns False. If it does find it, it returns True, but not after moving the turtle key to the beginning of the OrderedDict.
Let's try it. On the first run:
>>> check_if_turtle(animals)
Checking cat...
Checking dog...
Checking turtle...
True
we see that it checked all of the keys up to turtle. Now, if we run it again:
>>> check_if_turtle(animals)
Checking turtle...
True
we see that it checked the turtle key first.
Related
I am given an assignment when I am supposed to define a function that returns the second element of a tuple if the first element of a tuple matches with the argument of a function.
Specifically, let's say that I have a list of student registration numbers that goes by:
particulars = (("S12345", "John"), ("S23456", "Max"), ("S34567", "Mary"))
And I have defined a function that is supposed to take in the argument of reg_num, such as "S12345", and return the name of the student in this case, "John". If the number does not match at all, I need to print "Not found" as a message. In essence, I understand that I need to sort through the larger tuple, and compare the first element [0] of each smaller tuple, then return the [1] entry of each smaller tuple. Here's what I have in mind:
def get_student_name(reg_num, particulars):
for i in records:
if reg_num == particulars[::1][0]:
return particulars[i][1]
else:
print("Not found")
I know I'm wrong, but I can't tell why. I'm not well acquainted with how to sort through a tuple. Can anyone offer some advice, especially in syntax? Thank you very much!
When you write for i in particulars, in each iteration i is an item of the collection and not an index. As such you cannot do particulars[i] (and there is no need - as you already have the item). In addition, remove the else statement so to not print for every item that doesn't match condition:
def get_student_name(reg_num, particulars):
for i in particulars:
if reg_num == i[0]:
return i[1]
print("Not found")
If you would want to iterate using indices you could do (but less nice):
for i in range(len(particulars)):
if reg_num == particulars[i][0]:
return particulars[i][1]
Another approach, provided to help learn new tricks for manipulating python data structures:
You can turn you tuple of tuples:
particulars = (("S12345", "John"), ("S23456", "Max"), ("S34567", "Mary"))
into a dictionary:
>>> pdict = dict(particulars)
>>> pdict
{'S12345': 'John', 'S23456': 'Max', 'S34567': 'Mary'}
You can look up the value by supplying the key:
>>> r = 'S23456'
>>> dict(pdict)[r]
'Max'
The function:
def get_student_name(reg, s_data):
try:
return dict(s_data)[reg]
except:
return "Not Found"
The use of try ... except will catch errors and just return Not Found in the case where the reg is not in the tuple in the first place. It will also catch of the supplied tuple is not a series of PAIRS, and thus cannot be converted the way you expect.
You can read more about exceptions: the basics and the docs to learn how to respond differently to different types of error.
for loops in python
Gilad Green already answered your question with a way to fix your code and a quick explanation on for loops.
Here are five loops that do more or less the same thing; I invite you to try them out.
particulars = (("S12345", "John"), ("S23456", "Max"), ("S34567", "Mary"))
for t in particulars:
print("{} {}".format(t[0], t[1]))
for i in range(len(particulars)):
print("{}: {} {}".format(i, particulars[i][0], particulars[i][1]))
for i, t in enumerate(particulars):
print("{}: {} {}".format(i, t[0], t[1]))
for reg_value, student_name in particulars:
print("{} {}".format(reg_value, student_name))
for i, (reg_value, student_name) in enumerate(particulars):
print("{}: {} {}".format(i, reg_value, student_name))
Using dictionaries instead of lists
Most importantly, I would like to add that using an unsorted list to store your student records is not the most efficient way.
If you sort the list and maintain it in sorted order, then you can use binary search to search for reg_num much faster than browsing the list one item at a time. Think of this: when you need to look up a word in a dictionary, do you read all words one by one, starting by "aah", "aback", "abaft", "abandon", etc.? No; first, you open the dictionary somewhere in the middle; you compare the words on that page with your word; then you open it again to another page; compare again; every time you do that, the number of candidate pages diminishes greatly, and so you can find your word among 300,000 other words in a very small time.
Instead of using a sorted list with binary search, you could use another data structure, for instance a binary search tree or a hash table.
But, wait! Python already does that very easily!
There is a data structure in python called a dictionary. See the documentation on dictionaries. This structure is perfectly adapted to most situations where you have keys associated to values. Here the key is the reg_number, and the value is the student name.
You can define a dictionary directly:
particulars = {'S12345': 'John', 'S23456': 'Max', 'S34567': 'Mary'}
Or you can convert your list of tuples to a dictionary:
particulars = (("S12345", "John"), ("S23456", "Max"), ("S34567", "Mary"))
particulars_as_dict = dict(particulars)
Then you can check if an reg_number is in the dictionary, with they keyword in; you can return the student name using square brackets or with the method get:
>>> particulars = {'S12345': 'John', 'S23456': 'Max', 'S34567': 'Mary'}
>>> 'S23456' in particulars
True
>>> 'S98765' in particulars
False
>>>
>>> particulars['S23456']
'Max'
>>> particulars.get('S23456')
'Max'
>>> particulars.get('S23456', 'not found')
'Max'
>>>
>>> particulars['S98765']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'S98765'
>>> particulars.get('S98765')
None
>>> particulars.get('S98765', 'not found')
'not found'
The addition of collections.defaultdict in Python 2.5 greatly reduced the need for dict's setdefault method. This question is for our collective education:
What is setdefault still useful for, today in Python 2.6/2.7?
What popular use cases of setdefault were superseded with collections.defaultdict?
You could say defaultdict is useful for settings defaults before filling the dict and setdefault is useful for setting defaults while or after filling the dict.
Probably the most common use case: Grouping items (in unsorted data, else use itertools.groupby)
# really verbose
new = {}
for (key, value) in data:
if key in new:
new[key].append( value )
else:
new[key] = [value]
# easy with setdefault
new = {}
for (key, value) in data:
group = new.setdefault(key, []) # key might exist already
group.append( value )
# even simpler with defaultdict
from collections import defaultdict
new = defaultdict(list)
for (key, value) in data:
new[key].append( value ) # all keys have a default already
Sometimes you want to make sure that specific keys exist after creating a dict. defaultdict doesn't work in this case, because it only creates keys on explicit access. Think you use something HTTP-ish with many headers -- some are optional, but you want defaults for them:
headers = parse_headers( msg ) # parse the message, get a dict
# now add all the optional headers
for headername, defaultvalue in optional_headers:
headers.setdefault( headername, defaultvalue )
I commonly use setdefault for keyword argument dicts, such as in this function:
def notify(self, level, *pargs, **kwargs):
kwargs.setdefault("persist", level >= DANGER)
self.__defcon.set(level, **kwargs)
try:
kwargs.setdefault("name", self.client.player_entity().name)
except pytibia.PlayerEntityNotFound:
pass
return _notify(level, *pargs, **kwargs)
It's great for tweaking arguments in wrappers around functions that take keyword arguments.
defaultdict is great when the default value is static, like a new list, but not so much if it's dynamic.
For example, I need a dictionary to map strings to unique ints. defaultdict(int) will always use 0 for the default value. Likewise, defaultdict(intGen()) always produces 1.
Instead, I used a regular dict:
nextID = intGen()
myDict = {}
for lots of complicated stuff:
#stuff that generates unpredictable, possibly already seen str
strID = myDict.setdefault(myStr, nextID())
Note that dict.get(key, nextID()) is insufficient because I need to be able to refer to these values later as well.
intGen is a tiny class I build that automatically increments an int and returns its value:
class intGen:
def __init__(self):
self.i = 0
def __call__(self):
self.i += 1
return self.i
If someone has a way to do this with defaultdict I'd love to see it.
As most answers state setdefault or defaultdict would let you set a default value when a key doesn't exist. However, I would like to point out a small caveat with regard to the use cases of setdefault. When the Python interpreter executes setdefaultit will always evaluate the second argument to the function even if the key exists in the dictionary. For example:
In: d = {1:5, 2:6}
In: d
Out: {1: 5, 2: 6}
In: d.setdefault(2, 0)
Out: 6
In: d.setdefault(2, print('test'))
test
Out: 6
As you can see, print was also executed even though 2 already existed in the dictionary. This becomes particularly important if you are planning to use setdefault for example for an optimization like memoization. If you add a recursive function call as the second argument to setdefault, you wouldn't get any performance out of it as Python would always be calling the function recursively.
Since memoization was mentioned, a better alternative is to use functools.lru_cache decorator if you consider enhancing a function with memoization. lru_cache handles the caching requirements for a recursive function better.
I use setdefault() when I want a default value in an OrderedDict. There isn't a standard Python collection that does both, but there are ways to implement such a collection.
As Muhammad said, there are situations in which you only sometimes wish to set a default value. A great example of this is a data structure which is first populated, then queried.
Consider a trie. When adding a word, if a subnode is needed but not present, it must be created to extend the trie. When querying for the presence of a word, a missing subnode indicates that the word is not present and it should not be created.
A defaultdict cannot do this. Instead, a regular dict with the get and setdefault methods must be used.
Theoretically speaking, setdefault would still be handy if you sometimes want to set a default and sometimes not. In real life, I haven't come across such a use case.
However, an interesting use case comes up from the standard library (Python 2.6, _threadinglocal.py):
>>> mydata = local()
>>> mydata.__dict__
{'number': 42}
>>> mydata.__dict__.setdefault('widgets', [])
[]
>>> mydata.widgets
[]
I would say that using __dict__.setdefault is a pretty useful case.
Edit: As it happens, this is the only example in the standard library and it is in a comment. So may be it is not enough of a case to justify the existence of setdefault. Still, here is an explanation:
Objects store their attributes in the __dict__ attribute. As it happens, the __dict__ attribute is writeable at any time after the object creation. It is also a dictionary not a defaultdict. It is not sensible for objects in the general case to have __dict__ as a defaultdict because that would make each object having all legal identifiers as attributes. So I can't foresee any change to Python objects getting rid of __dict__.setdefault, apart from deleting it altogether if it was deemed not useful.
I rewrote the accepted answer and facile it for the newbies.
#break it down and understand it intuitively.
new = {}
for (key, value) in data:
if key not in new:
new[key] = [] # this is core of setdefault equals to new.setdefault(key, [])
new[key].append(value)
else:
new[key].append(value)
# easy with setdefault
new = {}
for (key, value) in data:
group = new.setdefault(key, []) # it is new[key] = []
group.append(value)
# even simpler with defaultdict
new = defaultdict(list)
for (key, value) in data:
new[key].append(value) # all keys have a default value of empty list []
Additionally,I categorized the methods as reference:
dict_methods_11 = {
'views':['keys', 'values', 'items'],
'add':['update','setdefault'],
'remove':['pop', 'popitem','clear'],
'retrieve':['get',],
'copy':['copy','fromkeys'],}
One drawback of defaultdict over dict (dict.setdefault) is that a defaultdict object creates a new item EVERYTIME non existing key is given (eg with ==, print). Also the defaultdict class is generally way less common then the dict class, its more difficult to serialize it IME.
P.S. IMO functions|methods not meant to mutate an object, should not mutate an object.
Here are some examples of setdefault to show its usefulness:
"""
d = {}
# To add a key->value pair, do the following:
d.setdefault(key, []).append(value)
# To retrieve a list of the values for a key
list_of_values = d[key]
# To remove a key->value pair is still easy, if
# you don't mind leaving empty lists behind when
# the last value for a given key is removed:
d[key].remove(value)
# Despite the empty lists, it's still possible to
# test for the existance of values easily:
if d.has_key(key) and d[key]:
pass # d has some values for key
# Note: Each value can exist multiple times!
"""
e = {}
print e
e.setdefault('Cars', []).append('Toyota')
print e
e.setdefault('Motorcycles', []).append('Yamaha')
print e
e.setdefault('Airplanes', []).append('Boeing')
print e
e.setdefault('Cars', []).append('Honda')
print e
e.setdefault('Cars', []).append('BMW')
print e
e.setdefault('Cars', []).append('Toyota')
print e
# NOTE: now e['Cars'] == ['Toyota', 'Honda', 'BMW', 'Toyota']
e['Cars'].remove('Toyota')
print e
# NOTE: it's still true that ('Toyota' in e['Cars'])
I use setdefault frequently when, get this, setting a default (!!!) in a dictionary; somewhat commonly the os.environ dictionary:
# Set the venv dir if it isn't already overridden:
os.environ.setdefault('VENV_DIR', '/my/default/path')
Less succinctly, this looks like this:
# Set the venv dir if it isn't already overridden:
if 'VENV_DIR' not in os.environ:
os.environ['VENV_DIR'] = '/my/default/path')
It's worth noting that you can also use the resulting variable:
venv_dir = os.environ.setdefault('VENV_DIR', '/my/default/path')
But that's less necessary than it was before defaultdicts existed.
Another use case that I don't think was mentioned above.
Sometimes you keep a cache dict of objects by their id where primary instance is in the cache and you want to set cache when missing.
return self.objects_by_id.setdefault(obj.id, obj)
That's useful when you always want to keep a single instance per distinct id no matter how you obtain an obj each time. For example when object attributes get updated in memory and saving to storage is deferred.
One very important use-case I just stumbled across: dict.setdefault() is great for multi-threaded code when you only want a single canonical object (as opposed to multiple objects that happen to be equal).
For example, the (Int)Flag Enum in Python 3.6.0 has a bug: if multiple threads are competing for a composite (Int)Flag member, there may end up being more than one:
from enum import IntFlag, auto
import threading
class TestFlag(IntFlag):
one = auto()
two = auto()
three = auto()
four = auto()
five = auto()
six = auto()
seven = auto()
eight = auto()
def __eq__(self, other):
return self is other
def __hash__(self):
return hash(self.value)
seen = set()
class cycle_enum(threading.Thread):
def run(self):
for i in range(256):
seen.add(TestFlag(i))
threads = []
for i in range(8):
threads.append(cycle_enum())
for t in threads:
t.start()
for t in threads:
t.join()
len(seen)
# 272 (should be 256)
The solution is to use setdefault() as the last step of saving the computed composite member -- if another has already been saved then it is used instead of the new one, guaranteeing unique Enum members.
In addition to what have been suggested, setdefault might be useful in situations where you don't want to modify a value that has been already set. For example, when you have duplicate numbers and you want to treat them as one group. In this case, if you encounter a repeated duplicate key which has been already set, you won't update the value of that key. You will keep the first encountered value. As if you are iterating/updating the repeated keys once only.
Here's a code example of recording the index for the keys/elements of a sorted list:
nums = [2,2,2,2,2]
d = {}
for idx, num in enumerate(sorted(nums)):
# This will be updated with the value/index of the of the last repeated key
# d[num] = idx # Result (sorted_indices): [4, 4, 4, 4, 4]
# In the case of setdefault, all encountered repeated keys won't update the key.
# However, only the first encountered key's index will be set
d.setdefault(num,idx) # Result (sorted_indices): [0, 0, 0, 0, 0]
sorted_indices = [d[i] for i in nums]
[Edit] Very wrong! The setdefault would always trigger long_computation, Python being eager.
Expanding on Tuttle's answer. For me the best use case is cache mechanism. Instead of:
if x not in memo:
memo[x]=long_computation(x)
return memo[x]
which consumes 3 lines and 2 or 3 lookups, I would happily write :
return memo.setdefault(x, long_computation(x))
I like the answer given here:
http://stupidpythonideas.blogspot.com/2013/08/defaultdict-vs-setdefault.html
In short, the decision (in non-performance-critical apps) should be made on the basis of how you want to handle lookup of empty keys downstream (viz. KeyError versus default value).
The different use case for setdefault() is when you don't want to overwrite the value of an already set key. defaultdict overwrites, while setdefault() does not. For nested dictionaries it is more often the case that you want to set a default only if the key is not set yet, because you don't want to remove the present sub dictionary. This is when you use setdefault().
Example with defaultdict:
>>> from collection import defaultdict()
>>> foo = defaultdict()
>>> foo['a'] = 4
>>> foo['a'] = 2
>>> print(foo)
defaultdict(None, {'a': 2})
setdefault doesn't overwrite:
>>> bar = dict()
>>> bar.setdefault('a', 4)
>>> bar.setdefault('a', 2)
>>> print(bar)
{'a': 4}
Another usecase for setdefault in CPython is that it is atomic in all cases, whereas defaultdict will not be atomic if you use a default value created from a lambda.
cache = {}
def get_user_roles(user_id):
if user_id in cache:
return cache[user_id]['roles']
cache.setdefault(user_id, {'lock': threading.Lock()})
with cache[user_id]['lock']:
roles = query_roles_from_database(user_id)
cache[user_id]['roles'] = roles
If two threads execute cache.setdefault at the same time, only one of them will be able to create the default value.
If instead you used a defaultdict:
cache = defaultdict(lambda: {'lock': threading.Lock()}
This would result in a race condition. In my example above, the first thread could create a default lock, and the second thread could create another default lock, and then each thread could lock its own default lock, instead of the desired outcome of each thread attempting to lock a single lock.
Conceptually, setdefault basically behaves like this (defaultdict also behaves like this if you use an empty list, empty dict, int, or other default value that is not user python code like a lambda):
gil = threading.Lock()
def setdefault(dict, key, value_func):
with gil:
if key not in dict:
return
value = value_func()
dict[key] = value
Conceptually, defaultdict basically behaves like this (only when using python code like a lambda - this is not true if you use an empty list):
gil = threading.Lock()
def __setitem__(dict, key, value_func):
with gil:
if key not in dict:
return
value = value_func()
with gil:
dict[key] = value
I'm doing a data mining homework with python(2.7). I created a weight dict for all words(that exist in the category), and for the words that don't exist in this dict, i want to assign a default value.
First I tried with setdefault for everykey before using it, it works perfectly, but somehow I think it doesn't look so pythonic. Therefore I tried using defaultdict, which works just fine most of the time. However, sometimes it returns an incorrect value. First I thought it could be caused by defaultdict or lambda function, but apparently there are no errors.
for node in globalTreeRoot.traverse():
...irrelevant...
weight_dict = {.......}
default_value = 1.0 / (totalwords + dictlen)
node.default_value = 1.0/ (totalwords + dictlen)
......
node.weight_dict_ori = weight_dict
node.weight_dict = defaultdict(lambda :default_value,weight_dict)
So, when I tried to print a value that doesn't exist during the loop, it gives me a correct value. However, after the code finishes running, when I try:
print node.weight_dict["doesnotexist"],
it gives me an incorrect value, and when incorrect usually a value related to some other node. I tried search python naming system or assign values to object attributes dynamically, but didn't figure it out.
By the way, is defaultdict faster than using setdefault(k,v) each time?
This is not a use case of defaultdict.
Instead, simply use get to get values from the dictionary.
val = dict.get("doesnotexist", 1234321)
is perfectly acceptable python "get" has a second parameter, the default value if the key was not found.
If you only need this for "get", defaultdict is a bit overkill. It is meant to be used like this:
example = defaultdict(list)
example[key].append(1)
without having to initialize the key-list combination explicitly each time. For numerical values, the improvements are marginal:
ex1, ex2 = dict, defaultdict(lambda: 0)
ex1[key] = ex1.get(key, 0) + 1
ex2[key] += 1
Your original problem probably is because you reused the variable storing the weight. Make sure it is local to the loop!
var = 1
ex3 = defaultdict(lambda: var)
var = 2
print ex3[123]
is supposed to return the current value of var=2. It's not substituted into the dictionary at initialization, but behaves as if you had define a function at this position, accessing the "outer" variable var.
A hack is this:
def constfunc(x):
return lambda: x
ex3 = defaultdict(constfunc(var))
Now constfunc is evaluated at initialization, x is a local variable of the invocation, and the lambda now will return an x which does not change anymore. I guess you can inline this (untested):
ex3 = defaultdict((lambda x: lambda: x)(var))
Behold, the magics of Python, capturing "closures" and the anomalies of imperative languages pretending to do functional programming.
setdefault is definitely what you should use to set a default value.
for node in globalTreeRoot.traverse():
node.default_value = 1.0 / (totalwords + dictlen)
node.weight_dict = {}
# if you did want to use a defaultdict here for some reason, it would be
# node.weight_dict = defaultdict(lambda: node.default_value)
for word in wordlist:
value = node.weight_dict.setdefault(word, node.default_value)
Apparently, there is something wrong with defaultdict.
d1 = {"a":10,"b":9,"c":8}
seven = 7
d2 = defaultdict(lambda :seven,d1)
seven = 8
d3 = defaultdict(lambda :seven,d1)
And the result:
>>> d2[4234]
8
I still don't understand why it works this way. As for my work, I'll stick with setdefault.
UPDATE:
Thanks for answering. I misunderstood how variable scoping works in Python.
I have a function that takes given initial conditions for a set of variables and puts the result into another global variable. For example, let's say two of these variables is x and y. Note that x and y must be global variables (because it is too messy/inconvenient to be passing large amounts of references between many functions).
x = 1
y = 2
def myFunction():
global x,y,solution
print(x)
< some code that evaluates using a while loop >
solution = <the result from many iterations of the while loop>
I want to see how the result changes given a change in the initial condition of x and y (and other variables). For flexibility and scalability, I want to do something like this:
varSet = {'genericName0':x, 'genericName1':y} # Dict contains all variables that I wish to alter initial conditions for
R = list(range(10))
for r in R:
varSet['genericName0'] = r #This doesn't work the way I want...
myFunction()
Such that the 'print' line in 'myFunction' outputs the values 0,1,2,...,9 on successive calls.
So basically I'm asking how do you map a key to a value, where the value isn't a standard data type (like an int) but is instead a reference to another value? And having done that, how do you reference that value?
If it's not possible to do it the way I intend: What is the best way to change the value of any given variable by changing the name (of the variable that you wish to set) only?
I'm using Python 3.4, so would prefer a solution that works for Python 3.
EDIT: Fixed up minor syntax problems.
EDIT2: I think maybe a clearer way to ask my question is this:
Consider that you have two dictionaries, one which contains round objects and the other contains fruit. Members of one dictionary can also belong to the other (apples are fruit and round). Now consider that you have the key 'apple' in both dictionaries, and the value refers to the number of apples. When updating the number of apples in one set, you want this number to also transfer to the round objects dictionary, under the key 'apple' without manually updating the dictionary yourself. What's the most pythonic way to handle this?
Instead of making x and y global variables with a separate dictionary to refer to them, make the dictionary directly contain "x" and "y" as keys.
varSet = {'x': 1, 'y': 2}
Then, in your code, whenever you want to refer to these parameters, use varSet['x'] and varSet['y']. When you want to update them use varSet['x'] = newValue and so on. This way the dictionary will always be "up to date" and you don't need to store references to anything.
we are going to take an example of fruits as given in your 2nd edit:
def set_round_val(fruit_dict,round_dict):
fruit_set = set(fruit_dict)
round_set = set(round_dict)
common_set = fruit_set.intersection(round_set) # get common key
for key in common_set:
round_dict[key] = fruit_dict[key] # set modified value in round_dict
return round_dict
fruit_dict = {'apple':34,'orange':30,'mango':20}
round_dict = {'bamboo':10,'apple':34,'orange':20} # values can even be same as fruit_dict
for r in range(1,10):
fruit_set['apple'] = r
round_dict = set_round_val(fruit_dict,round_dict)
print round_dict
Hope this helps.
From what I've gathered from the responses from #BrenBarn and #ebarr, this is the best way to go about the problem (and directly answer EDIT2).
Create a class which encapsulates the common variable:
class Count:
__init__(self,value):
self.value = value
Create the instance of that class:
import Count
no_of_apples = Count.Count(1)
no_of_tennis_balls = Count.Count(5)
no_of_bananas = Count.Count(7)
Create dictionaries with the common variable in both of them:
round = {'tennis_ball':no_of_tennis_balls,'apple':no_of_apples}
fruit = {'banana':no_of_bananas,'apple':no_of_apples}
print(round['apple'].value) #prints 1
fruit['apple'].value = 2
print(round['apple'].value) #prints 2
I have a question reguarding how I would perform the following task in python.
(I use python 3k)
what I have are several variables which can yield further variables on top of those
and each of those have even more variables
for example:
a generic name would be
item_version_type =
where each part (item, version, and type) refer to different variables(here there are 3 for each)
item = item_a, item_b, item_c
version = range(1,3)
itemtype = itemtype_a, itemtype_b, itemtype_c
simply listing each name and defining it is annoying:
itema_ver1_typea =
itemb_ver1_typea =
itemc_ver1_typea =
itema_ver2_typea =
etc.
etc.
etc.
especially when I have something where one variable is dependent on something else
for example:
if value == True:
version = ver + 1
and to top it off this whole example is rather simply compared to what I'm actually
working with.
one thing I am curious about is using multiple "." type of classes such as:
item.version.type
I know that this can be done
I just can't figure out how to get a class with more than one dot
either that or if anyone can point me to a better method
Thanks for help.
Grouping of data like this can be done in three ways in Python.
First way is tuples:
myvariable = ('Sammalamma', 1, 'Text')
The second way is a dictionary:
myvariable = {'value': 'Sammalamma', 'version': 1, 'type': 'Text'}
And the third way is a class:
class MyClass(object):
def __init__(self, value, version, type):
self.value = value
self.version = version
self.type = type
>>> myvariable = MyClass('Sammalamma', 1, 'Text')
>>> myvariable.value
'Sammalamma'
>>> myvariable.version
1
>>> myvariable.type
'Text'
Which one to use in each case is up to you, although in this case I would claim that the tuple doesn't seem to be the best choice, I would go for a dictionary or a class.
None of this is unique to Python 3, it works in any version of Python.
In addition to #Lennart Regebro's answer if items are immutable:
import collections
Item = collections.namedtuple('Item', 'value version type')
items = [Item(val, 'ver'+ver, t)
for val in 'abc' for ver in '12' for t in ['typea']]
print(items[0])
# -> Item(value='a', version='ver1', type='typea')
item = items[1]
print(item.value, item.type)
# -> b typea
sorry for posting this here instead of the comments but I have no clue how to work the site here.
for clarification
what I need is basically to have be able to get an output of said such as where
I could take a broad area (item) narrow it further (version) and even further (type as in type of item like lets say types are spoon, knife, fork)
or a better description is like arm.left.lower = lower left arm
where I could also have like leg.left.lower
so I could have arm.both.upper to get both left and right upper arms
where a value would be assigned to both.
what I need is to be able to do truth tests etc. and have it return the allowable values
such as
if leg == True
output is --> leg.both.lower, leg.both.upper, leg.left.upper leg.right.upper, etc., etc., etc.
if upper == True
output is --> leg.both.upper, leg.left.upper, etc., etc., etc.
hopefully that helps
Basically I get how to get something like item.version but how do I get something
like item.version.type
I need to have it to be more specific than just item.version
I need to be able to tell if item is this and version is that then type will be x
like
item.version.type
if version == 3:
item.version = spoon.3.bent
#which is different from
if version == 2:
item.version.type = spoon.2.bent