Dictionary use instead of dynamic variable names in Python

Dictionary use instead of dynamic variable names in Python - python

I have a long text file having truck configurations. In each line some properties of a truck is listed as a string. Each property has its own fixed width space in the string, such as:
2 chracters = number of axles
2 characters = weight of the first axle
2 characters = weight of the second axle
...
2 characters = weight of the last axle
2 characters = length of the first axle spacing (spacing means distance between axles)
2 characters = length of the second axle spacing
...
2 characters = length of the last axle spacing
As an example:
031028331004
refers to:
number of axles = 3
first axle weight = 10
second axle weight = 28
third axle weight = 33
first spacing = 10
second spacing = 4
Now, you have an idea about my file structure, here is my problem: I would like to group these trucks in separate lists, and name the lists in terms of axle spacings. Let's say I am using a boolean type of approach, and if the spacing is less than 6, the boolean is 1, if it is greater than 6, the boolean is 0. To clarify, possible outcomes in a three axle truck becomes:
00 #Both spacings > 6
10 #First spacing < 6, second > 6
01 #First spacing > 6, second < 6
11 #Both spacings < 6
Now, as you see there are not too many outcomes for a 3 axle truck. However, if I have a 12 axle truck, the number of "possible" combinations go haywire. The thing is, in reality you would not see all "possible" combinations of axle spacings in a 12 axle truck. There are certain combinations (I don't know which ones, but to figure it out is my aim) with a number much less than the "possible" number of combinations.
I would like the code to create lists and fill them with the strings that define the properties I mentioned above if only such a combination exists. I thought maybe I should create lists with variable names such as:
truck_0300[]
truck_0301[]
truck_0310[]
truck_0311[]
on the fly. However, from what I read in SF and other sources, this is strongly discouraged. How would you do it using the dictionary concept? I understand that dictionaries are like 2 dimensional arrays, with a key (in my case the keys would be something like truck_0300, truck_0301 etc.) and value pair (again in my case, the values would probably be lists that hold the actual strings that belong to the corresponding truck type), however I could not figure out how to create that dictionary, and populate it with variable keys and values.
Any insight would be welcome!
Thanks a bunch!

You are definitely correct that it is almost always a bad idea to try and create "dynamic variables" in a scope. Dictionaries usually are the answer to build up a collection of objects over time and reference back to them...
I don't fully understand your application and format, but in general to define and use your dictionary it would look like this:
trucks = {}
trucks['0300'] = ['a']
trucks['0300'].append('c')
trucks['0300'].extend(['c','d'])
aTruck = trucks['0300']
Now since every one of these should be a list of your strings, you might just want to use a defaultdict, and tell it to use a list as default value for non existant keys:
from collections import defaultdict
trucks = defaultdict(list)
trucks['0300']
# []
Note that even though it was a brand new dict that contained no entries, the 'truck_0300' key still return a new list. This means you don't have to check for the key. Just append:
trucks = defaultdict(list)
trucks['0300'].append('a')
A defaultdict is probably what you want, since you do not have to pre-define keys at all. It is there when you are ready for it.
Getting key for the max value
From your comments, here is an example of how to get the key with the max value of a dictionary. It is pretty easy, as you just use max and define how it should determine the key to use for the comparisons:
d = {'a':10, 'b':5, 'c':50}
print max(d.iteritems(), key=lambda (k,v): v)
# ('c', 50)
d['c'] = 1
print max(d.iteritems(), key=lambda (k,v): v)
# ('a', 10)
All you have to do is define how to produce a comparison key. In this case I just tell it to take the value as the key. For really simply key functions like this where you are just telling it to pull an index or attribute from the object, you can make it more efficient by using the operator module so that the key function is in C and not in python as a lambda:
from operator import itemgetter
...
print max(d.iteritems(), key=itemgetter(1))
#('c', 50)
itemgetter creates a new callable that will pull the second item from the tuple that is passed in by the loop.
Now assume each value is actually a list (similar to your structure). We will make it a list of numbers, and you want to find the key which has the list with the largest total:
d = {'a': range(1,5), 'b': range(2,4), 'c': range(5,7)}
print max(d.iteritems(), key=lambda (k,v): sum(v))
# ('c', [5, 6])

If the number of keys is more than 10,000, then this method is not viable. Otherwise define a dictionary d = {} and do a loop over your lines:
key = line[:4]
if not key in d.keys():
d[key] = []
d[key] += [somevalue]
I hope this helps.

Here's a complete solution from string to output:
from collections import namedtuple, defaultdict
# lightweight class
Truck = namedtuple('Truck', 'weights spacings')
def parse_truck(s):
# convert to array of numbers
numbers = [int(''.join(t)) for t in zip(s[::2], s[1::2])]
# check length
n = numbers[0]
assert n * 2 == len(numbers)
numbers = numbers[1:]
return Truck(numbers[:n], numbers[n:])
trucks = [
parse_truck("031028331004"),
...
]
# dictionary where every key contains a list by default
trucks_by_spacing = defaultdict(list)
for truck in trucks:
# (True, False) instead of '10'
key = tuple(space > 6 for space in truck.spacings)
trucks_by_spacing[key].append(truck)
print trucks_by_spacing
print trucks_by_spacing[True, False]

Related

Intersection of set of values corresponding to given keys from several dictionaries

I am creating a variable that measure the size of a sub-sample of values for a given key in 3 different dictionaries. For example, I want the set of values corresponding to key A1 in dictionary dict_a , to key b2 in dictionary dict_b and to key c5 in dictionary dict_c (i.e. the intersection of the set of values corresponding to given keys from 3 dictionaries).
I have written a code that does it using a loop as follows:
import numpy as np
dict_a = {'a1':[1,3,4], 'a2':[1,5,6,7,8,9,13]}
dict_b = {'b1':[85,7,25], 'b2':[1,8,10,70], 'b3':[1,5,69,13], 'b4':[1,75,15,30]}
dict_c = {'c1':[1,3,4], 'c2':[725,58,2,89], 'c3':[5,684,6,8,2], 'c4':[4,8,88,55,75,2,8], 'c5':[8,5,6,28,24,6], 'c6':[8,52,3,58,26,2]}
keys_a = list(dict_a.keys())
keys_b = list(dict_b.keys())
keys_c = list(dict_c.keys())
a= []
b= []
c= []
size = []
for y in keys_a:
for u in keys_b:
for w in keys_c:
a.append(u)
b.append(w)
c.append(y)
# Define subsample
subsample = np.intersect1d(dict_a[y],dict_b[u],dict_c[w])
size.append(len(subsample))
The problem is that my dictionaries are much bigger than in the example and this takes a long time to run.
Is there a way to make this more efficient?

How about using sets?
size = []
for y in keys_a:
for u in keys_b:
for w in keys_c:
common = set.intersection(set(dict_a[y]),
set(dict_b[u]),
set(dict_c[w]))
size.append(len(common))
Calculating the intersection of sets should also be a lot faster than converting the list of numbers to arrays first and using np.intersection then.
You can use this approach with any hashable types in your lists.

I'm going to chop this into a few bits. First generating the a, b and c lists, and then the main size list using numpy and finally doing the same with Python lists.
Getting the key lists
So if we look then c is actually a list of keys from dict_a and so on. I'm going to assume that is on purpose, but if it's not then replace y with key_a and you'll see what I mean.
We can calculate this easily up front without getting into the main loop. Each item is just repeated by the product of the number of keys in the other two lists. We can do that with something like:
from itertools import repeat, chain
def key_summary(dict_1, dict_2, dict_3):
count = len(dict_2) * len(dict_3)
return chain(*(repeat(k, count) for k in dict_1.keys()))
a = list(key_summary(dict_b, dict_a, dict_c))
b = list(key_summary(dict_c, dict_a, dict_b))
c = list(key_summary(dict_a, dict_b, dict_c))
This should be faster, as it's not deeply nested in the loops, but given how easy it is to calculate I think you might want to think about why you need this. Could you achieve you're goal without actually making the lists?
Getting the size list
I don't think you are using the intersect1d() function correctly. The docs state that the third argument is assume_unique, which is not what I think you are trying to do. I'm assuming you want the elements which appear in all lists one way to do that is:
np.intersect1d(np.intersect1d(val_a, val_b), val_c))
This suggests a way to optimize the loop. Instead of calculating the intersection of val_a and val_b inside every loop, we can instead do it once and re-use it.
for val_a in dict_a.values():
for val_b in dict_b.values():
# Get the intersection of a and b first
cache = np.intersect1d(val_a, val_b)
if not len(cache):
# Our first two sets have nothing in common, we know that we are
# just going to add a bunch of zeros for everything in dict_c
size.extend(repeat(0, len(dict_c)))
else:
size.extend(
len(np.intersect1d(cache, val_c)) for val_c in dict_c.values())
This also allows us to apply one more optimization which is to skip looping over dict_c at all if the intersection of val_a and val_b has nothing in it. We could also do something similar if val_a is ever empty.
As a final optimization you should always have dict_a be the smallest and dict_c the largest, as this gives us the best chance of skipping steps.
Doing the above netted my about 200% increase in speed (1.493ms -> 0.8ms on the example given).
Getting the size list (using Python sets)
I'm assuming you are using the numpy functions for a good reason, but if they aren't essential you can convert your lists to sets, which are very fast to perform intersections on in Python. We can follow a pretty similar approach as above:
dset_a = {k: set(v) for k, v in dict_a.items()}
dset_b = {k: set(v) for k, v in dict_b.items()}
dset_c = {k: set(v) for k, v in dict_c.items()}
size = []
for val_a in dset_a.values():
for val_b in dset_b.values():
cache = val_a & val_b
if not cache:
size.extend(repeat(0, len(dict_c)))
else:
size.extend(len(cache & val_c) for val_c in dset_c.values())
This is vastly faster on the given example. This took 0.019ms vs. 1.493ms for the original (about ~80x faster!).

Iterating over dictionaries within dictionaries, dictionary object turning into string?

test = {'a':{'aa':'value','ab':'value'},'b':{'aa':'value','ab':'value'}}
#test 1
for x in test:
print(x['aa'])
#test 2
for x in test:
print(test[x]['aa'])
Why does test 1 give me a TypeError: string indices must be integers but test 2 pass?
Does the for loop turn the dictionary into a string?

If you iterate over a dictionary, you iterate over the keys. So that means in the first loop, x = 'a', and in the second x = 'b' (or vice versa, since dictionaries are unordered). It thus simply "ignores" the values. It makes no sense to index a string with a string (well there is no straightforward interpretation for 'a'['aa'], or at least not really one I can come up with that would be "logical" for a signifcant number of programmers).
Although this may look quite strange, it is quite consistent with the fact that a membership check for example also works on the keys (if we write 'a' in some_dict, it does not look to the values either).
If you want to use the values, you need to iterate over .values(), so:
for x in test.values():
print(x['aa'])
If you however use your second thest, then this works, since then x is a key (for example 'a'), and hence test[x] will fetch you the corresponding value. If you then process test[x] further, you thus process the values of the dictionary.
You can iterate concurrently over keys and values with .items():
for k, x in test.items():
# ...
pass
Here in the first iteration k will be 'a' and x will be {'aa':'value','ab':'value'}, in the second iteration k will be 'b' and x will be {'aa':'value','ab':'value'} (again the iterations can be swapped, since dictionaries are unordered).
If you thus are interested in the outer key, and the value that is associated with the 'aa' key of the corresponding subdictionary, you can use:
for k, x in test.items():
v = x['aa']
print(k, v)

When you iterate over a dictionary with a for, you're not iterating over the items, but over the keys ('a', 'b'). These are just strings that mean nothing. That's why you have to do it as on test 2. You could also iterate over the items with test.items().

Trying to compare every number out of this list with the 1 infront of it and the 1 after it

I'm trying to compare every number out of this list with the 1 infront of it and the 1 after it.
For example: I got a list participants = ["A:1", "B:6", "C:5", "D:4", "E:7", "F:3", "G:10", "H:2"]
Now I wanna compare the number of B:6 to A:1 and C:5, but I have no idea how since that A,B and C are still in the way and I need to keep them for later.

You can use split, int and zip.
Let's name your list l:
for previous, current, next in zip(l[:-2], l[1:-1],l[2:]):
nums = [int(x.split(':')[1]) for x in [previous, current, next]]
# now do your comparison...

Use an collections.OrderedDict, if you need to maintain order, but want to properly split the keys and values. The advantage is that you have a proper key-value store of your values, but you also still maintain the order the values were added, like with a list.
from collections import OrderedDict
participants = ["A:1", "B:6", "C:5", "D:4", "E:7", "F:3", "G:10", "H:2"]
d = OrderedDict()
for p in participants:
letter, number = p.split(':')
d.update({letter:int(number)})
Using d.keys() you can access the keys, using d.values() you can access the values. So, for example, to print the difference of all numbers to the previous one, do something like this (python 2 code):
from __future__ import print_function # necessary for print() in py2.*
for prev, next in zip(d.keys()[:-1], d.keys()[1:]):
print(next, "-", prev, "=", d[next] - d[prev])

Python. Identity in sets of objects. And hashing

How do __hash__ and __eq__ use in identification in sets?
For example some code that should help to solve some domino puzzle:
class foo(object):
def __init__(self, one, two):
self.one = one
self.two = two
def __eq__(self,other):
if (self.one == other.one) and (self.two == other.two): return True
if (self.two == other.one) and (self.one == other.two): return True
return False
def __hash__(self):
return hash(self.one + self.two)
s = set()
for i in range(7):
for j in range(7):
s.add(foo(i,j))
len(s) // returns 28 Why?
If i use only __eq__() len(s) equals 49. Its ok because as i understand objects (1-2 and 2-1 for example) not same, but represent same domino. So I have added hash function.
Now it works the way i want, but i did not understand one thing: hash of 1-3 and 2-2 should be same so they should counted like same object and shouldn't added to set. But they do! Im stuck.

Equality for dict/set purposes depends on equality as defined by __eq__. However, it is required that objects that compare equal have the same hash value, and that is why you need __hash__. See this question for some similar discussion.
The hash itself does not determine whether two objects count as the same in dictionaries. The hash is like a "shortcut" that only works one way: if two objects have different hashes, they are definitely not equal; but if they have the same hash, they still might not be equal.
In your example, you defined __hash__ and __eq__ to do different things. The hash depends only on the sum of the numbers on the domino, but the equality depends on both individual numbers (in order). This is legal, since it is still the case that equal dominoes have equal hashes. However, like I said above, it doesn't mean that equal-sum dominoes will be considered equal. Some unequal dominoes will still have equal hashes. But equality is still determined by __eq__, and __eq__ still looks at both numbers, in order, so that's what determines whether they are equal.
It seems to me that the appropriate thing to do in your case is to define both __hash__ and __eq__ to depend on the ordered pair --- that is, first compare the greater of the two numbers, then compare the lesser. This will mean that 2-1 and 1-2 will be considered the same.

The hash is only a hint to help Python arrange the objects. When looking for some object foo in a set, it still has to check each object in the set with the same hash as foo.
It's like having a bookshelf for every letter of the alphabet. Say you want to add a new book to your collection, only if you don't have a copy of it already; you'd first go to the shelf for the appropriate letter. But then you have to look at each book on the shelf and compare it to the one in your hand, to see if it's the same. You wouldn't discard the new book just because there's something already on the shelf.
If you want to use some other value to filter out "duplicates", then use a dict that maps the domino's total value to the first domino you saw. Don't subvert builtin Python behavior to mean something entirely different. (As you've discovered, it doesn't work in this case, anyway.)

The requirement for hash functions is that if x == y for two values, then hash(x) == hash(y). The reverse need not be true.
You can easily see why this is the case by considering hashing of strings. Lets say that hash(str) returns a 32-bit number, and we are hashing strings longer than 4 characters long (i.e. contain more than 32 bits). There are more possible strings than there are possible hash values, so some non-equal strings must share the same hash (this is an application of the pigeonhole principle).
Python sets are implemented as hash tables. When checking whether an object is a member of the set, it will call its hash function and use the result to pick a bucket, and then use the equality operator to see if it matches any of the items in the bucket.
With your implementation, the 2-2 and 1-3 dominoes will end up in the hash bucket, but they don't compare equal. Therefore, the both can be added to the set.

You can read about this in the Python data model documentation, but the short answer is that you can rewrite your hash function as:
def __hash__(self):
return hash(tuple(sorted((self.one, self.two))))

I like the sound of the answer provided by Eevee, but I had difficulty imagining an implementation. Here's my interpretation, explanation and implementation of the answer provided by Eevee.
Use the sum of two domino values as dictionary the key.
Store either of the domino values as the dictionary value.
For example, given the domino '12', the sum is 3, and therefore the dictionary key will be 3. We can then pick either value (1 or 2) to store in that position (we'll pick the first value, 1).
domino_pairs = {}
pair = '12'
pair_key = sum(map(int, pair))
domino_pairs[pair_key] = int(pair[0]) # Store the first pair's first value.
print domino_pairs
Outputs:
{3: '1'}
Although we're only storing a single value from the domino pair, the other value can easily be calculated from the dictionary key and value:
pair = '12'
pair_key = sum(map(int, pair))
domino_pairs[pair_key] = int(pair[0]) # Store the first pair's first value.
# Retrieve pair from dictionary.
print pair_key - domino_pairs[pair_key] # 3-1 = 2
Outputs:
2
But, since two different pairs may have the same total, we need to store multiple values against a single key. So, we store a list of values against a single key (i.e. sum of two pairs). Putting this into a function:
def add_pair(dct, pair):
pair_key = sum(map(int, pair))
if pair_key not in dct:
dct[pair_key] = []
dct[pair_key].append(int(pair[0]))
domino_pairs = {}
add_pair(domino_pairs, '22')
add_pair(domino_pairs, '04')
print domino_pairs
Outputs:
{4: [2, 0]}
This makes sense. Both pairs sum to 4, yet the first value in each pair differs, so we store both. The implementation so far will allow duplicates:
domino_pairs = {}
add_pair(domino_pairs, '40')
add_pair(domino_pairs, '04')
print domino_pairs
Outputs
{4: [4, 0]}
'40' and '04' are the same in Dominos, so we don't need to store both. We need a way of checking for duplicates. To do this we'll define a new function, has_pair:
def has_pair(dct, pair):
pair_key = sum(map(int, pair))
if pair_key not in dct:
return False
return (int(pair[0]) in dct[pair_key] or
int(pair[1]) in dct[pair_key])
As normal, we get the sum (our dictionary key). If it it's not in the dictionary, then the pair cannot exist. If it is in the dictionary, we must check to see if either value in our pair exist in the dictionary 'bucket'. Let's insert this check into add_pair, and so we don't add duplicate domino pairs:
def add_pair(dct, pair):
pair_key = sum(map(int, pair))
if has_pair(dct, pair):
return
if pair_key not in dct:
dct[pair_key] = []
dct[pair_key].append(int(pair[0]))
Now adding duplicate domino pairs works correctly:
domino_pairs = {}
add_pair(domino_pairs, '40')
add_pair(domino_pairs, '04')
print domino_pairs
Outputs:
{4: [4]}
Lastly, a print function shows how from storing only the sum of a domino pair, and a single value from the same pair, is the same as storing the pair itself:
def print_pairs(dct):
for total in dct:
for a in dct[total]:
a = int(a)
b = int(total) - int(a)
print '(%d, %d)'%(a,b)
Testing:
domino_pairs = {}
add_pair(domino_pairs, '40')
add_pair(domino_pairs, '04')
add_pair(domino_pairs, '23')
add_pair(domino_pairs, '50')
print_pairs(domino_pairs)
Outputs:
(4, 0)
(2, 3)
(5, 0)

Determine if a dice roll contains certain combinations?

I am writing a dice game simulator in Python. I represent a roll by using a list containing integers from 1-6. So I might have a roll like this:
[1,2,1,4,5,1]
I need to determine if a roll contains scoring combinations, such as 3 of a kind, 4 of a kind, 2 sets of 3, and straights.
Is there a simple Pythonic way of doing this? I've tried several approaches, but they all have turned out to be messy.

Reorganize into a dict with value: count and test for presence of various patterns.

There are two ways to do this:
def getCounts(L):
d = {}
for i in range(1, 7):
d[i] = L.count(i)
return d # d is the dictionary which contains the occurrences of all possible dice values
# and has a 0 if it doesn't occur in th roll
This one is inspired by Ignacio Vazquez-Abrams and dkamins
def getCounts(L):
d = {}
for i in set(L):
d[i] = L.count(i)
return d # d is the dictionary which contains the occurrences of
# all and only the values in the roll

I have written code like this before (but with cards for poker). A certain amount of code-sprawl is unavoidable to encode all of the rules of the game. For example, the code to look for n-of-a-kind will be completely different from the code to look for a straight.
Let's consider n-of-a-kind first. As others have suggested, create a dict containing the counts of each element. Then:
counts = sorted(d.values())
if counts[-1] == 4:
return four_of_a_kind
if counts[-1] and counts[-2] == 3:
return two_sets_of_three
# etc.
Checking for straights requires a different approach. When checking for n-of-a-kind, you need to get the counts and ignore the values. Now we need to examine the values and ignore the counts:
ranks = set(rolls)
if len(ranks) == 6: # all six values are present
return long_straight
# etc.
In general, you should be able to identify rules with a similar flavor, abstract out code that helps with those kinds of rules, and then write just a few lines per rule. Some rules may be completely unique and will not be able to share code with other rules. That's just the way the cookie crumbles.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.