Hashing bitarray ? Counting - python

It seems for some reason that a dict can not have a non-duplicate key which is bitarray()
ex.:
data = {}
for _ in xrange(10):
ba = ...generate repeatable bitarrays ...
data[ba] = 1
print ba
{bitarray('11011'): 1, bitarray('11011'): 1, bitarray('11011'): 1, bitarray('01111'): 1, bitarray('11110'): 1, bitarray('11110'): 1, bitarray('01111'): 1, bitarray('01111'): 1, bitarray('11110'): 1, bitarray('11110'): 1}
You can clearly see that duplicate are stored as different keys (f.e. first two elements) !! which is weird. What could be the reason.
My goal is simply to count the number of times a bit pattern shows up, and of course Dict's are perfect for this, but it seems that bitarray() for some reason is opaque to the hashing algorithm.
btw.. i have to use bitarray(), cause i do 10000 bits+ patterns.
Any other idea of efficient way of counting occurrence of bit pattens ..

This answer addresses your first confusion regarding duplicate dictionary keys and I assume you're referring to bitarray() from bitarray module, *I've not used this module myself.
In your example above, you're not actually getting duplicate dictionary keys, you might see them that way, but they're duplicates to the naked eye only, for instance:
>>> class X:
... def __repr__(self):
... return '"X obj"'
...
>>> x1 = X()
>>> x2 = X()
>>> d = {x1:1, x2:2}
>>> d
{"X obj": 2, "X obj": 1}
But x1 isn't exactly equals to to x2 and hence they're not duplicates, they're distinct objects of class X:
>>> x1 == x2
False
>>> #same as
... id(x1) == id(x2)
False
>>> #same as
...x1 is x2
False
Moreover, because X class defines __repr__ which returns the string representation for its objects, you would think dictionary d has duplicate keys, again there are no duplicated keys nor are the keys of type str; key of value 1 is X object and key of value 2 is another object of X -- literally two different objects with a single string representation returned by their class's __repr__ method:
>>> # keys are instance of X not strings
... d
{"X obj": 2, "X obj": 1}
>>> d["X obj"]
KeyError: 'X obj'
>>>[x1]
1
>>>[x2]
2

Till BitArray 0.8.1 (or later) I believe it does not satisfy the hash invariant property.
To work around it, you should convert the bit array to byte format as follows.
>>> from bitarray import bitarray
>>> l = [bitarray('11111'), bitarray('11111'), bitarray('11010'), bitarray('11110'), bitarray('11111'), bitarray('11010')]
>>> for x in l: ht[x.tobytes()] = 0
...
>>> for x in l: ht[x.tobytes()] += 1
...
>>> ht
{'\xf8': 3, '\xf0': 1, '\xd0': 2}
Remember you can get back the bitarray from the byte format by using the command frombytes(byte). Though, in this case you will have to keep track of the size of bitarray explicitly as it will return bitarray of size multiple of 8.
If you want to keep the bitarray in the dictionary also:
>>> from bitarray import bitarray
>>> l = [bitarray('11111'), bitarray('11111'), bitarray('11010'), bitarray('11110'), bitarray('11111'), bitarray('11010')]
>>> ht = {}
>>> for x in l: ht[x.tobytes()] = (0, x)
...
>>> for x in l:
... old_count = ht[x.tobytes()][0]
... ht[x.tobytes()] = (old_count+1, x)
...
>>> ht
{'\xf8': (3, bitarray('11111')), '\xf0': (1, bitarray('11110')), '\xd0': (2, bitarray('11010'))}
>>> for x,y in ht.iteritems(): print(y)
...
(3, bitarray('11111'))
(1, bitarray('11110'))
(2, bitarray('11010'))

I solved it :
desc = bitarray(res).to01()
if desc in data : data[desc] += 1
else : data[desc] = 1
gosh I miss perl no-nonsense autovivification :)

Related

Finding the minimum value for different variables

If i am doing some math functions for different variables for example:
a = x - y
b = x**2 - y**2
c = (x-y)**2
d = x + y
How can i find the minimum value out of all the variables. For example:
a = 4
b = 7
c = 3
d = 10
So the minimum value is 3 for c. How can i let my program do this.
What have i thought so far:
make a list
append a,b,c,d in the list
sort the list
print list[0] as it will be the smallest value.
The problem is if i append a,b,c,d to a list i have to do something like:
lst.append((a,b,c,d))
This makes the list to be -
[(4,7,3,10)]
making all the values relating to one index only ( lst[0] )
If possible is there any substitute to do this or any way possible as to how can i find the minimum!
LNG - PYTHON
Thank you
You can find the index of the smallest item like this
>>> L = [4,7,3,10]
>>> min(range(len(L)), key=L.__getitem__)
2
Now you know the index, you can get the actual item too. eg: L[2]
Another way which finds the answer in the form(index, item)
>>> min(enumerate(L), key=lambda x:x[1])
(2, 3)
I think you may be going the wrong way to solving your problem, but it's possible to pull values of variable from the local namespace if you know their names. eg.
>>> a = 4
>>> b = 7
>>> c = 3
>>> d = 10
>>> min(enumerate(['a', 'b', 'c', 'd']), key=lambda x, ns=locals(): ns[x[1]])
(2, 'c')
a better way is to use a dict, so you are not filling your working namespace with these "junk" variables
>>> D = {}
>>> D['a'] = 4
>>> D['b'] = 7
>>> D['c'] = 3
>>> D['d'] = 10
>>> min(D, key=D.get)
'c'
>>> min(D.items(), key=lambda x:x[1])
('c', 3)
You can see that when the correct data structure is used, the amount of code required is much less.
If you store the numbers in an list you can use a reduce having a O(n) complexity due the list is not sorted.
numbers = [999, 1111, 222, -1111]
minimum = reduce(lambda mn, candidate: candidate if candidate < mn else mn, numbers[1:], numbers[0])
pack as dictionary, find min value and then find keys that have matching values (possibly more than one minimum)
D = dict(a = 4, b = 7, c = 3, d = 10)
min_val = min(D.values())
for k,v in D.items():
if v == min_val: print(k)
The buiit-in function min will do the trick. In your example, min(a,b,c,d) will yield 3.

Avoid double lookup when updating dictionary integer members

If a dictionary contains something to which you can hold a reference, you can default-or-update it with one dictionary lookup:
d.setdefault('k', []).append(2)
However, modifying dictionary entries in the same manner is not possible if they're numbers:
d.setdefault('k', 0) += 1 # doesn't work
Instead, you need to do two dict lookups, one for read and one for write:
d['a'] = d.get('a', 0) + 1
This doesn't seem like a great idea for dictionaries with a huge number of keys. So, is there a way to do a default-or-update operation on dictionaries containing numbers? Or, phrased another way, what's the most performant way to apply a default-or-update operation on such dictionaries?
A quick test suggests that collections.defaultdict is about 2.5 times faster than your double-lookup (tested on Python 2.6):
>>> import timeit
>>> s1 = "d = dict((str(n), 0) for n in range(1000000))"
>>> timeit.repeat("d['a'] = d.get('a', 0) + 1", setup=s1)
[0.17711305618286133, 0.17411494255065918, 0.17812514305114746]
>>> s2 = """
... from collections import defaultdict
... d = defaultdict(int, ((str(n), 0) for n in range(1000000)))
... """
>>> timeit.repeat("d['a'] += 1", setup=s2)
[0.07185506820678711, 0.07294416427612305, 0.12155508995056152]

using FOR statement on 2 elements at once python

I have the following list of variables and a mastervariable
a = (1,5,7)
b = (1,3,5)
c = (2,2,2)
d = (5,2,8)
e = (5,5,8)
mastervariable = (3,2,5)
I'm trying to check if 2 elements in each variable exist in the master variable, such that the above would show B (3,5) and D (5,2) as being elements with at least 2 elements matching in the mastervariable. Also note that using sets would result in C showing up as matchign but I don't want to count C cause only 'one' of the elements in C are in mastervariable (i.e. 2 only shows up once in mastervariable not twice)
I currently have the very inefficient:
if current_variable[0]==mastervariable[0]:
if current_variable[1] = mastervariable[1]:
True
elif current_variable[2] = mastervariable[1]:
True
#### I don't use OR here because I need to know which variables match.
elif current_variable[1] == mastervariable[0]: ##<-- I'm now checking 2nd element
etc. etc.
I then continue to iterate like the above by checking each one at a time which is extremely inefficient. I did the above because using a FOR statement resulted in me checking the first element twice which was incorrect:
For i in a:
for j in a:
### this checked if 1 was in the master variable and not 1,5 or 1,7
Is there a way to use 2 FOR statement that allows me to check 2 elements in a list at once while skipping any element that has been used already? Alternatively, can you suggest an efficient way to do what I'm trying?
Edit: Mastervariable can have duplicates in it.
For the case where matching elements can be duplicated so that set breaks, use Counter as a multiset - the duplicates between a and master are found by:
count_a = Counter(a)
count_master = Counter(master)
count_both = count_a + count_master
dups = Counter({e : min((count_a[e], count_master[e])) for e in count_a if count_both[e] > count_a[e]})
The logic is reasonably intuitive: if there's more of an item in the combined count of a and master, then it is duplicated, and the multiplicity is however many of that item are in whichever of a and master has less of them.
It gives a Counter of all the duplicates, where the count is their multiplicity. If you want it back as a tuple, you can do tuple(dups.elements()):
>>> a
(2, 2, 2)
>>> master
(1, 2, 2)
>>> dups = Counter({e : min((count_a[e], count_master[e])) for e in count_a if count_both[e] > count_a[e]})
>>> tuple(dups.elements())
(2, 2)
Seems like a good job for sets. Edit: sets aren't suitable since mastervariable can contain duplicates. Here is a version using Counters.
>>> a = (1,5,7)
>>>
>>> b = (1,3,5)
>>>
>>> c = (2,2,2)
>>>
>>> d = (5,2,8)
>>>
>>> e = (5,5,8)
>>> D=dict(a=a, b=b, c=c, d=d, e=e)
>>>
>>> from collections import Counter
>>> mastervariable = (5,5,3)
>>> mvc = Counter(mastervariable)
>>> for k,v in D.items():
... vc = Counter(v)
... if sum(min(count, vc[item]) for item, count in mvc.items())==2:
... print k
...
b
e

Tuples in Dicts

Is it possible in python to add a tuple as a value in a dictionary?
And if it is,how can we add a new value, then? And how can we remove and change it?
>>> a = {'tuple': (23, 32)}
>>> a
{'tuple': (23, 32)}
>>> a['tuple'] = (42, 24)
>>> a
{'tuple': (42, 24)}
>>> del a['tuple']
>>> a
{}
if you meant to use tuples as keys you could do:
>>> b = {(23, 32): 'tuple as key'}
>>> b
{(23, 32): 'tuple as key'}
>>> b[23, 32] = 42
>>> b
{(23, 32): 42}
Generally speaking there is nothing specific about tuples being in dictionary, they keep behaving as tuples.
Since tuples are immutable, you cannot add a value to the tuple. What you can do, is construct a new tuple from the current tuple and an extra value. The += operator does this for you, provided the left argument is a variable (or in this case a dictionary value):
>>> t = {'k': (1, 2)}
>>> t['k'] += (3,)
>>> t
{'k': (1, 2, 3)}
Regardless, if you plan on altering the tuple value, perhaps it's better to store lists? Those are mutable.
Edit: Since you updated your question†, observe the following:
>>> d = {42: ('name', 'date')}
>>> d[42][0] = 'name2'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
This happens because, as stated before, tuples are immutable. You cannot change them. If you want to change them, then in fact you'll have to create a new one. Thus:
>>> d[42] = ('name2', d[42][2])
>>> d
{42: ('name2', 'date')}
As a side note, you may want to use namedtuples. They work just like regular tuples, but allow you to refer to elements within the tuple by name:
>>> from collections import namedtuple
>>> Person = namedtuple('Person', 'name date')
>>> t = {42: Person('name', 'date')}
>>> t[42] = Person('name2', t[42].date)
>>> t
{42: Person(name='name2', date='date')}
†: Next time please edit your actual question. Do not post an answer containing only further questions. This is not a forum.
You can't change a tuple itself. You have to replace it by a different tuple.
When you use a list, you could also add values to it (changing the list itself) without need to replace it:
>> a = {'list': (23, 32)}
>> a
{'list': [23, 32]}
>> a['list'].append(99)
>> a
{'list': [23, 32, 99]}
In most cases, lists can be used as replacement for tuples (since as much I know they support all tuple functions -- this is duck typing, man!)
t1=('name','date')
t2=('x','y')
# "Number" is a String key!
d1={"Number":t1}
# Update the value of "Number"
d1["Number"] = t2
# Use a tuple as key, and another tuple as value
d1[t1] = t2
# Obtain values (getters)
# Can throw a KeyError if "Number" not a key
foo = d1["Number"]
# Does not throw a key error, t1 is the value if "Number" is not in the dict
d1.get("Number", t1)
# t3 now is the same as t1
t3 = d1[ ('name', 'date') ]
You updated your question again. Please take a look at Python dict docs. Python documentation is one of it's strong points! And play with the interpreter (python)on the command line! But let's continue.
initially key 0
d[0] = ('name', datetime.now())
id known
d1 = d[0]
del d[0]
name changed
tmp = d1
d1 = ( newname, tmp1 )
And please consider using a
class Person(object):
personIdCounter = 1
def __init__(self):
self.id = Person.personIdCounter
Person.personIdCounter += 1
self.name
self.date
then
persons = {}
person = Person()
persons[person.id] = person
person.name = "something"
persons[1].name = "something else"
That looks better than a tuple and models your data better.

Is there a Python module/recipe (not numpy) for 2d arrays for small games

I am writing some small games in Python with Pygame & Pyglet as hobby projects.
A class for 2D array would be very handy. I use py2exe to send the games to relatives/friends and numpy is just too big and most of it's features are unnecessary for my requirements.
Could you suggest a Python module/recipe I could use for this.
-- Chirag
[Edit]:
List of lists would be usable as mentioned below by MatrixFrog and zvoase. But it is pretty primitive. A class with methods to insert/delete rows and columns as well as to rotate/flip the array would make it very easy and reusable too. dicts are good for sparse arrays only.
Thank you for your ideas.
How about using a defaultdict?
>>> import collections
>>> Matrix = lambda: collections.defaultdict(int)
>>> m = Matrix()
>>> m[3,2] = 6
>>> print m[3,4] # deliberate typo :-)
0
>>> m[3,2] += 4
>>> print m[3,2]
10
>>> print m
defaultdict(<type 'int'>, {(3, 2): 10, (3, 4): 0})
As the underlying dict uses tuples as keys, this supports 1D, 2D, 3D, ... matrices.
The simplest approach would just be to use nested lists:
>>> matrix = [[0] * num_cols] * num_rows
>>> matrix[i][j] = 'value' # row i, column j, value 'value'
>>> print repr(matrix[i][j])
'value'
Alternatively, if you’re going to be dealing with sparse matrices (i.e. matrices with a lot of empty or zero values), it might be more efficient to use nested dictionaries. In this case, you could implement setter and getter functions which will operate on a matrix, like so:
def get_element(mat, i, j, default=None):
# This will also set the accessed row to a dictionary.
row = mat.setdefault(i, {})
return row.setdefault(j, default)
def set_element(mat, i, j, value):
row = mat.setdefault(i, {})
row[j] = value
And then you would use them like this:
>>> matrix = {}
>>> set_element(matrix, 2, 3, 'value') # row 2, column 3, value 'value'
>>> print matrix
{2: {3: 'value'}}
>>> print repr(get_element(matrix, 2, 3))
'value'
If you wanted, you could implement a Matrix class which implemented these methods, but that might be overkill:
class Matrix(object):
def __init__(self, initmat=None, default=0):
if initmat is None: initmat = {}
self._mat = initmat
self._default = default
def __getitem__(self, pos):
i, j = pos
return self._mat.setdefault(i, {}).setdefault(j, self._default)
def __setitem__(self, pos, value):
i, j = pos
self._mat.setdefault(i, {})[j] = value
def __repr__(self):
return 'Matrix(%r, %r)' % (self._mat, self._default)
>>> m = Matrix()
>>> m[2,3] = 'value'
>>> print m[2,3]
'value'
>>> m
Matrix({2: {3: 'value'}}, 0)
Maybe pyeuclid matches your needs -- (dated but usable) formatted docs are here, up-to-date docs in ReST format are in this text file in the pyeuclid sources (to do your own formatting of ReST text, use the docutils).
I wrote the class. Don't know if it is a good or redundant but... Posted it here http://bitbucket.org/pieceofpeace/container2d/

Categories