Python PriorityQueue order

Python PriorityQueue order - python

I'm trying to figure out in what order PriorityQueue.get() returns values in Python. First I thought that the smaller priority values get returned first but after few examples it's not like that. This is the example that I have run:
>>> qe = PriorityQueue()
>>> qe.put("Br", 0)
>>> qe.put("Sh", 0.54743812441605)
>>> qe.put("Gl", 1.1008112004388)
>>> qe.get()
'Br'
>>> qe.get()
'Gl'
>>> qe.get()
'Sh'
Why is it returning values in this order?

According to doc, the first parameter is priority, and second - is value. So that's why you get such result.
A typical pattern for entries is a tuple in the form: (priority_number, data).
So you should pass a tuple to put like this:
>>> q = PriorityQueue()
>>> q.put((10,'ten'))
>>> q.put((1,'one'))
>>> q.put((5,'five'))
>>> q.get()
>>> (1, 'one')
>>> q.get()
>>> (5, 'five')
>>> q.get()
>>> (10, 'ten')
Notice additional braces.

Related

Regarding "f-strings now support = for quick and easy debugging", how to print `f'{array[{i=}]=}'` with index also "="-expanded?

For simplicity, we have an array
>>> arr = [1,2,3]
>>> for i in range(len(arr)):
>>> print(f'{arr[i]=}')
we get
>>> arr[i]=1
>>> arr[i]=2
>>> arr[i]=3
Would it be possible to expand to output like this
>>> arr[i=0]=1
>>> arr[i=1]=2
>>> arr[i=2]=3
or
>>> arr[0]=1
>>> arr[1]=2
>>> arr[2]=3
The real practice is to debug the code and check the array with >1000 elements.
Neither print(f'{arr[{i=}]=}') nor print(f'{arr[{i}]=}') can work for me.

I get the idea, but isn't it much more readable to just do:
for i, x in enumerate(arr):
print(f'arr[{i}]={x}')

Understanding PySpark Reduce()

I'm learning Spark with PySpark and I'm trying different staff with the function reduce() to properly understand it, but I did something and obtained a result that makes no sense to me.
The previous examples I executed with reduce was basic things like:
>>> a = sc.parallelize(['a','b','c','d'])
>>> a.reduce(lambda x,y:x+y)
'abcd'
>>> a = sc.parallelize([1,2,3,4])
>>> a.reduce(lambda x,y:x+y)
10
>>> a = sc.parallelize(['azul','verde','azul','rojo','amarillo'])
>>> aV2 = a.map(lambda x:(x,1))
>>> aRes = aV2.reduceByKey(lambda x,y: x+y)
>>> aRes.collect()
[('rojo', 1), ('azul', 2), ('verde', 1), ('amarillo', 1)]
But I tried this:
>>> a = sc.parallelize(['a','b','c','d'])
>>> a.reduce(lambda x,y:x+x)
'aaaaaaaa'
And I was expecting 'aaaa' as a result but no 'aaaaaaaa'
I was looking for an answers reading reduce() docs but I think I'm missing something.
Thanks!

Your x in lambda function is keep changing and so the x of last expression in each step is
a
aa
aaaa
which gives the last result aaaaaaaa. The number of character should be doubled with your expression.

Replace one python object with another everywhere

How do I replace a python object everywhere with another object?
I have two classes, SimpleObject and FancyObject. I've created a SimpleObject, and have several references to it. Now I'd like to create a FancyObject, and make all those references point to the new object.
a = SimpleObject()
some_list.append(a)
b = FancyObject()
a = b is not what I want, it just changes what a points to. I read the following would work, but doesn't. I get an error "Attribute __dict__ is not writable":
a.__dict__ = b.__dict__
What I want is the equivalent of (pseudo-C):
*a = *b
I know this is hacky, but is there any way to accomplish this?

There's no way. It'd let you mutate immutable objects and cause all sorts of nastiness.
x = 1
y = (x,)
z = {x: 3}
magic_replace(x, [1])
# x is now a list!
# The contents of y have changed, and z now has an unhashable key.
x = 1 + 1
# Is x 2, or [1, 1], or something stranger?

You can put that object in global namespace of separate module and than monkey patch it when you need.
objstore.py:
replaceable = object()
sample.py:
import objstore
b = object()
def isB():
return objstore.replaceable is b
if __name__ == '__main__':
print isB()#False
objstore.replaceable = b
print isB()#True
P.S. Rely on monkey patching is a symptom of bad design

PyJack has a function replace_all_refs that replaces all references to an object in memory.
An example from the docs:
>>> item = (100, 'one hundred')
>>> data = {item: True, 'itemdata': item}
>>>
>>> class Foobar(object):
... the_item = item
...
>>> def outer(datum):
... def inner():
... return ("Here is the datum:", datum,)
...
... return inner
...
>>> inner = outer(item)
>>>
>>> print item
(100, 'one hundred')
>>> print data
{'itemdata': (100, 'one hundred'), (100, 'one hundred'): True}
>>> print Foobar.the_item
(100, 'one hundred')
>>> print inner()
('Here is the datum:', (100, 'one hundred'))
Calling replace_all_refs
>>> new = (101, 'one hundred and one')
>>> org_item = pyjack.replace_all_refs(item, new)
>>>
>>> print item
(101, 'one hundred and one')
>>> print data
{'itemdata': (101, 'one hundred and one'), (101, 'one hundred and one'): True}
>>> print Foobar.the_item
(101, 'one hundred and one')
>>> print inner()
('Here is the datum:', (101, 'one hundred and one'))

You have a number of options:
Design this in from the beginning, using the Facade pattern (i.e. every object in your main code is a proxy for something else), or a single mutable container (i.e. every variable holds a list; you can change the contents of the list through any such reference). Advantages are that it works with the execution machinery of the language, and is relatively easily discoverable from the affected code. Downside: more code.
Always refer to the same single variable. This is one implementation of the above. Clean, nothing fancy, clear in code. I would recommend this by far.
Use the debug, gc, and introspection features to hunt down every object meeting your criterion and alter the variables while running. The disadvantage is that the value of variables will change during code execution, without it being discoverable from the affected code. Even if the change is atomic (eliminating a class of errors), because this can change the type of a variable after the execution of code which determined the value was of a different type, introduces errors which cannot reasonably be anticipated in that code. For example
a = iter(b) # will blow up if not iterable
[x for x in b] # before change, was iterable, but between two lines, b was changed to an int.
More subtly, when discriminating between string and non-string sequences (because the defining feature of strings is that iterating them also yields strings, which are themselves iterable), when flattening a structure, code may be broken.
Another answer mentions pyjack which implements option 3. Although it may work, it has all of the problems mentioned. This is likely to be appropriate only in debugging and development.

Take advantage of mutable objects such as a list.
a = [SimpleObject()]
some_list.append(a)
b = FancyObject()
a[0] = b
Proof that this works:
class SimpleObject():
def Who(self):
print 'SimpleObject'
class FancyObject():
def Who(self):
print 'FancyObject'
>>> a = [SimpleObject()]
>>> a[0].Who()
SimpleObject
>>> some_list = []
>>> some_list.append(a)
>>> some_list[0][0].Who()
SimpleObject
>>> b = FancyObject()
>>> b.Who()
FancyObject
>>> a[0] = b
>>> some_list[0][0].Who()
FancyObject

Data structure that sorts elements by value on insert in Python

I need a queue structure that sorts elements (id, value) by value on insert. Also, I need to be able to remove the element with the highest value. I don't need this structure to be thread- safe. In Java, this would, I guess, correspond to PriorirtyQueue.
What structure should I use in Python? Could you provide a single toy example?

Python has something similar (which is really a thread-safe wrapper for heapq):
from Queue import PriorityQueue
q = PriorityQueue()
q.put((-1, 'foo'))
q.put((-3, 'bar'))
q.put((-2, 'baz'))
Instead of the largest, you can get the lowest number with q.get():
>>> q.get()
(-3, 'bar')
If you don't like negatives, you can override the _get method:
class PositivePriorityQueue(PriorityQueue):
def _get(self, heappop=max):
return heappop(self.queue)

You can use the heapq module.
From docs:
This module provides an implementation of the heap queue algorithm,
also known as the priority queue algorithm.

heapq uses a priority queue, but it's a minimum heap so you will need to make the value negative. Also you will need to put the id second since sorting is done from left to right.
>>> import heapq
>>> queue = []
>>> heapq.heappush(queue, (-1, 'a'))
>>> heapq.heappush(queue, (-2, 'a'))
>>> heapq.heappop(queue)
(-2, 'a')

I think what you're looking for can be found in the heapq library. From http://docs.python.org/2/library/heapq.html :
Heap elements can be tuples. This is useful for assigning comparison values (such as task priorities) alongside the main record being tracked:
>>> import heapq
>>>
>>> h = []
>>> heappush(h, (5, 'write code'))
>>> heappush(h, (7, 'release product'))
>>> heappush(h, (1, 'write spec'))
>>> heappush(h, (3, 'create tests'))
>>> heappop(h)
(1, 'write spec')
Is this the desired behavior?

how do I add fields to a namedtuple?

I am working with a list of namedtuples. I would like to add a field to each named tuple after it has already been created. It seems I can do that by just referencing it as an attribute (as in namedtuple.attribute = 'foo'), but then it isn't added to the list of fields. Is there any reason why I shouldn't do it this way if I don't do anything with the fields list? Is there a better way to add a field?
>>> from collections import namedtuple
>>> result = namedtuple('Result',['x','y'])
>>> result.x = 5
>>> result.y = 6
>>> (result.x, result.y)
(5, 6)
>>> result.description = 'point'
>>> (result.x, result.y, result.description)
(5, 6, 'point')
>>> result._fields
('x', 'y')

What you do works because namedtuple(...) returns a new class. To actually get a Result object, you instantiate that class. So the correct way is:
Result = namedtuple('Result', ['x', 'y'])
result = Result(5, 6)
And you'll find that adding attributes to these objects does not work. So the reason you shouldn't do it is because it doesn't work. Only abusing the class objects works, and I hope I don't need to go into detail why this is a horrible, horrible idea.
Note that regardless of whether you can add attributes to namedtuples or not (and even if you list all attributes you need beforehand), you cannot change a namedtuple object after it's created. Tuples are immutable. So if you need to change objects after creation for any reason, in any way or shape, you can't use namedtuple. You're better off defining a custom class (some of the stuff namedtuple adds for you doesn't even make sense for mutable objects).

Notice that here you're modifying the type of the named tuples, not instances of that type. In this case, you'd probably want to create a new type with an additional field from the old one:
result = namedtuple('Result',result._fields+('point',))
e.g.:
>>> result = namedtuple('Result',['x','y'])
>>> result = namedtuple('Result',result._fields+('point',))
>>> result._fields
('x', 'y', 'point')

You can easily concatenate namedtuples, keeping in mind that they are immutable
from collections import namedtuple
T1 = namedtuple('T1', 'a,b')
T2 = namedtuple('T2', 'c,d')
t1 = T1(1,2)
t2 = T2(3,4)
def sum_nt_classes(*args):
return namedtuple('_', ' '.join(sum(map(lambda t:t._fields, args), ())))
def sum_nt_instances(*args):
return sum_nt_classes(*args)(*sum(args,()))
print sum_nt_classes(T1,T2)(5,6,7,8)
print sum_nt_instances(t1,t2)

You cannot add a new field to a namedtuple after defining it. Only way is to create a new template and creating new namedtuple instances.
Analysis
>>> from collections import namedtuple
>>> result = namedtuple('Result',['x','y'])
>>> result
<class '__main__.Result'>
result is not a tuple, but the class which creates tuples.
>>> result.x
<property object at 0x02B942A0>
You create a new tuple like this:
>>> p = result(1, 2)
>>> p
Result(x=1, y=2)
>>> p.x
1
Prints the value x in p.
>>> p.x = 5
Traceback (most recent call last):
File "<pyshell#10>", line 1, in <module>
p.x = 5
AttributeError: can't set attribute
This throws error because tuple is immutable.
>>> result.x = 5
>>> result
<class '__main__.Result'>
>>> result._fields
('x', 'y')
>>> p = result(1, 2)
>>> p
Result(x=1, y=2)
This doesn't change anything.
>>> result.description = 'point'
>>> result
<class '__main__.Result'>
>>> result._fields
('x', 'y')
This doesn't change anything either.
Solution
>>> result = namedtuple('Result', ['x','y'])
>>> p = result(1, 2)
>>> p
Result(x=1, y=2)
>>> # I need one more field
>>> result = namedtuple('Result',['x','y','z'])
>>> p1 = result(1, 2, 3)
>>> p1
Result(x=1, y=2, z=3)
>>> p
Result(x=1, y=2)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.