I have designed a circular priority queue. But it took me a while because it is so conditional and has a bit much of a time complexity.
I implemented it using a list. But I need a more efficient circular priority queue implementation.
I'll illustrate my queue structure, sometimes it would be helpful for someone who seeks for a code to understand circular priority queues.
class PriorityQueue:
def __init__(self,n,key=None):
if key is None:
key=lambda x:x
self.maxsize = n
self.key=key
self.arr = list(range(self.maxsize))
self.rear = -1
self.front = 0
self.nelements=0
def isPQueueful(self):
return self.nelements==self.maxsize
def isPQueueempty(self):
return self.nelements==0
def insert(self, item):
if not self.isPQueueful():
pos=self.rear+1
scope = range(self.rear - self.maxsize, self.front - self.maxsize - 1, -1)
if self.rear==0 and self.rear<self.front:
scope=range(0,self.front-self.maxsize-1,-1)
for i in scope:
if self.key(item)>self.key(self.arr[i]):
self.arr[i+1]=self.arr[i]
pos=i
else:
break
self.rear+=1
if self.rear==self.maxsize:
self.rear=0
if pos==self.maxsize:
pos=0
self.arr[pos]=item
self.nelements+=1
else:
print("Priority Queue is full")
def remove(self):
revalue=None
if not self.isPQueueempty():
revalue=self.arr[self.front]
self.front+=1
if self.front==self.maxsize:
self.front=0
self.nelements-=1
else:
print("Priority Queue is empty")
return revalue
I really appreciate if someone can say whether what I designed is suitable for used in a production code. I think mostly it is not an efficient one.
If so can you point out to me how to design a efficient circular priority queue?
So, think of the interface and implementation separately.
The interface to a circular priority queue will make you think that the structure is a circular queue. It has a "highest" priority head and the next one is slightly lower, and then you get to the end, and the next one is the head again.
The methods you write need to act that way.
But the implementation doesn't actually need to be any kind of queue, list, array or linear structure.
For the implementation, you are trying to maintain a set of nodes that are always sorted by priority. For that, it would be better to use some kind of balanced tree (for example a red-black tree).
You hide that detail below your interface -- when you get to the end, you just reset yourself to the beginning -- your interfaces makes it look circular.
Related
I'm writing a python program that does some operations on combinational circuits like comparing for equality to other circuits, merging gates, counting gates, counting connections, finding fanout gates,...
Right now im representing the combinational circuits in the following way:
(I also added the testing for equality)
class Circuit:
def __init__(self):
self.gates = {} # key = the gates number, value = the gate
def __eq__(self, other):
if set(self.gates.keys()) != set(other.gates.keys()):
return False
for key in self.gates.keys():
if self.gates[key] != other.gates[key]:
return False
return True
class Gate:
def __init__(self, gate_type, number):
self.gate_type = gate_type # and, or, nand, nor, xor, xnor
self.number = number
self.incoming_gates = []
self.outgoing_gates = []
def __eq__(self, other):
# i know this is not correct, but in my case correct enough
return (
self.gate_type == other.gate_type
and self.number == other.number
and len(self.incoming) == len(other.incoming)
and len(self.outgoing) == len(other.outgoing)
)
My representation in code seems very laborious to me, so I am looking for a better way to do this. I have searched for best practices on this but didn't find anything.
You're looking to implement a directed graph, with certain data stored in vertices. Wikipedia has a discussion of various ways to represent a graph and here's a stackoverflow talking about the more general problem.
For quickly modifying the topology of the graph, and for doing (merging gates, etc) an adjacency list like you have is often useful.
In general I think the test of an architecture is when you actually start to implement it--I'd suspect you'll become very familiar with the benefits and detriments of your design quickly once you get started using it, and be able to adjust or build helper functions as needed.
You could avoid redundancy in the Gate class by only storing the inbound gate references but that would make the rest of your code more complex to implement. I believe the tradeoff of redundancy vs ease of use should weigh in favour of ease of use.
I don't know how you implement the connections between the gates, but if you hold object references in self.incoming_gates / self.outgoing_gates, you can probably define them based only on incoming links and update the source's outgoing_gate list with self automatically (possibly in the constructor itself)
I have tried to implement a BST. As of now it only adds keys according to the BST property(Left-Lower, Right-Bigger). Though I implemented it in a different way.
This is how I think BST's are supposed to be
Single Direction BST
How I have implemented my BST
Bi-Directional BST
The question is whether or not is it the correct implementation of BST?
(The way i see it in double sided BST's it would be easier to search, delete and insert)
import pdb;
class Node:
def __init__(self, value):
self.value=value
self.parent=None
self.left_child=None
self.right_child=None
class BST:
def __init__(self,root=None):
self.root=root
def add(self,value):
#pdb.set_trace()
new_node=Node(value)
self.tp=self.root
if self.root is not None:
while True:
if self.tp.parent is None:
break
else:
self.tp=self.tp.parent
#the self.tp varible always is at the first node.
while True:
if new_node.value >= self.tp.value :
if self.tp.right_child is None:
new_node.parent=self.tp
self.tp.right_child=new_node
break
elif self.tp.right_child is not None:
self.tp=self.tp.right_child
print("Going Down Right")
print(new_node.value)
elif new_node.value < self.tp.value :
if self.tp.left_child is None:
new_node.parent=self.tp
self.tp.left_child=new_node
break
elif self.tp.left_child is not None:
self.tp=self.tp.left_child
print("Going Down Left")
print(new_node.value)
self.root=new_node
newBST=BST()
newBST.add(9)
newBST.add(10)
newBST.add(2)
newBST.add(15)
newBST.add(14)
newBST.add(1)
newBST.add(3)
Edit: I have used while loops instead of recursion. Could someone please elaborate as why using while loops instead of recursion is a bad idea in this particular case and in general?
BSTs with parent links are used occasionally.
The benefit is not that the links make it easier to search or update (they don't really), but that you can insert before or after any given node, or traverse forward or backward from that node, without having to search from the root.
It becomes convenient to use a pointer to a node to represent a position in the tree, instead of a full path, even when the tree contains duplicates, and that position remains valid as updates or deletions are performed elsewhere.
In an abstract data type, these properties make it easy, for example, to provide iterators that aren't invalidated by mutations.
You haven't described how you gain anything with the parent pointer. An algorithm that cares about rewinding to the parent node, will do so by crawling back up the call stack.
I've been there -- in my data structures class, I implemented my stuff with bi-directional pointers. When we got to binary trees, those pointers ceased to be useful. Proper use of recursion replaces the need to follow a link back up the tree.
Is there a way to check if an object is dependent via parenting, constraints, or connections to another object? I would like to do this check prior to parenting an object to see if it would cause dependency cycles or not.
I remember 3DsMax had a command to do this exactly. I checked OpenMaya but couldn't find anything. There is cmds.cycleCheck, but this only works when there currently is a cycle, which would be too late for me to use.
The tricky thing is that these 2 objects could be anywhere in the scene hierarchy, so they may or may not have direct parenting relationships.
EDIT
It's relatively easy to check if the hierarchy will cause any issues:
children = cmds.listRelatives(obj1, ad = True, f = True)
if obj2 in children:
print "Can't parent to its own children!"
Checking for constraints or connections is another story though.
depending on what you're looking for, cmds.listHistory or cmds.listConnections will tell you what's coming in to a given node. listHistory is limited to a subset of possible connections that drive shape node changes, so if you're interested in constraints you'll need to traverse the listConnections for your node and see what's upstream. The list can be arbitrarily large because it may include lots of hidden nodes like like unit translations, group parts and so on that you probably don't want to care about.
Here's simple way to troll the incoming connections of a node and get a tree of incoming connections:
def input_tree(root_node):
visited = set() # so we don't get into loops
# recursively extract input connections
def upstream(node, depth = 0):
if node not in visited:
visited.add(node)
children = cmds.listConnections(node, s=True, d=False)
if children:
grandparents = ()
for history_node in children:
grandparents += (tuple(d for d in upstream(history_node, depth + 1)))
yield node, tuple((g for g in grandparents if len(g)))
# unfold the recursive generation of the tree
tree_iter = tuple((i for i in upstream(root_node)))
# return the grandparent array of the first node
return tree_iter[0][-1]
Which should produce a nested list of input connections like
((u'pCube1_parentConstraint1',
((u'pSphere1',
((u'pSphere1_orientConstraint1', ()),
(u'pSphere1_scaleConstraint1', ()))),)),
(u'pCube1_scaleConstraint1', ()))
in which each level contains a list of inputs. You can then troll through that to see if your proposed change includes that item.
This won't tell you if the connection will cause a real cycle, however: that's dependent on the data flow within the different nodes. Once you identify the possible cycle you can work your way back to see if the cycle is real (two items affecting each other's translation, for example) or harmless (I affect your rotation and you affect my translation).
This is not the most elegant approach, but it's a quick and dirty way that seems to be working ok so far. The idea is that if a cycle happens, then just undo the operation and stop the rest of the script. Testing with a rig, it doesn't matter how complex the connections are, it will catch it.
# Class to use to undo operations
class UndoStack():
def __init__(self, inputName = ''):
self.name = inputName
def __enter__(self):
cmds.undoInfo(openChunk = True, chunkName = self.name, length = 300)
def __exit__(self, type, value, traceback):
cmds.undoInfo(closeChunk = True)
# Create a sphere and a box
mySphere = cmds.polySphere()[0]
myBox = cmds.polyCube()[0]
# Parent box to the sphere
myBox = cmds.parent(myBox, mySphere)[0]
# Set constraint from sphere to box (will cause cycle)
with UndoStack("Parent box"):
cmds.parentConstraint(myBox, mySphere)
# If there's a cycle, undo it
hasCycle = cmds.cycleCheck([mySphere, myBox])
if hasCycle:
cmds.undo()
cmds.warning("Can't do this operation, a dependency cycle has occurred!")
I have a thread which is updating a list called l. Am I right in saying that it is thread-safe to do the following from another thread?
filter(lambda x: x[0] == "in", l)
If its not thread safe, is this then the correct approach:
import threading
import time
import Queue
class Logger(threading.Thread):
def __init__(self, log):
super(Logger, self).__init__()
self.log = log
self.data = []
self.finished = False
self.data_lock = threading.Lock()
def run(self):
while not self.finished:
try:
with self.data_lock:
self.data.append(self.log.get(block=True, timeout=0.1))
except Queue.Empty:
pass
def get_data(self, cond):
with self.data_lock:
d = filter(cond, self.data)
return d
def stop(self):
self.finished = True
self.join()
print("Logger stopped")
where the get_data(self, cond) method is used to retrieve a small subset of the data in the self.data in a thread safe manner.
First, to answer your question in the title: filter is just a function. Hence, its thread-safety will rely on the data-structure you use it with.
As pointed out in the comments already, list operations themselves are thread-safe in CPython and protected by the GIL, but that is arguably only an implementation detail of CPython that you shouldn't really rely on. Even if you could rely on it, thread safety of some of their operations probably does not mean the kind of thread safety you mean:
The problem is that iterating over a sequence with filter is in general not an atomic operation. The sequence could be changed during iteration. Depending on the data-structure underlying your iterator this might cause more or less weird effects. One way to overcome this problem is by iterating over a copy of the sequence that is created with one atomic action. Easiest way to do this for standard sequences like tuple, list, string is with the slice operator like this:
filter(lambda x: x[0] == "in", l[:])
Apart from this not necessarily being thread-safe for other data-types, there's one problem with this though: it's only a shallow copy. As your list's elements seem to be list-like as well, another thread could in parallel do del l[1000][:] to empty one of the inner lists (which are pointed to in your shallow copy as well). This would make your filter expression fail with an IndexError.
All that said, it's not a shame to use a lock to protect access to your list and I'd definitely recommend it. Depending on how your data changes and how you work with the returned data, it might even be wise to deep-copy the elements while holding the lock and to return those copies. That way you can guarantee that once returned the filter condition won't suddenly change for the returned elements.
Wrt. your Logger code: I'm not 100 % sure how you plan to use this and if it's critical for you to run several threads on one queue and join them. What looks weird to me is that you never use Queue.task_done() (assuming that its self.log is a Queue). Also your polling of the queue is potentially wasteful. If you don't need the join of the thread, I'd suggest to at least turn the lock acquisition around:
class Logger(threading.Thread):
def __init__(self, log):
super(Logger, self).__init__()
self.daemon = True
self.log = log
self.data = []
self.data_lock = threading.Lock()
def run(self):
while True:
l = self.log.get() # thread will sleep here indefinitely
with self.data_lock:
self.data.append(l)
self.log.task_done()
def get_data(self, cond):
with self.data_lock:
d = filter(cond, self.data)
# maybe deepcopy d here
return d
Externally you could still do log.join() to make sure that all of the elements of the log queue are processed.
If one thread writes to a list and another thread reads that list, the two must be synchronized. It doesn't matter for that aspect whether the reader uses filter(), an index or iteration or whether the writer uses append() or any other method.
In your code, you achieve the necessary synchronization using a threading.Lock. Since you only access the list within the context of with self.data_lock, the accesses are mutually exclusive.
In summary, your code is formally correct concerning the list handling between threads. But:
You do access self.finished without the lock, which is problematic. Assigning to that member will change self, i.e. the mapping of the object to the according members, so this should be synced. Effectively, this won't hurt, because True and False are global constants, at the worst you will have a short delay between setting the state in one thread and seeing the state in the other. It remains bad, because it is habit-forming.
As a rule, when you use a lock, always document which objects this lock protects. Also, document which object is accessed by which thread. The fact that self.finished is shared and requires synchronization would have been obvious. Also, making a visual distinction between public functions and data and private ones (beginning with an _underscore, see PEP 8) helps keeping track of this. It also helps other readers.
A similar issue is your baseclass. In general, inheriting from threading.Thread is a bad idea. Rather, include an instance of the thread class and give it a function like self._main_loop to run on. The reason is that you say that your Logger is a Thread and that all of its baseclass' public members are also public members of your class, which is probably a much wider interface than what you intended.
You should never block with a lock held. In your code, you block in self.log.get(block=True, timeout=0.1) with the lock on the mutex. In that time, even if nothing actually happens, no other thread will be able to call and complete a call to get_data(). There is actually just a tiny window between unlocking the mutex and locking it again where a caller of get_data() does not have to wait, which is very bad for performance. I could even imagine that your question is motivated by the really bad performance this causes. Instead, call log.get(..) without lock, it shouldn't need one. Then, with the lock held, append data to self.data and check self.finished.
Sorry for such a silly question but Python docs are confusing...
Link 1: Queue Implementation
http://docs.python.org/library/queue.html
It says that Queue has a class for the priority queue. But I could not find how to implement it.
class Queue.PriorityQueue(maxsize=0)
Link 2: Heap Implementation
http://docs.python.org/library/heapq.html
Here they say that we can implement priority queues indirectly using heapq
pq = [] # list of entries arranged in a heap
entry_finder = {} # mapping of tasks to entries
REMOVED = '<removed-task>' # placeholder for a removed task
counter = itertools.count() # unique sequence count
def add_task(task, priority=0):
'Add a new task or update the priority of an existing task'
if task in entry_finder:
remove_task(task)
count = next(counter)
entry = [priority, count, task]
entry_finder[task] = entry
heappush(pq, entry)
def remove_task(task):
'Mark an existing task as REMOVED. Raise KeyError if not found.'
entry = entry_finder.pop(task)
entry[-1] = REMOVED
def pop_task():
'Remove and return the lowest priority task. Raise KeyError if empty.'
while pq:
priority, count, task = heappop(pq)
if task is not REMOVED:
del entry_finder[task]
return task
raise KeyError('pop from an empty priority queue'
Which is the most efficient priority queue implementation in Python? And how to implement it?
There is no such thing as a "most efficient priority queue implementation" in any language.
A priority queue is all about trade-offs. See http://en.wikipedia.org/wiki/Priority_queue
You should choose one of these two, based on how you plan to use it:
O(log(N)) insertion time and O(1) (findMin+deleteMin)* time, or
O(1) insertion time and O(log(N)) (findMin+deleteMin)* time
(* sidenote: the findMin time of most queues is almost always O(1), so
here I mostly mean the deleteMin time can either be O(1) quick if the
insertion time is O(log(N)) slow, or the deleteMin time must be
O(log(N)) slow if the insertion time is O(1) fast. One should note that
both may also be unnecessarily slow like with binary-tree based
priority queues.)
In the latter case, you can choose to implement a priority queue with a Fibonacci heap: http://en.wikipedia.org/wiki/Heap_(data_structure)#Comparison_of_theoretic_bounds_for_variants (as you can see, heapq which is basically a binary tree, must necessarily have O(log(N)) for both insertion and findMin+deleteMin)
If you are dealing with data with special properties (such as bounded data), then you can achieve O(1) insertion and O(1) findMin+deleteMin time. You can only do this with certain kinds of data because otherwise you could abuse your priority queue to violate the O(N log(N)) bound on sorting. vEB trees kind of fall under a similar category, since you have a maximum set size (O(log(log(M)) is not referring to the number of elements, but the maximum number of elements) and thus you cannot circumvent the theoretical O(N log(N)) general-purpose comparison-sorting bound.
To implement any queue in any language, all you need is to define the insert(value) and extractMin() -> value operations. This generally just involves a minimal wrapping of the underlying heap; see http://en.wikipedia.org/wiki/Fibonacci_heap to implement your own, or use an off-the-shelf library of a similar heap like a Pairing Heap (a Google search revealed http://svn.python.org/projects/sandbox/trunk/collections/pairing_heap.py )
If you only care about which of the two you referenced are more efficient (the heapq-based code from http://docs.python.org/library/heapq.html#priority-queue-implementation-notes which you included above, versus Queue.PriorityQueue), then:
There doesn't seem to be any easily-findable discussion on the web as to what Queue.PriorityQueue is actually doing; you would have to source dive into the code, which is linked to from the help documentation: http://hg.python.org/cpython/file/2.7/Lib/Queue.py
224 def _put(self, item, heappush=heapq.heappush):
225 heappush(self.queue, item)
226
227 def _get(self, heappop=heapq.heappop):
228 return heappop(self.queue)
As we can see, Queue.PriorityQueue is also using heapq as an underlying mechanism. Therefore they are equally bad (asymptotically speaking). Queue.PriorityQueue may allow for parallel queries, so I would wager that it might have a very slightly constant-factor more of overhead. But because you know the underlying implementation (and asymptotic behavior) must be the same, the simplest way would simply be to run them on the same large dataset.
(Do note that Queue.PriorityQueue does not seem to have a way to remove entries, while heapq does. However this is a double-edged sword: Good priority queue implementations might possibly allow you to delete elements in O(1) or O(log(N)) time, but if you use the remove_task function you mention, and let those zombie tasks accumulate in your queue because you aren't extracting them off the min, then you will see asymptotic slowdown which you wouldn't otherwise see. Of course, you couldn't do this with Queue.PriorityQueue in the first place, so no comparison can be made here.)
The version in the Queue module is implemented using the heapq module, so they have equal efficiency for the underlying heap operations.
That said, the Queue version is slower because it adds locks, encapsulation, and a nice object oriented API.
The priority queue suggestions shown in the heapq docs are meant to show how to add additional capabilities to a priority queue (such as sort stability and the ability to change the priority of a previously enqueued task). If you don't need those capabilities, then the basic heappush and heappop functions will give you the fastest performance.
Although this question has been answered and marked accepted, still here is a simple custom implementation of Priority Queue without using any module to understand how it works.
# class for Node with data and priority
class Node:
def __init__(self, info, priority):
self.info = info
self.priority = priority
# class for Priority queue
class PriorityQueue:
def __init__(self):
self.queue = list()
# if you want you can set a maximum size for the queue
def insert(self, node):
# if queue is empty
if self.size() == 0:
# add the new node
self.queue.append(node)
else:
# traverse the queue to find the right place for new node
for x in range(0, self.size()):
# if the priority of new node is greater
if node.priority >= self.queue[x].priority:
# if we have traversed the complete queue
if x == (self.size()-1):
# add new node at the end
self.queue.insert(x+1, node)
else:
continue
else:
self.queue.insert(x, node)
return True
def delete(self):
# remove the first node from the queue
return self.queue.pop(0)
def show(self):
for x in self.queue:
print str(x.info)+" - "+str(x.priority)
def size(self):
return len(self.queue)
Find the complete code and explanation here: https://www.studytonight.com/post/implementing-priority-queue-in-python (Updated URL)