Using heaps for schedulers - python

On the Python official docs here, the following is mentioned regarding heaps:
A nice feature of this sort is that you can efficiently insert new
items while the sort is going on, provided that the inserted items are
not “better” than the last 0’th element you extracted. This is
especially useful in simulation contexts, where the tree holds all
incoming events, and the “win” condition means the smallest scheduled
time. When an event schedules other events for execution, they are
scheduled into the future, so they can easily go into the heap
I can only think of the following simple algorithm to implement a scheduler using heap:
# Priority queue using heap
pq = []
# The first element in the tuple represents the time at which the task should run.
task1 = (1, Task(...))
task2 = (2, Task(...))
add_task(pq, task1)
add_task(pq, task2)
# Add a few more root-level tasks
while pq:
next_task = heapq.heappop()
next_task.perform()
for child_task in next_task.get_child_tasks():
# Add new child tasks if available
heapq.heappush(pq, child_task)
In this, where does sorting even come into the picture?
And even if the future child tasks have a time for the 'past', still this algorithm would work correctly.
So, why is the author warning about the child events only being scheduled for the future??
And what does this mean:
you can efficiently insert new items while the sort is going on,
provided that the inserted items are not “better” than the last 0’th
element you extracted.

Heap are used as data structure for priority queue, in fact the fundamental in a min heap is that you have the lowest priority on top (or in max heap the higher priority on top). Therefore you can always extract lowest or highest element without search it.
You can always insert new element during the sorting, try to look how the heapSort works. Every time you need to build your heap and then extract the maximum value and put it on the end of the array, after you decrement the heap.length of 1.
If you already sorted some numbers: [..., 13, 15, 16] and you insert a new number that is higher of the last element that is extracted (13 = 0’th element) you will get a wrong solution, because you will extract the new number but you won't put it in the right place: [1, 2, 5, 7, 14, 13, 15, 16]. It will be placed before 13 because it swap the element on heap.length position.
This is obviously wrong so you can only insert element that are less of the 0’th element.

Related

my Python multiprocesses are apparently not independent

I have a very specific problem with python parallelisation let's see if I can explain it,
I want to execute a function foo() using the multiprocessing library for parallelisation.
# Creation of the n processes, in this case 4, and start it
threads = [multiprocessing.Process(target=foo, args=(i)) for i in range(n)]
for th in threads:
th.start()
The foo() function is a recursive function who explores a tree in depth until one specific event happens. Depending on how it expands through the tree, this event can occur in a few steps, for example 5 or even in millions. The tree nodes are a set of elements and in each step I select a random element from this set with rand_element = random.sample(node.set_of_elements,1)[0] and make a recursive call accordingly to them, i.e., two different random elements have different tree paths.
The problem is that for some unknown reason, the processes apparently does not behave independently. For example, if I run 4 processes in parallel, sometimes they return this result.
1, Number of steps: 5
2, Number of steps: 5
3, Number of steps: 5
4, Number of steps: 5
that is to say, all the processes take the "good path" and ends in a very few steps. On the other hand, other times it returns this.
1, Number of steps: 6516
2, Number of steps: 8463
3, Number of steps: 46114
4, Number of steps: 56312
that is to say, all the processes takes "bad paths". I haven't had a single execution in which at least one takes the "good path" and the rest the "bad path".
If I run foo() multiple times sequentially, more than a half of execution ends with less than 5000 steps, but in concurrency I don't see this proportion, all the processes ends either fast or slow.
How is it possible?
Sorry if I can't give you more precise details about the program and execution, but it is too big and complex to explain here.
I have found the solution, I post it in case someone finds it helpful
The problem was that at some point inside foo(), I have used the my_set.pop() method instead of set.remove(random.sample (my_set, 1) [0]). The first one, my_set.pop() doesn't actually return a random element. In Python 3.6 sets have a concrete order like lists, the key is that the established order is generated randomly, so, to return a (pseudo)random element, the my_set.pop() method, always returns the first element. The problem was that in my case, all processes share that order, so my_set.pop() returns the same first element in all of them.
You should use collections.OrderedDict (or another ordered data structure) rather than set if your program cares about item order (as random.sample() does, for example). Even in Python 3.7 and later, at the time of this writing, sets are documented as unordered collections, so they should not be used if the order in which items are inserted or enumerated is important to your program.
With set, you should not expect items to be inserted or enumerated in any particular order, even in a pseudorandom order.
See also:
Does Python have an ordered set?
Are dictionaries ordered in Python 3.6+?
https://stackoverflow.com/a/64855489/815724

Python: Better way to write nested for loops and if statements

I am trying to find a more Pythonic way of doing the below.
for employee in get_employees:
for jobs in employee['jobs']:
for nemployee in employee_comps:
if nemployee['employee_id'] == employee['id']:
for njob in nemployee['hourly_compensations']:
if njob['job_id'] == jobs['id']:
njob['rate'] = jobs['rate']
It works but seems clunky. I'm new to Python, if there is another thread that will help with this please direct me there!
The main comment I would make about the code is that you are free to change the order of the outer three for loops because the operation that you are performing does not depend on the order that you loop over these (as you are not breaking out of any loops when finding a match), and that given that this is the case, there is no point in doing the jobs loop only to reach an if statement inside it that is independent of the value of jobs. It would be more efficient to put the jobs loop inside the other two, so that it can also be inside the if, i.e. the loop is only performed for those combinations of values of employee and nemployee where the if condition evaluates True.
Beyond this but less importantly, where there are consecutive for statements (over independent iterables) after doing this rearrangement, you could replace them with a single loop over an itertools.product iterator to reduce the depth of nesting of for loops if you wish (reducing it from four to two explicit loops):
from itertools import product
for employee, nemployee in product(get_employees, employee_comps):
if nemployee['employee_id'] == employee['id']:
for jobs, njob in product(employee['jobs'],
nemployee['hourly_compensations']):
if njob['job_id'] == jobs['id']:
njob['rate'] = jobs['rate']
The code you have is very clean and pythonic, I would suggest staying with that.
If you want it in one line, this should work, but I don't have data to test it on, so I'm not sure.
[[njob.update({njob['rate']: jobs['rate']}) for njob in nemployee['hourly_compensations'] if njob['job_id'] == jobs['id']] for employee in get_employees for jobs in employee['jobs'] for nemployee in employee_comps if nemployee['employee_id'] == employee['id']]

concurrent collections in python

I was wondering if there was any concurrent structure like queue in python but with the ability to remove a specific element.
Example:
import queue
#with queue would be
q = queue.Queue()
#put some element
q.put(elem)
#i want to delete a specific element
#but queue does not provide this method
q.remove(elem)
What could I use?
Actually Python lists work like what you are looking for. As a matter of fact, the translation of your code (which requires no imports) should look like this:
#Create the list
q = [element1, element2, element3...]
#Insert element
q.insert(position, element4)
#Insert element in the end
q.append(element4)
#Remove element
del(q[position])
So that you can manage it as desired.
I hope that helps you.

How can I get all child nodes of general tree in Python

My tree has the following structure:
tree={'0':('1','2','3'), '1':('4'), '2':('5','6'), '3':(), '4':('7','8'), '8':('9','10','11')}
How can I wrote Python code to retrieve all given child nodes of a particular node?
For example, if I give it node 4, the code should retrieve 7,8,9,10,11.
For node 2, it should retrieve 5, 6 and so on.
I just started learning the basics of Python but I have no idea how to implement this for non-binary trees..
You can use a queue. Once you've gotten the user's requested value, push it into the queue. Then, while the queue isn't empty, pop a value, print it, check the dict, and if the current value is a key in the dict, add each of those values to the queue to check them in the next pass.
import queue
tree={'0':('1','2','3'), '1':('4'), '2':('5','6'), '3':(), '4':('7','8'), '8':('9','10','11')}
num = input("what you want ")
q = queue.Queue()
q.put(num)
while not q.empty():
n = q.get()
for s in n:
print(s)
if s in tree:
q.put(tree[s])
Demo
Note that if you have a tree tree={'0':('1'), '1':('0')}, or any other circular reference, this code will run forever. Be careful!

How to try all possible paths?

I need to try all possible paths, branching every time I hit a certain point. There are <128 possible paths for this problem, so no need to worry about exponential scaling.
I have a player that can take steps through a field. The player
takes a step, and on a step there could be an encounter.
There are two options when an encounter is found: i) Input 'B' or ii) Input 'G'.
I would like to try both and continue repeating this until the end of the field is reached. The end goal is to have tried all possibilities.
Here is the template, in Python, for what I am talking about (Step object returns the next step using next()):
from row_maker_inlined import Step
def main():
initial_stats = {'n':1,'step':250,'o':13,'i':113,'dng':0,'inp':'Empty'}
player = Step(initial_stats)
end_of_field = 128
# Walk until reaching an encounter:
while player.step['n'] < end_of_field:
player.next()
if player.step['enc']:
print 'An encounter has been reached.'
# Perform an input on an encounter step:
player.input = 'B'
# Make a branch of player?
# perform this on the branch:
# player.input = 'G'
# Keep doing this, and branching on each encounter, until the end is reached.
As you can see, the problem is rather simple. Just I have no idea, as a beginner programmer, how to solve such a problem.
I believe I may need to use recursion in order to keep branching. But I really just do not understand how one 'makes a branch' using recursion, or anything else.
What kind of solution should I be looking at?
You should be looking at search algorithms like breath first search (BFS) and depth first search (DFS).
Wikipedia has this as the pseudo-code implementation of BFS:
procedure BFS(G, v) is
let Q be a queue
Q.enqueue(v)
label v as discovered
while Q is not empty
v← Q.dequeue()
for all edges from v to w in G.adjacentEdges(v) do
if w is not labeled as discovered
Q.enqueue(w)
label w as discovered
Essentially, when you reach an "encounter" you want to add this point to your queue at the end. Then you pick your FIRST element off of the queue and explore it, putting all its children into the queue, and so on. It's a non-recursive solution that is simple enough to do what you want.
DFS is similar but instead of picking the FIRST element form the queue, you pick the last. This makes it so that you explore a single path all the way to a dead end before coming back to explore another.
Good luck!

Categories