Removing 2nd item from a queue, using another queue as an ADT - python

class Queue:
def __init__(self):
self._contents = []
def enqueue(self, obj):
self._contents.append(obj)
def dequeue(self):
return self._contents.pop(0)
def is_empty(self):
return self._contents == []
class remove_2nd(Queue):
def dequeue(self):
first_item = Queue.dequeue(self)
# Condition if the queue length isn't greater than two
if self.is_empty():
return first_item
else:
# Second item to return
second_item = Queue.dequeue(self)
# Add back the first item to the queue (stuck here)
The remove_2nd class is basically a queue except if the length of the queue is greater than two, then you remove the 2nd item every dequeue. If it isn't then you do the same as a normal queue. I am only allowed to use the methods in the queue to finish remove_2nd.
My algorithm:
If queue is bigger than two:
Lets say my queue is 1 2 3 4
I would first remove the first item so it becomes
2 3 4
I would then remove the 2nd item and that will be the returned value, so then it will be
3 4
I would then add back the first item as wanted
1 3 4
The problem is, I don't know how to add it back. Enqueue puts it at the end, so basically it would be 3 4 1. I was thinking of reversing the 3 4, but I don't know how to do that either. Any help?
Just want to point out, I'm not allowed to call on _contents or allowed to create my own private variable for the remove_2nd class. This should strictly be done using the queue adt

def insert(self,position,element):
self._contents.insert(position,element)

To get the queue back in the right order after removing the first two elements, you'll need to remove all the other elements as well. Once the queue is empty, you can add back the first element and all the other elements one by one.
How exactly you keep track of the values you're removing until you can add them again is a somewhat tricky question that depends on the rules of your assignment. If you can use Python's normal types (as local variables, not as new attributes for your class), you can put them in a list or a deque from the collections module. But you can also just use another Queue instance (an instance of the base type, not your subclass).
Try something like this in your else clause:
second_item = Queue.dequeue(self) # note, this could be written super().dequeue()
temp = Queue()
while not self.is_empty():
temp.enqueue(Queue.dequeue(self))
self.enqueue(first_item)
while not temp.is_empty()
self.enqueue(temp.dequeue())
return second_item
As I commented in the code, Queue.dequeue(self) can be written more "pythonically" using the super builtin. The exact details of the call depend on which version of Python you're using (Python 3's super is much fancier than Python 2's version).
In Python 2, you have to explicitly pass self and your current class, so the call would be super(self, dequeue_2nd).dequeue(). In Python 3, you simply use super().dequeue() and it "magically" takes care of everything (in reality, the compiler figures out the class at compile time, and adds some extra code to let it find self at run time).
For your simple code with only basic inheritance, there's no difference between using super or explicitly looking up the base class by name. But in more complicated situations, using super is very important. If you ever use multiple inheritance, calling overridden methods with super is often the only way to get things to work sanely.

Related

Python concurrency with concurrent.futures.ThreadPoolExecutor

Consider the following snippet:
import concurrent.futures
import time
from random import random
class Test(object):
def __init__(self):
self.my_set = set()
def worker(self, name):
temp_set = set()
temp_set.add(name)
temp_set.add(name*10)
time.sleep(random() * 5)
temp_set.add(name*10 + 1)
self.my_set = self.my_set.union(temp_set) # question 1
return name
def start(self):
result = []
names = [1,2,3,4,5,6,7]
with concurrent.futures.ThreadPoolExecutor(max_workers=len(names)) as executor:
futures = [executor.submit(self.worker, x) for x in names]
for future in concurrent.futures.as_completed(futures):
result.append(future.result()) # question 2
Is there a chance self.my_set can become corrupted via the line marked "question 1"? I believe union is atomic, but couldn't the assignment be a problem?
Is there a problem on the line marked "question 2"? I believe the list append is atomic, so perhaps this is ok.
I've read these docs:
https://docs.python.org/3/library/stdtypes.html#set
https://web.archive.org/web/20201101025814id_/http://effbot.org/zone/thread-synchronization.htm
Is Python variable assignment atomic?
https://docs.python.org/3/glossary.html#term-global-interpreter-lock
and executed the snippet code provided in this question, but I can't find a definitive answer to how concurrency should work in this case.
Regarding question 1: Think about what's going on here:
self.my_set = self.my_set.union(temp_set)
There's a sequence of at least three distinct steps
The worker call grabs a copy of self.my_set (a reference to Set object)
The union function constructs a new set.
The worker assigns self.my_set to refer to the newly constructed set.
So what happens if two or more workers concurrently try to do the same thing? (note: it's not guaranteed to happen this way, but it could happen this way.)
Each of them could grab a reference to the original my_set.
Each of them could compute a new set, consisting only of the original members of my_set plus its own contribution.
Each of them could assign its new set to the my_set variable.
The problem is in step three. If it happened this way, then each of those new sets only would contain the contribution from the one worker that created it. There would be no single set containing the new contributions from all of the workers. When it's all over, my_set would only refer to one of those new sets—whichever thread was the last to perform the assignment would "win"—and the other new sets all would be be thrown away.
One way to prevent that would be to use mutual exclusion to keep other threads from trying to compute their new sets and update the shared variable at the same time:
class Test(object):
def __init__(self):
self.my_set = set()
self.my_set_mutex = threading.Lock()
def worker(self, name):
...
with self.my_set_mutex
self.my_set = self.my_set.union(temp_set)
return name
Regarding question 2: It doesn't matter whether or not appending to a list is "atomic." The result variable is local to the start method. In the code that you've shown, the list to which result refers is inaccessible to any other thread than the one that created it. There can't be any interference between threads unless you share the list with other threads.

Questions about "yield from" and "next" behaviour

So I am making a generator from a list but would like to call next on it, which should just return the next item in the list, however it returns the same object, i.e. the whole piece of code is run again in stead of just returning the yield part. The example below shows the expected behaviour when looping through the list, but then the next returns 1 twice, whereas I would like the second call of next to return 2.
class demo:
#property
def mygen(self):
a = [1,2,3,4,5]
b = [6,7,8,9,10]
yield from a
yield from b
if __name__=='__main__':
demo1 = demo()
print([_ for _ in demo1.mygen])
demo2 = demo()
print(next(demo2.mygen))
print(next(demo2.mygen))
There's a reason I am turning a list into a generator as it is the response from an api call and would like to dynamically return the next item in the list and make the api call if it comes to the end of that list.
Every call to the property creates a new generator. You should store the generator returned by the property to a variable. Then you will be able to call next multiple times. Change
print(next(demo2.mygen))
print(next(demo2.mygen)) # calls next on a fresh generator
to
gen = demo2.mygen
print(next(gen))
print(next(gen)) # calls next on the SAME generator
As others have pointed out, this behaviour should have you reconsider making this a property in the first place. Seeing
demo2.mygen()
makes it much more obvious that there is some dynamic stuff going on, while
demo2.mygen
gives the impression of a more static attribute producing the same object every time. You can find some more elaboration on that here.

Loop through changing dataset with inlineCallbacks/yield (python-twisted)

I have a defer.inlineCallback function for incrementally updating a large (>1k) list one piece at a time. This list may change at any time, and I'm getting bugs because of that behavior.
The simplest representation of what I'm doing is:-
#defer.inlineCallbacks
def _get_details(self, dt=None):
data = self.data
for e in data:
if needs_update(e):
more_detail = yield get_more_detail(e)
do_the_update(e, more_detail)
schedule_future(self._get_details)
self.data is a list of dictionaries which is initially populated with basic information (e.g. a name and ID) at application start. _get_details will run whenever allowed to by the reactor to get more detailed information for each item in data, updating the item as it goes along.
This works well when self.data does not change, but once it is changed (can be at any point) the loop obviously refers to the wrong information. In fact in that situation it would be better to just stop the loop entirely.
I'm able to set a flag in my class (which the inlineCallback can then check) when the data is changed.
Where should this check be conducted?
How does the inlineCallback code execute compared to a normal deferred (and indeed to a normal python generator).
Does code execution stop everytime it encounters yield (i.e. can I rely on this code between one yield and the next to be atomic)?
In the case of unreliable large lists, should I even be looping through the data (for e in data), or is there a better way?
the Twisted reactor never preempts your code while it is executing -- you have to voluntarily yield to the reactor by returning a value. This is why it is such a terrible thing to write Twisted code that blocks on I/O, because the reactor is not able to schedule any tasks while you are waiting for your disk.
So the short answer is that yes, execution is atomic between yields.
Without #inlineCallbacks, the _get_details function returns a generator. The #inlineCallbacks annotation simply wraps the generator in a Deferred that traverses the generator until it reaches a StopIteration exception or a defer.returnValue exception. When either of those conditions is reached, inlineCallbacks fires its Deferred. It's quite clever, really.
I don't know enough about your use case to help with your concurrency problem. Maybe make a copy of the list with tuple() and update that. But it seems like you really want an event-driven solution and not a state-driven one.
You need to protect access to shared resource (self.data).
You can do this with: twisted.internet.defer.DeferredLock.
http://twistedmatrix.com/documents/current/api/twisted.internet.defer.DeferredLock.html
Method acquire
Attempt to acquire the lock. Returns a Deferred that fires on lock
acquisition with the DeferredLock as the value. If the lock is locked,
then the Deferred is placed at the end of a waiting list.
Method release
Release the lock. If there is a waiting list, then the first Deferred in that waiting list will be called back.
#defer.inlineCallback
def _get_details(self, dt=None):
data = self.data
i = 0
while i < len(data):
e = data[i]
if needs_update(e):
more_detail = yield get_more_detail(e)
if i < len(data) or data[i] != e:
break
do_the_update(e, more_detail)
i += 1
schedule_future(self._get_details)
Based on more testing, the following are my observations.
for e in data iterates through elements, with the element still existing even if data itself does not, both before and after the yield statement.
As far as I can tell, execution is atomic between one yield and the next.
Looping through the data is more transparently done by using a counter. This also allows for checking whether the data has changed. The check can be done anytime after yield because any changes must have occurred before yield returned. This results in the code shown above.
self.data is a list of dictionaries...once it is changed (can be at any point) the loop obviously refers to the wrong information
If you're modifying a list while you iterate it, as Raymond Hettinger would say "You're living in the land of sin and you deserve everything that happens to you." :) Scenarios like this should be avoided or the list should be immutable. To circumvent this problem, you can use self.data.pop() or DeferredQueue object to store data. This way you can add and remove elements at anytime without causing adverse effects. Example with a list:
#defer.inlineCallbacks
def _get_details(self, dt=None):
try:
data = yield self.data.pop()
except IndexError:
schedule_future(self._get_details)
defer.returnValue(None) # exit function
if needs_update(e):
more_detail = yield get_more_detail(data)
do_the_update(data, more_detail)
schedule_future(self._get_details)
Take a look at DeferredQueue because a Deferred is returned when the get() function is called, which you can chain callbacks to handle each element you pop from the queue.

Python - scope of wrapping functions [duplicate]

This question already has answers here:
Counting python method calls within another method
(3 answers)
Closed 9 years ago.
The goal is to wrap a function or method and carry data around with the wrapper that's unique to the wrapped function.
As an example - let's say I have object myThing with method foo. I want to wrap myThing.foo with myWrapper, and (as an example) I want to be able to count the number of times myThing.foo is actually called.
So far, the only method I've found to be effective is to just add an attribute to the object -- but this feels a little bit clumsy.
class myThing(object):
def foo(self):
return "Foo called."
def myWrap(some_func):
def _inner(self):
#a wild kludge appears!
try:
self.count += 1
except AttributeError:
self.count = 0
return some_func(self)
return _inner
Stick = myThing()
myThing.foo = myWrap(myThing.foo)
for i in range(0, 10):
Stick.foo() ##outputs "Foo called." 10 times
Stick.count # the value of Stick.count
So, this achieves the goal, and in fact if there are multiple instances of myThing then each one 'tracks' its own self.count value, which is part of my intended goal. However, I am not certain that adding an attribute to each instance of myThing is the best way to achieve this. If, for example, I were to write a wrapper for a function that wasn't part of an object or class, adding attributes to an object that isn't there won't do anything.
Maybe there is a hole in my understanding of what's actually happening when a method or function is wrapped. I do know that one can maintain some kind of static data within a closure, as with the following example:
def limit(max_value):
def compare(x):
return x > max_value
return compare
isOverLimit = limit(30)
isOverLimit(45) #returns True
isOverLimit(12) #returns False
alsoOver = limit(20)
alsoOver(25) # returns True
isOverLimit(25) # returns False
The second example proving that it's not simply modifying the original instance of limit, and that isOverLimit continues to act as it did before the second alsoOver is created. So I get the sense that there's a way for the wrapper to carry an incremental variable around with it, and that I'm just missing something obvious.
Seems like this is a dupe of Counting python method calls within another method
The short answer is to use a decorator on the method/function you want to count, and have the decorator store the counter as a function attribute. See the answers in the question I linked.

Generalized reference that allows efficient deletion from a variety of containers [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am writing a custom container class. A constituent object is created independently of the container, and can be a member of no container or multiple containers. The container's public API should support three operations:
iteration over all objects
insertion of a new object
removal of an existing object
The container does some additional work, and its precise implementation may change.
How can I write the public API to this class so that it remains stable as I change the implementation?
If the container is list-like, efficient removal requires the knowledge of the object's index; knowing the object itself is no good (I don't want to search the whole container for the element).
If the container is set-like, there's nothing equivalent to the index, and I need the object itself.
If the container is like a singly linked list, I need some kind of a reference to the object preceding the object being removed.
If the container is like a doubly linked list, I need a reference to the object itself.
I am thinking to have the removal method take a single argument reference, which has no meaning or use outside of the removal method. The iteration would yield a pair of (object, reference).
Is there any problem with this design? Is there an example or design pattern I can look up?
Ideally, I would rather have the iteration yield a complex object that contains both the original object and the reference, and exhibits the interface of both. But I don't suppose this is doable?
Most container types have a direction that they work well with - from index to indexed, from current to next, etc. Some are bidirectional, but far from all.
Trying to find a value in a python list without using an index is pretty much going to be O(n). You either need to embrace the O(n), or use a different type.
One thing that comes to mind on this, is that if you need to delete something quickly from a lot of container types en masse, you could add an "ignore_this" attribute to your values. If you set it to true, then all your container types start ignoring it, or even removing it when seen.
Just encapsulate a list and a dict / a list and a set, ...
Roughly doubles your memory usage and operation times, but clever encapsulation often makes nearly all problem-relevant operations O(1).
It might be worth looking at collections.OrderedDict if you're using Python 2.7 and above: http://docs.python.org/library/collections.html#collections.OrderedDict
Here's what I'll do unless someone else helps by finding a better solution:
# to get __hash__ and __eq__ return id(self)
class Reference:
def __init__(self, item):
self.item = item
class RemovalAPI:
def add_removal_info(self, item, removal_info):
try:
references = item.__reference
except AttributeError:
references = item.__reference = {}
references[Reference(self)] = removal_info
def get_removal_info(self, item):
try:
references = item.__reference
self_reference = Reference(self)
return references[self_reference]
class Container(list, RemovalAPI):
def __iter__(self):
for i in range(len(self)):
item = self[i]
self.add_removal_info(item, i)
yield item
def remove(self, item):
removal_info = self.get_removal_info(item)
del self[removal_info]
def insert(self, item):
self.add_removal_info(item, len(self))
self.append(item)
# do whatever post-processing I need
# ...
If I then decide to change the implementation from list to some other data structure, the public interface can remain unchanged:
class Container(orderedset, RemovalAPI):
# inheriting __iter__, remove from parent
def insert(self, item):
self.add(item)
# do whatever post-processing I need
# ...
Or...
class Container(linkedlist, RemovalAPI):
def __iter__(self):
it = super().__iter__()
last_item = None
for item in it:
self.add_removal_info(item, last_item)
yield item
def remove(self, item):
removal_info = self.get_removal_info(item)
if removal_info is None:
self.remove_first()
else:
self.remove_after(removal_info)
def insert(self, item):
self.add_removal_info(item, None)
self.add_to_front(item)
# do whatever post-processing I need
# ...

Categories