Django: Refer to previous item in model.save() method fails - python

I'm trying to implement a waiting room for items with queue verification.
The general idea is to put item to waiting room if, calculate it's counter (value defining how many new items are needed to confirm this one), decrease it after adding every new item and finally confirm it after decreasing counter below zero. If there're already items in waiting room, the calculated counter is increased by the counter of previous item so items are in queue.
My implementation is very simple, but I found this solution quite fast.
But referring to previous item is not working for me and I can't find the reason - it often returns 'random' (or at least not last one) value from Waiting objects.
Here's vital code snippet:
class Waiting(models.Model):
item = models.ForeignKey(Item)
counter = models.FloatField(default=0)
(...)
def clearup(self):
(...) #here is decrementing and confirming part - it's working fine
def save(self, update=False):
if update:
return super(Waiting, self).save()
item = self.item
self.clearup()
(...) #nothing important
self.counter = item.quantity * items_list[item.name][1]
last = Waiting.objects.exclude(
item__name="Something I don't want here").order_by('-pk')
if last:
last = last[0]
weight = items_list[last.item.name][1]
self.counter += (last.item.quantity * weight)
super(Waiting, self).save()
Edit: I'm retarded.

Like almost always - it was my fault.
I implemented vital part from another approach to this problem and this caused this issues.
Thanks for trying to help me anyway.

Related

Why is the queue's peep function removing the last item and appending it?

This bit of code is from hackerrank.com.
def pop(self):
#looks at the top of the queue
if len(self.stack2) > 0:
top = self.stack2.pop()
self.stack2.append(top)
Can someone please explain why it is popping off the last item in the stack/queue and then just appending it? I thought in queues, it is first in first out. In that case, the "top" item in the queue should be self.stack2.pop(0) ?
If you are referring to this code:
def peek(self):
if len(self.stack2) > 0:
top = self.stack2.pop()
self.stack2.append(top)
else:
while(len(self.stack1) > 1):
self.stack2.append(self.stack1.pop())
top = self.stack1.pop()
self.stack2.append(top)
return top
... then the clue is in the name: the two lines in question pop a value off the top of self.stack2, storing it in the variable top, then push it back onto the top of the stack so that the stack remains unchanged, and the value can be returned in the last line of the method. Hence the name peek, as in "take a peek at the top value on the stack without permanently changing anything".
The rest of the code, including the else clause in peek(), is the classic implementation of a two-stack queue, explained in detail here:
https://stackoverflow.com/a/39089983

Best way to limit incoming messages to avoid duplicates

I have a system that accepts messages that contain urls, if certain keywords are in the messages, an api call is made with the url as a parameter.
In order to conserve processing and keep my end presentation efficient.
I don't want duplicate urls being submitted within a certain time range.
so if this url ---> http://instagram.com/p/gHVMxltq_8/ comes in and it's submitted to the api
url = incoming.msg['urls']
url = urlparse(url)
if url.netloc == "instagram.com":
r = requests.get("http://api.some.url/show?url=%s"% url)
and then 3 secs later the same url comes in, I don't want it submitted to the api.
What programming method might I deploy to eliminate/limit duplicate messages from being submitted to the api based on time?
UPDATE USING TIM PETERS METHOD:
limit = DecayingSet(86400)
l = limit.add(longUrl)
if l == False:
pass
else:
r = requests.get("http://api.some.url/show?url=%s"% url)
this snippet is inside a long running process, that is accepting streaming messages via tcp.
every time I pass the same url in, l returns True every time.
But when I try it in the interpreter everything is good, it returns False when the set time hasn't expired.
Does it have to do with the fact that the script is running, while the set is being added to?
Instance issues?
Maybe overkill, but I like creating a new class for this kind of thing. You never know when requirements will get fancier ;-) For example,
from time import time
class DecayingSet:
def __init__(self, timeout): # timeout in seconds
from collections import deque
self.timeout = timeout
self.d = deque()
self.present = set()
def add(self, thing):
# Return True if `thing` not already in set,
# else return False.
result = thing not in self.present
if result:
self.present.add(thing)
self.d.append((time(), thing))
self.clean()
return result
def clean(self):
# forget stuff added >= `timeout` seconds ago
now = time()
d = self.d
while d and now - d[0][0] >= self.timeout:
_, thing = d.popleft()
self.present.remove(thing)
As written, it checks for expirations whenever an attempt is made to add a new thing. Maybe that's not what you want, but it should be a cheap check since the deque holds items in order of addition, so gets out at once if no items are expiring. Lots of possibilities.
Why a deque? Because deque.popleft() is a lot faster than list.pop(0) when the number of items becomes non-trivial.
suppose your desired interval is 1 hour, keep 2 counters that increment every hour but they are offset 30 minutes from each other. i. e. counter A goes 1, 2, 3, 4 at 11:17, 12:17, 13:17, 14:17 and counter B goes 1, 2, 3, 4 at 11:47, 12:47, 13:47, 14:47.
now if a link comes in and has either of the two counters same as an earlier link, then consider it to be duplicate.
the benefit of this scheme over explicit timestamps is that you can hash the url+counterA and url+counterB to quickly check whether the url exists
Update: You need two data stores: one, a regular database table (slow) with columns: (url, counterA, counterB) and two, a chunk of n bits of memory (fast). given a url so.com, counterA 17 and counterB 18, first hash "17,so.com" into a range 0 to n - 1 and see if the bit at that address is turned on. similarly, hash "18,so.com" and see if the bit is turned on.
If the bit is not turned on in either case you are sure it is a fresh URL within an hour, so we are done (quickly).
If the bit is turned on in either case then look up the url in the database table to check if it was that url indeed or some other URL that hashed to the same bit.
Further update: Bloom filters are an extension of this scheme.
I'd recommend keeping an in-memory cache of the most-recently-used URLs. Something like a dictionary:
urls = {}
and then for each URL:
if url in urls and (time.time() - urls[url]) < SOME_TIMEOUT:
# Don't submit the data
else:
urls[url] = time.time()
# Submit the data

Circular Buffer Python implementation

I wrote the code for a circular buffer for an interviewstreet question. But as it happened, two testcases passed and the others are failing. The failure of the cause: index out f range. I tried several testcases after that to reproduce the failure. Unfortunately none of them reproduce the error. Here is the code.
Implement a circular buffer of size N. Allow the caller to append, remove and list the contents of the buffer. Implement the buffer to achieve maximum performance for each of the operations.
"A" n - Append the following n lines to the buffer. If the buffer is full they replace the older entries.
"R" n - Remove first n elements of the buffer. These n elements are the ones that were added earliest among the current elements.
"L" - List the elements of buffer in order of their inserting time.
"Q" - Quit.
class circbuffer():
#initialization
def __init__(self,size):
self.maximum=size
self.data=[]
self.current=0
#appending when the buffer is not full
def append(self,x):
if len(self.data)==self.maximum:
self.current=0
self.data[self.current]=x
self.current=(self.current+1)%self.maximum
self.__class__=bufferfull
else:
self.data.append(x)
def remove(self,x):
if self.data:
self.data.pop(0)
def cget(self):
return self.data
class bufferfull:
def append(self,x):
if len(self.data)<self.maximum:
self.data.insert(self.current, x)
else:
self.data[self.current]=x
self.current=(self.current+1)%self.maximum
def remove(self,x):
if self.data:
if self.current>len(self.data):
self.current=0
self.data.pop(self.current)
def cget(self):
return self.data[self.current:]+self.data[:self.current]
n=input()
buf=circbuffer(n)
outputbuf=[]
while True:
com=raw_input().split(' ')
if com[0]=='A':
n=int(com[1])
cominput=[]
for i in xrange(n):
cominput.append(raw_input())
for j in cominput:
buf.append(j)
elif com[0]=="R":
n=int(com[1])
for i in range(n):
buf.remove(i)
elif com[0]=="L":
for i in buf.cget():
outputbuf.append(i)
elif com[0]=="Q":
break
for i in outputbuf:
print i
The error is pointing to self.data.pop(self.current) in class bufferfull. I cannot get the test data from the interviewstreet people. I am trying to come up with testcase myself to reproduce the error.
Any insights?
One bug is here:
def remove(self,x):
if self.data:
if self.current>len(self.data):
self.current=0
self.data.pop(self.current)
If self.current == len(self.data), you'll try to pop a non-existent element.
As a general remark, your implementation is way too complicated and for that reason wouldn't score very highly in my book (others might view this differently). #9000's comment to your question sums it up nicely:
Keep it simple. Don't be clever when you can be straightforward in the same number of lines. All you need is a head pointer, a tail pointer, and a list of a fixed size. You don't need any fancy metaprogramming stuff whatsoever. – #9000
It looks like you are trying to stop the index out of range error with the code below, but the condition you are checking is wrong.
if self.current > len(self.data):
self.current = 0
self.data.pop(self.current)
If you call self.data.pop(len(self.data)) you will definitely get that error since lists are 0-indexed. You probably meant:
if self.current >= len(self.data):
self.current = 0
self.data.pop(self.current)

Separating Progress Tracking and Loop Logic

Suppose i want to track the progress of a loop using the progress bar printer ProgressMeter (as described in this recipe).
def bigIteration(collection):
for element in collection:
doWork(element)
I would like to be able to switch the progress bar on and off. I also want to update it only every x steps for performance reasons. My naive way to do this is
def bigIteration(collection, progressbar=True):
if progressBar:
pm = progress.ProgressMeter(total=len(collection))
pc = 0
for element in collection:
if progressBar:
pc += 1
if pc % 100 = 0:
pm.update(pc)
doWork(element)
However, I am not satisfied. From an "aesthetic" point of view, the functional code of the loop is now "contaminated" with generic progress-tracking code.
Can you think of a way to cleanly separate progress-tracking code and functional code? (Can there be a progress-tracking decorator or something?)
It seems like this code would benefit from the null object pattern.
# a progress bar that uses ProgressMeter
class RealProgressBar:
pm = Nothing
def setMaximum(self, max):
pm = progress.ProgressMeter(total=max)
pc = 0
def progress(self):
pc += 1
if pc % 100 = 0:
pm.update(pc)
# a fake progress bar that does nothing
class NoProgressBar:
def setMaximum(self, max):
pass
def progress(self):
pass
# Iterate with a given progress bar
def bigIteration(collection, progressBar=NoProgressBar()):
progressBar.setMaximum(len(collection))
for element in collection:
progressBar.progress()
doWork(element)
bigIteration(collection, RealProgressBar())
(Pardon my French, er, Python, it's not my native language ;) Hope you get the idea, though.)
This lets you move the progress update logic from the loop, but you still have some progress related calls in there.
You can remove this part if you create a generator from the collection that automatically tracks progress as you iterate it.
# turn a collection into one that shows progress when iterated
def withProgress(collection, progressBar=NoProgressBar()):
progressBar.setMaximum(len(collection))
for element in collection:
progressBar.progress();
yield element
# simple iteration function
def bigIteration(collection):
for element in collection:
doWork(element)
# let's iterate with progress reports
bigIteration(withProgress(collection, RealProgressBar()))
This approach leaves your bigIteration function as is and is highly composable. For example, let's say you also want to add cancellation this big iteration of yours. Just create another generator that happens to be cancellable.
# highly simplified cancellation token
# probably needs synchronization
class CancellationToken:
cancelled = False
def isCancelled(self):
return cancelled
def cancel(self):
cancelled = True
# iterates a collection with cancellation support
def withCancellation(collection, cancelToken):
for element in collection:
if cancelToken.isCancelled():
break
yield element
progressCollection = withProgress(collection, RealProgressBar())
cancellableCollection = withCancellation(progressCollection, cancelToken)
bigIteration(cancellableCollection)
# meanwhile, on another thread...
cancelToken.cancel()
You could rewrite bigIteration as a generator function as follows:
def bigIteration(collection):
for element in collection:
doWork(element)
yield element
Then, you could do a great deal outside of this:
def mycollection = [1,2,3]
if progressBar:
pm = progress.ProgressMeter(total=len(collection))
pc = 0
for item in bigIteration(mycollection):
pc += 1
if pc % 100 = 0:
pm.update(pc)
else:
for item in bigIteration(mycollection):
pass
My approach would be like that:
The looping code yields the progress percentage whenever it changes (or whenever it wants to report it). The progress-tracking code then reads from the generator until it's empty; updating the progress bar after every read.
However, this also has some disadvantages:
You need a function to call it without a progress bar as you still need to read from the generator until it's empty.
You cannot easily return a value at the end. A solution would be wrapping the return value though so the progress method can determine if the function yielded a progress update or a return value. Actually, it might be nicer to wrap the progress update so the regular return value can be yielded unwrapped - but that'd require much more wrapping since it would need to be done for every progress update instead just once.

Python: why does this code take forever (infinite loop?)

I'm developing an app in Google App Engine. One of my methods is taking never completing, which makes me think it's caught in an infinite loop. I've stared at it, but can't figure it out.
Disclaimer: I'm using http://code.google.com/p/gaeunitlink text to run my tests. Perhaps it's acting oddly?
This is the problematic function:
def _traverseForwards(course, c_levels):
''' Looks forwards in the dependency graph '''
result = {'nodes': [], 'arcs': []}
if c_levels == 0:
return result
model_arc_tails_with_course = set(_getListArcTailsWithCourse(course))
q_arc_heads = DependencyArcHead.all()
for model_arc_head in q_arc_heads:
for model_arc_tail in model_arc_tails_with_course:
if model_arc_tail.key() in model_arc_head.tails:
result['nodes'].append(model_arc_head.sink)
result['arcs'].append(_makeArc(course, model_arc_head.sink))
# rec_result = _traverseForwards(model_arc_head.sink, c_levels - 1)
# _extendResult(result, rec_result)
return result
Originally, I thought it might be a recursion error, but I commented out the recursion and the problem persists. If this function is called with c_levels = 0, it runs fine.
The models it references:
class Course(db.Model):
dept_code = db.StringProperty()
number = db.IntegerProperty()
title = db.StringProperty()
raw_pre_reqs = db.StringProperty(multiline=True)
original_description = db.StringProperty()
def getPreReqs(self):
return pickle.loads(str(self.raw_pre_reqs))
def __repr__(self):
return "%s %s: %s" % (self.dept_code, self.number, self.title)
class DependencyArcTail(db.Model):
''' A list of courses that is a pre-req for something else '''
courses = db.ListProperty(db.Key)
def equals(self, arcTail):
for this_course in self.courses:
if not (this_course in arcTail.courses):
return False
for other_course in arcTail.courses:
if not (other_course in self.courses):
return False
return True
class DependencyArcHead(db.Model):
''' Maintains a course, and a list of tails with that course as their sink '''
sink = db.ReferenceProperty()
tails = db.ListProperty(db.Key)
Utility functions it references:
def _makeArc(source, sink):
return {'source': source, 'sink': sink}
def _getListArcTailsWithCourse(course):
''' returns a LIST, not SET
there may be duplicate entries
'''
q_arc_heads = DependencyArcHead.all()
result = []
for arc_head in q_arc_heads:
for key_arc_tail in arc_head.tails:
model_arc_tail = db.get(key_arc_tail)
if course.key() in model_arc_tail.courses:
result.append(model_arc_tail)
return result
Am I missing something pretty obvious here, or is GAEUnit acting up?
Also - the test that is making this run slow has no more than 5 models of any kind in the datastore. I know this is potentially slow, but my app only does this once then subsequently caches it.
Ignoring the commented out recursion, I don't think this should be an infinite loop - you are just doing some for-loops over finite results sets.
However, it does seem like this would be really slow. You're looping over entire tables and then doing more datastore queries in every nested loop. It seems unlikely that this sort of request would complete in a timely manner on GAE unless your tables are really, really small.
Some rough numbers:
If H = # of entities in DepedencyArcHead and T = average # of tails in each DependencyArcHead then:
_getListArcTailsWithCourse is doing about H*T queries (understimate). In the "worst" case, the result returned from this function will have H*T elements.
_traverseForwards loops over all these results H times, and thus does another H*(H*T) queries.
Even if H and T are only on the order of 10s, you could be doing thousands of queries. If they're bigger, then ... (and this ignores any additional queries you'd do if you uncommented the recursive call).
In short, I think you may want to try to organize your data a little differently if possible. I'd make a specific suggestion, but what exactly you're trying to do isn't clear to me.

Categories