Gspread - Change Listener? - python

I currently run a daemon thread that grabs all cell values, calculates if there's a change, and then writes out dependent cells in a loop, ie:
def f():
while not event.is_set():
update()
event.wait(15)
Thread(target=f).start()
This works, but the looped get-all calls are significant I/O.
Rather than doing this, it would be much cleaner if the thread was notified of changes by Google Sheets. Is there a way to do this?

I rephrased my comment on gspread GitHub's Issues:
Getting a change notification from Google Sheets is possible with help of installable triggers in Apps Script. You set up a custom function in the Scripts editor and assign a trigger event for this function. In this function you can fetch an external url with UrlFetchApp.fetch.
On the listening end (your web server) you'll have a handler for this url. This handler will do the job. Depending on the server configuration (many threads or processes) make sure to avoid possible race condition.
Also, I haven't tested non browser-triggered updates. If Sheets trigger the same event for this type of updates there could be a case for infinite loops.

I was able to get this working by triggering an HTTP request whenever Google Sheets detected a change.
On Google Sheets:
function onEdit (e) {
UrlFetchApp.fetch("http://myaddress.com");
}
Python-side (w/ Tornado)
import tornado.ioloop
import tornado.web
class MainHandler(tornado.web.RequestHandler):
def get(self):
on_edit()
self.write('Updating.')
def on_edit():
# Code here
pass
app = tornado.web.Application([(r'/', MainHandler)])
app.listen(#port here)
tornado.ioloop.IOLoop.current().start()
I don't think this sort of functionality should be within the scope of gspread, but I hope the documentation helps others.

Related

My Azure Function in Python v2 doesn't show any signs of running, but it probably is

I have a simple function app in Python v2. The plan is to process millions of images, but right I just want to make the scaffolding right, i.e. no image processing, just dummy data. So I have two functions:
process with an HTTP trigger #app.route, this inserts 3 random image URLs to the Azure Queue Storage,
process_image with a Queue trigger #app.queue_trigger, that processes one image URL from above (currently only logs the event).
I trigger the first one with curl request and as expected, I can see the invocation in the Azure portal in the function's invocation section and I can see the items in the Storage Explorer's queue.
But unexpectedly, I do not see any invocations for the second function, even though after a few seconds the items disappear from the images queue and end up in the images-poison queue. So this means that something did run with the queue items 5 times. I see the following warning in the application insights checking traces and exceptions:
Message has reached MaxDequeueCount of 5. Moving message to queue 'case-images-deduplication-poison'.
Can anyone help with what's going on? Here's the gist of the code.
If I was to guess, something else is hitting that storage queue, like your dev machine or another function, can you put logging into the second function? (sorry c# guy so I don't know the code for logging)
Have you checked the individual function metric, in the portal, Function App >> Functions >> Function name >> overview >> Total execution Count and expand to the relevant time period?
Do note that it take up to 5 minutes for executions to show but after that you'll see them in the metrics

Python3 how to syncronize a manager Dict between two Threads

I am working on a TestSuite app based on Python. In this application I have a running report multiprocessing thread which generate a html report of all the test cases execute by the main program.
The main program it self is multi threaded although using the Normal threading module.
Now this report writer thread act like a server and each test case thread started by the main application can ask for an interface to this writer (IFW).
Communicating from the IFW to the writer just use a single Queue(). However it is also possible for each of the IFW's to ask for a status form the writer and here is where it gets tricky because this data is IFW ID specific.
It is not possible to just use Queues because of the IFW dynamic behavior, so I used a manager to create a proxy Queue however this cause a lot of problems for me (created a new manager instance for each interface). So now I'm trying with the manager().dict() method. However I cannot figure out how to synchronize between the to threads. Here is my code from the IFW:
def getCurrentTestInfo(self):
self._UpdateQueue.put_nowait(WriterCtrlCmd(self.WriterIfId, 'get_current_test_info'))
while self.WriterIfId not in self._RequestDict:
pass
res = self._RequestDict[self.WriterIfId]
del self._RequestDict[self.WriterIfId]
return res
What happens here is that the IFW sends a request cmd to the writer and the writer then returns the test infomation. Along with this request cmd is there a specific IFW ID, this ID is unique.
I know at this point that the entry does not exist in the dict() so I wait for the entry to show up using a "poll" :) and then I read the data. However everyone can see the potential problem in this code.
Is there a way to somehow wait for the dict to be updated by means of the manager ? maybe an event or condition ? Although I would like something linked to the dict() method it self
NOTE there is a period of processing time from the put command until the dict have been updated. so I cannot use Queue.put() instead of Queue.put_nowait()

ec2 wait for instance to come up with timeout [Python]

I'm using AWS python API (boto3). My script starts a few instances and then waits for them to come up online, before proceeding doing stuff. I want the wait to timeout after a predefined period, but I can't find any API for that in Python. Any ideas? A snippet of my current code:
def waitForInstance(id):
runningWaiter = self.ec2c.get_waiter("instance_status_ok")
runningWaiter.wait(InstanceIds = [id])
instance = ec2resource.Instance(id)
return instance.state
I can certainly do something like running this piece of code in a separate thread and terminate it if needed, but I was wondering whether there is already a built in API in boto3 for that and I'm just missing it.
A waiter has a configuration associated with it which can be accessed (using your example above) as:
runningWaiter.config
One of the settings in this config is max_attempts which controls how many attempts will be tried before giving up. The default value is 40. You can change that value like this:
runningWaiter.config.max_attempts = 10
This isn't directly controlling a timeout as your question asked but will cause the waiter to give up earlier.
Why not check the instances status from time to time?
#code copy from boto3 doc
for status in ec2.meta.client.describe_instance_status()['InstanceStatuses']:
print(status)
refence : http://boto3.readthedocs.org/en/latest/guide/migrationec2.html
BTW, it is better to use tag naming for all the instances with a standard naming convention. Query any aws resources with its original ID is a maintenance nightmare.
You could put a sleep timer in your code. Sleep for x minutes, check it to see if it is finished and go back to sleep if not. After y number of attempts take some sort it action.

Python-Twisted Reactor Starting too Early

I have an application that uses PyQt4 and python-twisted to maintain a connection to another program. I am using "qt4reactor.py" as found here. This is all packaged up using py2exe. The application works wonderfully for 99% of users, but one user has reported that networking is failing completely on his Windows system. No other users report the issue, and I cannot replicate it on my own Windows VM. The user reports no abnormal configuration.
The debugging logs show that the reactor.connectTCP() call is executing immediately, even though the reactor hasn't been started yet! There's no mistaking run order because this is a single-threaded process with 60 sec of computation and multiple log messages between this line and when the reactor is supposed to start.
There's a lot of code, so I am only putting in pseudo-code, hoping that there is a general solution for this issue. I will link to the actual code below it.
import qt4reactor
qt4reactor.install()
# Start setting up main window
# ...
from twisted.internet import reactor
# Separate listener for detecting/processing multiple instances
self.InstanceListener = ListenerFactory(...)
reactor.listenTCP(LISTEN_PORT, self.InstanceListener)
# The active/main connection
self.NetworkingFactory = ClientFactory(...)
reactor.connectTCP(ACTIVE_IP, ACTIVE_PORT, self.NetworkingFactory)
# Finish setting up main window
# ...
from twisted.internet import reactor
reactor.runReturn()
The code is nested throughout the Armory project files. ArmoryQt.py (containing the above code) and armoryengine.py (containing the ReconnectingClientFactory subclass used for this connection).
So, the reactor.connectTCP() call executes immediately. The client code executes the send command and then immediately connectionLost() gets called. It does not appear to try to reconnect. It also doesn't throw any errors other than connectionLost(). Even more mysteriously, it receives messages from the remote node later on, and this app even processes them! But it believes it's not connected (and handshake never finished, so the remote node shouldn't be sending messages, but might be a bug/oversight in that program).
What on earth is going on!? How could the reactor get started before I tell it to start? I searched the code and found no other code that (I believe) could start the reactor.
The API that you're looking for is twisted.internet.reactor.callWhenRunning.
However, it wouldn't hurt to have less than 60 seconds of computation at startup, either :). Perhaps you should spread that out, or delegate it to a thread, if it's relatively independent?

Specifying timeout while fetching/putting data in memcache (django)

I have a django based http server and I use django.core.cache.backends.memcached.MemcachedCache as the client library to access memcache. I want to know whether we can set a timeout or something (say 500ms.) so that the call to memcached returns False if it is not able to access the cache for 500ms. and we make the call to the DB. Is there any such setting to do that?
Haven't tried this before, but you may be able to use threading and set up a timeout for the function call to cache. As an example, ignore the example provided in the main body at this link, but look at Jim Carroll's comment:
http://code.activestate.com/recipes/534115-function-timeout/
Adapted for something you might use:
from threading import Timer
import thread, time, sys
def timeout():
thread.interrupt_main()
try:
Timer(0.5, timeout).start()
cache.get(stuff)
except:
print "Use a function to grab it from the database!"
I don't have time to test it right now, but my concern would be whether Django itself is threaded, and if so, is interrupting the main thread what you really want to do? Either way, it's a potential starting point. I did look for a configuration option that would allow for this and found nothing.

Categories