How can I synchronize access to stored data?

How can I synchronize access to stored data? - python

I am using Google App Engine with the Python 2.7 run time. I know it is multi-threaded, but the multi-instance nature of Google App Engine makes this question relevant to any run time.
I have a web messenger application. It is based on the Channel API for receiving various notifications (message received, user connected, user typed) and on the Memcache API for maintaining global state (and on the High Replication Datastore for actually storing the message history).
The time stamp of the last message is kept in Memcache (along with other data, such whether the user is online or not, typing or not and others).
Whenever a user sends a message, the Memcache is updated with the new time stamp value, while the old time stamp value is sent using the Channel API.
When multiple messages are sent at the same time, the Memcache is sometimes overwritten and I get weird values for that time stamp. For example -
One message has a certain time stamp and the next has an older time stamp.
I know about the Memcache Client API, but I cannot use that because I must read the most up to date time stamp first and the write the new time stamp.
In short, I want to somehow wait for the (Memcache?) data to be completely unoccupied (read or write), lock it somehow, proceed throughout the entire request and then release it for the next request.
Any suggestions?
Thank you in advance. :)

There is no explicit locking functionality in the Memcache API, but you can achieve the effect using the Compare-and-Set mechanism. Guido has a writeup with examples here. Only the Client class exposes CAS, though, but it's unclear from your description why that won't work for you.

Related

How to get new (real time) comments from my blog?

I have a blog and an application that gives the number of comments and posts on my blog by using my blog's API.
The issue I'm having is that I want to have my application receive new comments from my application in real-time.
My solution:
I can have my application calling the API every 30 seconds or so to check whether there is a response (i.e. whether there is a new comment).

I think the best solution is to use something called Long Polling to get updates. Its a technique in programming to handle requests with less resources such as CPU being used over time. For a detailed solution for your case search for
Long Polling in Flask application

Facebook's Graph API's request limit on a locally run program? How to get specific data in real time without reaching it?

I've been writing a program in Python which needs to have the datum of the number of likes of a specific Facebook page in real time. The program itself works, but it's based on a loop that is constantly requesting the number of likes and updating it on a variable, and I was afraid that this way it will soon reach the API's limit of requests.
I read that Graph API's request limit per user for an application is 200 requests per hour. Is a program locally run as this one considered an application with one user, or what is it considered?
Also, I read that some users say the API can handle 600 requests per 600 seconds without returning an error, does this still apply? (Could I, for example, delay the loop for one second and still be able to make all the requests?) If not, is there a solution to get that info in real time in a local program? (I saw that Graph can send you updates with a POST on a specified URL, but is there a way to receive those updates without owning an URL? Maybe a way to renew the token or something?). I need to have this program running for almost a whole day, so not being rejected from the API is quite important.
Sorry if it sounds silly or anything, this is the first time I'm using the Graph API (and a web-based API in general).

What's the preferred method for throttle websocket connections?

I have a web app where I am streaming model changes to a backbone collection in a chrome client. There a a few backbone views that may or may not render parts of the page depending on the type of update and what is being looked at. For example some changes to a model result in the view for the collection being re-rendered and there may or may not be a detail panel view open for the model that's being updated. These model changes can happen very fast as the server side workflow involves quite verbose and rapid changes to the model.
Here's the problem: I'm getting a large number of errno 32 pipe broken messages in the webserver's process when sending messages to the client, although the websocket connection is still up and its readyState is still 1 (OPEN).
What I suspect is happening is that the various views haven't finished rendering in the onmessage callback by the time the next message is coming in. After I get these tracebacks in stdout the websocket connection can still work and the UI will still update.
If I put eventlet.sleep(0.02) in the loop that reads model changes off the message queue and sends them on the websocket the broken pipe messages go away, however this isn't a real solution and feels like a nasty hack.
Has anyone has similar problems with websocket's onmessage function trying to do too much work and still being busy when the next message comes in? Anyone have a solution?

I think the most efficient way to do this is that client app tell the server what they are displaying. The server keep track of this and send changes only to the objects currently viewed, only to the concerned client.
A way to do this is by using a "Who Watch What" list of items.
Items are indexed in two ways. From the client ID and with a isVievedBy chainlist inside each data objects (I know it doesn't look clean to mix it with data but it is very efficient).
You'll also need a lastupdate timestamp for each data object.
When a client change view, it send a "I'm viewing this, wich I have the version -timestamp-" message to the server. The server check timestamp and send back the object if required. It also remove obsolete "Who Watch What" (accessing them by client ID) items and create the new ones.
When a data object is updated, loop through the isVievedBy chainlist of this object to know which client should be updated. Put this in message buffers for each client and flush those buffers manually (in case you update several items at the same time, it will send one big message).
This is lot of work, but your app will be efficient and scale gracefully, even with lot of objects and lot of clients. It sends only usefull messages and it is very unlikely that there will be too many of them.
For your onMessage problem, I would store data in a queue and process them asynchronously.

Synchronize Memcache and Datastore on Google App Engine

I'm writing a chat application using Google App Engine. I would like chats to be logged. Unfortunately, the Google App Engine datastore only lets you write to it once per second. To get around this limitation, I was thinking of using a memcache to buffer writes. In order to ensure that no data is lost, I need to periodically push the data from the memcache into the data store.
Is there any way to schedule jobs like this on Google App. Engine? Or am I going about this in entirely the wrong way?
I'm using the Python version of the API, so a Python solution would be preferred, but I know Java well enough that I could translate a Java solution into Python.

To get around the write/update limit of entity groups (note that entities without parent are their own entity group) you could create a new entity for every chat message and keep a property in them that would reference a chat they belong to.
You'd then find all chat messages that belong to a chat via a query. But this would be very inefficient, as you'd then need to do a query for every user for every new message.
So go with the above advice, but additionally do:
Look into backends. This are always-on instances where you could aggregate chat messages in memory (and immediately/periodically flush them to datastore). When user requests latest chat messages, you already have them in memory and would serve them instantly (saving on time and cost compared to using Datastore). Note that backends are not 100% reliable, they might go down from time to time - adjust chat message flushing to datastore accordingly.
Check out Channels API. This will allow you to notify users when there is a new chat message. This way you'd avoid polling for new chat messages and keep the number or requests down.

Sounds like the wrong way since you are risking losing data on memcache.
You can write to one entity group once per second.
You can write separate entity groups very rapidly. So it really depends how you structure your data. For example, if you kept an entire chat in one entity, you can only write that chat once per second. And you'd be limited to 1MB.
You should write a separate entity per message in the chat, you can write very, very quickly, but you need to devise a way to pull all the messages together, in order for the log.
Edit I agree with Peter Knego that the costs of using one entity per message will get way too expensive. His backend suggestion is pretty good too, although if your app is popular, backends don't scale that well.
I was trying to avoid sharding, but I think it will be necessary. If you're not familiar with sharding, read up on this: https://developers.google.com/appengine/articles/sharding_counters
Sharding would be an intermediate between writing one entity for all messages in a conversation, vs one entity per message. You would randomly split the messages between a number of entities. For example, if you save the messages in 3 entities, you can write 5x/sec (I doubt most human conversations would go any faster than that).
On fetching, you would need to grab the 3 entities, and merge the messages in chronological order. This would save you a lot on cost. But you would need to write the code to do the merging.
One other benefit is that your conversation limit would now be 3MB instead of 1MB.

Why not use a pull task? I highly recommend this Google video is you are not familiar enough with task queues. First 15 minutes will cover pull queue info that may apply to your situation. Anything involving per message updates may get quite expensive re: database ops, and this will be greatly exacerbated if you have any indices involved. Video link:
https://www.youtube.com/watch?v=AM0ZPO7-lcE&feature=player_embedded
I would simply set up my chat entity when users initiate it in the on-line handler, passing back the entity id to the chat parties. Send the id+message to your pull queue, and serialize the messages within the chat entity's TextProperty. You wont likely schedule the pull queue cron more often than once per second, so that avoids your entity update limitation. Most importantly: your database ops will be greatly reduced.

I think you could create tasks which will persist the data. This has the advantage that, unlike memcached the tasks are persisted and so no chats would be lost.
when a new chat comes in, create a task to save the chat data. In the task handler do the persist. You could either configure the task queue to pull at 1 per second (or slightly slower) and save each bit of chat data held in the task persist the incoming chats in a temporary table (in different entity groups), and every have the tasks pull all unsaved chats from the temporary table, persist them to the chat entity then remove them from the temporary table.

i think you would be fine by using the chat session as entity group and save the chat messages .
this once per second limit is not the reality, you can update/save at a higher rate and im doing it all the time and i don't have any problem with it.
memcache is volatile and is the wrong choice for what you want to do. if you start encountering issues with the write rate you can start setting up tasks to save the data.

Google App Engine Locking

just wondering if anyone of you has come across this. I'm playing around with the Python mail API on Google App Engine and I created an app that accepts a message body and address via POST, creates an entity in the datastore, then a cron job is run every minute, grabs 200 entities and sends out the emails, then deletes the entities.
I ran an experiment with 1500 emails, had 1500 entities created in the datastore and 1500 emails were sent out. I then look at my stats and see that approx. 45,000 recipients were used from the quota, how is that possible?
So my question is at which point does the "Recipients Emailed" quota actually count? At the point where I create a mail object or when I actually send() it? I was hoping for the second, but the quotas seem to show something different. I do pass the mail object around between crons and tasks, etc. Anybody has any info on this?
Thanks.
Update: Turns out I actually was sending out 45k emails with a queue of only 1500. It seems that one cron job runs until the previous one is finished and works out with the same entities. So the question changes to "how do I lock the entities and make sure nobody selects them before sending the emails"?
Thanks again!

Use tasks to send the email.
Create a task that takes a key as an argument, retrieves the stored entity for that key, then sends the email.
When your handler receives the body and address, store that as you do now but then enqueue a task to do the send and pass the key of your datastore object to the task so it knows which object to send an email for.
You may find that the body and address are small enough that you can simply pass them as arguments to a task and have the task send the email without having to store anything directly in the datastore.
This also has the advantage that if you want to impose a limit on the number of emails sent within a given amount of time (quota) you can set up a task queue with that rate.

Instantiating an email object certainly does not count against your "recipients emailed" quota. Like other App Engine services, you consume quota when you trigger an RPC, i.e. call send().
If you intended to email 1500 recipients and App Engine says you emailed 45,000, your code has a bug.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.