Google App Engine Locking - python

just wondering if anyone of you has come across this. I'm playing around with the Python mail API on Google App Engine and I created an app that accepts a message body and address via POST, creates an entity in the datastore, then a cron job is run every minute, grabs 200 entities and sends out the emails, then deletes the entities.
I ran an experiment with 1500 emails, had 1500 entities created in the datastore and 1500 emails were sent out. I then look at my stats and see that approx. 45,000 recipients were used from the quota, how is that possible?
So my question is at which point does the "Recipients Emailed" quota actually count? At the point where I create a mail object or when I actually send() it? I was hoping for the second, but the quotas seem to show something different. I do pass the mail object around between crons and tasks, etc. Anybody has any info on this?
Thanks.
Update: Turns out I actually was sending out 45k emails with a queue of only 1500. It seems that one cron job runs until the previous one is finished and works out with the same entities. So the question changes to "how do I lock the entities and make sure nobody selects them before sending the emails"?
Thanks again!

Use tasks to send the email.
Create a task that takes a key as an argument, retrieves the stored entity for that key, then sends the email.
When your handler receives the body and address, store that as you do now but then enqueue a task to do the send and pass the key of your datastore object to the task so it knows which object to send an email for.
You may find that the body and address are small enough that you can simply pass them as arguments to a task and have the task send the email without having to store anything directly in the datastore.
This also has the advantage that if you want to impose a limit on the number of emails sent within a given amount of time (quota) you can set up a task queue with that rate.

Instantiating an email object certainly does not count against your "recipients emailed" quota. Like other App Engine services, you consume quota when you trigger an RPC, i.e. call send().
If you intended to email 1500 recipients and App Engine says you emailed 45,000, your code has a bug.

Related

How to send emails according to user request in python?

How do i send an email whenever a user requests it?. I thought on a script that sends mails every hour with some content relevant for the recipients, but I dont like the idea of spamming mail to other people. So i would like to send this mail whenever the recipients wants it. For example if this user sends me a mail with a keyword, i would like the sript to run and send the mail back with the content already made up. So, there is no spam and information is sent in a more efficient way.
I appreciate your help in advance.
This is a pretty complex and broad question.
First off, you would need a server or your PC to have the script running all the time or periodically fire off via cron.
Secondly, you would need to keep some kind of job queue where you would register that a user has sent you an email requesting that you send them one in response. There are well developed and complex job queues like Celery, but you could potentially do with something less complex - maybe a Google Sheets table.
Thirdly, you would need to parse emails you receive in order to get the keywords and email addresses. If you go the gmail + google sheets route, you could do it with some custom code probably, but otherwise you would need to implement email checking on your PC or server. Once you parse them, you add them to the jobs queue.
Finally, you need to provide email access to your Py script - meaning SMTP login. The docs have more on this.
As you can see, there are a lot of things to consider and implement, so you maybe better off trying to narrow down your question a bit. :)

How to rule out duplicate messages in Pubsub without using Dataflow and without using ACK in Python?

I have a use-case wherein I want to read the messages in Pubsub without acknowledging the messages. I would need help in on how to rule out the possibility of "duplicate messages" which will remain in Pubsub store when I don't ACK the delivered message.
Solutions that I have thought of:
Store the pulled messages in Datastore and see if they are same.
Store the pulled messages at runtime and check if my message is duplicate O(n) time complexity and space complexity O(n).
Store the pulled messages in a file and compare the new incoming messages from the messages in the file.
Use Dataflow and rule out the possibility (least expected)
I see that there is no offset like feature in Pubsub which is similar to Kafka, I think.
Which is the best approach that you would suggest in this matter/ or any other alternative approach that I can use?
I am using python google-cloud-pubsub_v1 to create a python client and pulling messages from Pubsub.
I am sharing the code which is the logic to pull the data
subscription_path = subscriber.subscription_path(
project_id, subscription_name)
NUM_MESSAGES = 3
# The subscriber pulls a specific number of messages.
response = subscriber.pull(subscription_path, max_messages=NUM_MESSAGES)
for received_message in response.received_messages:
print(received_message.message.data)
It sounds like Pub/Sub is probably not the right tool for the job. It seems as if you are trying to use Pub/Sub as a persistent data store, which is not the intended use case. Acking is a fundamental part of the lifecycle of a Cloud Pub/Sub message. Pub/Sub messages are deleted if they are unacked after the provided message retention period, which cannot be longer than 7 days.
I would suggest instead that you consider using an SQL database like Cloud Spanner. Then, you could generate a uuid for each message, use this as the primary key for deduplication, and transactionally update the database to ensure there are no duplicates.
I might be able to provide a better answer for you if you provide more information about what you are planning on doing with the deduplicated messages.

What's the preferred method for throttle websocket connections?

I have a web app where I am streaming model changes to a backbone collection in a chrome client. There a a few backbone views that may or may not render parts of the page depending on the type of update and what is being looked at. For example some changes to a model result in the view for the collection being re-rendered and there may or may not be a detail panel view open for the model that's being updated. These model changes can happen very fast as the server side workflow involves quite verbose and rapid changes to the model.
Here's the problem: I'm getting a large number of errno 32 pipe broken messages in the webserver's process when sending messages to the client, although the websocket connection is still up and its readyState is still 1 (OPEN).
What I suspect is happening is that the various views haven't finished rendering in the onmessage callback by the time the next message is coming in. After I get these tracebacks in stdout the websocket connection can still work and the UI will still update.
If I put eventlet.sleep(0.02) in the loop that reads model changes off the message queue and sends them on the websocket the broken pipe messages go away, however this isn't a real solution and feels like a nasty hack.
Has anyone has similar problems with websocket's onmessage function trying to do too much work and still being busy when the next message comes in? Anyone have a solution?
I think the most efficient way to do this is that client app tell the server what they are displaying. The server keep track of this and send changes only to the objects currently viewed, only to the concerned client.
A way to do this is by using a "Who Watch What" list of items.
Items are indexed in two ways. From the client ID and with a isVievedBy chainlist inside each data objects (I know it doesn't look clean to mix it with data but it is very efficient).
You'll also need a lastupdate timestamp for each data object.
When a client change view, it send a "I'm viewing this, wich I have the version -timestamp-" message to the server. The server check timestamp and send back the object if required. It also remove obsolete "Who Watch What" (accessing them by client ID) items and create the new ones.
When a data object is updated, loop through the isVievedBy chainlist of this object to know which client should be updated. Put this in message buffers for each client and flush those buffers manually (in case you update several items at the same time, it will send one big message).
This is lot of work, but your app will be efficient and scale gracefully, even with lot of objects and lot of clients. It sends only usefull messages and it is very unlikely that there will be too many of them.
For your onMessage problem, I would store data in a queue and process them asynchronously.

Handling / Bouncing Incoming Email on Errors with App Engine Python

When an email is received that generates an error what is the best way to bounce the message? For example you store a file in a db.BlobProperty but an email comes in that exceeds the 1m limit. There needs to be a bounce error to the request somehow so the email doesn't keep hitting the server and increasing the billing every 15 minutes. (Don't ask me how I know :-P ... not really it is a separate but related issue I posted in another question. here )
But that other error made it clear I need to deal with this before I get that email with multiple attachments that nails me for 1gb of data.
Normally the mail server handles the bounce, like when you send to a bad address and returns an error to the client/server. I have searched and didn't find anything helpful on this. YMMV
Is there an undocumented function? What is the proper response to return so that the originating server stops sending?
There's no way to bounce a message once it arrives at your App Engine app. You have two options:
Send the user a 'bounce message' yourself using the outgoing email API
Silently discard the message
In either case, you should install a top-level exception handler (frameworks like webapp and webapp2 have support for this) that logs the exception, performs the appropriate action, and then returns status code 200 instead of 500, so the message won't be redelivered repeatedly.
In your specific case, too, I'd start storing the attachments in the blobstore instead of a blob property, to avoid the 1MB limit.

Synchronize Memcache and Datastore on Google App Engine

I'm writing a chat application using Google App Engine. I would like chats to be logged. Unfortunately, the Google App Engine datastore only lets you write to it once per second. To get around this limitation, I was thinking of using a memcache to buffer writes. In order to ensure that no data is lost, I need to periodically push the data from the memcache into the data store.
Is there any way to schedule jobs like this on Google App. Engine? Or am I going about this in entirely the wrong way?
I'm using the Python version of the API, so a Python solution would be preferred, but I know Java well enough that I could translate a Java solution into Python.
To get around the write/update limit of entity groups (note that entities without parent are their own entity group) you could create a new entity for every chat message and keep a property in them that would reference a chat they belong to.
You'd then find all chat messages that belong to a chat via a query. But this would be very inefficient, as you'd then need to do a query for every user for every new message.
So go with the above advice, but additionally do:
Look into backends. This are always-on instances where you could aggregate chat messages in memory (and immediately/periodically flush them to datastore). When user requests latest chat messages, you already have them in memory and would serve them instantly (saving on time and cost compared to using Datastore). Note that backends are not 100% reliable, they might go down from time to time - adjust chat message flushing to datastore accordingly.
Check out Channels API. This will allow you to notify users when there is a new chat message. This way you'd avoid polling for new chat messages and keep the number or requests down.
Sounds like the wrong way since you are risking losing data on memcache.
You can write to one entity group once per second.
You can write separate entity groups very rapidly. So it really depends how you structure your data. For example, if you kept an entire chat in one entity, you can only write that chat once per second. And you'd be limited to 1MB.
You should write a separate entity per message in the chat, you can write very, very quickly, but you need to devise a way to pull all the messages together, in order for the log.
Edit I agree with Peter Knego that the costs of using one entity per message will get way too expensive. His backend suggestion is pretty good too, although if your app is popular, backends don't scale that well.
I was trying to avoid sharding, but I think it will be necessary. If you're not familiar with sharding, read up on this: https://developers.google.com/appengine/articles/sharding_counters
Sharding would be an intermediate between writing one entity for all messages in a conversation, vs one entity per message. You would randomly split the messages between a number of entities. For example, if you save the messages in 3 entities, you can write 5x/sec (I doubt most human conversations would go any faster than that).
On fetching, you would need to grab the 3 entities, and merge the messages in chronological order. This would save you a lot on cost. But you would need to write the code to do the merging.
One other benefit is that your conversation limit would now be 3MB instead of 1MB.
Why not use a pull task? I highly recommend this Google video is you are not familiar enough with task queues. First 15 minutes will cover pull queue info that may apply to your situation. Anything involving per message updates may get quite expensive re: database ops, and this will be greatly exacerbated if you have any indices involved. Video link:
https://www.youtube.com/watch?v=AM0ZPO7-lcE&feature=player_embedded
I would simply set up my chat entity when users initiate it in the on-line handler, passing back the entity id to the chat parties. Send the id+message to your pull queue, and serialize the messages within the chat entity's TextProperty. You wont likely schedule the pull queue cron more often than once per second, so that avoids your entity update limitation. Most importantly: your database ops will be greatly reduced.
I think you could create tasks which will persist the data. This has the advantage that, unlike memcached the tasks are persisted and so no chats would be lost.
when a new chat comes in, create a task to save the chat data. In the task handler do the persist. You could either configure the task queue to pull at 1 per second (or slightly slower) and save each bit of chat data held in the task persist the incoming chats in a temporary table (in different entity groups), and every have the tasks pull all unsaved chats from the temporary table, persist them to the chat entity then remove them from the temporary table.
i think you would be fine by using the chat session as entity group and save the chat messages .
this once per second limit is not the reality, you can update/save at a higher rate and im doing it all the time and i don't have any problem with it.
memcache is volatile and is the wrong choice for what you want to do. if you start encountering issues with the write rate you can start setting up tasks to save the data.

Categories