I am building an application on GAE that needs to notify users when another user performs an action that affects them. A real world analogy would be being alerted when your friend comments on your facebook status.
I understand how the Channel API works to actually send notifications in real time, but I'm trying to understand the most effective way to store those notifications in the datastore. Ideally, I want the notification code to be decoupled from the actual event being performed. Is this a good use case for Prospective Search? It doesn't quite feel right since I don't need to perform any kind of searching, just: if you see a new comment, create a new notification that is stored in the datastore and pushed to the client through the channel api if they are connected. I basically need a database trigger, but I don't think GAE supports that.
Why don't you want to couple the event and its notifications in the first place?
I think it may be interesting to know in order to help you with your use case :)
If I had to do this I would launch a task queue anytime I write to the datastore something that might fire events...
That way you can do your write and have a separate "layer" to process the events.
Triggers would not even be that good of an option, since your application would have to "poll" the database to push events to the users' UI.
I think your process (firing events) does not belong to the database, since it might as well need business rules that the datastore cannot provide : for example when a users ignores another one, you should not fire events.
The more business logic you put in your database system, the more complex it gets to maintain & scale IMHO...
Looks like GAE does support mimicking database triggers using hooks.
Hooks can be useful for
query caching
auditing Datastore activity per-user
mimicking database triggers
Related
I am making a slack bot. I have been using python slackclient library to develop the bot. Its working great with one team. I am using Flask Webframework.
As many people add the app to slack via "Add to Slack" button, I get their bot_access_token.
Now
how should I run the code with so many Slack tokens. Should I store them in a list and then traverse using for loops for all token! But this was is not good as I may not be able to handle the simultaneous messages or events I receive Or "its a good way".
Any other way if its not?
If you're using the real-time API, you'll need one WebSocket open per team. Yes, you would typically use a loop to establish these connections. Depending on the way slackclient works, you might need to start each in a separate thread or process.
EDIT: As mentioned in the comments below, threading is to be preferred over multiple processes. Even better would be to use something lighter weight than threads, but at this point in your learning, I wouldn't bother over-optimizing here.
SECOND EDIT: It looks like python-slackclient has non-blocking reads, so you don't even need to use threads. E.g. the following will not block:
for team in teams:
for event in team.client.rtm_read():
# process the event for that team
(This assumes some sort of "team" object that contains an instance of SlackClient.)
You do indeed need to
Store each team token. Please remember to encrypt it
When a team installs your app, create a new RTM connection. When your app/server restarts, loop across all your teams, open a RTM connection for each of them
each connection will receive events from that team, and that team only. You will not receive all notifications on the same connection
(maybe you are coming from Facebook Messenger bots background, where all notifications arrive at the same webhook ? That's not the case with Slack)
I recently started playing around with django-social-auth and am looking for some help from the community to figure out the best way to move forward with an idea.
Once a user has registered you have access to his oauth token which allows you to pull certain data.
In my case I want to build a nice little profile based on the users avatar, location and maybe some other information if it's available.
Would the best way be to:
build a custom task for celery and pull the information and build the profile?
or, make use of signals to build the profile?
This really comes down to synchronous vs asynchronous. Django signals are synchronous that is they will block the response until they are completed. Celery tasks are asynchronous.
Which will be better will depend on whether the benefits of handling the profile building asynchronously outweighs the negatives of the maintaining the extra infrastructure necessary for celery.
It's basically impossible to answer this without a lot more specific information regarding the specifics of your situation.
I'm writing a chat application using Google App Engine. I would like chats to be logged. Unfortunately, the Google App Engine datastore only lets you write to it once per second. To get around this limitation, I was thinking of using a memcache to buffer writes. In order to ensure that no data is lost, I need to periodically push the data from the memcache into the data store.
Is there any way to schedule jobs like this on Google App. Engine? Or am I going about this in entirely the wrong way?
I'm using the Python version of the API, so a Python solution would be preferred, but I know Java well enough that I could translate a Java solution into Python.
To get around the write/update limit of entity groups (note that entities without parent are their own entity group) you could create a new entity for every chat message and keep a property in them that would reference a chat they belong to.
You'd then find all chat messages that belong to a chat via a query. But this would be very inefficient, as you'd then need to do a query for every user for every new message.
So go with the above advice, but additionally do:
Look into backends. This are always-on instances where you could aggregate chat messages in memory (and immediately/periodically flush them to datastore). When user requests latest chat messages, you already have them in memory and would serve them instantly (saving on time and cost compared to using Datastore). Note that backends are not 100% reliable, they might go down from time to time - adjust chat message flushing to datastore accordingly.
Check out Channels API. This will allow you to notify users when there is a new chat message. This way you'd avoid polling for new chat messages and keep the number or requests down.
Sounds like the wrong way since you are risking losing data on memcache.
You can write to one entity group once per second.
You can write separate entity groups very rapidly. So it really depends how you structure your data. For example, if you kept an entire chat in one entity, you can only write that chat once per second. And you'd be limited to 1MB.
You should write a separate entity per message in the chat, you can write very, very quickly, but you need to devise a way to pull all the messages together, in order for the log.
Edit I agree with Peter Knego that the costs of using one entity per message will get way too expensive. His backend suggestion is pretty good too, although if your app is popular, backends don't scale that well.
I was trying to avoid sharding, but I think it will be necessary. If you're not familiar with sharding, read up on this: https://developers.google.com/appengine/articles/sharding_counters
Sharding would be an intermediate between writing one entity for all messages in a conversation, vs one entity per message. You would randomly split the messages between a number of entities. For example, if you save the messages in 3 entities, you can write 5x/sec (I doubt most human conversations would go any faster than that).
On fetching, you would need to grab the 3 entities, and merge the messages in chronological order. This would save you a lot on cost. But you would need to write the code to do the merging.
One other benefit is that your conversation limit would now be 3MB instead of 1MB.
Why not use a pull task? I highly recommend this Google video is you are not familiar enough with task queues. First 15 minutes will cover pull queue info that may apply to your situation. Anything involving per message updates may get quite expensive re: database ops, and this will be greatly exacerbated if you have any indices involved. Video link:
https://www.youtube.com/watch?v=AM0ZPO7-lcE&feature=player_embedded
I would simply set up my chat entity when users initiate it in the on-line handler, passing back the entity id to the chat parties. Send the id+message to your pull queue, and serialize the messages within the chat entity's TextProperty. You wont likely schedule the pull queue cron more often than once per second, so that avoids your entity update limitation. Most importantly: your database ops will be greatly reduced.
I think you could create tasks which will persist the data. This has the advantage that, unlike memcached the tasks are persisted and so no chats would be lost.
when a new chat comes in, create a task to save the chat data. In the task handler do the persist. You could either configure the task queue to pull at 1 per second (or slightly slower) and save each bit of chat data held in the task persist the incoming chats in a temporary table (in different entity groups), and every have the tasks pull all unsaved chats from the temporary table, persist them to the chat entity then remove them from the temporary table.
i think you would be fine by using the chat session as entity group and save the chat messages .
this once per second limit is not the reality, you can update/save at a higher rate and im doing it all the time and i don't have any problem with it.
memcache is volatile and is the wrong choice for what you want to do. if you start encountering issues with the write rate you can start setting up tasks to save the data.
Mornink!
I need to design, write and implement wide system consisting of multiple unix servers performing different roles and running different services. The system must be bullet proof, robust and fast. Yeah, I know. ;) Since I dont know how to approach this task, I've decided to ask you for your opinion before I leave design stage. Here is how the workflow is about to flow:
users are interacting with website, where they set up demands for service
this demand is being stored (database?) and some kind of message to central system (clustered) is being sent about new demand in database/queue
central system picks up the demand and sends signals to various other systems (clusters) to perform their duties (parts of the demanded service setup)
when they are done, they send up message to central system or the website that the service is now being served
Now, what is the modern, robust, clean and efficient way of storing these requests in some kind of queue, and executing them? Should I send some signals, or should I let all subsystems check the queue/db of any sort for new data? What could be that queue, should it be a database? How to deal with the messages? I thought about opening single tcp connection and sending data over that, along with comands triggering actions/functions on the other end, but at closer inspection, there has to be other, better way. So I found Spring Python, that has been criticized for being so 90's-ish.
I know its a very wide question, but I really hope you can help me wrap my head around that design and not make something stupid here :)
Thanks in advance!
Some general ideas for you:
You could have a master-client approach. Requests would be inserted in the master, stored in a database. Master knows the state of each client (same db). Whenever there is a request, the master redirects it to a free client. The client reports back when has finished the task (including answers if any), making it able to receive a new task from the master (this removes the need for pooling).
Communication could be done using web-services. An HTTP request/post should solve every cases. No need to actually go down to the TCP level.
Just general ideas, hope they're useful.
There are a number of message queue technologies out there which are Python friendly which could serve quite well. The top two that I know of are ActiveMQ and RabbitMQ, which both play well with Python, plus I found this comparison which states that ActiveMQ currently (as of 18 months ago!) outperforms RabbitMQ.
I'm looking into way to track events in a django application (events would generally be clicks tied to a specific unique user id).
These events would essentially contain an event type like "click" and then each click event would be assigned to a unique id (many events can go to one id) and each event would have a data set including items like referrer etc...
I have tried mixpanel, but for now the data api they are offering seems too limiting as I can't seem to find a way to get all of my data out by a unique id (apart from the event itself).
I'm looking into using django-eventracker, but curious about any others thought on the best way to do this. Mongo or CouchDb seem like a great choice here, but the celery/rabbitmq looks really attractive with mongo. Pumping these events into the existing applications db seems limiting at this point.
Anyways, this is just a thread to see what others thoughts are on this and how they have implemented something like this...
shoot
I am not familiar with the pre-packaged solutions you mention. Were I to design this from scratch, I'd have a simple JS collecting info on clicks and posting it back to the server via Ajax (using whatever JS framework you're already using), and on the server side I'd simply append that info to a log file for later "offline" processing -- so that would be independent of django or other server-side framework, essentially.
Appending to a log file is a very light-weight action, while DBs for web use are generally way optimized for read-intensive (not write-intensive) operation, so I agree with you that force fitting that info (as it trickes in) into the existing app's DB is unlikely to offer good performance.
You probably want to keep a flexible format for your logs to anticipate future needs or changes. In this sense, the schema-less document-oriented databases are nice. One advantage is that the structure of your data will be close to your application needs for whatever analyses you perform later (so, avoiding some of the inevitable parsing/data munging work).
If you're thinking about using mysql, postgresql or such, then you should look into something like rsyslog for buffering writes and avoiding the performance penalty with heavy logging. (I can't say much about celery and other queueing mechanisms for this type of thing, but they sound promising.)
Mongodb has a some nice features that make it amenable to logging such as capped collections. A summary can be found in this post.
If by click, you mean a click on a link that loads a new page (or performs an AJAX request), then what you aim to do is fairly straightforward. Web servers tend to keep plain-text logs about requests - with information about the user, time/date, referrer, the page requested, etc. You could examine these logs and mine the statistics you need.
On the other hand, if you have a web application where clicks don't necessarily generate server requests, then collecting click information with javascript is your best bet.