Overhead caused while fetching the required client in WAMP WS - python

I have created a websocket server using the WAMP WS provided in python programming language.
I have a requirement where I am subscribing about 500 clients with the WAMP WS server at a time.
But when I am publishing the data I will send it only to a single client based on certain conditions. I know that it is very much simple to just loop throgh the list of cliets and find out the eligible and then send the data to that respective client.
I would like to know, is there any other way without using the loops, as using loops will lead to a large overhead if in case the required client is at the last position.

presumably you loop through each client's eligibility data and do some sort of decision based on that data. it would follow that an index on the eligibility data would give you near instant access. so, using pseudo code, something like:
client_array = []
client_index = {}
client_array.add(new client)
if not new client.eligibility_data in client_index:
client_index[new client.eligibility_data] = []
client_index[new client.eligibility_data].add(new client)
I don't know what the eligibility data is, but, say it is the weight of the client. If you wanted to send a message to everybody that weighs between 200 and 205 points you could find those clients in the client_index[200] through [205].
if the condition cannot be determined before hand then you may need a database which can handle arbitrary queries to determine the client targets.

When doing a publish, you can provide a list of eligible receivers for the event via options, e.g. similar to this. The list of eligible receiver should be specified as a list of WAMP session IDs (which is the correct way to identify WAMP clients in this case).
Internally, AutobahnPython uses Python sets and set operations to compute the actual receivers - which is quite fast (and built into the language .. means native code runs).

Related

How can I optimize an alert system that processes 10k requests / job?

I'm build a solution Match Service where receive data from a third party provider from MQTT server. This data is a realtime data. We save this data in RDS Cluster.
Our users can create in another service a filter called Strateg, we send a cron every 5 minutes to this service and all records in database are send to Kafka topic to be processed in Match Service.
My design is based on events, so each new Strategy record in topic, Match Service performs a query in database for check if have any Match that active the Strategy threshold. If the threshold is passed, it sends out an new message to broker.
The API processes about 10k Strategy in each job, it's taking timing (about 250s for each job).
So my question is if there is a better way to design this system? I was thinking of adding a redis-layer, to avoid database transactions.
All suggestions welcome!
Think long and hard about your relational data store. If you really need it to be relational, then it may absolutely make sense, but if not, a relational database is often a horrible place to dump things like time-series and IoT output. It's a great place to put normalized, structured data for reporting, but a lousy dump/load location and real-time matching.
Look more at something like AWS RedShift, ElasticSearch, or some other no-sql solution that can ingest and match things at orders of magnitude higher scale.

how to add rate limiting on tornado python app

would it be possible to implement a rate limiting feature on my tornado app? like limit the number of HTTP request from a specific client if they are identified to send too many requests per second (which red flags them as bots).
I think I could it manually by storing the requests on a database and analyzing the requests per IP address but I was just checking if there is already an existing solution for this feature.
I tried checking the github page of tornado, I have the same questions as this post but no explicit answer was provided. checked tornado's wiki links as well but I think rate limiting is not handled yet.
Instead of storing them in the DB, would be better to have them in a dictionary stored in memory for easy usability.
Also can you share the details whether the api has a load-balancer and which web-server is used.
The enterprise grade solution to your problem is ambassador.
You can use ambassador's solutions like envoy proxy and edge stack and have it set up that can do the needful.
additional to tore the data, you can use any popular cached db, or d that store as key:value pairs, for example redis.
if you doing this for a very small project, can use some npm/pip packages.
Read the docs: https://www.getambassador.io/products/edge-stack/api-gateway/
You should probably do this before your requests reach Tornado.
But if it's an application level feature (limiting requests depending on level of subscription), then you can do it in Tornado in lots of ways, depending on how complex you want the rate limiting to be.
Probably the simplest way is to have a dict on your tornado.web.Application that uses the ip as the key and the timestamp of the last request as the value and check every request against it in prepare- if not enough time passed since last request, raise a tornado.web.HTTPError(429) (ideally with a Retry-After header). If you do this, you will still need to clean up this dict now and then to remove entries that have not made a request recently or it will grow (you could do it finish on every request).
If you have another fast/in-memory storage attached (memcache, redis, sqlite), you should use that, but you definitely should not use an RDBMS as all those writes will not be great for its performance.

RabbitMQ/pika - Single RPC callback for multiple clients

I have followed the examples at the official RabbitMQ site and I then tried to take it a step further. I tried to apply the same RPC logic with a single server and multiple clients. Following the examples, I am using BlockingConnection() for now. Each client calls the process_data_events() function in a loop and check for it's corresponding correlation_id. All of the clients check for their correlation id on the same callback_queue.
For example, in a setup of 2 clients and 1 sever, there are 2 queues. One that both clients publish to, and one that both clients check for the corresponding correlation_id. The code works flawlessly with a single client and a single server (or even multiple servers), but fails to work when more than one clients consume on the callback_queue.
My experiments have showed that when a client receives (via the process_data_events()) an id that is not theirs, that id is not processed by the other client, ever. Hence a timeout occurs or the connection is dropped since no heartbeat is sent for quite some time. The function after which the problem occurs is channel.basic_consume(queue='callback',on_message_callback=on_resp)
Should I use a unique callback queue for each client? The documentation was not as helpful as I would have hoped, is there something you would recommend me studying?
I can post minimal code to reproduce the issue if you ask me to.
Thanks in advance
EDIT: This repo contains minimal code to reproduce the issue plus some more details.

Why does search in gmail API return different result than search in gmail website?

I'm using the gmail API to search emails from users. I've created the following search query:
ticket after:2015/11/04 AND -from:me AND -in:trash
When I run this query in the browser interface of Gmail I get 11 messages (as expected). When I run the same query in the API however, I get only 10 messages. The code I use to query the gmail API is written in Python and looks like this:
searchQuery = 'ticket after:2015/11/04 AND -from:me AND -in:trash'
messagesObj = google.get('/gmail/v1/users/me/messages', data={'q': searchQuery}, token=token).data
print messagesObj.resultSizeEstimate # 10
I sent the same message on to another gmail address and tested it from that email address and (to my surprise) it does show up in an API-search with that other email address, so the trouble is not the email itself.
After endlessly emailing around through various test-gmail accounts I *think (but not 100% sure) that the browser-interface search function has a different definition of "me". It seems that in the API-search it does not include emails which come from email addresses with the same name while these results are in fact included in the result of the browser-search. For example: if "Pete Kramer" sends an email from petekramer#icloud.com to pete#gmail.com (which both have their name set to "Pete Kramer") it will show in the browser-search and it will NOT show in the API-search.
Can anybody confirm that this is the problem? And if so, is there a way to circumvent this to get the same results as the browser-search returns? Or does anybody else know why the results from the gmail browser-search differ from the gmail API-search? Al tips are welcome!
I would suspect it is the after query parameter that is giving you trouble. 2015/11/04 is not a valid ES5 ISO 8601 date. You could try the alternative after:<time_in_seconds_since_epoch>
# 2015-11-04 <=> 1446595200
searchQuery = 'ticket AND after:1446595200 AND -from:me AND -in:trash'
messagesObj = google.get('/gmail/v1/users/me/messages', data={'q': searchQuery}, token=token).data
print messagesObj.resultSizeEstimate # 11 hopefully!
The q parameter of the /messages/list works the same as on the web UI for me (tried on https://developers.google.com/gmail/api/v1/reference/users/messages/list#try-it )
I think the problem is that you are calling /messages rather than /messages/list
The first time your application connects to Gmail, or if partial synchronization is not available, you must perform a full sync. In a full sync operation, your application should retrieve and store as many of the most recent messages or threads as are necessary for your purpose. For example, if your application displays a list of recent messages, you may wish to retrieve and cache enough messages to allow for a responsive interface if the user scrolls beyond the first several messages displayed. The general procedure for performing a full sync operation is as follows:
Call messages.list to retrieve the first page of message IDs.
Create a batch request of messages.get requests for each of the messages returned by the list request. If your application displays message contents, you should use format=FULL or format=RAW the first time your application retrieves a message and cache the results to avoid additional retrieval operations. If you are retrieving a previously cached message, you should use format=MINIMAL to reduce the size of the response as only the labelIds may change.
Merge the updates into your cached results. Your application should store the historyId of the most recent message (the first message in the list response) for future partial synchronization.
Note: You can also perform synchronization using the equivalent Threads resource methods. This may be advantageous if your application primarily works with threads or only requires message metadata.
Partial synchronization
If your application has synchronized recently, you can perform a partial sync using the history.list method to return all history records newer than the startHistoryId you specify in your request. History records provide message IDs and type of change for each message, such as message added, deleted, or labels modified since the time of the startHistoryId. You can obtain and store the historyId of the most recent message from a full or partial sync to provide as a startHistoryId for future partial synchronization operations.
Limitations
History records are typically available for at least one week and often longer. However, the time period for which records are available may be significantly less and records may sometimes be unavailable in rare cases. If the startHistoryId supplied by your client is outside the available range of history records, the API returns an HTTP 404 error response. In this case, your client must perform a full sync as described in the previous section.
From gmail API Documentation
https://developers.google.com/gmail/api/guides/sync

How do I create a D-Bus service that dynamically creates multiple objects?

I'm new to D-Bus (and to Python, double whammy!) and I am trying to figure out the best way to do something that was discussed in the tutorial.
However, a text editor application
could as easily own multiple bus names
(for example, org.kde.KWrite in
addition to generic TextEditor), have
multiple objects (maybe
/org/kde/documents/4352 where the
number changes according to the
document), and each object could
implement multiple interfaces, such as
org.freedesktop.DBus.Introspectable,
org.freedesktop.BasicTextField,
org.kde.RichTextDocument.
For example, say I want to create a wrapper around flickrapi such that the service can expose a handful of Flickr API methods (say, urls_lookupGroup()). This is relatively straightforward if I want to assume that the service will always be specifying the same API key and that the auth information will be the same for everyone using the service.
Especially in the latter case, I cannot really assume this will be true.
Based on the documentation quoted above, I am assuming there should be something like this:
# Get the connection proxy object.
flickrConnectionService = bus.get_object("com.example.FlickrService",
"/Connection")
# Ask the connection object to connect, the return value would be
# maybe something like "/connection/5512" ...
flickrObjectPath = flickrConnectionService.connect("MY_APP_API_KEY",
"MY_APP_API_SECRET",
flickrUsername)
# Get the service proxy object.
flickrService = bus.get_object("com.example.FlickrService",
flickrObjectPath);
# As the flickr service object to get group information.
groupInfo = flickrService.getFlickrGroupInfo('s3a-belltown')
So, my questions:
1) Is this how this should be handled?
2) If so, how will the service know when the client is done? Is there a way to detect if the current client has broken connection so that the service can cleanup its dynamically created objects? Also, how would I create the individual objects in the first place?
3) If this is not how this should be handled, what are some other suggestions for accomplishing something similar?
I've read through a number of D-Bus tutorials and various documentation and about the closest I've come to seeing what I am looking for is what I quoted above. However, none of the examples look to actually do anything like this so I am not sure how to proceed.
1) Mostly yes, I would only change one thing in the connect method as I explain in 2).
2) D-Bus connections are not persistent, everything is done with request/response messages, no connection state is stored unless you implement this in third objects as you do with your flickerObject. The d-bus objects in python bindings are mostly proxies that abstract the remote objects as if you were "connected" to them, but what it really does is to build messages based on the information you give to D-Bus object instantiation (object path, interface and so). So the service cannot know when the client is done if client doesn't announce it with other explicit call.
To handle unexpected client finalization you can create a D-Bus object in the client and send the object path to the service when connecting, change your connect method to accept also an ObjectPath parameter. The service can listen to NameOwnerChanged signal to know if a client has died.
To create the individual object you only have to instantiate an object in the same service as you do with your "/Connection", but you have to be sure that you are using an unexisting name. You could have a "/Connection/Manager", and various "/Connection/1", "/Connection/2"...
3) If you need to store the connection state, you have to do something like that.

Categories