I have a basic Web API written in Node.js that writes an object as an HSET to a Redis cache. Both are running in docker containers.
I have a Python script running on the same VM which needs to watch the Redis cache and then run some code when there is a new HSET or a change to a field in the HSET.
I came across Redis Pub/Sub but I'm not sure if this is really the proper way to use it.
To test, I created two Python scripts. The first subscribes to the messaging system:
import redis
import json
print ("Redis Subscriber")
redis_conn = redis.Redis(
host='localhost',
port=6379,
password='xxx',
charset="utf-8",
decode_responses=True)
def sub():
pubsub = redis_conn.pubsub()
pubsub.subscribe("broadcast")
for message in pubsub.listen():
if message.get("type") == "message":
data = json.loads(message.get("data"))
print(data)
if __name__ == "__main__":
sub()
The second publishes to the messaging system:
import redis
import json
print ("Redis Publisher")
redis_conn = redis.Redis(
host='localhost',
port=6379,
password='xxx',
charset="utf-8",
decode_responses=True)
def pub():
data = {
"message": "id:3"
}
redis_conn.publish("broadcast", json.dumps(data))
if __name__ == "__main__":
pub()
I will rewrite the publisher in Node.js and it will simply published the HSET key, like id:3. Then the subscriber will run in Python and when it received a new message, it will use that HSET key "id:3" to look up the actual HSET and do stuff.
This doesn't seem like the right way to do this but Redis watch doesn't support HSET. Is there a better way to accomplish this?
This doesn't seem like the right way to do this but Redis watch doesn't support HSET.
Redis WATCH does support hash keys - while it does not support hash fields.
Is there a better way to accomplish this?
While I believe your approach may be acceptable for certain scenarios, pub/sub messages are fire-and-forget: your subscriber may disconnect for whatever reason right after the publisher has published a message but before having the chance to read it - and your object write will thus be lost forever, even if the subscriber automatically reconnects after that.
You may opt instead for Redis streams, which allow to add entries to a given stream (resembling the publishing process of your code) and consume them (akin your subscriber script), through a process which preserves the messages.
As an alternative, perhaps simpler, approach, you may just split your hashes into multiple keys, one per field, so that you can WATCH them.
You might want to take a look at key-space notifications. Key-space notifcations can automatically publish messages when via PubSub when a key is changed, added, deleted, etc.
You can choose to consume events, i.e. HSET was called, and be provided the keyname it was called upon. Or, you can choose to consume keys, i.e my:awesome:key, and be notified with what event happened. Or both.
You'll need to turn key-space notifications on in order to use them:
redis.cloud:6379> CONFIG SET notify-keyspace-events KEA
You can subscribe to all events and keys like this:
redis.cloud:6379> PSUBSCRIBE '__key*__:*'
"pmessage","__key*__:*","__keyspace#0__:foo","set"
"pmessage","__key*__:*","__keyevent#0__:set","foo"
Hope that helps!
Related
I have java Kafka stream processing application and python application. Java application produces data and python consumer consumes it. When the processing.guarantee is set to exactly_once, then the python consumer is not able to deserialize the data. Deserialization fails.
I tried a java consumer and the java consumer is successfully reading the data. Then I turned back the processing.guarantee to atleast_once in java application. Now the python application is able to read without any issue.
I checked the payload from a console consumer and in both cases of exactly_once and atleast_once the payload looks same. Even the binary payload read at python consumer before deserializtion in both cases looks same. What could be the problem in this scenario.
Note: In my case the kafka doesn't have atleast 3 brokers which is suggested in the documentation for exactly_once to work. Its only one in my setup.
Can anyone throw some light into why java consumer was working but not python consumer.
Update: Looking at the python logs much deeper looks like two records are being tried to process in python consumer
Original record - which is processed perfectly fine.
An empty record - log shows as follows key = b'\x00\x00\x00\x01' and value = b'\x00\x00\x00\x00\x00\x00'. But now Iam wondering how this additional record is send when exactly_once is set.
Below is the python code used.
params = {
"bootstrap_servers": "localhost:29092",
"auto_offset_reset": "latest",
"group_id": "test",
}
def set_consumer(self):
try:
consumer = KafkaConsumer(*self.topics, **self.consumer_params)
return consumer
Exception e:
print(e)
for msg in self.consumer:
try:
event = self.decode_msg(msg)
self.logger.info("Json result : %s", str(event))
wondering how this additional record is send when exactly_once is set
It is a transaction marker. The Java consumer is able to detect these and filter them out, but in Python, your deserializer will need to handle them separately. There is a Github issue thread that suggests the consumer should already be able to filter the transaction records; maybe check the librdkafka docs if you are missing any configurations for this.
I see there is an EOS example in the confluent-kafka-python repo, but it doesn't consume after the producer sends the transaction records.
I have a Flask app which uses SocketIO to communicate with users currently online. I keep track of them by mapping the user ID with a session ID, which I can then use to communicate with them:
online_users = {'uid...':'sessionid...'}
I delcare this in my run.py file where the app is launched, and then I import it when I need it as such:
from app import online_users
I'm using Celery with RabbitMQ for task deployment, and I need to use this dict from within the tasks. So I import it as above, but when I use it it is empty even when I know it is populated. I realize after reading this that it is because each task is asynchronous and starts a new process with an empty dict, and so my best bet is to use some sort of database or cache.
I'd rather not run an additional service, and I only need to read from the dict (I won't be writing to it from the tasks). Is a cache/database my only option here?
That depends on what you have in the dict.... If it's something you can serialize to string you can serialize it to Json and pass it as an argument to that task. If it's an object you cannot serialize, then yes you need to use a cache/database.
I came across this discussion which seems to be a solution for exactly what I'm trying to do.
Communication through a message queue is now implemented in package python-socketio, through the use of Kombu, which provides a common API to work with several message queues including Redis and RabbitMQ.
Supposedly an official release will be out soon, but as of now it can be done using an additional package.
Setup:
Tornado HTTP/WebSocket Server. WebSocketHandler reacts on messages from the client (e.g. put them in the job-queue)
A beanstalk job-queue which sends jobs to the different components
Some other components communicating over beanstalk, but those are unrelated to my problem.
Problem:
WebSocketHandler should react on jobs, but if he is listening on beanstalk, its blocking. A job could be e.g. 'send data xy to client xyz'
How can this be solved nicely?
My first approach was running a jobqueue-listener in a separate thread which contained a list of the pickled WebSocketHandler. All should be stored in a redis-db. Since WebsocketHandler can't be pickled (and this approach seems to be very ugly) I'm searching for another solution.
Any ideas?
Instead of trying to pickle your WebSocketHandler instances you could store them in a class level (or just global) dictionary.
class MyHandler(WebSocketHandler):
connections = {}
def __init__(self, *args, **kwargs):
self.key = str(self)
self.connections[self.key] = self
Then you would pass the self.key along with the job to beanstalk, and when you get a job back you look up which connection to send the output to with the key, and then write to it. Something like (pseudo code...)
def beanstalk_listener():
for response in beanstalk.listen():
MyHandler.connections[response.data[:10]].write_message(response[10:])
I don't think there is any value in trying to persist your websockethandler connections in redis. They are by nature ephemeral. If your tornado process restarts/dies they have no use. If what you are trying to do is keep a record of which user is waiting for the output of which job then you'll need to keep track of that separately.
I have a Celery task registered in my tasks.py file. When someone POST to /run/pk I run the task with the given parameters. This task also executes other tasks (normal Python functions), and I'd like to update my page (the HttpResponse returned at /run/pk) whenever a subtask finishes its work.
Here is my task:
from celery.decorators import task
#task
def run(project, branch=None):
if branch is None:
branch = project.branch
print 'Creating the virtualenv'
create_virtualenv(project, branch)
print 'Virtualenv created' ##### Here I want to send a signal or something to update my page
runner = runner(project, branch)
print 'Using {0}'.format(runner)
try:
result, output = runner.run()
except Exception as e:
print 'Error: {0}'.format(e)
return False
print 'Finished'
run = Run(project=project, branch=branch,
output=output, **result._asdict())
run.save()
return True
Sending push notifications to the client's browser using Django isn't easy, unfortunately. The simplest implementation is to have the client continuously poll the server for updates, but that increases the amount of work your server has to do by a lot. Here's a better explanation of your different options:
Django Push HTTP Response to users
If you weren't using Django, you'd use websockets for these notifications. However Django isn't built for using websockets. Here is a good explanation of why this is, and some suggestions for how to go about using websockets:
Making moves w/ websockets and python / django ( / twisted? )
With many years past since this question was asked, Channels is a way you could now achieve this using Django.
Then Channels website describes itself as a "project to make Django able to handle more than just plain HTTP requests, including WebSockets and HTTP2, as well as the ability to run code after a response has been sent for things like thumbnailing or background calculation."
There is a service called Pusher that will take care of all the messy parts of Push Notifications in HTML5. They supply a client-side and server-library to handle all the messaging and notifications, while taking care of all the HTML5 Websocket nuances.
I want to schedule an email to be sent to a user upon a specific action.
However, if the user takes another action I want to cancel that email and have it not send.
How would I do that in django or python?
Beanstalkd
If you can install beanstalkd and run python script from command line I would use that to schedule emails. With beanstalkc client you can easily accomplish this. On ubuntu you might first need to install:
sudo apt-get install python-yaml python-setuptools
consumer.py:
import beanstalkc
def main():
beanstalk = beanstalkc.Connection(host='localhost', port=11300)
while True:
job = beanstalk.reserve()
print job.body
job.delete()
if __name__ == '__main__':
main()
Will print job 5 seconds after it get's inserted by producer.py. Offcourse this should be set longer to when you want to schedule your emails, but for demonstration purposes it will do. You don't want to wait half an hour to schedule message when testing ;).
producer.py:
import beanstalkc
def main():
beanstalk = beanstalkc.Connection(host='localhost', port=11300)
jid = beanstalk.put('foo', delay=5)
if __name__ == '__main__':
main()
GAE Task Queue
You could also use Google App engine task queue to accomplish this. You can specify an eta for your Task. Google App engine has a generous free quota. In the task queue webhook make Asynchronous Requests to fetch URL on your server which does the sending of emails.
I would set up a cron job which could handle everything you want to do...
If you didn't have access to cron, you could easily do this:
Write a model that stores the email, the time to send, and a BooleanField indicating if the email has been sent.
Write a view which selects all emails that haven't been sent yet but should have by now, and sends them.
Use something like OpenACS Uptime, Pingdom or any other service capable of sending HTTP GET requests periodically to call that view, and trigger the email sending. (Both are free, the former should request once every 15 minutes, and the latter can be configured to request up to every minute, and will do so from several locations.)
Sure, it's inelegant, but it's a method that works on basically any web host. I used to do something like this when I was writing PHP apps to run on a host that killed all processes after something like 15 seconds.
Are your using celery ? If true, see http://ask.github.com/celery/userguide/executing.html#eta-and-countdown
You said that you want to do it through Python or Django, but it seems as though something else will need to be involved. Considering you are on a shared host, there is a chance installing other packages could also be a problem.
Another possible solution could be something like this:
Use a javascript framework which can setup timed events, start/cancel them etc. I have done timed events using a framework called ExtJS. Although ExtJS is rather large im sure other frameworks such as jQuery or even raw javascript you could do a similar thing.
Set up a task on a user action, that will execute in 5 minutes. The action could be an ajax call to a python script which sends the email... If a user does something where the task needs to be stopped, just cancel the event.
It kind of seems complicated and convoluted, but it really isn't. If this seems like a path you would like to try out, let me know and I'll edit with some code