I load messages from kafka topic to database. Loading to database can fail. Also I do not want to lose unsent messages.
App code:
import faust
app = faust.App('App', broker='kafka://localhost:9092')
source_topic = app.topic('source_topic')
failed_channel = app.channel() # channel for unsent messages
#app.agent(source_topic)
async def process(stream):
async for batch in stream.take(100_000, within=60):
# here we have not info about partitions and keys
# to reuse them when resending if sending failed
try:
pass # send to database. can fail
except ConnectionError:
for record in batch:
# sending to channel is faster than sending to topic
await failed_channel.send(value=record)
#app.agent(failed_channel)
async def resend_failed(stream):
async for unsent_msg in stream:
await source_topic.send(value=unsent_msg)
Maybe there is more standart way to handle such situations? Adding app.topic('source_topic', acks=False) works only after restarting app.
I load messages from kafka topic to database
Maybe there is more standart way to handle such situations
Yes - it's called Kafka Connect :-)
The standard pattern is to do any processing on your data and write it [back to] Kafka topics. Then you use the Kafka topic as a source for a Kafka Connect sink connector, in this case the Kafka Connect JDBC Sink connector.
Kafka Connect is part of Apache Kafka, and handles restarts, scaleout, failures, etc etc.
See also Kafka Connect in Action: JDBC Sink
Related
I would like to set-up a server that can subscribe to an external stream over a websocket (ws_ext) and then republish that data (after curating) to internal clients connecting to this server over websockets (ws_int).
My approach so far is to set up a fastapi server that can open websockets (ws_int) with internal clients .
However, I don't understand how to have a listener embedded in this server that can listen to external stream and then publish to these internal clients in a non blocking way.
Can someone point me to a working example that can help?
Here is what i would like to achieve:
p.s: I HAVE BEEN able to make it work by decoupling broadcaster from subscriber using redis pubsub. So, what i have now setup is a client that listens to external stream, curate and pushes it to redis pubsub. then i have a separate broadcaster that listens to redis pubsub and pushes it out to clients after curating on its websockets. I would still love to combine these two without using redis or some such backend.
if you have all clients connected to an async ws located in the broadcaster then the same time push whatever that's coming asynchronously to broadcaster from external website the process should be unblocking supposedly
the update process can have a async stream pipeline to filter results coming from external website for each client in broadcaster
as for example to async client for WebSocket it can go by "with async"
async def hello():
async with websockets.connect(
'ws://localhost:8765', ssl=ssl_context) as websocket:
name = input("What's your name? ")
await websocket.send(name)
print(f"> {name}")
greeting = await websocket.recv()
print(f"< {greeting}")
asyncio.get_event_loop().run_until_complete(hello())
producer = KafkaProducer(bootstrap_servers='kf-p1l-node3:9092,xxxxx,xxxxx',
value_serializer=lambda x: dumps(x).encode('utf-8')) # utf-8
consumer = KafkaConsumer( bootstrap_servers='rdwh-node1:49092,xxxxx,xxxxx',
# bootstrap_servers='kf-p1l-node3:9092,xxxxx,xxxxx',
auto_offset_reset=param["AUTO_OFFSET_RESET"],
consumer_timeout_ms=param["CONSUMER_TIMEOUT_MS"],
enable_auto_commit=False,
auto_commit_interval_ms=60000,
group_id=param["GROUP_ID"],
client_id=param["CLIENT_ID"]
)
consumer.subscribe([param["TOPIC_IN"]])
This code work if, KafkaProducer and KafkaConsumer's bootstrap_server are the same. But if change KafkaConsumer to another server it doesn't work
Bootstrap servers must contain all servers for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping. You can check the documentation here: http://kafka.apache.org/090/documentation.html
consumer = KafkaConsumer('my-topic',
group_id='my-group',
bootstrap_servers=['node1:port1', 'node1:port2', 'node2:port3'])
I am trying to write a producer and consumer code in python using pika for rabbitmq. However for my specific case, I need to run producer on a different host and consumer on other.
I have already written a producer code as:
import pika
credentials = pika.PlainCredentials('username', 'password')
parameters = pika.ConnectionParameters('ip add of another host', 5672, '/', credentials)
connection = pika.BlockingConnection()
channel = connection.channel()
channel.queue_declare(queue='test')
channel.basic_publish(exchange='', routing_key='test', body='hello all!')
print (" [x] sent 'Hello all!")
connection.close()
The above producer code is running without any error. I also created a new user and gave administrator credentials to it on rabbitmq-server. However when I run the consumer code on another host running rabbitmq-server, I do not see any output:
import pika
credentials = pika.PlainCredentials('username', 'password')
parameters = pika.ConnectionParameters('localhost', 5672, '/', credentials)
connection = pika.BlockingConnection()
channel = connection.channel()
channel.queue_declare(queue='test')
def callback(ch, method, properties, body):
print(" [x] Recieved %r" % body)
channel.basic_consume(
queue='test', on_message_callback=callback, auto_ack=True)
print (' [x] waiting for messages. To exit press ctrl+c')
channel.start_consume()
So, here i had two hosts on the same network which had rabbitmq installed. However one has 3.7.10 and other had 3.7.16 version of rabbitmq.
The producer is able to send the text without error, but the consumer on another host is not receiving any text.
I do not get any problem when both run on same machine, as i just replace connection settings with localhost. Since user guest is only allowed to connect on localhost by default, i created a new user on consumer host running rabbitmq-server.
Please look if anyone can help me out here...
I have a couple of questions when I see your problem:
Are you 100% sure that on your RabbitMQ management monitoring
you see 2 connections? One from your local host and another from the another host? This will help to debug
Second, Did you check that your ongoing port 5672 on the server that host RabbitMQ is open? Because maybe your producer does not manage to connect What is your cloud provider?
If you don't want to manage those kinds of issues, you should use a service like https://zenaton.com. They host everything for you, and you have integrated monitoring, error handling etc.
Your consumer and producer applications must connect to the same RabbitMQ server. If you have two instances of RabbitMQ running they are independent. Messages do not move from one instance of RabbitMQ to another unless you configure Shovel or Federation.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
You don't seem to be passing the parameters to the BlockingConnection instance.
import pika
rmq_server = "ip_address_of_rmq_server"
credentials = pika.PlainCredentials('username', 'password')
parameters = pika.ConnectionParameters(rmq_server, 5672, '/', credentials)
connection = pika.BlockingConnection(parameters)
channel = connection.channel()
Also, your consumer is attaching to the localhost hostname. Make sure this actually resolves and that your RabbitMQ service is listening on the localhost address (127.0.0.1) It may not be bound to that address. I believe that RMQ will bind to all interfaces (and thus all addresses) by default but I'm not sure.
I have an end-to-end pipeline of an web application like below in Python3.6
Socket(connection from client to server) -> Flask Server -> Kafka Producer ->Kafka Consumer ->NLPService
Now when I get some result back from the NLPService, I need to send it back to the client. I am thinking below steps
NLP service writes the result to a different topic on Kafka producer (done)
Kafka consumer retrieves the result from Kafka broker (done)
Kafka consumer needs to write the result to the flask server
Then flask server will send the result back to the socket
Socket writes to client
I have already done steps 1-2. But stuck at step 3, 4. How do I write from Kafka to the flask server? If I just call a function at my server.py, then logically it seems like I have to create a socket within at function at server.py which will do the job of sending to client through socket. But syntax wise it looks weird. What am I missing?
at consumer.py
#receiving reply
topicReply = 'Reply'
consumerReply = KafkaConsumer(topicReply, value_deserializer=lambda m: json.loads(m.decode('ascii')))
for message in consumerReply:
#send reply back to Server
fromConsumer(message.value)
at server.py
socketio = SocketIO(app)
def fromConsumer(msg):
#socketio.on('reply')
def replyMessage(msg):
send(msg)
The above construct in server.py doesn't make sense to me. Please suggest.
I have an existing Python system that receives messages using Rabbit MQ. What is the absolute easiest way to get these events pushed to a browser using WebSockets using Python? Bonus if the solution works in all major browsers too.
Thanks,
Virgil
Here https://github.com/Gsantomaggio/rabbitmqexample I wrote an complete example that uses tornado and RabbitMQ.
You can find all the instruction from the site:
anyway ..you need:
pip install pika
pip install tornado
First you register your rabbitmq:
def threaded_rmq():
channel.queue_declare(queue="my_queue")
logging.info('consumer ready, on my_queue')
channel.basic_consume(consumer_callback, queue="my_queue", no_ack=True)
channel.start_consuming()
then you register your web-sockets clients:
class SocketHandler(tornado.websocket.WebSocketHandler):
def open(self):
logging.info('WebSocket opened')
clients.append(self)
def on_close(self):
logging.info('WebSocket closed')
clients.remove(self)
When you get a message, you can redirect it to the web-socket page.
def consumer_callback(ch, method, properties, body):
logging.info("[x] Received %r" % (body,))
# The messagge is brodcast to the connected clients
for itm in clients:
itm.write_message(body)
You could use Twisted, txAMQP and Autobahn|Python on the server to write a bridge in probably 50 lines of code, and Autobahn|JS on the browser side. Autobahn implements WebSocket, and WAMP on top, which provides you with Publish & Subscribe (as well as Remote Procedure Calls) over WebSocket.
When using raw WebSocket you would have to invent your own Publish & Subscribe over WebSocket - since I guess that is what you are after: extending the AMQP PubSub to the Web. Or you could check out STOMP.
Disclaimer: I am original author of WAMP and Autobahn.