kafka consumer to ruby on rails

kafka consumer to ruby on rails - python

I am currently working with a rails server which is supposed to run python script, which are kafka consumer/producer.
The server must to run the script then receive the processed data from consumer, and render them to the site.
I am able to run a script but can not fund a solution for the consumer to be connected. As the consumer is either running non stop accepting the messages, or running in while loop. I tried to run the consumer first from ruby, which starts the consumer, but never gets the consumer, as it is listening, but the other script could not be run.
So the flow of the message ideally should be something like this -> email from logged user to kafka producer -> MQ -> kafka consumer generates data writes to db -> producer query data from database -> MQ -> consumer accepts the data and renders them to the site.
The ideal scenario would be a one script lets call it manager that does all the work only accepts data and returns it. It also was not able to do that because, the one script also runs consumer and listens for producer, but it is never ran.
so here is my code:
from kafka import KafkaProducer
from faker import Faker
import json
import time
class producer1():
'''
fr_instance= Faker()
def get_new_user():
return {"email_address":fr_instance.email(),"first_name": fr_instance.first_name(),
"lastname": fr_instance.last_name(), "occupation":fr_instance.job()}
'''
def __init__(self):
self
def json_serializer(self, data):
return json.dumps(data).encode("utf-8")
def send(self,email):
print(email)
producer = KafkaProducer(bootstrap_servers='localhost:9092',
value_serializer = self.json_serializer)
registred_user = {"email": email}
future = producer.send("NewUserTopic", registred_user)
print (registred_user)
result = future.get(timeout=10)
p = producer1()
if __name__ == '__main__':
email = "testmail#aaaaaaaa.com"
p.send(email)
then 1st consumer:
from kafka import KafkaConsumer
import json
import random
from sqlalchemy.orm import sessionmaker
import dbservice
import time
class consumer1():
def __init__(self) -> None:
self
def email(self):
consumer = KafkaConsumer('NewUserTopic',
bootstrap_servers='localhost:9092',
auto_offset_reset = 'latest', enable_auto_commit= False)
for msg in consumer:
msg_out = json.loads(msg.value)
for value in msg_out.values():
#return print(msg_out)
return (value)
#generate dummy address , eth
def gen_ETHw (self):
numbers = str(random.randint(11111,99999))
wallet_num = str("Ox"+numbers)
return (wallet_num)
#generate dummy address , btc
def gen_BTCw (self):
numbers = str(random.randint(11111,99999))
wallet_num = str("Ox"+numbers)
return (wallet_num)
def commit_db (self, email, ETHw, BTCw):
Session = sessionmaker(bind=dbservice.engine)
s = Session()
input = dbservice.walletdb( email,ETHw, BTCw)
time.sleep(2)
s.add(input)
s.commit()
if __name__ =='__main__':
while True:
c = consumer1()
c.commit_db(c.email(),c.gen_ETHw(),c.gen_BTCw())
query producer:
import dbservice
import dbservice
from sqlalchemy.orm import sessionmaker
from kafka import KafkaProducer
import json
class query_prod ():
def __init__(self, email) -> None:
self = self
self.email = email
def json_serializer(data):
return json.dumps(data).encode("utf-8")
producer = KafkaProducer(bootstrap_servers='localhost:9092',
value_serializer = json_serializer)
Session = sessionmaker(bind=dbservice.engine)
s = Session()
def query_address(self,email):
Session = sessionmaker(bind=dbservice.engine)
s = Session()
for s in s.query(dbservice.walletdb).filter_by(email=email):
return {"email":s.email,"ETH_w":s.ETH_w,"BTC_w":s.BTC_w}
def send(self, email):
data_to_send = self.query_address(email)
future = self.producer.send("QueryAdressToServer", data_to_send)
print (data_to_send)
result = future.get(timeout=10)
if __name__ == '__main__':
email = "testmail#aaaaaaaa.com"
query_prod=query_prod(email)
query_prod.send(email)
and consume data which should be returned to the site:
from kafka import KafkaConsumer
import json
import time
class consume_for_web():
string=""
def __init__(self) -> None:
self = self
string = self.string
def consumer(self):
consumer = KafkaConsumer('QueryAdressToServer',
bootstrap_servers='localhost:9092',
auto_offset_reset = 'latest', enable_auto_commit= False)
print('starting consumer')
for msg in consumer:
data = (('{}'.format(json.loads(msg.value))))
self.string = self.string + data
return print(data)
def read_str(self):
return print(self.string)
if __name__ =='__main__':
while True:
c = consume_for_web()
c.consumer()
##print("reading")
#c.read_str()
and finally my rails pages controller:
class PagesController < ApplicationController
def home
end
def about
end
before_action :require_login
def genw
our_input = current_user.email
puts our_input
#consumer_result = `python3 /Users/samuelrybar/python_projects/Kafka_demo1/kafka-prod-coms/consumer2.py`
end
def mywa
end
def save
end
end
Thanks for your time and help, I really appreciate it. :))

Not sure why you are trying to run python scripts from a running Rails server. It sounds like a very bad idea to me. You can run Kafka consumers/producers from Ruby and Ruby on Rails directly. I suggest you investigate Karafka. We've been using it successfully at work.

Related

Export data with OPCUa subscription in IoT Edge

I am trying to receive data as an OPCUA client, therefore I created a subscription for some data nodes. Receiving data is no problem, but exporting it as an IoT Client is the problem. I think I have to run 2 threads paralel. The following code is used:
class Publisher():
def __init__(self):
self.module_client = IoTHubModuleClient.create_from_edge_environment()
async def export_data(self, message, output):
"""Serializes databuffer to json string"""
print('Export')
self.module_client.send_message_to_output(message, output)
print('Exported completed')
class Handler(object):
"""
Client to subscription. It will receive events from server
"""
def __init__(self):
""""""
self.publisher = Publisher()
def datachange_notification(self, node, val, data):
self.publisher.export_data('test', 'output1')
url = "opc.tcp://192.168.3.5:4840"
opcua_client = Client(url)
opcua_client.connect()
opcua_broker = Handler()
sub = opcua_client.create_subscription(20, opcua_broker)
node_list = [opcua_client.get_node('ns=3;s="EM126_Log"."DataLogL"[1]')]
handler = sub.subscribe_data_change(node_list)
time.sleep(100)
The error is as follows:
"RuntimeWarning: coroutine 'Publisher.export_data' was never awaited"
Does anyone know how to export the incoming data in de Handler class (datachange_notfication)

Sharing an aiohttp.ClientSession between multiple asynchronous callers

The goal
I want to make regular HTTP requests to a REST API. The requests happen at an interval ranging from a few seconds up to multiple hours, depending on the user input. I want to keep a single connection alive and close it in a smart way.
The questions
Is there an existing library that provides features similar to what i wrote with the class SessionOneToN?
Would it be possible to use a single session of type aiohttp.ClientSession that runs forever and is never closed?
What i have tried
What i have tried does work, but i wonder if there is an established library that can achieve the goals.
My code
My application uses the event loop module asyncio, and the HTTP module aiohttp.
My application opens a single session of type aiohttp.ClientSession and shares it between multiple asynchronous callers.
The callers can make their requests concurrently and asynchronously.
Whenever all callers have received their responses at the same time, the aiohttp.ClientSession is closed. A new aiohttp.ClientSession is opened as necessary when a caller makes a new request.
import aiohttp
import asyncio
from contextlib import asynccontextmanager
import sys
url = 'http://stackoverflow.com/questions/64305548/sharing-an-aiohttp-clientsession-between-multiple-asynchronous-callers'
# The callers look like so
async def request():
async with session() as s:
async with s.get(url) as response:
return await response.text()
# The session manager looks like so:
class SessionOneToN:
""" Manage one session that is reused by multiple clients, instantiate a new
session object from the given session_class as required. Provide a context
manager for accessing the session. The session_class returns a context
manager when instantiated, this context manager is referred to as the
internal context manager, and is entered when the first
client enters the context manager returned by context_manager, and is exited
when the last client exits context_manager."""
def __init__(self, session_class):
self.n = 0
self.session_class = session_class
self.context = None
self.aenter_result = None
async def plus(self):
self.n += 1
if self.n == 1:
# self.context: The internal context manager
self.context = context = self.session_class()
# self.aenter_result: The result from entering the internal context
# manager
self.aenter_result = await context.__aenter__()
assert self.aenter_result is not None
return self.aenter_result
def minus(self):
self.n -= 1
if self.n == 0:
return True
def cleanup(self):
self.context = None
self.aenter_result = None
#asynccontextmanager
async def get_context(self):
try:
aenter_result = await self.plus()
yield aenter_result
except:
if self.minus():
if not await self.context.__aexit__(*sys.exc_info()):
self.cleanup()
raise
self.cleanup()
else:
if self.minus():
await self.context.__aexit__(None, None, None)
self.cleanup()
class SessionOneToNaiohttp:
def __init__(self):
self.sotn = SessionOneToN(aiohttp.ClientSession)
def get_context(self):
return self.sotn.get_context()
sotnaiohttp = SessionOneToNaiohttp()
def session():
return sotnaiohttp.get_context()
response_text = asyncio.run(request())
print(response_text[0:10])

Threads can only be started once in Django Channels

I created a simple Django Channels consumer that should connects to an external source, retrieve data and send it to the client. So, the user opens the page > the consumer connects to the external service and gets the data > the data is sent to the websocket.
Here is my code:
import json
from channels.generic.websocket import WebsocketConsumer, AsyncConsumer, AsyncJsonWebsocketConsumer
from binance.client import Client
import json
from binance.websockets import BinanceSocketManager
import time
import asyncio
client = Client('', '')
trades = client.get_recent_trades(symbol='BNBBTC')
bm = BinanceSocketManager(client)
class EchoConsumer(AsyncJsonWebsocketConsumer):
async def connect(self):
await self.accept()
await self.send_json('test')
bm.start_trade_socket('BNBBTC', self.process_message)
bm.start()
def process_message(self, message):
JSON1 = json.dumps(message)
JSON2 = json.loads(JSON1)
#define variables
Rate = JSON2['p']
Quantity = JSON2['q']
Symbol = JSON2['s']
Order = JSON2['m']
asyncio.create_task(self.send_json(Rate))
print(Rate)
This code works when i open one page; if i try to open a new window with a new account, though, it will throw the following error:
File "C:\Users\User\Desktop\Heroku\github\master\myapp\consumers.py", line 54, in connect
bm.start()
File "C:\Users\User\lib\threading.py", line 843, in start
raise RuntimeError("threads can only be started once")
threads can only be started once
I'm new to Channels, so this is a noob question, but how can i fix this problem? What i wanted to do was: user opens the page and gets the data, another user opens the page and gets the data; is there no way to do that? Or am i simply misunderstanding how Django Channels and websockets works?

Do you really need a secondary thread ?
class EchoConsumer(AsyncJsonWebsocketConsumer):
symbol = ''
async def connect(self):
self.symbol = 'BNBBTC'
# or, more probably, retrieve the value for "symbol" from query_string
# so the client can specify which symbol he's interested into:
# socket = new WebSocket("ws://.../?symbol=BNBBTC");
await self.accept()
def process_message(self, message):
# PSEUDO-CODE BELOW !
if self.symbol == message['symbol']:
await self.send({
'type': 'websocket.send',
'text': json.dumps(message),
})
For extra flexibility, you might also accept al list of symbols from the client, instead:
//HTML
socket = new WebSocket("ws://.../?symbols=XXX,YYY,ZZZ");
then (in the consumer):
class EchoConsumer(AsyncJsonWebsocketConsumer):
symbols = []
async def connect(self):
# here we need to parse "?symbols=XXX,YYY,ZZZ" ...
# the code below has been stolen from another project of mine and should be suitably adapted
params = urllib.parse.parse_qs(self.scope.get('query_string', b'').decode('utf-8'))
try:
self.symbols = json.loads(params.get('symbols', ['[]'])[0])
except:
self.symbols = []
def process_message(self, message):
if message['symbol'] in self.symbols:
...

I'm no Django developer, but if I understand correctly, the function connect is being called more than once-- and bm.start references the same thread most likely made in bm.start_trade_socket (or somewhere else in connect). In conclusion, when bm.start is called, a thread is started, and when it is done again, you get that error.

Here start() Start the thread’s activity.
It should be called at most once per thread object. You have made a global object of BinanceSocketManager as "bm".
It will always raise a RuntimeError if called more than once on the same thread object.
Please refer the below mentioned code, it may help you
from channels.generic.websocket import WebsocketConsumer, AsyncConsumer, AsyncJsonWebsocketConsumer
from binance.client import Client
import json
from binance.websockets import BinanceSocketManager
import time
import asyncio
class EchoConsumer(AsyncJsonWebsocketConsumer):
client = Client('', '')
trades = client.get_recent_trades(symbol='BNBBTC')
bm = BinanceSocketManager(client)
async def connect(self):
await self.accept()
await self.send_json('test')
self.bm.start_trade_socket('BNBBTC', self.process_message)
self.bm.start()
def process_message(self, message):
JSON1 = json.dumps(message)
JSON2 = json.loads(JSON1)
#define variables
Rate = JSON2['p']
Quantity = JSON2['q']
Symbol = JSON2['s']
Order = JSON2['m']
asyncio.create_task(self.send_json(Rate))
print(Rate)

Tornado: How to get and return large data with less memory usage?

I have web-crawler and http interface for it.
Crawler gets grouped urls as dictionary. I need to return a result in the same format in JSON. But I was faced with a large memory usage, which is not returned to the operating system. How can I implement this solution without large memory usage?
Code:
#!/usr/bin/env python
# coding=utf-8
import collections
import tornado.web
import tornado.ioloop
import tornado.queues
import tornado.httpclient
class ResponseError(Exception):
pass
class Crawler(object):
client = tornado.httpclient.AsyncHTTPClient()
def __init__(self, groups, concurrency=10, retries=3, validators=None):
self.groups = groups
self.concurrency = concurrency
self.retries = retries
self.validators = validators or []
self.requests = tornado.queues.Queue()
self.responses = collections.defaultdict(list)
async def worker(self):
while True:
await self.consume()
async def validate(self, response):
for validator in self.validators:
validator(response)
async def save(self, response):
self.responses[response.request.group].append(response.body.decode('utf-8'))
async def consume(self):
async for request in self.requests:
try:
response = await self.client.fetch(request, raise_error=False)
await self.validate(response)
await self.save(response)
except ResponseError:
if request.retries < self.retries:
request.retries += 1
await self.requests.put(request)
finally:
self.requests.task_done()
async def produce(self):
for group, urls in self.groups.items():
for url in urls:
request = tornado.httpclient.HTTPRequest(url)
request.group = group
request.retries = 0
await self.requests.put(request)
async def fetch(self):
await self.produce()
for __ in range(self.concurrency):
tornado.ioloop.IOLoop.current().spawn_callback(self.worker)
await self.requests.join()
class MainHandler(tornado.web.RequestHandler):
async def get(self):
urls = []
with open('urls') as f: # mock
for line in f:
urls.append(line.strip())
crawler = Crawler({'default': urls})
await crawler.fetch()
self.write(crawler.responses)
if __name__ == '__main__':
app = tornado.web.Application(
(tornado.web.url(r'/', MainHandler),), debug=True
)
app.listen(8000)
tornado.ioloop.IOLoop.current().start()

It looks to me like most of the memory usage is devoted to self.responses. Since you seem to be ordering responses by "group" before writing them to a file, I can understand why you do it this way. One idea is to store them in a database (MySQL or MongoDB or whatever) with the "group" as column or field value in the database record.
The database might be the final destination of your data, or else it might be a temporary place to store the data until crawler.fetch completes. Then, query all the data from the database, ordered by "group", and write it to the file.
This doesn't solve the problem, it just means that the database process is responsible for most of your memory usage, instead of the Python process. This may be preferable for you, however.

How to set IMAP flags using Twisted

How do you delete messages using imap4.IMAP4Client? I cannot get the "deleted" tag correctly applied for using the "expunge" method.
I keep getting the following error:
Failure: twisted.mail.imap4.IMAP4Exception: Invalid system flag \
Sample code would be appreciated. This is what I have so far:
from twisted.internet import protocol, reactor
from twisted.mail import imap4
#Variables for connection
username = 'user#host.com'
password = 'mypassword'
host = 'imap.host.com'
port = 143
class IMAP4LocalClient(imap4.IMAP4Client):
def connectionMade(self):
self.login(username,password).addCallbacks(self._getMessages, self._ebLogin)
#reports any connection errors
def connectionLost(self,reason):
reactor.stop()
#drops the connection
def _ebLogin(self,result):
print result
self.transport.loseConnection()
def _programUtility(self,result):
print result
return self.logout()
def _cbExpungeMessage(self,result):
return self.expunge().addCallback(self._programUtility)
def _cbDeleteMessage(self,result):
return self.setFlags("1:5",flags=r"\\Deleted",uid=False).addCallback(self._cbExpungeMessage)
#gets the mailbox list
def _getMessages(self,result):
return self.list("","*").addCallback(self._cbPickMailbox)
#selects the inbox desired
def _cbPickMailbox(self,result):
mbox='INBOX.Trash'
return self.select(mbox).addCallback(self._cbExamineMbox)
def _cbExamineMbox(self,result):
return self.fetchMessage("1:*",uid=False).addCallback(self._cbDeleteMessage)
class IMAP4ClientFactory(protocol.ClientFactory):
def buildProtocol(self,addr):
return IMAP4LocalClient()
def clientConnectionFailed(self,connector,reason):
print reason
reactor.stop()
reactor.connectTCP(host,port,IMAP4ClientFactory())
reactor.run()
Changed to:
def _cbDeleteMessage(self,result):
return self.setFlags("1:5",flags=['\\Deleted'],uid=False).addCallback(self._cbExpungeMessage)
thanks to Jean-Paul Calderone and it worked, setFlags requires a list, not just a string.

I think there are two problems here.
First, you're passing a string as the flags parameter to setFlags. Notice the documentation for that parameter: The flags to set (type: Any iterable of str). Try a list containing one string, instead.
Second, \\Deleted is probably not a flag the server you're interacting with supports. The standard deleted flag in IMAP4 is \Deleted.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

kafka consumer to ruby on rails - python

Not sure why you are trying to run python scripts from a running Rails server. It sounds like a very bad idea to me. You can run Kafka consumers/producers from Ruby and Ruby on Rails directly. I suggest you investigate Karafka. We've been using it successfully at work.

Related

Export data with OPCUa subscription in IoT Edge

Sharing an aiohttp.ClientSession between multiple asynchronous callers

Threads can only be started once in Django Channels

Tornado: How to get and return large data with less memory usage?

How to set IMAP flags using Twisted

Categories

Resources