Dask: can't submit tasks to global client from separate multiprocessing.Process - python

I have 2 processes: the first one on which I create the global distributed client; the second process is a web scraper, that should get the global client and submit tasks to it and when everything is done, it sends a message to another process to tell it that he can proceed.
from dask.distributed import Client, as_completed
from multiprocessing import Process
from time import sleep
import zmq
def get(url) -> dict:
# downloads data from url
time.sleep(3)
return data
def save(data) -> None:
# saves data locally
time.sleep(3)
return None
def scraper(urls):
# global client
client = get_client()
# zeromq socket
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind('tcp://*:port')
while True:
for future, result in as_completed([client.submit(get, url=url) for url in urls], with_results=True):
save(data=result)
socket.send_string('All job is done for this minute, proceed.')
sleep(60)
if __name__ == '__main__':
client = Client()
s = Process(target=scraper, *args, **kwargs)
s.start()
The problem is that from the scraper function I can get the global client (I see it correctly if I print it), but I can't submit to it any kind of task. The console doesn't print any error, it's just stuck without doing nothing. I think that the cause is that the scraper function is running on a saparate multiprocessing.Process.
Any solution or workaround? Thank you.

The dask client holds open connections to the scheduler. Depending on how your systems creates new processes, you may get copies of the connections which point to nothing useful in the new process, or fail to transfer the client completely (it is not pickleable).
Instead, you should send the connection information to the child process
addr = c.scheduler_info()['address']
and in the target function do
client = Client(addr)

Related

Multiprocessing IPC with MQTT in python

Trying to send and receive data between the processes using MQTT, I am able to achieve a call from one of the process using BaseManager class to main mqtt class which publishes data on MQTT which has been sent from created Process.
The code looks like below
class MqttClient:
def __init__(self):
self.pub = Mqtt.Client()
self.pub.connect("localhost", 1883, 60)
self.pub.loop_start()
def publishM(self, data): # method to be called from processes to publish data
self.pub.publish("Test", data)
class subProcess:
def __init__(self, mqttObj):
self.MqttObject = mqttObj
self.data = "Hello World"
self.MqttObject.publishM(self.data) # calling method in MqttClient using instance of it
if __name__ == "__main__":
BaseManager.register('MqttClient', MqttClient) # registering MqttClient class to baseManager
manager = BaseManager()
manager.start()
mqttInstance = manager.MqttClient() # Instance of MqttClient
p = Process(target=subProcess, args=(mqttInstance,)) # instance of MqttClient to process
p.start()
p.join()
The above code works well when a created process needs to send data over Mqtt. But I also need to send data to the process back using Mqtt. Meaning process should be subscribed to a topic and should have onmessage method to receive data from Mqtt. I know I can create one mqtt client inside process itself to begin with but the number of processes increases number of mqtt client also will increase. I would like to do with single mqtt client.
How to achieve this ? Thanks.

python zerorpc and multiprocessing issue

I'm implementing a bi-directional ping-pong demo app between an electron app and a python backend.
This is the code for the python part which causes the problems:
import sys
import zerorpc
import time
from multiprocessing import Process
def ping_response():
print("Sleeping")
time.sleep(5)
c = zerorpc.Client()
c.connect("tcp://127.0.0.1:4243")
print("sending pong")
c.pong()
class Api(object):
def echo(self, text):
"""echo any text"""
return text
def ping(self):
p = Process(target=ping_response, args=())
p.start()
print("got ping")
return
def parse_port():
port = 4242
try:
port = int(sys.argv[1])
except Exception as e:
pass
return '{}'.format(port)
def main():
addr = 'tcp://127.0.0.1:' + parse_port()
s = zerorpc.Server(Api())
s.bind(addr)
print('start running on {}'.format(addr))
s.run()
if __name__ == '__main__':
main()
Each time ping() is called from javascript side it will start a new process that simulates some work (sleeping for 5 seconds) and replies by calling pong on nodejs server to indicate work is done.
The issue is that the pong() request never gets to javascript side. If instead of spawning a new process I create a new thread using _thread and execute the same code in ping_response(), the pong request arrives in the javascript side. Also if I manually run the bash command zerorpc tcp://localhost:4243 pong I can see that the pong request is received by the nodejs script so the server on the javascript side works ok.
What happens with zerorpc client when I create a new process and it doesn't manage to send the request ?
Thank you.
EDIT
It seems it gets stuck in c.pong()
Try using gipc.start_process() from the gipc module (via pip) instead of multiprocessing.Process(). It creates a new gevent context which otherwise multiprocessing will accidentally inherit.

Grpc python client server streaming not working as expected

a simple grpc server client, client send a int and server streams int's back.
client is reading the messages one by one but server is running the generator function immediately for all responses.
server code:
import test_pb2_grpc as pb_grpc
import test_pb2 as pb2
import time
import grpc
from concurrent import futures
class test_servcie(pb_grpc.TestServicer):
def Produce(self, request, context):
for i in range(request.val):
print("request came")
rs = pb2.Rs()
rs.st = i + 1
yield rs
def serve():
server =
grpc.server(futures.ThreadPoolExecutor(max_workers=10))
pb_grpc.add_TestServicer_to_server(test_servcie(), server)
server.add_insecure_port('[::]:50051')
print("service started")
server.start()
try:
while True:
time.sleep(3600)
except KeyboardInterrupt:
server.stop(0)
if __name__ == '__main__':
serve()
client code:
import grpc
import test_pb2_grpc as pb_grpc
import test_pb2 as pb
def test():
channel = grpc.insecure_channel(
'{host}:{port}'.format(host="localhost", port=50051))
stub = pb_grpc.TestStub(channel=channel)
req = pb.Rq()
req.val = 20
for s in stub.Produce(req):
print(s.st)
import time
time.sleep(10)
test()
proto file:
syntax = "proto3";
service Test {
rpc Produce (Rq) returns (stream Rs);
}
message Rq{
int32 val = 1;
}
message Rs{
int32 st = 1;
}
after starting the server
when i run the client, server side generator started running and completed immediately it looped for the range.
what i expected is it will one by one as client calls but that is not the case.
is this an expected behaviour. my client is still printing the values but the sever is already completed the function.
Yes, this behavior is expected. gRPC features flow control between the two sides of an RPC (so that generating messages too fast on one side won't exhaust memory on the other side) but there's also an allowance for a small amount of buffering (so that a reasonably small amount of data may be sent by one side before the other side explicitly asks for it). In your case the twenty messages sent from server to client all fit within this small allowance. The service-side gRPC Python runtime is calling your service-side Produce method, consuming its entire output of twenty messages, and sending all those messages across the network to your client, where they are locally held by the invocation-side gRPC Python runtime until your invocation-side test function asks for them.
If you want to see the effects of flow control in action, try using huge messages (one megabyte in size or so) or altering the size of the allowance (I think this is done with a channel argument but those are an advanced and relatively-unsupported feature so this is left as an exercise).

How to stop a websocket client without stopping reactor

I have an app similar to a chat-room writing in python that intends to do the following things:
A prompt for user to input websocket server address.
Then create a websocket client that connects to server and send/receive messages. Disable the ability to create a websocket client.
After receiving "close" from server (NOT a close frame), client should drop connecting and re-enable the app to create a client. Go back to 1.
If user exits the app, it exit the websocket client if there is one running.
My approach for this is using a main thread to deal with user input. When user hits enter, a thread is created for WebSocketClient using AutoBahn's twisted module and pass a Queue to it. Check if the reactor is running or not and start it if it's not.
Overwrite on message method to put a closing flag into the Queue when getting "close". The main thread will be busy checking the Queue until receiving the flag and go back to start. The code looks like following.
Main thread.
def main_thread():
while True:
text = raw_input("Input server url or exit")
if text == "exit":
if myreactor:
myreactor.stop()
break
msgq = Queue.Queue()
threading.Thread(target=wsthread, args=(text, msgq)).start()
is_close = False
while True:
if msgq.empty() is False:
msg = msgq.get()
if msg == "close":
is_close = True
else:
print msg
if is_close:
break
print 'Websocket client closed!'
Factory and Protocol.
class MyProtocol(WebSocketClientProtocol):
def onMessage(self, payload, isBinary):
msg = payload.decode('utf-8')
self.Factory.q.put(msg)
if msg == 'close':
self.dropConnection(abort=True)
class WebSocketClientFactoryWithQ(WebSocketClientFactory):
def __init__(self, *args, **kwargs):
self.queue = kwargs.pop('queue', None)
WebSocketClientFactory.__init__(self, *args, **kwargs)
Client thread.
def wsthread(url, q):
factory = WebSocketClientFactoryWithQ(url=url, queue=q)
factory.protocol = MyProtocol
connectWS(Factory)
if myreactor is None:
myreactor = reactor
reactor.run()
print 'Done'
Now I got a problem. It seems that my client thread never stops. Even if I receive "close", it seems still running and every time I try to recreate a new client, it creates a new thread. I understand the first thread won't stop since reactor.run() will run forever, but from the 2nd thread and on, it should be non-blocking since I'm not starting it anymore. How can I change that?
EDIT:
I end up solving it with
Adding stopFactory() after disconnect.
Make protocol functions with reactor.callFromThread().
Start the reactor in the first thread and put clients in other threads and use reactor.callInThread() to create them.
Your main_thread creates new threads running wsthread. wsthread uses Twisted APIs. The first wsthread becomes the reactor thread. All subsequent threads are different and it is undefined what happens if you use a Twisted API from them.
You should almost certainly remove the use of threads from your application. For dealing with console input in a Twisted-based application, take a look at twisted.conch.stdio (not the best documented part of Twisted, alas, but just what you want).

Pass data asynchronously to a Python Server class

I have to pass a data from my test cases to a mock server.
What is the best way to do that ?
This is what I have so far
mock_server.py
class ThreadedUDPServer(SocketServer.ThreadingMixIn, SocketServer.UDPServer):
pass
class ThreadedUDPRequestHandler(SocketServer.BaseRequestHandler):
def __init__(self, request, client_address, server):
SocketServer.BaseRequestHandler.__init__(self,request,client_address,server)
def handle(self):
print server.data #this is where i need the data
class server_wrap:
def __init__(self):
self.server = ThreadedUDPServer( ("127.0.0.1",49555) , ThreadedUDPRequestHandler)
def set_data(self,data)
self.server.data = data
def start(self)
server_thread = threading.Thread(target=self.server.serve_forever())
def stop(self)
self.server.shutdown()
test_mock.py
server_inst = server_wrap()
server_inst.start()
#code which sets the data and expects the handle method to print the data set
server_inst.stop()
The problem which i have with this code is, the execution stops at server_inst.start(), where the server goes in to an infinite listening mode
Other Solutions that I have tried, but failed:
Using global variables
Using queues
starting mock_server.py
with its own main
Let me know about any other possible solutions. Thanks in advance
Update 1:
Using separate threads to send data to the socket:
Changes
test_mock.py
def test_set_data(data)
server_inst = server_wrap()
server_inst.set_data(data)
server_inst.start()
if __name__ == "__main__":
thread = Thread(target=test_set_data, args=("foo_data))
thread.setDaemon(True)
thread.start()
#test code which verifies if data set is same
#works so far, able to pass data
#problem starts now
thread = Thread(target=test_set_data, args=("bar_data))
thread.setDaemon(True)
thread.start()
#says address already in use error
#Tried calling server.shuddown() in handle , but error persists. Also there is no thread.shop in threading.Thread object
Thanks
The server should go to listening mode.
You don't need the server_inst.stop until all the data was sent, and the test finishes. Maybe in you test tear down, or when the the test suite is completed.
To send data to the server, and let the handle pick it, you should open a socket on anohter thread. Then send the data to the server via this socket.
This code should look something like this:
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("127.0.0.1",49555))
sock.send(... the data ...)
received = sock.recv(1024) # the handle can send a response
sock.close()
Add a function in your django code, which does run on another thread. This function will open the socket, connect, send the data and get the response. You can call it from a view, a middleware etc.

Categories