I don't understand the difference between websocket (websocket client) and websockets.
I would like to understand the difference to know which one uses to be optimal.
I have a code that seems to do the same thing.
import websockets
import asyncio
import json
async def capture_data():
uri = "wss://www.bitmex.com/realtime?subscribe=instrument:XBTUSD"
#uri = "wss://www.bitmex.com/realtime"
async with websockets.connect(uri) as websocket:
while True:
data = await websocket.recv()
data = json.loads(data)
try :
#print('\n', data['data'][0]["lastPrice"])
print('\n', data)
except KeyError:
continue
asyncio.get_event_loop().run_until_complete(capture_data())
import websocket
import json
def on_open(ws):
print("opened")
auth_data = {
'op': 'subscribe',
'args': ['instrument:XBTUSD']
}
ws.send(json.dumps(auth_data))
def on_message(ws, message):
message = json.loads(message)
print(message)
def on_close(ws):
print("closed connection")
socket = "wss://www.bitmex.com/realtime"
ws = websocket.WebSocketApp(socket,
on_open=on_open,
on_message=on_message,
on_close=on_close)
ws.run_forever()
EDIT:
Is Python powerful enough to receive a lot of data via websocket?
In times of high traffic, I could have thousands of messages per second and even tens of thousands for short periods of time.
I've seen a lot of things about Nodejs that might be better than python but I don't know js and even if I just copy code from the internet I'm not sure I wouldn't waste time if I have to send the data to my python script and then process it.
websocket-client implements only client-side. If you want to create a WebSocket Server you have to use websockets.
websocket-client implements WebSocketApp which is more JavaScript like implementation of websocket.
I prefer WebSocketApp run as thread:
socket = "wss://www.bitmex.com/realtime"
ws = websocket.WebSocketApp(socket,
on_open=on_open,
on_message=on_message,
on_close=on_close)
wst = threading.Thread(target=lambda: ws.run_forever())
wst.daemon = True
wst.start()
# your code continues here ...
Now you have asynchronous WebSocket. Your mainframe program is running and every time, when you receive the message, function on_message is called. on_message process the message automatically, without mainframe interruption.
websockets is a Python standard libary, currently suggested to use.
As far as I understand the main difference, from the client API programming, is that websocket-client support callbacks programming paradigm, in a way similar of javascript web client APIs.
Read the websockets documentation FAQs:
Are there onopen, onmessage, onerror, and onclose callbacks?
No, there aren’t.
websockets provides high-level, coroutine-based APIs. Compared to callbacks, coroutines make it easier to manage control flow in concurrent code.
If you prefer callback-based APIs, you should use another library.
With current Python 3 release the common / suggested way to manage asyncronous coroutines is via asyncio (used by websockets standard library approach).
So, answering your last question (Python vs nodejs), asyncio just allows to manage asyncronous I/O (networking-based in this case) events, as NodeJs does natively. broadley speaking, using asyncio you can achieve I/O websockets performances similar to that obtained using Nodejs.
Related
Lets say I have a Python gRPC server and a corresponding client.
According to this Question the same gRPC channel can be utilized to be passed to client stubs, each running in different threads.
Lets say RPC function foo() is called from thread T1 and the response takes about one second. Can I call foo() from thread T2 in the meantime, while T1 is still waiting for the response in thread or is the channel somehow locked until the first call returns? In other words: Is performance gained by using more threads, because the corresponding server is based on a thread pool and able to handle more requests "in parallel" and, if yes, should I use the same channel or different channels per thread?
EDIT: According to a quick test it seems that parallel requests using the same channel from different threads are possible and it gives sense, to do it in such way. However, because before closing the question, I would like to see a confirmation from an expert whether this code is correct:
import time
import threading
import grpc
import helloworld_pb2
import helloworld_pb2_grpc
def run_in_thread1(channel):
n = 10000
for i in range(n):
stub = helloworld_pb2_grpc.GreeterStub(channel)
response = stub.SayHello(helloworld_pb2.HelloRequest(name='1'))
print("client 1: " + response.message)
def run_in_thread2(channel):
n = 10000
for i in range(n):
stub = helloworld_pb2_grpc.GreeterStub(channel)
response = stub.SayHello2(helloworld_pb2.HelloRequest(name='2'))
print("client 2: " + response.message)
if __name__ == '__main__':
print("I'm client")
channel = grpc.insecure_channel('localhost:50051')
x1 = threading.Thread(target=run_in_thread1, args=(channel,))
x1.start()
x2 = threading.Thread(target=run_in_thread2, args=(channel,))
x2.start()
x1.join()
x2.join()
gRPC uses HTTP/2 and can multiplex many requests on one connection and gRPC client connections should be re-used for the lifetime of the client app.
If you are inspired by what is done when working with databases, I would say you don't need to worry about it as the opening connection overhead doesn't exist when working with gRPC.
I'd like to be able to trigger a long-running python script via a web request, in bare-bones fashion. Also, I'd like to be able to trigger other copies of the script with different parameters while initial copies are still running.
I've looked at flask, aiohttp, and queueing possibilities. Flask and aiohttp seem to have the least overhead to set up. I plan on executing the existing python script via subprocess.run (however, I did consider refactoring the script into libraries that could be used in the web response function).
With aiohttp, I'm trying something like:
ingestion_service.py:
from aiohttp import web
from pprint import pprint
routes = web.RouteTableDef()
#routes.get("/ingest_pipeline")
async def test_ingest_pipeline(request):
'''
Get the job_conf specified from the request and activate the script
'''
#subprocess.run the command with lookup of job conf file
response = web.Response(text=f"Received data ingestion request")
await response.prepare(request)
await response.write_eof()
#eventually this would be subprocess.run call
time.sleep(80)
return response
def init_func(argv):
app = web.Application()
app.add_routes(routes)
return app
But though the initial request returns immediately, subsequent requests block until the initial request is complete. I'm running a server via:
python -m aiohttp.web -H localhost -P 8080 ingestion_service:init_func
I know that multithreading and concurrency may provide better solutions than asyncio. In this case, I'm not looking for a robust solution, just something that will allow me to run multiple scripts at once via http request, ideally with minimal memory costs.
OK, there were a couple of issues with what I was doing. Namely, time.sleep() is blocking, so asyncio.sleep() should be used. However, since I'm interested in spawning a subprocess, I can use asyncio.subprocess to do that in a non-blocking fashion.
nb:
asyncio: run one function threaded with multiple requests from websocket clients
https://docs.python.org/3/library/asyncio-subprocess.html.
Using these help, but there's still an issue with the webhandler terminating the subprocess. Luckily, there's a solution here:
https://docs.aiohttp.org/en/stable/web_advanced.html
aiojobs has a decorator "atomic" that will protect the process until it is complete. So, code along these lines will function:
from aiojobs.aiohttp import setup, atomic
import asyncio
import os
from aiohttp import web
#atomic
async def ingest_pipeline(request):
#be careful what you pass through to shell, lest you
#give away the keys to the kingdom
shell_command = "[your command here]"
response_text = f"running {shell_command}"
response_code = 200
response = web.Response(text=response_text, status=response_code)
await response.prepare(request)
await response.write_eof()
ingestion_process = await asyncio.create_subprocess_shell(shell_command,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE)
stdout, stderr = await ingestion_process.communicate()
return response
def init_func(argv):
app = web.Application()
setup(app)
app.router.add_get('/ingest_pipeline', ingest_pipeline)
return app
This is very bare bones, but might help others looking for a quick skeleton for a temporary internal solution.
I have been using stomp.py and stompest for years now to communicate with activemq to great effect, but this has mostly been with standalone python Daemons.
I would like to use these two libraries from the webserver to communicate with the backend, but I am having trouble finding out how to do this without creating a new connection every request.
Is there a standard approach to safely handling TCP connections in the webserver? In other languages, some sort of global object at that level would be used for connection pooling.
HTTP is a synchronous protocol. Each waiting client consumes server resources (CPU, memory, file descriptors) while waiting for a response. This means that web server has to respond quickly. HTTP web server should not block on external long-running processes when responding to a request.
The solution is to process requests asynchronously. There are two major options:
Use polling.
POST pushes a new task to a message queue:
POST /api/generate_report
{
"report_id": 1337
}
GET checks the MQ (or a database) for a result:
GET /api/report?id=1337
{
"ready": false
}
GET /api/report?id=1337
{
"ready": true,
"report": "Lorem ipsum..."
}
Asynchronous tasks in Django ecosystem are usually implemented using Celery, but you can use any MQ directly.
Use WebSockets.
Helpful links:
What are Long-Polling, Websockets, Server-Sent Events (SSE) and Comet?
https://en.wikipedia.org/wiki/Push_technology
https://www.reddit.com/r/django/comments/4kcitl/help_in_design_for_long_running_requests/
https://realpython.com/asynchronous-tasks-with-django-and-celery/
https://blog.heroku.com/in_deep_with_django_channels_the_future_of_real_time_apps_in_django
Edit:
Here is a pseudocode example of how you can reuse a connection to a MQ:
projectName/appName/services.py:
import stomp
def create_connection():
conn = stomp.Connection([('localhost', 9998)])
conn.start()
conn.connect(wait=True)
return conn
print('This code will be executed only once per thread')
activemq = create_connection()
projectName/appName/views.py:
from django.http import HttpResponse
from .services import activemq
def index(request):
activemq.send(message='foo', destination='bar')
return HttpResponse('Success!')
I'm coming from node where handling asynchronous design is as simple as adding a callback and getting on with your life. I'm trying to write some apps in python where I'm not having the same success and I'm struggling to find what to search for since there doesn't seem to be a direct equivalent.
Here's an example where I'm running an MQTT messaging client and waiting for a state change signal from a sensor.
import paho.mqtt.client as mqtt
from ouimeaux.environment import Environment
from ouimeaux.signals import receiver, statechange
def on_connect(client, userdata, rc):
print('Connected with result code '+str(rc))
client.subscribe('lights/#')
def turn_lights_on(client, userdata, rc):
for (x, value) in enumerate(devices['switches']):
devices['switches'][x].on()
def turn_lights_off(client, userdata, rc):
for (x, value) in enumerate(devices['switches']):
devices['switches'][x].off()
def reply_with_devices(client, userdata, rc):
for (x, value) in enumerate(devices['switches']):
client.publish('devices/new', switches[x])
for (x, value) in enumerate(devices['motions']):
client.publish('devices/new', motions[x])
def on_switch(switch):
print "Switch found: ", switch.name
devices['switches'].append(switch)
def on_motion(motion):
print "Motion found: ", motion.name
devices['motions'].append(motion)
client = mqtt.Client("wemo_controller")
client.on_connect = on_connect
client.message_callback_add('lights/on', turn_lights_on)
client.message_callback_add('lights/off', turn_lights_off)
client.message_callback_add('devices/discover', reply_with_devices)
client.connect('localhost', 1883, 60)
print 'Running WEMO controller - listening for messages on localhost:1883'
devices = { 'switches': [], 'motions': [] }
env = Environment(on_switch, on_motion)
env.start()
env.discover(seconds=3)
switch = env.get_switch('Desk lights')
#receiver(statechange)
def motion(sender, **kwargs):
print 'A THING HAPPENED'
print "{} state is {state}".format(sender.name, state="on" if kwargs.get('state') else "off")
env.wait()
client.loop_forever()
Both libraries seem to have their own way of holding up the thread but it seems like I can only have one blocking and listening at a time. I have a feeling that threading might be the answer, but I'm struggling to work out how to implement this and not sure if it's right. I'm also confused as to what wait() and loop_forever() actually do.
The answer I'm looking for is the 'python' way to solve this problem.
You may want to look at the Twisted framework
"Twisted is an event-driven networking engine written in Python"
It is specifically designed for building asynchronous network applications.
In particular, read up on the reactor, and using Deffered() to register callbacks
Asynchronous programming have been integrated into python recently. So,
if you are using python 3.3, then python provides an inbuilt library Asyncio especially for this purpose (which was previously called 'Tulips').
If you are using python 2.7, then you can use Trollius which is backporting of Asyncio. If nothing suits you, than you can obviously use full-fledged network programming framework Twisted as suggested in other answers.
I'm the author of HBMQTT, a MQTT broker/client library which uses Python asyncio API.
The client API doesn't need any callback. You can use the client API to subscribe for some topic and then run a loop for reading and processing incoming messages. Something like:
import asyncio
from hbmqtt.client import MQTTClient
C = MQTTClient()
#asyncio.coroutine
def test_coro():
yield from C.connect(uri='mqtt://localhost/', username=None, password=None)
# Adapt QOS as needed
yield from C.subscribe([
{'filter': 'lights/on', 'qos': 0x01},
{'filter': 'lights/off', 'qos': 0x01},
{'filter': 'devices/discover', 'qos': 0x01},
])
while some_condition:
# Wait until next PUBLISH message arrives
message = yield from C.deliver_message()
if message.variable_header.topic_name == 'lights/on':
# Lights on
elif message.variable_header.topic_name == 'lights/off':
# Lights off
yield from C.acknowledge_delivery(message.variable_header.packet_id)
yield from C.disconnect()
if __name__ == '__main__':
loop=asyncio.get_event_loop()
loop.run_until_complete(test_coro())
HBMQTT is still under development. It requires Python 3.4.
for a django application I'm working on, I need to implement a two ways RPC so :
the clients can call RPC methods from the platform and
the platform can call RPC methods from each client.
As the clients will mostly be behind NATs (which means no public IPs, and unpredictable weird firewalling policies), the platform to client way has to be initiated by the client.
I have a pretty good idea on how I can write this from scratch, I also think I can work something out of the publisher/subscriber model of twisted, but I've learned that there is always a best way to do it in python.
So I'm wondering what would be the best way to do it, that would also integrate the best to django. The code will have to be able to scope with hundreds of clients in short term, and (we hope) with thousands of clients in medium/long term.
So what library/implementation would you advice me to use ?
I'm mostly looking to starting points for RTFM !
websocket is a moving target, with new specifications from time to time. Brave developpers implements server side library, but few implements client side. The client for web socket is a web browser.
websocket is not the only way for a server to talk to a client, event source is a simple and pragmatic way to push information to a client. It's just a never ending page. Twitter fire hose use this tricks before its specification. The client open a http connection and waits for event. The connection is kept open, and reopen if there is some troubles (connection cut, something like that).
No timeout, you can send many events in one connection.
The difference between websocket and eventsource is simple. Websocket is bidirectionnal and hard to implement. Eventsource is unidirectionnal and simple to implement, both client and server side.
You can use eventsource as a zombie controller. Each client connects and reconnect to the master and wait for instruction. When instruction is received, the zombie acts and if needed can talk to its master, with a classical http connection, targeting the django app.
Eventsource keep the connection open, so you need an async server, like tornado. Django need a sync server, so, you need both, with a dispatcher, like nginx. Django or a cron like action talks to the async server, wich talks to the right zombie. Zombie talks to django, so, the async server doesn't need any peristance, it's just a hub with plugged zombies.
Gevent is able to handle such http server but there is no decent doc and examples for this point. It's a shame. I want a car, you give me a screw.
You can also use Tornado + Tornadio + Socket.io. That's what we are using right now for notifications, and the amount of code that you should write is not that much.
from tornadio2 import SocketConnection, TornadioRouter, SocketServer
class RouterConnection(SocketConnection):
__endpoints__ = {'/chat': ChatConnection,
'/ping': PingConnection,
'/notification' : NotificationConnection
}
def on_open(self, info):
print 'Router', repr(info)
MyRouter = TornadioRouter(RouterConnection)
# Create socket application
application = web.Application(
MyRouter.apply_routes([(r"/", IndexHandler),
(r"/socket.io.js", SocketIOHandler)]),
flash_policy_port = 843,
flash_policy_file = op.join(ROOT, 'flashpolicy.xml'),
socket_io_port = 3001,
template_path=os.path.join(os.path.dirname(__file__), "templates/notification")
)
class PingConnection(SocketConnection):
def on_open(self, info):
print 'Ping', repr(info)
def on_message(self, message):
now = dt.utcnow()
message['server'] = [now.hour, now.minute, now.second, now.microsecond / 1000]
self.send(message)
class ChatConnection(SocketConnection):
participants = set()
unique_id = 0
#classmethod
def get_username(cls):
cls.unique_id += 1
return 'User%d' % cls.unique_id
def on_open(self, info):
print 'Chat', repr(info)
# Give user unique ID
self.user_name = self.get_username()
self.participants.add(self)
def on_message(self, message):
pass
def on_close(self):
self.participants.remove(self)
def broadcast(self, msg):
for p in self.participants:
p.send(msg)
here is a really simple solution I could came up with :
import tornado.ioloop
import tornado.web
import time
class MainHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self):
self.set_header("Content-Type", "text/event-stream")
self.set_header("Cache-Control", "no-cache")
self.write("Hello, world")
self.flush()
for i in range(0, 5):
msg = "%d<br>" % i
self.write("%s\r\n" % msg) # content
self.flush()
time.sleep(5)
application = tornado.web.Application([
(r"/", MainHandler),
])
if __name__ == "__main__":
application.listen(8888)
tornado.ioloop.IOLoop.instance().start()
and
curl http://localhost:8888
gives output when it comes !
Now, I'll just have to implement the full event-source spec and some kind of data serialization between the server and the clients, but that's trivial. I'll post an URL to the lib I'll write here when it'll be done.
I've recently played with Django, Server-Sent Events and WebSocket, and I've wrote an article about it at http://curella.org/blog/2012/jul/17/django-push-using-server-sent-events-and-websocket/
Of course, this comes with the usual caveats that Django probably isn't the best fit for evented stuff, and both protocols are still drafts.