Using websocket for python web crawler -- rsv is not implemented, yet - python

I use websocket to make a long-live connection with the target wss-url successfully. But after receiving one message, the code caught an error named "rsv is not implemented, yet" and closed the connection.
It seems that few people have met this problem, which described as "rsv is not implemented, yet". And the API doc of websocket never mention this issue.
The main piece of my code:
def on_message(ws, message):
print(message)
def on_error(ws, error):
print("!!!find error!!!")
print(error)
def on_close(ws):
print("### why closed ???###")
websocket.enableTrace(True)
ws = websocket.WebSocketApp(url,
on_message = on_message,
on_error = on_error,
on_close = on_close,
header = header,
cookie = cookie,
)
ws.run_forever(origin = 'https://matters.news', skip_utf8_validation = True)
It will give me only one message, and then show that:
!!!find error!!!
rsv is not implemented, yet
send: b'\x88\x82\xd9\xe2\xcc\x8c\xda\n'
### why closed ???###

I received the same error and fixed it by removing:
'Sec-WebSocket-Extensions': 'permessage-deflate'
from my headers.

Related

How to refuse a websocket handshake in a tornado websocket handler

So I'm doing a unittest of a tornado server I programmed which has a websocketHandler. This is my testing code:
def test_unauthorized_websocket(self):
message = "Hey bro"
ws = websocket.create_connection('ws://localhost:8888/ws', header=['Authorization: false'])
send_dict = {'command': test_command, 'message': message}
serialized_dict = json.dumps(send_dict)
ws.send(serialized_dict)
response = ws.recv()
#test response with assert
My goal with this test was to prove that my tornado server correctly refuses and closes this websocket connection because of wrong authentication header.
This is my tornado websocketHandler code:
class WebSocketHandler(tornado.websocket.WebSocketHandler):
def open(self):
#some code
headers = self.request.headers
try:
auth = headers['Authorization']
except KeyError:
self.close(code = 1002, reason = "Unauthorized websocket")
print("IT'S CLOSED")
return
if auth == "true":
print("Authorized Websocket!")
#some code
else:
print("Unauthorized Websocket... :-(")
self.close(code = 1002, reason = "Unauthorized websocket")
So when the authentication is wrong, self.close() is called (not sure I need code and reason). This should close the websocket. But this doesn't actually happens in the "client" side. After I call create_connection() in the "client" the ws.connected variable is still True, and when I do ws.send() the on_message method of the websocketHandler is still called and when it tries to make a response with self.write_message() it raises a WebSocketClosedError. And only then the "client" actually closes its side of the websocket, ws.recv() returns nothing and ws.connected turns to False after that.
Is there a way I can communicate to the client side (through handshake headers or something) that the websocket is meant to be closed earlier on its side?
You can override prepare() and raise a tornado.web.HTTPError, instead of overriding open() and calling self.close(). This will reject the connection at the first opportunity and this will be reported on the client side as a part of create_connection() instead of on a later read.

Python websocket automatically closes with basic auth

I am attempting to setup a websocket using the websocket-client library using python 3.7. The documentation from the API provider states that basic auth is required.
Below is the code I am using to try subscribing to their test channel. The test channel should send responses back nonstop until we close the socket.
email = b'myemail#domain.com'
pw = b'mypassword'
_str = (base64.b64encode(email + b':' + pw)).decode('ascii')
headerValue = 'Authorization: Basic {0}'.format(_str)
def on_message(ws, msg):
global msg_received
print("Message received: ", msg)
msg_received += 1
if msg_received > 10:
ws.send(json.dumps({"unsubscribe": "/test"}))
ws.close()
def on_error(ws, error):
print("ERROR: ", error)
def on_close(ws):
print("Closing websocket...")
def on_open(ws):
print("Opening...")
ws.send(json.dumps({'subscribe': '/test'}))
time.sleep(1)
if __name__ == '__main__':
websocket.enableTrace(True)
ws = websocket.WebSocketApp("wss://api-ws.myurl.com/",
header=[headerValue],
on_message=on_message,
on_error=on_error,
on_close=on_close))
ws.on_open = on_open
ws.run_forever()
When I run this, I am able to see my request headers, and their response headers which show that the connection was upgraded to a websocket and I am assigned a Sec-Websocket-Accept. Then, the websocket immediately closes without any responses coming through.
I have tried first sending a post request to the login api and generating a sessionID and csrftoken, and then passing those as cookies in the websocketapp object. It didn't work. I have tried passing the headers as an actual dict but that doesn't work either. I've tried way too many variations of the b64 encoding and none of them work.
Any advice would be appreciated.

How to create a long lived websocket connection in python?

I'm trying to use websockets in python, and i previously asked a question about this. I quickly realized, though, that the way i was connecting to the server was meant for "one off" messages, while what i want to do needs to listen for notifications constantly.
In the documentation for the python websocket client i can see the following code:
import websocket
import thread
import time
def on_message(ws, message):
print message
def on_error(ws, error):
print error
def on_close(ws):
print "### closed ###"
def on_open(ws):
def run(*args):
for i in range(3):
time.sleep(1)
ws.send("Hello %d" % i)
time.sleep(1)
ws.close()
print "thread terminating..."
thread.start_new_thread(run, ())
if __name__ == "__main__":
websocket.enableTrace(True)
ws = websocket.WebSocketApp("ws://echo.websocket.org/",
on_message = on_message,
on_error = on_error,
on_close = on_close)
ws.on_open = on_open
ws.run_forever()
The thing is, i'm still quite new to python, and i don't completely understand this code. Since it's example code, and does things that i don't need it to, i would like to understand how it works. I, for example, don't understand what the for loop is needed for, or what the __name__ and the __main__ at the bottom are.
Is there a better way?
Thanks,
Sas :)
The for loop is probably just an example since it will just print Hello 0, Hello 1 and Hello 2.
__name__ == "__main__" is true when the Python interpreter is running a module as the main program, which you can read more about here. Whenever that happens, it assigns what functions should be used on message, error and when the socket closes. And when that's done it runs the WebSocket forever.
So, to create your own long-lived WebSocket you can copy this example code and change the on_message, on_error, on_close and on_open functions to do what you want them to do whenever these events occour. on_message activates whenever a message is sent, on_error when an error occours, on_close when the WebSocket closes and on_open when the WebSocket opens.

Python client keeps websocket open and reacts to received messages

I'm trying to implement a REST client in python that reacts to messages received from the server received through an opened websocket with the concerned server.
Here is the scenario:
client opens a websocket with the server
from time to time, the server sends a message to the client
when the client receives the messages, it gets some information from
the server
The current client I have is able to open the websocket and to receive the message from the server. However, as soon as it receives the messages, it gets the information from the server then terminates while I'd like to keep it listening for other messages that will make it get a new content from the server.
Here is the piece of code I have:
def openWs(serverIp, serverPort):
##ws url setting
wsUrl = "ws://"+serverIp+":"+serverPort+"/websocket"
##open ws
ws = create_connection(wsUrl)
##send user id
print "Sending User ID..."
ws.send("user_1")
print "Sent"
##receiving data on ws
print "Receiving..."
result = ws.recv()
##getting new content
getUrl = "http://"+serverIp+":"+serverPort+"/"+result+"/entries"
getRest(getUrl)
I don't know if using threads is appropriate or not, I'm not expert in that.
If someone could help, it'll be great.
Thanks in advance.
I finished with this code, doing what I'm expecting. Git it from here
import websocket
import thread
import time
def on_message(ws, message):
print message
def on_error(ws, error):
print error
def on_close(ws):
print "### closed ###"
def on_open(ws):
def run(*args):
for i in range(3):
time.sleep(1)
ws.send("Hello %d" % i)
time.sleep(1)
ws.close()
print "thread terminating..."
thread.start_new_thread(run, ())
if __name__ == "__main__":
websocket.enableTrace(True)
ws = websocket.WebSocketApp("ws://localhost:5000/chat",
on_message = on_message,
on_error = on_error,
on_close = on_close)
ws.on_open = on_open
ws.run_forever()

How to add more headers in websocket python client

I'm trying to send session id (I got it after authentication against http server) over a websocket connection (I'm using python websocket client), I need to pass it as a header parameter, where the server will read all the headers and get them checked.
The questions is: how can I add headers to using one of the existing client python Websocket implementations, I find none of them can do that, or am I following the wrong approach in the first place for authentication?
-- Update --, Below a template of the code I use:
def on_message(ws, message):
print 'message received ..'
print message
def on_error(ws, error):
print 'error happened .. '
print error
def on_close(ws):
print "### closed ###"
def on_open(ws):
print 'Opening Websocket connection to the server ... '
## This session_key I got, need to be passed over websocket header isntad of ws.send.
ws.send(session_key)
if __name__ == "__main__":
websocket.enableTrace(True)
ws = websocket.WebSocketApp("ws://localhost:9999/track",
on_open = on_open,
on_message = on_message,
on_error = on_error,
on_close = on_close,
)
ws.on_open = on_open
ws.run_forever()
It seems that websocket-client was updated to include websocket headers since this question was asked. Now you can simply pass a list of header parameters as strings:
custom_protocol = "your_protocol_here"
protocol_str = "Sec-WebSocket-Protocol: " + custom_protocol
ws = websocket.WebSocketApp("ws://localhost:9999/track",
on_open = on_open,
on_message = on_message,
on_error = on_error,
on_close = on_close,
header = [protocol_str]
)
If you are interested in the complete list of valid headers, see the websocket RFC6455 document: https://www.rfc-editor.org/rfc/rfc6455#section-4.3
GitHub Source: https://github.com/liris/websocket-client/blob/master/websocket.py
Nothing is more amusing than reading the source code :))
I monkey patched the source code of the Websocket client library to make it able to receive a header as a normal parameter in the initializer, like this:
ws = websocket.WebSocketApp("ws://localhost:9999/track",
on_open = on_open,
on_message = on_message,
on_error = on_error,
on_close = on_close,
header = {'head1:value1','head2:value2'}
)
This can be done by editing 3 lines in the websocket.py source code of the library:
1- Add header parameter:
## Line 877
class WebSocketApp(object):
"""
Higher level of APIs are provided.
The interface is like JavaScript WebSocket object.
"""
def __init__(self, url,
on_open = None, on_message = None, on_error = None,
on_close = None, keep_running = True, get_mask_key = None, header = None):
self.url = url
self.on_open = on_open
self.on_message = on_message
self.on_error = on_error
self.on_close = on_close
self.keep_running = keep_running
self.get_mask_key = get_mask_key
self.sock = None
self.header = header
2- Then, pass the self.header to websocket connect method as a header parameter, like this:
## Line 732
self.sock.connect(self.url, header = self.header)
Actually I tried to import the WebSocketApp class, but it didn't work, as the whole websocket.py module is interdependent, that I found myself importing a lot of things to make it work, monkey patching is easier and more solid in this case.
That's all, enjoy using your patched library with all the headers you need.

Categories