I
setting up a Websocket that receives market data from 33 pairs, process the data and insert it into a local mysql database.
what I've tried so far :
Setting up the websocket works fine, then process the data on each new message function and insert it directly into the database
--> problem was that with 33 pairs the websocket was stacking up the buffer with market data, and after a few minutes I would get a delay in the database of at least 10 seconds
Then I tried processing the data through a thread : the on_message function would execute a thread that is simply putting the market data into an array, like below
datas=[]
def add_queue(symbol,t,a,b,r_n):
global datas
datas.append([symbol,t,a,b,r_n])
if json_msg['ev']=="C":
symbol=json_msg['p'].replace("/","-")
round_number=pairs_dict_new[symbol]
t = Thread(target=add_queue, args=(symbol,json_msg['t'],json_msg['a'],json_msg['b'],round_number,))
t.start()
and then another function, with a loop thread would pick it up to insert it into the database
def add_db():
global datas
try:
# db = mysql.connector.connect(
# host="104.168.157.164",
# user="bvnwurux_noe_dev",
# password="Tickprofile333",
# database="bvnwurux_tick_values"
# )
while True:
for x in datas:
database.add_db(x[0],x[1],x[2],x[3],x[4])
if x in datas:
datas.remove(x)
except KeyboardInterrupt:
print("program ending..")
t2 = Thread(target=add_db)
t2.start()
still giving a delay, and the threaded process wasn't actually using a lot of CPU but more of RAM and it just was even worse.
instead of using a websocket with a thread, I tried simple webrequests to the API call, so with 1 thread per symbol, it would loop through a webrequest and in everythread send it to the database. my issues here were that mysql connections don't like threads (sometimes they would make a request with the same connection at the same time and crash) or it would still be delayed by the time to process the code, even without buffer. the code was taking too long to process the answered request that it couldnt keep it under 10s of delay.
Here is a little example of the basic code I used to get the data.
pairs={'AUDCAD':5,'AUDCHF':5,'AUDJPY':3,'AUDNZD':5,'AUDSGD':2,'AUDUSD':5,'CADCHF':5,'CADJPY':3,'CHFJPY':3,'EURAUD':5,'EURCAD':5,'EURCHF':5,'EURGBP':5,'EURJPY':3,'EURNZD':5,'EURSGD':5,'EURUSD':5,'GBPAUD':5,'GBPCAD':5,'GBPCHF':5,'GBPJPY':3,'GBPNZD':5,'GBPSGD':5,'GBPUSD':5,'NZDCAD':5,'NZDCHF':5,'NZDJPY':3,'NZDUSD':5,'USDCAD':5,'USDCHF':5,'USDJPY':3,'USDSGD':5,'SGDJPY':3}
def on_open(ws):
print("Opened connection")
ws.send('{"action":"auth","params":"<API KEY>"}') #connecting with secret api key
def on_message(ws, message):
print("msg",message)
json_msg = json.loads(message)[0]
if json_msg['status'] == "auth_success": # successfully authenticated
r = ws.send('{"action":"subscribe","params":"C.*"}') # subscribing to currencies
print("should subscribe to " + pairs)
#once the websocket is connected to all the pairs, process the data
--> process json_msg
if __name__ == "__main__":
# websocket.enableTrace(True) # just to show all the requests made (debug mode)
ws = websocket.WebSocketApp("wss://socket.polygon.io/forex",
on_open=on_open,
on_message=on_message)
ws.run_forever(dispatcher=rel) # Set dispatcher to automatic reconnection
rel.signal(2, rel.abort) # Keyboard Interrupt
rel.dispatch()
method I tried multiprocess, but this was on the other crashing my server because it would use 100% CPU, and then the requests made on the apache server would not reach or take a long time loading. Its really a balance problem
I'm using an ubuntu server with 32CPUS, based in london and the API polygon is based in NYC.
I also tried with 4 CPUS in seattle to NYC, but still no luck.
Even with 4 pairs and 32CPUS , it would eventually reach 10s delay. I think this is more of a code structure problem.
Related
I am building a dashboard for live tickers
I am using zerodha Kiteconnect to fetch SE Options & Futures live data for that ... The client for whom I am building the dashboard has provided me the api key, and also provides me the access token daily .... The situation is he uses the same credentials in his own program (that he runs separately in his own laptop) to fetch live ticker data
When I use KiteConnect to fetch instrument dump it works, below code executes successfully,
from kiteconnect import KiteTicker, KiteConnect
access_token = '*********' # changes every day
api_key = '*********'
kite = KiteConnect(api_key=key_secret, access_token = access_token)
instrument_list = kite.instruments(exchange=kite.EXCHANGE_NFO)
but when I use KiteTicker (WebSocket Streaming) with same credentials as shown in code below it produces 1006 Connection error:
kws = KiteTicker(api_key, access_token=kite.access_token)
####### define the callbacks ############
def on_connect(ws, response):
# Callback on successful connect.
# Subscribe to a list of instrument_tokens (RELIANCE and ACC here).
ws.subscribe(instrument_tokens)
# Set tick in `full` mode.
ws.set_mode(ws.MODE_FULL, instrument_tokens)
def on_ticks(ws, ticks):
# Callback to receive ticks.
# logging.debug("Ticks: {}".format(ticks))
print(ticks)
ticks_list.append(ticks)
# close the connection after some time
if (time.time() - begin_time) > 120: # run for 2 minutes
write_json(ticks_list)
print("close called")
ws.close()
def on_close(ws, code, reason):
# On connection close stop the event loop.
# Reconnection will not happen after executing `ws.stop()`
print("-------- Stopping ------------")
ws.stop()
print("--------- Stopped ----------")
### define
ticks_list = [] # will hold list of JSON objects
##### Assign the callbacks.
kws.on_ticks = on_ticks
kws.on_connect = on_connect
kws.on_close = on_close
begin_time = time.time()
# Infinite loop on the main thread. Nothing after this will run.
# You have to use the pre-defined callbacks to manage subscriptions.
kws.connect()
The exact error produced is:
Connection error: 1006 - connection was closed uncleanly (I dropped the WebSocket TCP connection: close reason without close code)
Can you please guide me as to why is this happening? Also is it possible to use the same credentials in parallel by two different programs from different IP to fetch live tick data...
Thanks
I was getting the same error while using KiteConnect WebSocket, So the issue in the tokens list - so first you have to change token list values type in Int.
Are you placing an order inside on_tick method?
you shouldn't be putting any logic inside on_tick thread.You need to pass on the tick on another method asynchronically without blocking on_tick thread.
There are two ways to pass tick data from on_tick thread to perform any operation without blocking the main on_ticks thread.
1> You can push tick data to Queue(use celery,rq,etc) as task queue manager and have another method that reads these Queue data and perform the tasks.
eg:
def on_ticks(ws,ticks):
#Pass the tick data to Queue using celery,rq,etc
#Using celery delay method here to call helper_method task
helper_method.delay(ticks)
def helper_method(ticks):
#Perform require operation using tick data
2>Create another thread and perform the required operation in 2nd thread using threaded=True
P.S: Don't forget to assign ticker callback for new thread.
import logging
from kiteconnect import KiteTicker
logging.basicConfig(level=logging.DEBUG)
kws = KiteTicker("your_api_key", "your_access_token")
def on_ticks(ws, ticks):
logging.debug("Ticks: {}".format(ticks))
def on_connect(ws, response):
ws.subscribe([738561, 5633])
ws.set_mode(ws.MODE_FULL, [738561])
def on_close(ws, code, reason):
ws.stop()
kws.on_ticks = on_ticks
kws.on_connect = on_connect
kws.on_close = on_close
kws.connect(threaded=True)
while True:
#Perform required data operation using tick data
def on_ticks(ws, ticks):
..........
helper_method(ticks)
.........
def helper_method(ticks):
.........
Perform computation here
........
#Assign callback
kws.on_ticks=on_t
I'm trying to convert a synchronous flow in Python code which is based on callbacks to an A-syncronious flow using asyncio.
Basically the code interacts a lot with TCP/UNIX sockets. It reads data from the sockets, manipulates it to make decisions and writes stuff back to the other side. This is going on over multiple sockets at once and data is shared between the contexts to make decisions sometimes.
EDIT :: The code currently is mostly based on registering a callback to a central entity for a specific socket, and having that entity run the callback when the relevant socket is readable (something like "call this function when that socket has data to be read"). Once the callback is called - a bunch of stuff happens, and eventually a new callback is registered for when new data is available. The central entity runs a select over all sockets registered to figure out which callbacks should be called.
I'm trying to do this without refactoring my entire code and making this as seamless as possible to the programmer - so I was trying to think about it like so - all code should run the same way as it does today - but whenever the current code does a socket.recv() to get new data - the process would yield execution to other tasks. When the read returns, it should go back to handling the data from the same point using the new data it got.
To do this, I wrote a new class called AsyncSocket - which interacts with the IO streams of asyncIO and placed the Async/await statements almost solely in there - thinking that I would implement the recv method in my class to make it look like a "regular IO socket" to the rest of my code.
So far - this is my understanding of what A-sync programming should allow.
Now to the problem :
My code awaits for clients to connect - when it does, each client's context is allowed to read and write from it's own connection.
I've simplified to flow to the following to clarify the problem:
class AsyncSocket():
def __init__(self,reader,writer):
self.reader = reader
self.writer = writer
def recv(self,numBytes):
print("called recv!")
data = self.read_mitigator(numBytes)
return data
async def read_mitigator(self,numBytes):
print("Awaiting of AsyncSocket.reader.read")
data = await self.reader.read(numBytes)
print("Done Awaiting of AsyncSocket.reader.read data is %s " % data)
return data
def mit2(aSock):
return mit3(aSock)
def mit3(aSock):
return aSock.recv(100)
async def echo_server(reader, writer):
print ("New Connection!")
aSock = AsyncSocket(reader,writer) # create a new A-sync socket class and pass it on the to regular code
while True:
data = await some_func(aSock) # this would eventually read from the socket
print ("Data read is %s" % (data))
if not data:
break
writer.write(data) # echo everything back
async def main(host, port):
server = await asyncio.start_server(echo_server, host, port)
await server.serve_forever()
asyncio.run(main('127.0.0.1', 5000))
mit2() and mit3() are synchronous functions that do stuff with the data on the way back before returning to the main client's loop - but here I'm just using them as empty functions.
The problem starts when I play with the implementation of some_func().
A pass through implementation (edit: kind-of-works) - but still has issues :
def some_func(aSock):
try:
return (mit2(aSock)) # works
except:
print("Error!!!!")
While an implementation which reads the data and does something with it - like adding a suffix before returning, throws an error:
def some_func(aSock):
try:
return (mit2(aSock) + "something") # doesn't work
except:
print("Error!!!!")
The error (as far as I understand it) means it's not really doing what it should:
New Connection!
called recv!
/Users/user/scripts/asyncServer.py:36: RuntimeWarning: coroutine 'AsyncSocket.read_mitigator' was never awaited
return (mit2(aSock) + "something") # doesn't work
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Error!!!!
Data read is None
And the echo server obviously doesn't work.
Obviously my code looks more like option #2 with a lot more stuff in some_func(),mit2() and mit3() - but I can't get this to work. I'm fairly new in using asyncio/async/await - so what (rather basic concept I guess) am I missing?
This code won't work as envisioned:
def recv(self,numBytes):
print("called recv!")
data = self.read_mitigator(numBytes)
return data
async def read_mitigator(self,numBytes):
...
You cannot call an async function from a sync function and get the result, you must await it, which ensures that you return to the event loop in case the data is not yet ready. This mismatch between async and sync code is sometimes referred to as the issue of function color.
Since your code is already using non-blocking sockets and an event loop, a good approach to porting it to asyncio might be to first switch to the asyncio event loop. You can use event loop methods like sock_recv to request data:
def start():
loop = asyncio.get_event_loop()
sock = make_socket() # make sure it's non-blocking
future_data = loop.sock_recv(sock, 1024)
future_data.add_done_callback(continue_read)
# return to the event loop - when some data is ready
# continue_read will be invoked
def continue_read(future):
data = future.result()
print('got', data)
# ... do something with data, e.g. process it
# and call sock_sendall with the response
asyncio.get_event_loop().call_soon(start())
asyncio.get_event_loop().run_forever()
Once you have the program working in that mode, you can start moving to coroutines, which allow the code to look like sync code, but work in exactly the same way:
async def start():
loop = asyncio.get_event_loop()
sock = make_socket() # make sure it's non-blocking
data = await loop.sock_recv(sock, 1024)
# data is available "immediately", meaning the coroutine gets
# automatically suspended when awaiting data that is not yet
# ready, and automatically re-scheduled when the data is ready
print('got', data)
asyncio.run(start())
The next step can be eliminating make_socket and switching to asyncio streams.
I am creating a simple TCP server-client script in Python. The server is threaded and forks a new worker/thread for every client connection. So far I have pretty much coded the entire server module. But my function called the handle_clients() which is forked for every incoming client connection is getting very long. In order to improve the readability of the code I want to split my handle_clients() into multiple small functions. I do understand that when I split handle_client() into smaller functions, the split functions should be wrapped around mutex locks to synchronize shared usage between multiple handle_clients() threads. Doing this will actually reduce the efficiency of the program because handle_clients() will have to wait for other threads to unlock the shared functions before actually using it. My other thought was to create these smaller functions as threads within the handle_clients() thread. And wait for these threads to finish using Thread.join() before continuing. Is there a better way to do this?
My code:
#!/usr/bin/python
import socket
import threading
import pandas as pd
class TCPServer(object):
NUMBER_OF_THREADS = 0
BUFFER = 4096
threads_list = []
def __init__(self, port, hostname):
self.socket = socket.socket(
family=socket.AF_INET, type=socket.SOCK_STREAM)
self.socket.bind((hostname, port))
def listen_for_clients(self):
self.socket.listen(5)
while True:
client, address = self.socket.accept()
client_ID = client.recv(TCPServer.BUFFER)
print(f'Connected to client: {client_ID}')
if client_ID:
TCPServer.NUMBER_OF_THREADS = TCPServer.NUMBER_OF_THREADS + 1
thread = threading.Thread(
target=TCPServer.create_worker, args=(self, client, address, client_ID))
TCPServer.threads_list.append(thread)
thread.start()
if TCPServer.NUMBER_OF_THREADS > 2:
break
TCPServer.wait_for_workers()
def wait_for_workers():
for thread in TCPServer.threads_list:
thread.join()
def create_worker(self, client, address, client_ID):
print(f'Spawned a new worker for {client_ID}. Worker #: {TCPServer.NUMBER_OF_THREADS}')
data_list = []
data_frame = pd.DataFrame()
client.send("SEND_REQUEST_TYPE".encode())
request_type = client.recv(TCPServer.BUFFER).decode('utf-8')
if request_type == 'KMEANS':
print(f'Client: REQUEST_TYPE {request_type}')
client.send("SEND_DATA".encode())
while True:
data = client.recv(TCPServer.BUFFER).decode('utf-8')
if data == 'ROW':
client.send("OK".encode())
while True:
data = client.recv(TCPServer.BUFFER).decode('utf-8')
print(f'Client: {data}')
if data == 'ROW_END':
print('Data received: ', data_list)
series = pd.Series(data_list)
data_frame.append(series, ignore_index=True)
data_list = []
client.send("OK".encode())
break
else:
data_list.append(int(data))
client.send("OK".encode())
elif data == 'DATA_END':
client.send("WAIT".encode())
# (Vino) pass data to algorithm
print('Data received from client {client_ID}: ', data_frame)
elif request_type == 'NEURALNET':
pass
elif request_type == 'LINRIGRESSION':
pass
elif request_type == 'LOGRIGRESSION':
pass
def main():
port = input("Port: ")
server = TCPServer(port=int(port), hostname='localhost')
server.listen_for_clients()
if __name__ == '__main__':
main()
Note: This following block of code is repetative and will e used multiple times within the handle_client() function.
while True:
data = client.recv(TCPServer.BUFFER).decode('utf-8')
if data == 'ROW':
client.send("OK".encode())
while True:
data = client.recv(TCPServer.BUFFER).decode('utf-8')
print(f'Client: {data}')
if data == 'ROW_END':
print('Data received: ', data_list)
series = pd.Series(data_list)
data_frame.append(series, ignore_index=True)
data_list = []
client.send("OK".encode())
break
else:
data_list.append(int(data))
client.send("OK".encode())
elif data == 'DATA_END':
client.send("WAIT".encode())
# (Vino) pass data to algorithm
print('Data received from client {client_ID}: ', data_frame)
This is the block I want a place in a separate function and calls it within the handle_client() thread.
Your code is already long, I'll not dive into it but try to keep things general.
I do understand that when I split handle_client() into smaller functions, the split functions should be wrapped around mutex locks.
That's not directly true, between threads you already have to use locks to guard against memory overwriting, regarless your function calls.
The server is threaded
Looks like you're doing CPU-intensive work (I see LINALG, NEURALNET, ...), it is not logical to use threads, in Python, to dispatch CPU-intensive loads as the GIL will linearize CPU usage between your threads.
The way to parallelize CPU intensive work in Python is to use processes.
Processes do not share memory so you'll be able to manipulate variables freely without mutexes, but they won't be shared at all, I hope your jobs are independent, as they can't share any state.
If you need to share state, avoid locks, it's complicated to handle, it's the way to dead locks, and it's not readable, try to implement your "state sharing" with queues, as a pipeline of jobs, each worker pulling from a queue, doing work, and pushing to another queue, this way keep things clear and easy to understand. Plus there's implementation of queues for threads and processes so you'll be able to switch from both almost seamlessly.
if TCPServer.NUMBER_OF_THREADS > 2:
break
Hey, you're breaking out of your main loop when you have more than two threads, existing your main process, killing your server, I bet that now what you want. Oh and if you use processes instead of threads, you should prefork a pool of them, as their creation costs more than a thread. And reuse them, a process can do a job after finishing one, it does not have to die (typically use queues to send job to your processes).
Side note: I'd implement this using HTTP instead of raw TCP to benefit from the notions of request, response, error reporting, existing frameworks, and the ability to use existing clients (curl/wget in command line, your browser, requests in Python). I'd implement this fully asynchronously (no blocking HTTP request), like one request to create a job, and following requests to get the status and the result, like:
$ curl -X POST http://localhost/linalg/jobs/ -d '{your data}'
201 Created
Location: http://localhost/linalg/jobs/1
$ curl -XGET http://localhost/linalg/jobs/1
200 OK
{"status": "queued"}
Some time later…
$ curl -XGET http://localhost/linalg/jobs/1
200 OK
{"status": "in progress"}
Some time later…
$ curl -XGET http://localhost/linalg/jobs/1
200 OK
{"status": "done", "result": "..."}
To implement this there's a lot of nice work already done, typically aiohttp, apistar, and so on.
I am trying to start and stop a service running as a Python web app using Flask. The service involves a loop that executes continuously, listening for microphone input and taking some action if the input surpasses a predefined threshold. I can get the program to start execution when the url is passed with a /on parameter, but once it starts, I can't find a way to stop it. I have tried using request.args.get to monitor the status of the url parameter and watch for it to change from /on to /off, but for some reason, the program doesn't register that I have changed the query string to attempt to halt the execution. Is there a better way to execute my code and have it stop when the url parameter is changed from /on to /off? Any help is greatly appreciated!
import alsaaudio, time, audioop
import RPi.GPIO as G
import pygame
from flask import Flask
from flask import request
app = Flask(__name__)
G.setmode(G.BCM)
G.setup(17,G.OUT)
pygame.mixer.init()
pygame.mixer.music.load("/home/pi/OceanLoud.mp3")
#app.route('/autoSoothe', methods=['GET','POST'])
def autoSoothe():
toggle = request.args.get('state')
print(toggle)
if toggle == 'on':
# Open the device in nonblocking capture mode. The last argument could
# just as well have been zero for blocking mode. Then we could have
# left out the sleep call in the bottom of the loop
inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE,alsaaudio.PCM_NONBLOCK,'null',0)
# Set attributes: Mono, 8000 Hz, 16 bit little endian samples
inp.setchannels(1)
inp.setrate(8000)
inp.setformat(alsaaudio.PCM_FORMAT_S16_LE)
# The period size controls the internal number of frames per period.
# The significance of this parameter is documented in the ALSA api.
# For our purposes, it is suficcient to know that reads from the device
# will return this many frames. Each frame being 2 bytes long.
# This means that the reads below will return either 320 bytes of data
# or 0 bytes of data. The latter is possible because we are in nonblocking
# mode.
inp.setperiodsize(160)
musicPlay = 0
while toggle == 'on':
toggle = request.args.get('state')
print(toggle)
if toggle == 'off':
break
# Read data from device
l,data = inp.read()
if l:
try:
# Return the maximum of the absolute value of all samples in a fragment.
if audioop.max(data, 2) > 20000:
G.output(17,True)
musicPlay = 1
else:
G.output(17,False)
if musicPlay == 1:
pygame.mixer.music.play()
time.sleep(10)
pygame.mixer.music.stop()
musicPlay = 0
except audioop.error, e:
if e.message != "not a whole number of frames":
raise e
time.sleep(.001)
return toggle
if __name__ == "__main__":
app.run(host='0.0.0.0', port=5000, debug=True)
When an HTTP request is made from the client to the Flask server, the client sends one request and waits for a response from the server. This means when you send the state parameter, there is no way for the client to retroactively change it.
There are a few different ways to get your desired behavior.
The first that comes to my mind is to use some asynchronous code. You could have code that starts a thread/process when the state is "on" and then finishes the request. This thread would run your audio loop. Then the client could send another request but with the state being "off", which could alert the other process to gracefully stop.
Here is some information about multiproccessing; however, there is a lot of information about how to do similar things with Flask using Celery, etc.
I have a Tornado web application, this app can receive GET and POST request from the client.
The POSTs request put an information received in a Tornado Queue, then I pop this information from the queue and with it I do an operation on the database, this operation can be very slow, it can take several seconds to complete!
In the meantime that this database operation goes on I want to be able to receive other POSTs (that put other information in the queue) and GET. The GET are instead very fast and must return to the client their result immediatly.
The problem is that when I pop from the queue and the slow operation begin the server doesn't accept other requests from the client. How can I resolve this?
This is the semplified code I have written so far (import are omitted for avoid wall of text):
# URLs are defined in a config file
application = tornado.web.Application([
(BASE_URL, Variazioni),
(ARTICLE_URL, Variazioni),
(PROMO_URL, Variazioni),
(GET_FEEDBACK_URL, Feedback)
])
class Server:
def __init__(self):
http_server = tornado.httpserver.HTTPServer(application, decompress_request=True)
http_server.bind(8889)
http_server.start(0)
transactions = TransactionsQueue() #contains the queue and the function with interact with it
IOLoop.instance().add_callback(transactions.process)
def start(self):
try:
IOLoop.instance().start()
except KeyboardInterrupt:
IOLoop.instance().stop()
if __name__ == "__main__":
server = Server()
server.start()
class Variazioni(tornado.web.RequestHandler):
''' Handle the POST request. Put an the data received in the queue '''
#gen.coroutine
def post(self):
TransactionsQueue.put(self.request.body)
self.set_header("Location", FEEDBACK_URL)
class TransactionsQueue:
''' Handle the queue that contains the data
When a new request arrive, the generated uuid is putted in the queue
When the data is popped out, it begin the operation on the database
'''
queue = Queue(maxsize=3)
#staticmethod
def put(request_uuid):
''' Insert in the queue the uuid in postgres format '''
TransactionsQueue.queue.put(request_uuid)
#gen.coroutine
def process(self):
''' Loop over the queue and load the data in the database '''
while True:
# request_uuid is in postgres format
transaction = yield TransactionsQueue.queue.get()
try:
# this is the slow operation on the database
yield self._load_json_in_db(transaction )
finally:
TransactionsQueue.queue.task_done()
Moreover I don't understand why if I do 5 POST in a row, it put all five data in the queue though the maximun size is 3.
I'm going to guess that you use a synchronous database driver, so _load_json_in_db, although it is a coroutine, is not actually async. Therefore it blocks the entire event loop until the long operation completes. That's why the server doesn't accept more requests until the operation is finished.
Since _load_json_in_db blocks the event loop, Tornado can't accept more requests while it's running, so your queue never grows to its max size.
You need two fixes.
First, use an async database driver written specifically for Tornado, or run database operations on threads using Tornado's ThreadPoolExecutor.
Once that's done your application will be able to fill the queue, so second, TransactionsQueue.put must do:
TransactionsQueue.queue.put_nowait(request_uuid)
This throws an exception if there are already 3 items in the queue, which I think is what you intend.