How to call pymongo db faster? - python

I am using motor but pymongo was my initial choice, switched to motor because it is an async version of mongodb in python.
My aim here is to query the mongodb with large number of calls at the same time with minimal waiting time.
There's about 1000 symbols and for each symbol I have to query its latest candlestick data from mongodb from time to time in order to perform certain calculation. I need to query the latest 5K documents for each symbol. So the collection contains roughly 1000 * 5000 = 5,000,000 documents.
With Motor and asyncio, I use the following method to fetch documents asynchronously, but it takes really long time to run the code and I can't seem to know why. I am using 8 core cpu on a virtual machine.
Any help with this problem?
async def getCandleList(symbol): # each symbol contains about 5K latest candles in the collection
final_str = "{'symbol': '%s'}"%(symbol)
resultType = 'candlestick_archive'
dbName = 'candle_db'
cursor = eval("db.{}.find({}).sort('timeStamp',-1)".format(dbName, final_str))
finalList = await cursor.to_list(length=None)
return finalList
async def taskForEachSymbol(symbol):
while True:
candleList = await getCandleList(symbol)
await generateSignal(candleList) # a function that generates certain signals in real time
def getAllTasks():
awaitableTasks = []
for symbol in symbolList: # symbolList contains around 1k symbols
awaitableTasks.append(asyncio.create_task(taskForEachSymbol(symbol)))
return awaitableTasks
async def mainTask():
awaitableTasks = getAllTasks()
await asyncio.gather(*awaitableTasks, return_exceptions=False)
async def main()
mainLoop.run_until_complete(mainTask())
print('completed! ... ')
if __name__ == '__main__':
mainLoop=asyncio.new_event_loop()
asyncio.set_event_loop(mainLoop)
client = motor.motor_asyncio.AsyncIOMotorClient(io_loop=mainLoop)
db = client.candles
main()

Related

I want build a alert for ema indicator crypto in a special list

First one:
### configuration details
TELEGRAM_TOKEN = '' # telegram bot token
TELEGRAM_CHANNEL ='' # channel id
INTERVAL = '1m' # binance time interval
SHORT_EMA = 7 # short interval for ema
LONG_EMA = 21 # long interval for ema
Here is my second code:
import requests
import talib
import time
import numpy as np
import websocket
from config import TELEGRAM_TOKEN, TELEGRAM_CHANNEL , INTERVAL, SHORT_EMA , LONG_EMA
def streamKline(currency, interval):
websocket.enableTrace(False)
socket = f'wss://stream.binance.com:9443/ws/{currency}#kline_{interval}'
ws = websocket.WebSocketApp(socket)
ws.run_forever()
#SYMBOLS TO LOOK FOR ALERTS
SYMBOLS = [
"ETHUSDT",
"BTCUSDT",
"ATOMUSDT",
"BNBUSDT",
"FTMBUSD",
"ENJUSDT",
"WAXPUSDT"
]
#sending alerts to telegram
def send_message(message):
url = "https://api.telegram.org/bot{}/sendMessage?chat_id={}&text={}&parse_mode=markdown".format(TELEGRAM_TOKEN,TELEGRAM_CHANNEL,message)
res = requests.get(url);print(url);
return res
# getting klines data to process
def streamKline(symbol):
data = socket.streamKline(symbol=symbol,interval=INTERVAL,limit=300) # more data means more precision but at the trade off between speed and time
return_data = []
# taking closing data for each kline
for each in data:
return_data.append(float(each[4])) # 4 is the index of the closing data in each kline
return np.array(return_data) # returning as numpy array for better precision and performance
def main():
# making a infinite loop that keeps checking for condition
while True:
#looping through each coin
for each in SYMBOLS:
data = streamKline(each)
ema_short = talib.EMA(data,int(SHORT_EMA))
ema_long = talib.EMA(data,int(LONG_EMA))
last_ema_short = ema_short[-2]
last_ema_long = ema_long[-2]
ema_short = ema_short[-1]
ema_long = ema_long[-1]
# conditions for alerts
if(ema_short > ema_long and last_ema_short < last_ema_long):
message = each + "bullcoming "+ str(SHORT_EMA) + " over "+str(LONG_EMA);print(each ,"alert came");
send_message(message);
time.sleep(0.5);
# calling the function
if __name__ == "__main__":
main()
The part of config is all settle done, just second for the kline data, the error mention lot like this.
data = socket.streamKline(symbol=symbol,interval=INTERVAL,limit=300) # more data means more precision but at the
trade off between speed and time
NameError: name 'socket' is not defined
I just don't know how to do it, I want build a ema alert that can give me a message when I am not watching chart, through this way seems not work, I have tried many times, and also find many video but still, I am just an beginner, nothing improving at all.

Discord py - multiprocessing blocks tasks.loop

I am creating a discord bot with Python on Replit.
One function of the bot is that it checks whether the current time is equal to a given time, so I have a tasks.loop event that loops every second. Another function of the bot is a command that generates a graph with data taken from an api.
Both blocks of codes run fine on their own. But sometimes after calling the graph command, it stops the tasks.loop: now is no longer printed every second after bot.pt_list is printed. The following is my code:
import datetime
from discord.ext import tasks
from multiprocessing import Pool
import requests
#tasks.loop(seconds = 1)
async def notif():
now = datetime.datetime.now() + datetime.timedelta(hours = 8)
now = now.strftime("%H:%M:%S")
print(now)
bot.pt_list = []
#bot.command(name = 'graph')
async def graph(ctx):
bot.rank = rank
timestamp_url = "https://api.sekai.best/event/29/rankings/time?region=tw"
timestamp_response = requests.get(timestamp_url)
timestamp_data = timestamp_response.json()["data"]
i = 1
timestamp_filtered = []
while i <= len(timestamp_data):
timestamp_filtered.append(timestamp_data[i])
i += 12
timestamp_url = []
if __name__ == '__main__':
for timestamp in timestamp_filtered:
timestamp_url.append("https://api.sekai.best/event/29/rankings?region=tw&timestamp=" + timestamp)
with Pool(20) as p:
bot.pt_list = p.map(pt, timestamp_url)
print(bot.pt_list)
def pt(timestamp_url):
pt_response = requests.get(timestamp_url)
pt_data = pt_response.json()["data"]["eventRankings"]
for data in pt_data:
if data["rank"] == 1:
return data["score"]
And below is the output:
# prints time every second
15:03:01
15:03:02
15:03:03
15:03:04
[414505, 6782930, 13229090, 19650440, 27690605, 34044730, 34807680, 38346228, 43531083, 48973205, 52643633, 56877023, 62323476, 67464731, 69565641, 74482140, 78791756, 84277236, 87191476, 91832031, 97207348, 102692443, 104280559, 106288572, 111710142, 112763082, 112827552, 113359257, 116211652, 117475362, 117529967, 117560102, 118293877, 118293877, 118430000, 118430000]
15:03:15
15:03:15
# printing stops
However, the tasks.loop does not get stopped every time, sometimes it works and will continue to print now after printing bot.pt_list. I'm relatively new to Python and I don't know what the issue is, could someone help explain why this is happening and how to fix this? Thank you!

Fetching the order depth with python-binance, but the code does not complete

I am running a python script to fetch all the current order books for all symbols that ends with USDT.
Whenever I try to run it, it fetches the orderbook for the first three symbols (in this case BTCUSDT, ETHUSDT and BNBUSDT). Any takers on what I am messing up here?
I am using this logic to get a list of the symbols and the order book;
import asyncio
import config as c #from config.py
import infinity as inf #userdefined function for infinity (probably not needed)
from binance import AsyncClient, DepthCacheManager, Client
client = Client(c.API_KEY, c.API_SECRET, tld = 'com')
info = client.get_exchange_info()
symbols = info['symbols']
ls = []
for s in symbols:
if 'USDT' in s['symbol']:
#if 'BUSD' not in s['symbol']:
ls.append(s['symbol'])
async def main():
# initialise the client
client = await AsyncClient.create()
for i in ls:
async with DepthCacheManager(client, symbol= i, limit = 10000) as dcm_socket:
depth_cache = await dcm_socket.recv()
symbol = i
asks = depth_cache.get_asks()[:5]
bids = depth_cache.get_bids()[:5]
full = [symbol, asks, bids]
print(full)
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
It wouldn't complete, because it's not supposed to.
DepthCacheManager is designed to establish a connection (WebSockets), get a snapshot of the order information, and then subscribes to a stream of updates to the current outstanding orders that it applies locally in it's "DepthCache". Each time that gets updated, it deliver the updated set of current asks/bids as you can see.
The trading and orders never stop, so why would it stop?
Maybe you wana try: https://github.com/LUCIT-Systems-and-Development/unicorn-binance-local-depth-cache
import unicorn_binance_local_depth_cache
ubldc = unicorn_binance_local_depth_cache.BinanceLocalDepthCacheManager(exchange="binance.com")
ubldc.create_depth_cache("LUNABTC")
asks = ubldc.get_asks("LUNABTC")
bids = ubldc.get_bids("LUNABTC")
Thats it :)

Multithread or multiprocess

So, currently, I am using multiprocessing to run these 3 functions together.
As only tokens changes, is it recommended to switch to multi-threading? (if yes, will it really help in a performance like speed-up and I think memory will be for sure used less)
This is my code:
from database_function import *
from kiteconnect import KiteTicker
import pandas as pd
from datetime import datetime, timedelta
import schedule
import time
from multiprocessing import Process
def tick_A():
#credentials code here
tokens = [x[0] for x in db_fetchquery("SELECT zerodha FROM script ORDER BY id ASC LIMIT 50")] #FETCHING FIRST 50 SCRIPTS TOKEN
#print(tokens)
##### TO MAKE SURE THE TASK STARTS AFTER 8:59 ONLY ###########
t = datetime.today()
future = datetime(t.year,t.month,t.day,8,59)
if ((future-t).total_seconds()) < 0:
future = datetime(t.year,t.month,t.day,t.hour,t.minute,(t.second+2))
time.sleep((future-t).total_seconds())
##### TO MAKE SURE THE TASK STARTS AFTER 8:59 ONLY ###########
def on_ticks(ws, ticks):
global ltp
ltp = ticks[0]["last_price"]
for tick in ticks:
print(f"{tick['instrument_token']}A")
db_runquery(f'UPDATE SCRIPT SET ltp = {tick["last_price"]} WHERE zerodha = {tick["instrument_token"]}') #UPDATING LTP IN DATABASE
#print(f"{tick['last_price']}")
def on_connect(ws, response):
#print(f"response from connect :: {response}")
# Subscribe to a list of instrument_tokens (TOKENS FETCHED ABOVE WILL BE SUBSCRIBED HERE).
# logging.debug("on connect: {}".format(response))
ws.subscribe(tokens)
ws.set_mode(ws.MODE_LTP,tokens) # SETTING TOKEN TO TICK MODE (LTP / FULL / QUOTE)
kws.on_ticks = on_ticks
kws.on_connect = on_connect
kws.connect(threaded=True)
#####TO STOP THE TASK AFTER 15:32 #######
end_time = datetime(t.year,t.month,t.day,15,32)
while True:
schedule.run_pending()
#time.sleep(1)
if datetime.now() > end_time:
break
#####TO STOP THE TASK AFTER 15:32 #######
def tick_B():
everything remains the same only tokens value changes
tokens = [x[0] for x in db_fetchquery("SELECT zerodha FROM script ORDER BY id ASC OFFSET (50) ROWS FETCH NEXT (50) ROWS ONLY")]
def tick_C():
everything remains the same only tokens value changes
tokens = [x[0] for x in db_fetchquery("SELECT zerodha FROM script ORDER BY id ASC OFFSET (100) ROWS FETCH NEXT (50) ROWS ONLY")]
if __name__ == '__main__':
def runInParallel(*fns):
proc = []
for fn in fns:
p = Process(target=fn)
p.start()
proc.append(p)
for p in proc:
p.join()
runInParallel(tick_A , tick_B , tick_C)
So, currently, I am using multiprocessing to run these 3 functions together.
As only tokens changes, is it recommended to switch to multi-threading? (if yes, will it really help in a performance like speed-up and I think memory will be for sure used less)
most Python implementations do not have true multi-threading, because they use global lock (GIL). So only one thread runs at a time.
For I/O heavy applications it should not make difference. But if you need CPU heavy operations done in parallel (and I see that you use Panda - so the answer must be yes) - you will be better off staying with multi-process app.

Tornado Server using most of the cpu while using tornado-sockjs and only two clients.

I am using Tornado Server, 4.4.2 and pypy 5.9.0 and python 2.7.13,
hosted on Ubuntu 16.04.3 LTS
A new client logs in and a new class is created and passed the socket, so dialog can be maintained. I am using a global clients[] list to contain the classes. initial dialog looks like :
clients = []
class RegisterWebSocket(SockJSConnection):
# intialize the class and handle on-open (some things left out)
def on_open(self,info):
self.ipaddress = info.headers['X-Real-Ip']
def on_message(self, data):
coinlist = []
msg = json.loads(data)
if 'coinlist' in msg:
coinlist = msg['coinlist']
if 'currency' in msg:
currency = msg['currency']
tz = pendulum.timezone('America/New_York')
started = pendulum.now(tz).to_day_datetime_string()
ws = WebClientUpdater(self, self.clientid, coinlist,currency,
started, self.ipaddress)
clients.append(ws)
The ws class is shown below and I use a tornado periodiccallback to update the clients with their specific info every 20 seconds
class WebClientUpdater(SockJSConnection):
def __init__(self, ws,id, clist, currency, started, ipaddress):
super(WebClientUpdater,self).__init__(ws.session)
self.ws = ws
self.id = id
self.coinlist = clist
self.currency = currency
self.started = started
self.ipaddress = ipaddress
self.location = loc
self.loop = tornado.ioloop.PeriodicCallback(self.updateCoinList,
20000, io_loop=tornado.ioloop.IOLoop.instance())
self.loop.start()
self.send_msg('welcome '+ id)
def updateCoinList(self):
pdata = db.getPricesOfCoinsInCurrency(self.coinlist,self.currency)
self.send(dict(priceforcoins = pdata))
def send_msg(self,msg):
self.send(msg)
I also start at 60 second periodiccallback at startup, to monitor the clients for closed connections and remove them from the client[] list. Which I put on the startup line to call a def internally like
if __name__ == "__main__":
app = make_app()
app.listen(options.port)
ScheduleSocketCleaning()
and
def ScheduleSocketCleaning():
def cleanSocketHouse():
print "checking sockets"
for x in clients:
if x.is_closed:
x = None
clients[:] = [y for y in clients if not y.is_closed ]
loop = tornado.ioloop.PeriodicCallback(cleanSocketHouse, 60000,
io_loop=tornado.ioloop.IOLoop.instance())
loop.start()
If I monitor the server using TOP I see that it uses 4% cpu typical with bursts to 60+ immediately, but later, say after a few hours it becomes in the 90% and stays there.
I have used strace and I see an enormous amount of Stat calls on the same files with errors shown in the strace -c view, but I cannot find any errors in a text file using -o trace.log. How can I find those errors ?
But I also notice that most of the time is consumed in epoll_wait.
%time
41.61 0.068097 7 9484 epoll_wait
26.65 0.043617 0 906154 2410 stat
15.77 0.025811 0 524072 read
10.90 0.017840 129 138 brk
2.41 0.003937 9 417 madvise
2.04 0.003340 0 524072 lseek
0.56 0.000923 3 298 sendto
0.06 0.000098 0 23779 gettimeofday
100.00 0.163663 1989527 2410 total
Notice 2410 errors above.
When I view the strace output stream using attached pid, I just see endless Stat calls on the same files..
Can someone advise me as to how to better debug this situation? With only two clients and 20 seconds between client updates, I would expect the CPU usage (there are no other users of the site during this prototype stage) would be less than 1% or thereabouts.
You need to close PeriodicCallbacks, otherwise its a memory leak. You do that by simply calling .close() on a PeriodicCallback object. One way to deal with that is in your periodic cleaning task:
def cleanSocketHouse():
global clients
new_clients = []
for client in clients:
if client.is_closed:
# I don't know why you call it loop,
# .timer would be more appropriate
client.loop.close()
else:
new_clients.append(client)
clients = new_clients
I'm not sure how accurate .is_closed is (some testing is required). The other way is to alter updateCoinList. The .send() method should fail when the client is no longer connected, right? Therefore try: except: should do the trick:
def updateCoinList(self):
global clients
pdata = db.getPricesOfCoinsInCurrency(self.coinlist,self.currency)
try:
self.send(dict(priceforcoins = pdata))
except Exception:
# log exception?
self.loop.close()
clients.remove(self) # you should probably use set instead of list
If ,send() actually doesn't fail (for whatever reason, I'm not that familiar with Tornado) then stick to the first solution.

Categories