python newbie here!
I am trying to write a small program for myself which is basically getting price information from different exchanges and comparing them, so far it is working great but honestly, I want to make it better in terms of performance and efficiency.
What I mean by efficiency is my program is checking prices step by step and printing the results. My question is can I convert it to checking the prices simultaneously from different exchanges and print them all at the same time?
Below is the part of the code that I wrote:
#Novadax
symbol_novadax = coin_list[i] + "_USDT"
response_novadax = requests.get('https://api.novadax.com/v1/market/ticker?symbol=' + symbol_novadax)
novadax_dic = json.loads(response_novadax.content)
try:
if "ask" in novadax_dic["data"]:
novadax_bid_price = float(novadax_dic["data"]["bid"])
print("novadax_bid_price "+str(novadax_bid_price))
novadax_ask_price = float(novadax_dic["data"]['ask'])
print("novadax_ask_price " + str(novadax_ask_price))
if (max_bid_val < novadax_bid_price):
max_bid_val = novadax_bid_price
max_bid_place = "novadax"
if (min_ask_val > novadax_ask_price):
min_ask_val = novadax_ask_price
min_ask_place = "novadax"
except:
print(coin_list[i] + " not in novadax")
if is_run == False:
telegram_send.send(messages=["False novadax"], parse_mode=None)
break
#ZT
symbol_zt = coin_list[i] + "_USDT"
response_zt = requests.get('https://www.ztb.im/api/v1/tickers')
zt_dic = json.loads(response_zt.content)
# print(next(item for item in zt_dic["ticker"] if item["symbol"] == symbol_zt))
try:
if "buy" in next(item for item in zt_dic["ticker"] if item["symbol"] == symbol_zt):
zt_bid_price = float(next(item for item in zt_dic["ticker"] if item["symbol"] == symbol_zt)["buy"])
print("zt_bid_price "+str(zt_bid_price))
zt_ask_price = float(next(item for item in zt_dic["ticker"] if item["symbol"] == symbol_zt)['sell'])
print("zt_ask_price " + str(zt_ask_price))
if (max_bid_val < zt_bid_price):
max_bid_val = zt_bid_price
max_bid_place = "zt"
if (min_ask_val > zt_ask_price):
min_ask_val = zt_ask_price
min_ask_place = "zt"
except:
print(coin_list[i] + " not in zt")
if is_run == False:
telegram_send.send(messages=["False zt"], parse_mode=None)
break
my input is something like that:
zt_bid_price = 0.12
zt_ask_price = 0.14
novadax_bid_price = 0.13
novadax_ask_price= 0.14
To be more clear, I am not getting those results at the same time. I mean it prints in order and I am planning to add more exchanges in the future which means if I decide to print everything at the end of the code that will give me slightly old data. Does anyone have any idea about how can I solve this problem?
Thanks in advance!
Depending on your implementation you can use multi-processing to keep each ticker going. However there is overhead for each process which depending on your system, may or may not lag. You could have all the processes(tickers) running and either on a signal or on a time interval, have each simultaneously poll their source (with time stamp) and return their data.
There is a learning curve. The line below will get you started.
from multiprocessing import Pool
Related
The second 'if' statement midway through this code is using an 'or' between two conditions. This is causing the issue I just don't know how to get around it. The code is going through a data file and turning on the given relay number at a specific time, I need it to only do this once per given relay. If I use an 'and' between the conditions, it will only turn on the first relay that matches the current time and wait for the next hour and turn on the next given relay.
Could someone suggest something to fix this issue, thank you!
def schedule():
metadata, sched = dbx.files_download(path=RELAYSCHEDULE)
if not sched.content:
pass # If file is empty then exit routine
else:
relaySchedule = str(sched.content)
commaNum = relaySchedule.count(',')
data1 = relaySchedule.split(',')
for i in range(commaNum):
data2 = data1[i].split('-')
Time1 = data2[1]
currentRN = data2[0]
currentDT = datetime.datetime.now()
currentHR = currentDT.hour
global RN
global T
if str(currentHR) == str(Time1):
if T != currentHR or RN != currentRN:
relaynum = int(data2[0])
relaytime = int(data2[2])
T = currentHR
RN = currentRN
k = threading.Thread(target=SendToRelay(relaynum, relaytime)).start()
else:
print("Pass")
Desired Inputs:
sched.content = '1-19-10,3-9-20,4-9-10,'
T = ' '
RN = ' '
T and RN are global variables because the loop is running indefinitely, they're there to let the loop know whether the specific Time(T) and Relay Number(RN) have already been used.
Desired Outputs:
If the time is 9 AM then,
T = 9
RN should be whatever the given relay number is so RN = 3, but not sure this is the right thing to use.
Sorry if this is confusing. I basically need the program to read a set of scheduled times for specific relays to turn on, I need it to read the current time and if it matches the time in the schedule it will then check which relay is within that time and turn it on for however long. Once it has completed that, I need it to go over that same set of data in case there is another relay within the same time that also needs to turn on, the issue is that if I don't use T and RN variables to check whether a previous relay has been set, it will read the file and turn on the same relay over and over.
Try printing all used variables, check if everything is, what you think it is. On top of that, sometimes whietespaces characters causes problem with comparison.
I fixed it. For anyone wondering this is the new working code:
def schedule():
metadata, sched = dbx.files_download(path=RELAYSCHEDULE)
if not sched.content:
pass # If file is empty then exit routine
else:
relaySchedule = str(sched.content)
commaNum = relaySchedule.count(',')
data1 = relaySchedule.split(',')
for i in range(commaNum):
data2 = data1[i].split('-')
TimeSched = data2[1]
relaySched = data2[0]
currentDT = datetime.datetime.now()
currentHR = currentDT.hour
global RN
global T
if str(currentHR) == str(TimeSched):
if str(T) != str(currentHR):
RN = ''
T = currentHR
if str(relaySched) not in str(RN):
relaynum = int(data2[0])
relaytime = int(data2[2])
k = threading.Thread(target=SendToRelay(relaynum, relaytime)).start()
RN = str(RN) + str(relaySched)
Ok so for a problem I must parse in two files of information and then compare those files. I need to report any inconsistent data and any data that is in one file but not the other file. To do this I have sorted both lists of data which allows me to compare the first element in each list, if they are equal I remove them if they are inconsistent I report and remove them, if one or the other is different I report which one is missing data and then PUT BACK the later date so it can be compared next time.
As you can see in my code (at least I think, and have tested extensively) that this method works well for data sets with 100-200 lines per file. When I get larger, like 1,000-1,000,000 it takes to long to report.
I am stumped at how my while loop would be causing this. See below.
The split represents the date(split[0]) and then a piece of information(split[1]).
Any help would be appreciated and this is actually my first python program.
tldr; For some reason my program works fine in small data sets but larger data sets do not run properly. It isn't the sort()'s either (i.e. something in my first while loop is causing the run time to be shit).
ws1.sort()
ws2.sort()
while ws1 and ws2:
currItem1 = ws1.pop(0)
currItem2 = ws2.pop(0)
if currItem1 == currItem2:
continue
splitWS1 = currItem1.split()
splitWS2 = currItem2.split()
if splitWS1[0] == splitWS2[0] and splitWS1[1] != splitWS2[1]:
print("Inconsistent Data (" + splitWS1[0] + "): A: " + splitWS1[1] + " B: " + splitWS2[1])
continue
elif splitWS1[0] < splitWS2[0]:
print("Missing Data (" + splitWS1[0] + ") in data set A but not in B")
ws2.insert(0, currItem2)
continue
elif splitWS1[0] > splitWS2[0]:
print("Missing Data (" + splitWS2[0] + ") in data set B but not in A")
ws1.insert(0, currItem1)
continue
while ws2:
currItem2 = ws2.pop(0)
splitWS2 = currItem2.split()
print("Missing data (" + splitWS2[0] + ") in data set B but not in A")
while ws1:
currItem1 = ws1.pop(0)
splitWS1 = currItem1.split()
print("Missing data (" + splitWS1[0] + ") in data set A but not in B")
It's likely these two lines:
currItem1 = ws1.pop(0)
currItem2 = ws2.pop(0)
As you are removing the first item in the lists, you have to rebuild the entire list. If you try to use (outside the loop):
listA = list(reversed(ws1.sorted()))
And then process in the loop with
currItem1 = listA.pop()
For each of the two lists, you may save much processing time.
Basically, removing the first item in a list is O(n), while removing the last item in the list is O(1). Doing this within the loop means it is O(n^2), but if you reverse the list once beforehand, and then remove the last item in the list, it's O(n).
I'm trying to create a simulation where there are two printers and I find the average wait time for each. I'm using a class for the printer and task in my program. Basically, I'm adding the wait time to each of each simulation to a list and calculating the average time. My issue is that I'm getting a division by 0 error so nothing is being appended. When I try it with 1 printer (Which is the same thing essentially) I have no issues. Here is the code I have for the second printer. I'm using a queue for this.
if printers == 2:
for currentSecond in range(numSeconds):
if newPrintTask():
task = Task(currentSecond,minSize,maxSize)
printQueue.enqueue(task)
if (not labPrinter1.busy()) and (not labPrinter2.busy()) and \
(not printQueue.is_empty()):
nexttask = printQueue.dequeue()
waitingtimes.append(nexttask.waitTime(currentSecond))
labPrinter1.startNext(nexttask)
elif (not labPrinter1.busy()) and (labPrinter2.busy()) and \
(not printQueue.is_empty()):
nexttask = printQueue.dequeue()
waitingtimes.append(nexttask.waitTime(currentSecond))
labPrinter1.startNext(nexttask)
elif (not labPrinter2.busy()) and (labPrinter1.busy()) and \
(not printQueue.is_empty()):
nexttask = printQueue.dequeue()
waitingtimes.append(nexttask.waitTime(currentSecond))
labPrinter2.startNext(nexttask)
labPrinter1.tick()
labPrinter2.tick()
averageWait = sum(waitingtimes)/len(waitingtimes)
outfile.write("Average Wait %6.2f secs %3d tasks remaining." \
%(averageWait,printQueue.size()))
Any assistance would be great!
Edit: I should mention that this happens no matter the values. I could have a page range of 99-100 and a PPM of 1 yet I still get divided by 0.
I think your problem stems from an empty waitingtimes on the first iteration or so. If there is no print job in the queue, and there has never been a waiting time inserted, you are going to reach the bottom of the loop with waitingtimes==[] (empty), and then do:
sum(waitingtimes) / len(waitingtimes)
Which will be
sum([]) / len([])
Which is
0 / 0
The easiest way to deal with this would just be to check for it, or catch it:
if not waitingtimes:
averageWait = 0
else:
averageWait = sum(waitingtimes)/len(waitingtimes)
Or:
try:
averageWait = sum(waitingtimes)/len(waitingtimes)
except ZeroDivisionError:
averageWait = 0
I'm writing a script to download videos from a website. I've added a report hook to get download progress. So, far it shows the percentage and size of the downloaded data. I thought it'd be interesting to add download speed and eta.
Problem is, if I use a simple speed = chunk_size/time the speeds shown are accurate enough but jump around like crazy. So, I've used the history of time taken to download individual chunks. Something like, speed = chunk_size*n/sum(n_time_history).
Now it shows a stable download speed, but it is most certainly wrong because its value is in a few bits/s, while the downloaded file visibly grows at a faster pace.
Can somebody tell me where I'm going wrong?
Here's my code.
def dlProgress(count, blockSize, totalSize):
global init_count
global time_history
try:
time_history.append(time.monotonic())
except NameError:
time_history = [time.monotonic()]
try:
init_count
except NameError:
init_count = count
percent = count*blockSize*100/totalSize
dl, dlu = unitsize(count*blockSize) #returns size in kB, MB, GB, etc.
tdl, tdlu = unitsize(totalSize)
count -= init_count #because continuation of partial downloads is supported
if count > 0:
n = 5 #length of time history to consider
_count = n if count > n else count
time_history = time_history[-_count:]
time_diff = [i-j for i,j in zip(time_history[1:],time_history[:-1])]
speed = blockSize*_count / sum(time_diff)
else: speed = 0
n = int(percent//4)
try:
eta = format_time((totalSize-blockSize*(count+1))//speed)
except:
eta = '>1 day'
speed, speedu = unitsize(speed, True) #returns speed in B/s, kB/s, MB/s, etc.
sys.stdout.write("\r" + percent + "% |" + "#"*n + " "*(25-n) + "| " + dl + dlu + "/" + tdl + tdlu + speed + speedu + eta)
sys.stdout.flush()
Edit:
Corrected the logic. Download speed shown is now much better.
As I increase the length of history used to calculate the speed, the stability increases but sudden changes in speed (if download stops, etc.) aren't shown.
How do I make it stable, yet sensitive to large changes?
I realize the question is now more math oriented, but it'd be great if somebody could help me out or point me in the right direction.
Also, please do tell me if there's a more efficient way to accomplish this.
_count = n if count > n else count
time_history = time_history[-_count:]
time_weights = list(range(1,len(time_history))) #just a simple linear weights
time_diff = [(i-j)*k for i,j in zip(time_history[1:], time_history[:-1],time_weights)]
speed = blockSize*(sum(time_weights)) / sum(time_diff)
To make it more stable and not react when download spikes up or down you could add this as well:
_count = n if count > n else count
time_history = time_history[-_count:]
time_history.remove(min(time_history))
time_history.remove(max(time_history))
time_weights = list(range(1, len(time_history))) #just a simple linear weights
time_diff = [(i-j)*k for i,j in zip(time_history[1:], time_history[:-1],time_weights)]
speed = blockSize*(sum(time_weights)) / sum(time_diff)
This will remove highest and lowest spike in time_history which will make number displayed more stable. If you want to be picky, you probably could generate weights before removal, and then filter mapped values using time_diff.index(min(time_diff)).
Also using non-linear function (like sqrt()) for weights generation will give you better results. Oh and as I said in comments: adding statistical methods to filter times should be marginally better, but I suspect it's not worth overhead it would add.
I am an almost new programmer learning python for a few months. For the last 2 weeks, I had been coding to make a script to search permutations of numbers that make magic squares.
Finally I succeeded in searching the whole 880 4x4 magic square numbers sets within 30 seconds. After that I made some different Perimeter Magic Square program. It finds out more than 10,000,000 permutations so that I want to store them part by part to files. The problem is that my program doesn't use all my processes that while it is working to store some partial data to a file, it stops searching new number sets. I hope I could make one process of my CPU keep searching on and the others store the searched data to files.
The following is of the similar structure to my magic square program.
while True:
print('How many digits do you want? (more than 20): ', end='')
ansr = input()
if ansr.isdigit() and int(ansr) > 20:
ansr = int(ansr)
break
else:
continue
fileNum = 0
itemCount = 0
def fileMaker():
global fileNum, itemCount
tempStr = ''
for i in permutationList:
itemCount += 1
tempStr += str(sum(i[:3])) + ' : ' + str(i) + ' : ' + str(itemCount) + '\n'
fileNum += 1
file = open('{0} Permutations {1:03}.txt'.format(ansr, fileNum), 'w')
file.write(tempStr)
file.close()
numList = [i for i in range(1, ansr+1)]
permutationList = []
itemCount = 0
def makePermutList(numList, ansr):
global permutationList
for i in numList:
numList1 = numList[:]
numList1.remove(i)
for ii in numList1:
numList2 = numList1[:]
numList2.remove(ii)
for iii in numList2:
numList3 = numList2[:]
numList3.remove(iii)
for iiii in numList3:
numList4 = numList3[:]
numList4.remove(iiii)
for v in numList4:
permutationList.append([i, ii, iii, iiii, v])
if len(permutationList) == 200000:
print(permutationList[-1])
fileMaker()
permutationList = []
fileMaker()
makePermutList(numList, ansr)
I added from multiprocessing import Pool at the top. And I replaced two 'fileMaker()' parts at the end with the following.
if __name__ == '__main__':
workers = Pool(processes=2)
workers.map(fileMaker, ())
The result? Oh no. It just works awkwardly. For now, multiprocessing looks too difficult for me.
Anybody, please, teach me something. How should my code be modified?
Well, addressing some things that are bugging me before getting to your asked question.
numList = [i for i in range(1, ansr+1)]
I know list comprehensions are cool, but please just do list(range(1, ansr+1)) if you need the iterable to be a list (which you probably don't need, but I digress).
def makePermutList(numList, ansr):
...
This is quite the hack. Is there a reason you can't use itertools.permutations(numList,n)? It's certainly going to be faster, and friendlier on memory.
Lastly, answering your question: if you are looking to improve i/o performance, the last thing you should do is make it multithreaded. I don't mean you shouldn't do it, I mean that it should literally be the last thing you do. Refactor/improve other things first.
You need to take all of that top-level code that uses globals, apply the backspace key to it, and rewrite functions that pass data around properly. Then you can think about using threads. I would personally use from threading import Thread and manually spawn Threads to do each unit of I/O rather than using multiprocessing.