How do I speed up incredibly slow iteration through pandas dataframe?

How do I speed up incredibly slow iteration through pandas dataframe? - python

I have collected 500ms stock data for several weeks. No I am wanting to go through each days data an iterate through it to determine at any given last price value, how many times a specific lowerbound would be passed followed by a specific upperbound being passed for the rest of the day. The lowerbound has to be passed before the script starts searching for whether the upperbound will be reached.
So essentially if the last price at row i is 10 and the lowerbound then is 9 and 11. The code will first try to find a moment when the remaining rows i+1...i+2... are reach the lowerbound, as soon as the lower bound is reached the code switches into looking for when the upperbound is reached. If the upperbound is reached then the success will add 1 and the code starts looking for the lowerbound again, doing this whole process again.
This entire process occurs for every single row, so essentially for each row we will have a column for how many times a successful lower and upper bound reach occurred in the rows following that given row.
The problem I am having is that I have about 14400 rows per day, and about 40 days so around 576000 rows of data. The iteration takes absolutely forever, and in order for me to do this across all of my data I will need my computer to run a few days. Surely I am not doing this in the most efficient way possible am I? Can anybody maybe point to a concept that I can rewrite this code in a much more effective way? Or am I just stuck waiting for ever for it to prepare my data?
range_per = .00069 #the percentage determining lower and upper bound
data['Success?']=np.nan
data['Success? Count']=np.nan
#For every row count how many times the trade in the range would be successful
for i in range(0,len(data)):
last_price = data.at[i,'lastPrice']
lower_bound = last_price - last_price*range_per
upper_bound = last_price + last_price*range_per
lower_bound_reached = False
upper_bound_reached = False
success=0
for b in range(i+1,len(data)):
last_price = data.at[b,'lastPrice']
while lower_bound_reached == False:
if lower_bound - last_price >=0:
upper_bound_reached = False
lower_bound_reached = True
else:
break
while (upper_bound_reached == False and lower_bound_reached ==True):
if upper_bound - last_price <=0:
success+=1
lower_bound_reached = False
upper_bound_reached = True
else:
break
print('row %s: %s times' %(i, success))
data['Success? Count'][i] = success
if success>0:
data['Success?'][i] = True
else:
data['Success?'][i] = False

Related

Checking Measurement Data Error - 'If only Constants arrive'

Example:
If the Photovoltaik Number is constant over a certain period, there is something wrong, because Solar Power Irradation fluctuats alot. So One should want to recognize this bevarioul pattern to e.g. restart the system.
old_pv = []
while True:
try:
count = 1
sunny_scraper.driver.current_url
new_pv = sunny_scraper.scrape()
old_pv.append(new_pv)
count += 1
if count in [19, 20]:
if len(list(set(old_pv))) == 1:
sunny_scraper.start_browser()
sunny_scraper.accept_cookies()
sunny_scraper.enter_email_password()
new_pv = sunny_scraper.scrape()
time.sleep(5)
old_pv = []
old_pv.append(new_pv)
else:
time.sleep(10)
except:
time.sleep(120)
sunny_scraper.start_browser()
sunny_scraper.accept_cookies()
sunny_scraper.enter_email_password()
sunny_scraper.scrape()
time.sleep(10)
pv - Photovoltaic (solar power)
the pv value is scraped in high resolution (~ 10s), I work in Solar Industrie for Power Prediction using All Sky Imager. Sometimes the value freezes (reason unclear), so restarting the script is the current idea.
What other implementation options would be conceivable and useful? I would be happy for inspiration.

Function returning different result despite the same inputs in Python

Here is my function that uses the Poloniex Exchange API. It gets a dict of asks (tuples of price and amount) and then calculates the total amount of BTC that would be obtained using a given spend.
But running the function several times returns different amounts despite the dict of asks and spend remaining the same. This problem should be replicable by printing "asks" (defined below) and the function result several times.
def findBuyAmount(spend):
#getOrderBook
URL = "https://poloniex.com/public?command=returnOrderBook&currencyPair=USDT_BTC&depth=20"
#request the bids and asks (returns nested dict)
r_ab = requests.get(url = URL)
# extracting data in json format -> returns a dict in this case!
ab_data = r_ab.json()
asks = ab_data.get('asks',[])
#convert strings into decimals
asks=[[float(elem[0]), elem[1]] for elem in asks]
amount=0
for elem in asks: #each elem is a tuple of price and amount
if spend > 0:
if elem[1]*elem[0] > spend: #check if the ask exceeds volume of our spend
amount = amount+((elem[1]/elem[0])*spend) #BTC that would be obtained using our spend at this price
spend = 0 #spend has been used entirely, leading to a loop break
if elem[1]*elem[0] < spend: #check if the spend exceeds the current ask
amount = amount + elem[1] #BTC that would be obtained using some of our spend at this price
spend = spend - elem[1]*elem[0] #remainder
else:
break
return amount
If the first ask in the asks dict was [51508.93591717, 0.62723766] and spend was 1000, I would expect amount to equal (0.62723766/51508.93591717) * 1000 but I get all kinds of varied outputs instead. How can I fix this?

You get all kinds of varied outputs because you're fetching new data every time you run the function. Split the fetch and the calculation into separate functions so you can test them independently. You can also make the logic much clearer by naming your variables properly:
import requests
def get_asks(url="https://poloniex.com/public?command=returnOrderBook&currencyPair=USDT_BTC&depth=20"):
response = requests.get(url=url)
ab_data = response.json()
asks = ab_data.get('asks', [])
#convert strings into decimals
return [(float(price), qty) for price, qty in asks]
def find_buy_amount(spend, asks):
amount = 0
for price, qty in asks:
if spend > 0:
ask_value = price * qty
if ask_value >= spend:
amount += spend / price
spend = 0
else:
amount += qty
spend -= ask_value
else:
break
return amount
asks = get_asks()
print("Asks:", asks)
print("Buy: ", find_buy_amount(1000, asks))
Your math was wrong for when the ask value exceeds remaining spend; the quantity on the order book doesn't matter at that point, so the amount you can buy is just spend / price.
With the functions split up, you can also run find_buy_amount any number of times with the same order book and see that the result is, in fact, always the same.

The problem is in your "we don't have enough money" path. In that case, the amount you can buy does not depend on the amount that was offered.
if elem[1]*elem[0] > spend:
amount += spend/elem[0]

How to break a while, when subtracting the new value from a variable from the previous one?

How to break a while, when subtracting the new value from a variable from the previous one?
The code is:
Calc = (PTO2.h - coefficients[1]) / coefficients[0]
while True:
NewP = IAPWS97(P=PH5, s=Calc)
Calc = (NewP.h - coefficients[1]) / coefficients[0]
print(NewP.h)
The results are the following:
3181.2423174700475, 3125.5329929699737, 3145.170908432667,
3138.216970209225, 3140.675480138904, 3139.805801319479,
3140.1133819014494, 3140.0045917261796, 3140.043069467109,
3140.029460245017, 3140.034273686281, 3140.032571219946,
3140.033173365131
The idea is to stop when the value does not increase anymore ie, 3140 it should be the final value.
This problem could be solved with 5 or 6 iterations.

Please check if this meets your requirements:
# Define required difference
difference = 0.1
solutions = []
# Add 1st solution or guess
# func is whatever function you are using, with the required arguments, arg.
solutions.append(func(arg))
delta = soltutions[-1]
# Check if delta is still positive and if the delta meets the requirements
while delta > 0 and delta > difference:
solutions.append(func(arg))
delta = solutions[-2] - solutions[-1]
print(f'Final result is: {solutions[-1]}')
This solution supposes that you want to finish execution if the variation starts to go negative. If the sign doesn't matter, you can use abs() in delta.

Pseudorandomisation with python

I currently have a problem with pseudorandomizing my trials. I am using a while loop in order to create 12 files containing 38 rows (or trials) that match 1 criterion:
1) max color1expl cannot be identical in 3 consecutive rows
Where color1expl is one of the columns in my dataframe.
When I have files of only 38 rows to create, the following script seems to work perfectly.
import pandas as pd
n_dataset_int = 0
n_dataset = str(n_dataset_int)
df_all_possible_trials = pd.read_excel('GroupF' + n_dataset + '.xlsx') # this is my dataset with all possible trials
# creating the files
for iterations in range(0,12): #I need 12 files with pseudorandom combinations
n_dataset_int += 1 #this keeps track of the number of iterations
n_dataset = str(n_dataset_int)
df_experiment = df_all_possible_trials.sample(n=38) #38 is the total number of trials
df_experiment.reset_index(drop=True, inplace=True)
#max color1expl cannot be identical in 3 consecutive trials (maximum in 2 consecutive t.)
randomized = False
while not randomized: #thise while loop will make every time a randomization of the rows in the dataframe
experimental_df_2 = df_experiment.sample(frac=1).reset_index(drop=True)
for i in range(0, len(experimental_df_2)):
try:
if i == len(experimental_df_2) - 1:
randomized = True
elif (experimental_df_2['color1expl'][i] != experimental_df_2['color1expl'][i+1]) or (experimental_df_2['color1expl'][i] != experimental_df_2['color1expl'][i+2])
continue
elif (experimental_df_2['color1expl'][i] == experimental_df_2['color1expl'][i+1]) and (experimental_df_2['color1expl'][i] == experimental_df_2['color1expl'][i+2]):
break
except:
pass
#export the excel file
experimental_df_2.to_excel('GroupF_r' + n_dataset + '.xlsx', index=False) #creates a new
However, when doing the same procedure increasing the number from n=38 to n=228, the script seems to run for an indefinite amount of time. So far, more than one day and it did not produce any of the 12 files. Probably because there are too many combinations to try.
Is there a way to improve this script so that it works with a larger amount of rows?

I think you can change the way you use to generate random samples (pseudo-code):
n = 38 # or anything else
my_sample = []
my_sample.append( pop_one_random_from(df_all_possible_trials) )
my_sample.append( pop_one_random_from(df_all_possible_trials) )
while len(my_sample) < n:
next_one = pop_one_random_from(df_all_possible_trials)
if next_one is equal to my_sample[-1] and my_sample[-2]:
put next_one back to df_all_possible_trials
continue
else:
my_sample.append( next_one )
If I get it right, all the different samples (totalling number of combinations, 'len(df_all_possible_trials) choose n') have the same probability to be chosen, which is what you're looking for. And it should work faster.

How to jump out the current while loop and run the next loop whenever meeting a certain condition?

The Python script that I am using is not exactly as below, but I just want to show the logic. Let me explain what I am trying to do: I have a database, and I want to fetch one population (a number A) a time from the database until all numbers are fetched. Each time fetching a number, it does some calculations (A to B) and store the result in C. After all fetched, the database will be updated with C. The while condition just works like a 'switch'.
The thing is that I don't want to fetch a negative number, so when it does fetch one, I want to immediately jump out the current loop and get a next number, until it is not a negative number. I am a beginner of Python. The following script is what I could write, but clearly it doesn't work. I think something like continue, break or try+except should be used here, but I have no idea.
for _ in range(db_size):
condition = True
while condition:
# Get a number from the database
A = db.get_new_number()
# Regenerate a new number if A is negative
if A < 0:
A = db.get_new_number()
B = myfunc1(A)
if B is None:
continue
C=myfunc2(B)
db.update(C)

Use a while loop that repeats until the condition is met.
for _ in range(db_size):
condition = True
while condition:
# Get a number from the database
while True:
A = db.get_new_number()
if A is None:
raise Exception("Ran out of numbers!")
# Regenerate a new number if A is negative
if A >= 0:
break
B = myfunc1(A)
if B is None:
continue
C=myfunc2(B)
db.update(C)
My code assumes that db.get_new_number() returns None when it runs out. Another possibility would be for it to raise an exception itself, then you don't need that check here.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I speed up incredibly slow iteration through pandas dataframe? - python

Related

Checking Measurement Data Error - 'If only Constants arrive'

Function returning different result despite the same inputs in Python

How to break a while, when subtracting the new value from a variable from the previous one?

Pseudorandomisation with python

How to jump out the current while loop and run the next loop whenever meeting a certain condition?

Categories

Resources