BitCoin Algo not iterating through historical data correctly

BitCoin Algo not iterating through historical data correctly - python

I'm creating a simple trading backtester on Bitcoin, yet I'm having trouble with the for loops in my code. The current code is based on 2 simple moving averages q and z (currently for learning purposes no real strategy). info is a dataframe holding Bitcoin historical data from a csv file. There seems to be an outofbounce error and I can't figure it out. Any help would be appreciated.
import pandas as pd
import numpy as np
cash = 10000
file = 'BTC-USD.csv'
data = pd.read_csv(file)
y = data['Adj Close'][1000:]
x = data['Date'][1000:]
v = data['Volume'][1000:]
h = data['High'][1000:]
l = data['Low'][1000:]
def movAvg(values,time):
times=np.repeat(1.0,time)/time
sma = np.convolve(values,times,'valid')
return sma
z = movAvg(y,12)
q = movAvg(y,9)
SP = len(x[50-1:])
def AlgoCal(account,info):
#i = 1050
bought = False
test = []
for x in info.index:
if q[x]<z[x]:
if bought == False:
temp = info[x]
account = account-info[x]
test.append(account)
bought = True
elif q[x]>z[x]:
if bought == True:
temp = info[x]
account = account + info[x]
test.append(account)
bought = False
else:
print("Error")
return(test)
money = AlgoCal(cash,y)
print(money)
Sample Data from Yahoo Bitcoin csv
Date,Open,High,Low,Close,Adj Close,Volume
2014-09-17,465.864014,468.174011,452.421997,457.334015,457.334015,21056800
2014-09-18,456.859985,456.859985,413.104004,424.440002,424.440002,34483200
........
........
2020-05-21,9522.740234,9555.242188,8869.930664,9081.761719,9081.761719,39326160532
2020-05-22,9080.334961,9232.936523,9008.638672,9182.577148,9182.577148,29810773699
2020-05-23,9185.062500,9302.501953,9118.108398,9209.287109,9209.287109,27727866812
2020-05-24,9196.930664,9268.914063,9165.896484,9268.914063,9268.914063,27658280960
Error:
Traceback (most recent call last):
File "main.py", line 47, in <module>
money = AlgoCal(cash,y)
File "main.py", line 31, in AlgoCal
if q[x]<z[x]:
IndexError: index 1066 is out of bounds for axis 0 with size 1066

Your moving averages have two different lengths. One is 12 periods and the other is 9 periods. When you try to compare them in AlgoCal your short one runs out and gives you the out of bounds error.
If you are going to compare moving averages in this way, you need to add a minimum period at the beginning to only start when both averages are available.

Related

Submitting queries to gaia.aip.de seems to no longer work

So I wrote some code a month ago, and I've been consistently running/updating it. I uploaded my most recent one to GitHub and it works I know it works because I tested it over and over again before uploading. However, now I opened up the file nothing changed and submitting queries... NO LONGER WORKS, and by no longer works I mean that out of the 150 queries 2 succeed. I have the data from my most recent script and I know 104/150 work. Anyone know why this might be? My code is below
"""
Imports needed for the code.
"""
"""
Script to get and clean data
"""
import numpy as np
import pandas as pd
from itertools import chain
from astroquery.gaia import Gaia
from pynverse import inversefunc
from astropy.io import ascii
import wget
import requests
import matplotlib.pyplot as plt
import numpy as np
import math
import pandas as pd
from sklearn.metrics import r2_score
from scipy import stats
import sklearn.metrics as sm
defaults = [0] * 3#needed for ignoring values that don't exsist
data = []#array for storing data
def reject_outliers(data):#Outlier Rejection Function
m = 2
u = np.mean(data)
s = np.std(data)
filtered = [e for e in data if (u - 2 * s < e < u + 2 * s)]
return filtered
def isNaN(num):#Checking if it is NaN(Not a Number)
return num != num
def HMS2deg(ra='', dec=''):#Convert from form RA to Degree RA(Gaia Form)
RA, DEC, rs, ds = '', '', 1, 1
if ra:
H, M, S, *_ = [float(i) for i in chain(ra.split(), defaults)]
if str(H)[0] == '-':
rs, H = -1, abs(H)
deg = (H*15) + (M/4)
RA = '{0}'.format(deg*rs)
if ra and dec:
return (RA, DEC)
else:
return RA or DEC
def HMS2degDEC(dec='', ra=''):#Convert from form Dec to Degree Dec(Gaia Form)
RA, DEC, rs, ds = '', '', 1, 1
if dec:
D, M, S, *_ = [float(i) for i in chain(dec.split(), defaults)]
S = S[0] if S else 0
if str(D)[0] == '-':
ds, D = -1, abs(D)
deg = D + (M/60) + (S/3600)
DEC = '{0}'.format(deg*ds)
if ra and dec:
return (RA, DEC)
else:
return RA or DEC
count=1
csv_file='test1.csv'#Data Storing File for Gaia
data = pd.read_csv(csv_file, error_bad_lines=False)#Ignore the bad lines
radata=data['R.A.']#get RA
decdata=data['Dec.']#get dec
agedata=data['Age(Myr)']#get Age
diamaterdata=data['Diameter']#get Diameter later converted to FOV
ra=[]#cleaned RA
dec=[]#cleaned Dec
age=[]#Cleaned age
csv_files=['M42.csv', 'Horsehead.csv', 'M93.csv', 'IrisTrain.csv']#Pre exsisting data
ages=[3, 6, 25, 0.055]#pre exsisting data's age
diameter=[]#Diameter cleaned data
gooddata=[]#Overall data storage for cleaned data
for i in range(len(radata)):#cleaning RA data and converting
if(isNaN(radata[i])):
ra.append(0)
else:
ra.append(HMS2deg(radata[i]))
print(ra)
for i in range(len(decdata)):#Cleaning Dec Data and converting
if(isNaN(decdata[i])):
dec.append(0)
else:
dec.append(HMS2degDEC(decdata[i]))
print(dec)
for i in range(len(diamaterdata)):#cleaning diameter data and converting to FOV
if(isNaN(diamaterdata[i])):
diameter.append(0)
else:
diameter.append(((diamaterdata[i])/3600)*100)
print(diameter)
for i in range(len(ra)):#Modified Query for each object
query1=""" SELECT bp_rp, parallax, pmra, pmdec, phot_g_mean_mag AS gp
FROM gaiadr2.gaia_source
WHERE 1 = CONTAINS(POINT('ICRS', ra, dec),
"""
query1=query1+" CIRCLE('ICRS'," +str(ra[i])+","+ str(dec[i])+","+str(diameter[i])+")"+")"
string2="""
AND phot_g_mean_flux_over_error > 50
AND phot_rp_mean_flux_over_error > 20
AND phot_bp_mean_flux_over_error > 20
AND visibility_periods_used > 8
"""
print(query1)
query1=query1+string2
try:#Try the following code
job = Gaia.launch_job(query1)#Launch query to gaia webpage
print(job)
results = job.get_results()#get results
ascii.write(results, 'values'+str(count)+'.csv', format='csv', fast_writer=False)
csv_files.append('values'+str(count)+'.csv')#store in CSV
ages.append(agedata[i])#append data
print(ages)
count+=1#avoid re-writing CSV file by creating different ones
except:#If the code throws any error, usually 'can't query' it will ignore the file, another filter to clean out any useless or bad data
continue
"""
End of Cleaning and Gathering Data
"""
"""
Training and Creating Model with the data
"""
arr2=[]
datasetY=[]
datasetX=[]
Y=[]
av=0
count=[]
count2=[]
MAD=[]
"""
def adjR(x, y, degree):
results = {}
coeffs = np.polyfit(x, y, degree)
p = np.poly1d(coeffs)
yhat = p(x)
ybar = np.sum(y)/len(y)
ssreg = np.sum((yhat-ybar)**2)
sstot = np.sum((y - ybar)**2)
results['r_squared'] = 1- (((1-(ssreg/sstot))*(len(y)-1))/(len(y)-degree-1)
return results
original accuracy calculation
"""
"""
def objective(x, a, b, c):
return a * x + b
needed for scipy modeling, polyfit was more accurate
"""
"""
Line 59-68 checks if CSV data is NAN if it is it will ignore the value and only take the data that can be used
"""
count=0
for i in range(len(csv_files)):
data=pd.read_csv(csv_files[i])
arr=data['gp']
arr2=data['bp_rp']
for i in range(len(arr2)):
if(isNaN(arr2[i])):
continue
elif(13<=arr[i]<=19):
datasetX.append(arr2[i])
datasetY.append(arr[i])
count+=1
mad=stats.median_absolute_deviation(datasetY)#Calculate MAD for Magnitude
mad2=stats.median_absolute_deviation(datasetX)#Calculate MAD for Color
madav=(mad+mad2)/2#Total MAD
MAD.append(count)#Appending to an Array for training and plotting
datasetX.clear()#Clearing for next Iteration
datasetY.clear()#Clearing for next Iteration
count=0
"""
Plotting data and Traning
"""
ages3=[]
MAD2=[]
ages2 = [4000 if math.isnan(i) else i for i in ages]#ignore any age nan values
print(len(ages3))
print(len(MAD))
MAD=[1.5 if math.isnan(i) else i for i in MAD]#ignore any MAD computation error values
for i in range(len(MAD)):
if(-500<=MAD[i]<=1500 and -25<=ages2[i]<170 or (100<=MAD[i]<=1262) and (278<=ages2[i]<=5067) or (-20<=MAD[i]<=20) and (3900<=ages2[i]<=4100) or (2642<=MAD[i]<=4750) and (0<=ages2[i]<=200) or (7800<=MAD[i]<=315800) and (0<=ages2[i]<=20)):
continue
else:
ages3.append(float(ages2[i]))
MAD2.append(float(MAD[i]))
fig = plt.figure()
ax1 = fig.add_subplot('111')
ax1.scatter(ages3, MAD2, color='blue')
plt.ylim(-7800,315800)
polyline = np.linspace(-5, 9000, 20)
mod1 = np.poly1d(np.polyfit(ages3, MAD2, 2))#Train for a function of degree 2
predict = np.poly1d(mod1)
ax1.plot(polyline,mod1(polyline), color='red')
print(np.interp(0.795, mod1(polyline),polyline))
print(mod1)#print model
plt.show()
"""
End of Training and Creating model/End of Script
"""
Please focus on this part, the querying section:
for i in range(len(ra)):#Modified Query for each object
query1=""" SELECT bp_rp, parallax, pmra, pmdec, phot_g_mean_mag AS gp
FROM gaiadr2.gaia_source
WHERE 1 = CONTAINS(POINT('ICRS', ra, dec),
"""
query1=query1+" CIRCLE('ICRS'," +str(ra[i])+","+ str(dec[i])+","+str(diameter[i])+")"+")"
string2="""
AND phot_g_mean_flux_over_error > 50
AND phot_rp_mean_flux_over_error > 20
AND phot_bp_mean_flux_over_error > 20
AND visibility_periods_used > 8
"""
print(query1)
query1=query1+string2
try:#Try the following code
job = Gaia.launch_job(query1)#Launch query to gaia webpage
print(job)
results = job.get_results()#get results
ascii.write(results, 'values'+str(count)+'.csv', format='csv', fast_writer=False)
csv_files.append('values'+str(count)+'.csv')#store in CSV
ages.append(agedata[i])#append data
print(ages)
count+=1#avoid re-writing CSV file by creating different ones
except:#If the code throws any error, usually 'can't query' it will ignore the file, another filter to clean out any useless or bad data
continue
Thank you for your time. I know this is really unusual.
After removing the try/except this is the errors:
Traceback (most recent call last):
File "read.py", line 120, in <module>
job = Gaia.launch_job(query1)#Launch query to gaia webpage
File "C:\ProgramData\Anaconda3\lib\site-packages\astroquery\gaia\core.py", line 846, in launch_job
return TapPlus.launch_job(self, query=query, name=name,
File "C:\ProgramData\Anaconda3\lib\site-packages\astroquery\utils\tap\core.py", line 344, in launch_job
results = utils.read_http_response(response, output_format)
File "C:\ProgramData\Anaconda3\lib\site-packages\astroquery\utils\tap\xmlparser\utils.py", line 42, in read_http_response
result = APTable.read(data, format=astropyFormat)
File "C:\ProgramData\Anaconda3\lib\site-packages\astropy\table\connect.py", line 61, in __call__
out = registry.read(cls, *args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\astropy\io\registry.py", line 520, in read
data = reader(*args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\astropy\io\votable\connect.py", line 116, in read_table_votable
raise ValueError("No table found")
ValueError: No table found

Please note, this has been resolved. The reason for this is on their website: https://www.cosmos.esa.int/web/gaia/news, planned maintenance. For future reference, if your code stops working and it involves Querying, head to their website they have probably posted it.

Index out of bounds error for raster data extraction code

I am using a code written by Victor Velasquez to extract data from raster files which contain dayly precipitation data since 1981.
When I run the code, I get this error that some index is out of bounds. I did a little research and found that this is common and there are a lot of similar questions here, but I haven´t been able to find the specific solution for this case.
The error:
IndexError Traceback (most recent call last)
<ipython-input-8-eff66ef74d73> in <module>
1 Pisco = Extract_Pisco()
----> 2 Pisco.DataPre()
3 Pisco.ExportExcel()
<ipython-input-7-6cf99336b9e1> in DataPre(self)
23 Band = Data.read(1)
24 X,Y = Data.index(self.x,self.y) #extraigo
---> 25 Pre = Band[X,Y]
26 self.ListPre.append(Pre) #agrego a lista
27
IndexError: index 158116290 is out of bounds for axis 0 with size 198
The part of the code pointed by the traceback is:
def DataPre(self):
os.chdir(path)
fileDir= path
fileExt = r".tif"
Lis = [_ for _ in os.listdir(fileDir) if _.endswith(fileExt)]
Lis.sort() #ordeno archivos .tif
Inicio = '1981-01-01.tif'
Fin = '2018-07-31.tif'
Rini = Lis.index(Inicio)
Rend = Lis.index(Fin)
self.Lis = Lis[Rini:Rend+1]
self.ListPre = []
for i in tnrange (0,len(self.Lis),desc = "!! Extrayendo Datos !!"):
with rasterio.open(self.Lis[i]) as Data:
Band = Data.read(1)
X,Y = Data.index(self.x,self.y)
Pre = Band[X,Y]
self.ListPre.append(Pre)
Thank you very much!

It looks like the file you are reading does not contain the geospatial point you are trying to find data for. (If this is incorrect please let me know).
You can add a statement to catch if a point is contained in the data:
Band = Data.read(1)
X,Y = Data.index(self.x,self.y)
if 0 <= X < Band.height and 0 <= Y <= Band.width:
Pre = Band[X,Y]
self.ListPre.append(Pre)

Associating .csv column data for calculations

I am a complete nube to Python3 and coding so go easy on me please. :)
As a project I'm creating a football league table based on 2018 EPL results. I have been able to break the .csv file containing an entire seasons worth of data into round by round results, into .csv using Pandas module. Now I need to extract the table points for each team by round, based on the home and away goals for each team. I'm having a hard time associating the goals with the teams in each fixture. I can figure out how to apply win/draw/lose (3/1/0) points but only mandrolically per fixture, not dynamically for all fixtures in the round. Then I need to write the table to another .csv file.
FTHG-Full Time Home Goals, FTAG-Full Time Away Goals, FTR-Full Time Result
Example Data
Unnamed: 0,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR
0,10/08/2018,Man United,Leicester,2,1,H
1,11/08/2018,Bournemouth,Cardiff,2,0,H
2,11/08/2018,Fulham,Crystal Palace,0,2,A
3,11/08/2018,Huddersfield,Chelsea,0,3,A
4,11/08/2018,Newcastle,Tottenham,1,2,A
5,11/08/2018,Watford,Brighton,2,0,H
6,11/08/2018,Wolves,Everton,2,2,D
7,12/08/2018,Arsenal,Man City,0,2,A
8,12/08/2018,Liverpool,West Ham,4,0,H
9,12/08/2018,Southampton,Burnley,0,0,D
Example Code
import pandas as pd
results = pd.read_csv("2018 Round 1.csv")
team = results.iloc[2,2]
if results.iloc[2,4] > results.iloc[2,5]:
points = 3
elif results.iloc[2, 4] < results.iloc[2, 5]:
points = 0
else:
results.iloc[2, 4] = results.iloc[2, 5]
points = 1
table_entry = (team + " " + str(points))
print(table_entry)
table_entry = pd.to_csv("EPL Table Round 1.csv", index = False)
Thanks for your help.

I hope this helps :)
Please fell free to ask if the code it's not clear
import pandas as pd
import numpy as np
df = pd.read_csv('foot.txt')
#Make a list with all tema names
Home_teams = pd.unique(df['HomeTeam'])
Away_teams = pd.unique(df['AwayTeam'])
teams = np.concatenate((Home_teams, Away_teams))
df_teams = pd.DataFrame(columns=['team', 'points'])
#For each team in the list...
for team in teams:
print("*******" + team+ "*****")
points = 0
df_home = df[(df['HomeTeam'] == team)]
res_home = df_home['FTR'].value_counts()
try:
points += res_home['H']*3;
except:
print("Didn't win when Home")
try:
points += res_home['D']*1;
except:
print("No Draws")
df_away = df[(df['AwayTeam'] == team)]
res_away = df_away['FTR'].value_counts()
try:
points += res_away['A']*3;
except:
print("Didn't win when Away")
df_teams = df_teams.append({'team': team, 'points': points}, ignore_index=True)
print(team +"has "+ str(points) +" points" )

Storing Stock OHLCV Data into Their Own Lists (Python)

I'm trying to store stock data (Open, High, Low, Close, Volume), pulled by pandas_datareader, into 5 distinct lists named accordingly. I am new to Python and am wondering where I am going wrong. I got it to cycle through a one-dimensional list of integer values and assign them to each list, but am unsure of how to handle the additional dimension of the f.head output. I have twice gotten a traceback error indicating index values out of range, but know that I've made a mistake beyond simple index range.
Open, High, Low, Close, Vol = [], [], [], [], []
col_data = [Open, High, Low, Close, Vol]
stock = 'BABA'
# data period
yStart = 2017
mStart = 11
dStart = 14
yEnd = 2018
mEnd = 2
dEnd = 14
import pandas as p
p.core.common.is_list_like = p.api.types.is_list_like
import pandas_datareader.data as pdr
from datetime import datetime
start = datetime(yStart,mStart,dStart)
end = datetime(yEnd,mEnd,dEnd)
f = pdr.DataReader(stock, 'morningstar', start, end)
f.head()
a = 0
b = 0
while a < len(col_data):
b = 0
while b < len(f):
cur = (f.loc[f.index[b], col_data[a]])
col_data[a].append(cur)
b += 1
a += 1
I would like to ultimately be able to print the individual lists ( like print(Open) and retrieve the list of Open prices ). Any advice/additional resources that might help would be appreciated.

what is the best practice to turn a python function to running in Apache Spark

I have a python program to deal with big data in one computer(16 cpu cores). Because the data is bigger and bigger, I need it to run in 5 computers. I am fresh in Spark,still feel Confused after read some docs. I will appreciate if anyone can tell me what is the best way to make a small cluster.
Here is some details:
The program is trying to count the trade volume in every price for each stock (one day for a time) from tick transaction pandas dataframe data.
There are more than 3000 stocks, 1 billion transactions in one day. The size of data file(dataframe) is between 1~2 G.
getting the result of 300 days spend for 3 days on one computer now, I hope to add 4 more computers to short the time.
here are the sample code in python:
import sharedmem
import os
import multiprocessing as mp
def ticks_to_priceline(day=None):
# file name for the tick dataframe file, one day for a file
fn = get_tick_dataframe_filename_byday(day)
with pd.HDFStore(fn, 'r') as tick_store:
tick_dataframe = tick_store.select("tick")
all_stock_symbols = tick_dataframe.symbol.drop_duplicates()
sblist = []
# cut to small chunk
chunk = 300
for xx in range(len(all_stock_symbols) / chunk + 1):
sblist.append(all_stock_symbols[xx * chunk:(xx + 1) * stuck])
# run with all cpus
with sharedmem.MapReduce(np=mp.cpu_count()) as pool:
def work(chunk_list):
result = {}
for symbol in chunk_list:
data = tick_dataframe[tick_dataframe.symbol == symbol]
if not data.empty and len(data) > 99:
df1 = data.loc[:,
[u'timestamp', u'price', u'volume']]
df1['vol_diff'] = df1.volume.diff().fillna(0)
df2 = df1.loc[:, ['price', 'vol_diff']]
df2.price = df2.price.apply(int)
rs = df2.groupby('price').sum()
rs = rs.sort_index(ascending=0).reset_index()
result[symbol] = rs
return result
rslist = pool.map(work, sblist)
return rslist
here is a spark cluster in standalone mode I have already setup for testing. My main problem is how to rewrite the codes above.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

BitCoin Algo not iterating through historical data correctly - python

Related

Submitting queries to gaia.aip.de seems to no longer work

Index out of bounds error for raster data extraction code

Associating .csv column data for calculations

Storing Stock OHLCV Data into Their Own Lists (Python)

what is the best practice to turn a python function to running in Apache Spark

Categories

Resources