Im trying to get BTC USD data for 1 min in day and then for each open value im comparing the first open value with the next if its greater than the first value then buy and vice versa
This is what Ive got:
import numpy as np
import pandas as pd
import yfinance as yf
data = yf.download(tickers='BTC-USD', period='1d', interval='1m')
Opens = data['Open'].to_numpy()
for x in Opens:
for y in Opens:
if x > y:
print("Buy")
else:
print("Sell")
Storing all the Buy/Sell into a column named decision:
import numpy as np
import pandas as pd
import yfinance as yf
data = yf.download(tickers='BTC-USD', period='1d', interval='1m')
Opens = data[['Open']]
Opens['decision'] = np.where(Opens['Open'] > Opens['Open'].shift(1), 'Buy', 'Sell')
print(Opens)
Related
My code runs properly but it will not provide output as it should. I am not sure where the issue is occurring. Could someone help me correct it? Do you need the CSV too?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv("/content/drive/MyDrive/replicates/Replicate 3 Gilts just measures.csv")
df.info()
df.head()
# removing the irrelevant columns
cols_to_drop = ["animal"]
df = df.drop(columns=cols_to_drop,axis=1)
# first five rows of data frame after removing columns
df.head()
deep_df = df.copy(deep = True)
numerical_columns = [col for col in df.columns if (df[col].dtype=='int64' or
df[col].dtype=='float64')]
df[numerical_columns].describe().loc[['min','max', 'mean','50%'],:]
df[df['i1000.0'] == df['i1000.0'].min()]
This is where the issue occurs
i1000_bucket = df.groupby(pd.cut(df["i1000.0"],bins=[10,20,30,40,50,60,70,80,90,100]))
number_bucket = df.groupby(pd.cut(df["i1000.0"],bins=[10,20,30,40,50,60,70,80,90,100]))
i1000_bucket = ((i1000_bucket.sum()["i1000.0"] / i1000_bucket.size())*100 , 2)
number_bucket = round((number_bucket.sum()["i1000.0"] / number_bucket.size())*100 , 2)
The graph appears but nothing actually plots
x = [str(i)+"-"+str(i+10) for i in range(10,91,10)]
plt.plot(x,number_bucket.values)
plt.xlabel("i1000.0")
plt.ylabel("p1000.0")
plt.title("1000.0 comparisons")
Here is the test code for my macd function, however, the values I am getting are incorrect. I don't know if it is because my span is in days and my data is in 2 minute increments, or if it is a seperate issue. Any help would be much appreciated :)
import yfinance as yf
import pandas as pd
import pandas_ta as ta
import numpy as np
import datetime as dt
import time
dataTSLA = yf.download(tickers='TSLA', period='1mo', interval='2m', auto_adjust=True)
def indicatorMACD(data):
exp1 = data['Close'].ewm(span=12, adjust=False).mean()
exp2 = data['Close'].ewm(span=26, adjust=False).mean()
macd = exp1 - exp2
signalLine = macd.ewm(span=9, adjust=False).mean()
return [macd, signalLine]
print(indicatorMACD(dataTSLA))
Getting an output of around 0.66 for macd and 0.23 for signal when it should be -0.23 and -0.64 respectively.
Use min_periods instead adjust
code:
import pandas as pd
import pandas_datareader as pdr
import matplotlib.pyplot as plt
df = pdr.DataReader('BTC-USD' , data_source='yahoo' , start='2020-01-01')
df
Function definition:
def MACD(DF,a,b,c):
df=DF.copy()
df['MA FAST'] = df['Close'].ewm(span=a , min_periods = a).mean()
df['MA SLOW'] = df['Close'].ewm(span=b , min_periods = b).mean()
df['MACD'] = df['MA FAST'] - df['MA SLOW']
df['Signal'] = df['MACD'].ewm(span= c , min_periods = c).mean()
df.dropna(inplace=True)
return df
Function call:
data = MACD(df , 12,26,9)
data
I am trying to plot a time graph with month and year combined for my x and values for y. Python is reading my excel data with decimal points so won't allow to convert to %m %Y. Any ideas?
MY EXCEL DATA
How python reads my data
0 3.0-2015.0
1 5.0-2015.0
3 6.0-2017.0
...
68 nan-nan
69 nan-nan
70 nan-nan
71 nan-nan'
# Code
import plotly
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import pandas as pd
import math
# Set Directory
workbook1 = 'GAP Insurance - 1.xlsx'
workbook2 = 'GAP Insurance - 2.xlsx'
workbook3 = 'GAP Insurance - 3.xlsx'
df = pd.read_excel(workbook1, 'Sheet1',)
# Set x axis
df['Time'] = (df['Month']).astype(str)+ '-' + (df['Year']).astype(str)
df['Time'] = pd.to_datetime(df['Time'], format='%m-%Y').dt.strftime('%m-%Y')
You could try converting to "int" before converting to "str" in this line:
df['Time'] = (df['Month']).astype(str)+ '-' + (df['Year']).astype(str)
This should ensure that what gets stored does not include decimal points.
I am using Pandas to create 3 HTML tables out of 3 dataframes. The output I want is an HTML file. The code I'm currently using prints tables one under the other. I want to print one table on top, and then the other two tables side by side. What could I change in the code to achieve that?
import numpy as np
from numpy.random import randn
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(randn(5,4),columns='W X Y Z'.split())
df1 = pd.DataFrame(randn(5,4),columns='A B C D'.split())
df2 = pd.DataFrame(randn(5,4),columns='E F G K'.split())
with open("a.html", 'w') as _file:
_file.write(df.head().to_html() + "\n\n" + df1.head().to_html()+ "\n\n" + df2.head().to_html())
Here's my proposal based on your original code:
import numpy as np
from numpy.random import randn
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(randn(5,4),columns='W X Y Z'.split())
df1 = pd.DataFrame(randn(5,4),columns='A B C D'.split())
df2 = pd.DataFrame(randn(5,4),columns='E F G K'.split())
html = """
{table1}
<table>
<tr>
<td>{table2}</td>
<td>{table3}</td>
</tr>
</table>
""".format(
table1=df.head().to_html(),
table2=df1.head().to_html(),
table3=df2.head().to_html()
)
with open("a.html", 'w') as _file:
_file.write(html)
i have data length is over 3000.
below are code for making 20days value ( Volume Ration in Stock market)
it took more than 2 min.
is there any good way to reduce running time.
import pandas as pd
import numpy as np
from pandas.io.data import DataReader
import matplotlib.pylab as plt
data = DataReader('047040.KS','yahoo',start='2010')
data['vr']=0
data['Volume Ratio']=0
data['acend']=0
data['vr'] = np.sign(data['Close']-data['Open'])
data['vr'] = np.where(data['vr']==0,0.5,data['vr'])
data['vr'] = np.where(data['vr']<0,0,data['vr'])
data['acend'] = np.multiply(data['Volume'],data['vr'])
for i in range(len(data['Open'])):
if i<19:
data['Volume Ratio'][i]=0
else:
data['Volume Ratio'][i] = ((sum(data['acend'][i-19:i]))/((sum(data['Volume'][i-19:i])-sum(data['acend'][i-19:i]))))*100
Consider using conditional row selection and rolling.sum():
data.loc[data.index[:20], 'Volume Ratio'] = 0
data.loc[data.index[20:], 'Volume Ratio'] = (data.loc[:20:, 'acend'].rolling(window=20).sum() / (data.loc[:20:, 'Volume'].rolling(window=20).sum() - data.loc[:20:, 'acend'].rolling(window=20).sum()) * 100
or, simplified - .rolling.sum() will create np.nan for the first 20 values so just use .fillna(0):
data['new_col'] = data['acend'].rolling(window=20).sum().div(data['Volume'].rolling(window=20).sum().subtract(data['acend'].rolling(window=20).sum()).mul(100).fillna(0)