I'm trying to take the moving average of a stocks volume using TA-Lib but I'm getting the error above. Any suggestions on how to fix this? Thanks!
See code below:
import pandas_datareader as pdr
import datetime
import pandas as pd
import numpy as np
import talib as ta
#Download Data
aapl = pdr.get_data_yahoo('AAPL', start=datetime.datetime(2006, 10, 1), end=datetime.datetime(2012, 1, 1))
#Saves Data as CSV on desktop
aapl.to_csv('C:\\Users\\JDOG\\Desktop\\aapl_ohlc.csv', encoding='utf-8')
#Save to dataframe
df = pd.read_csv('C:\\Users\JDOG\\Desktop\\aapl_ohlc.csv', header=0, index_col='Date', parse_dates=True)
twenty_ma = 20
signals = pd.DataFrame(index=aapl.index)
signals['signal'] = 0.0
signals['20 MA'] = ta.SMA(aapl.Volume.values,twenty_ma)
It looks like SMA expects an array of floats rather than ints:
In [11]: ta.SMA(aapl.Volume.values.astype('float64'), twenty_ma)
Out[11]:
array([ nan, nan, nan, ..., 78960385., 76585880.,
73991890.])
Related
Im trying to get BTC USD data for 1 min in day and then for each open value im comparing the first open value with the next if its greater than the first value then buy and vice versa
This is what Ive got:
import numpy as np
import pandas as pd
import yfinance as yf
data = yf.download(tickers='BTC-USD', period='1d', interval='1m')
Opens = data['Open'].to_numpy()
for x in Opens:
for y in Opens:
if x > y:
print("Buy")
else:
print("Sell")
Storing all the Buy/Sell into a column named decision:
import numpy as np
import pandas as pd
import yfinance as yf
data = yf.download(tickers='BTC-USD', period='1d', interval='1m')
Opens = data[['Open']]
Opens['decision'] = np.where(Opens['Open'] > Opens['Open'].shift(1), 'Buy', 'Sell')
print(Opens)
I am trying to read some date info from Excel using openpyxl which returns them as datetime.datetime. I need to resample the data and then plot the resampled times using matplotlib. I would like an array of dates starting at tmin, incremented by 30.0 days.
The code stub is
import numpy as np
import datetime
vals = [
datetime.date.fromisoformat('2004-06-01'),
datetime.date.fromisoformat('2004-07-01'),
datetime.date.fromisoformat('2004-08-01'),
datetime.date.fromisoformat('2004-09-01'),
datetime.date.fromisoformat('2004-10-01'),
datetime.date.fromisoformat('2004-11-01')]
xtim = np.array(vals)
tmin = np.min(xtim)
ytim = np.arange(0.0, 150.0, 30.0)
tnew = tmin + ytim.astype('timedelta64[D]')
Unfortunately, this gives me the error message
UFuncTypeError: ufunc 'add' cannot use operands with types dtype('O') and dtype('<m8[D]')
You should wrap tmin in an array with 'datetime64' type:
np.array(tmin, dtype='datetime64') + ytim.astype('timedelta64[D]')
output: array(['2004-06-01', '2004-07-01', '2004-07-31', '2004-08-30', '2004-09-29'], dtype='datetime64[D]')
To add 30 days I'd suggest using datetime.timedelta(days=30).
For example as:
import numpy as np
import datetime
vals = [
datetime.date.fromisoformat('2004-06-01'),
datetime.date.fromisoformat('2004-07-01'),
datetime.date.fromisoformat('2004-08-01'),
datetime.date.fromisoformat('2004-09-01'),
datetime.date.fromisoformat('2004-10-01'),
datetime.date.fromisoformat('2004-11-01')]
xtim = np.array(vals)
tmin = np.min(xtim)
t_new=[tmin+datetime.timedelta(days=i) for i in np.arange(0.0, 150.0, 30.0)]
I am trying to resample my data annually, but struggle to set the start day of resampling.
import xarray as xr
import numpy as np
import pandas as pd
da = xr.DataArray(
np.linspace(0, 11, num=36),
coords=[
pd.date_range(
"15/12/1999", periods=36,
)
],
dims="time",
)
da.resample(time="1Y").mean()
What I am trying to achieve is to get the means of the following periods: 15/12/1999-15/12/2000, 15/12/2000-15/12/2001, 15/12/2001-15/12/2002, ...
I have solved it by shifting the time to the first month and use the corresponding pandas anchored offset. Afterwards, reset the time back.
import xarray as xr
import numpy as np
import pandas as pd
da = xr.DataArray(
np.concatenate([np.zeros(365), np.ones(365)]),
coords=[
pd.date_range(
"06/15/2017", "06/14/2019", freq='D'
)
],
dims="time",
)
days_to_first_of_month = pd.Timedelta(days=int(da.time.dt.day[0])-1)
da['time'] = da.time - days_to_first_of_month
month = da.time.dt.strftime("%b")[0].values
resampled = da.resample(time=f'AS-{month}').sum()
resampled['time'] = resampled.time + days_to_first_of_month
print(resampled)
Is there a more efficient or clean way?
When I try to import data from Excel using pandas and NumPy, I get the error shown below.
...
import pandas as pd
import numpy as np
# Importing prostate data
prostate_data = pd.read_excel(r'C:\Users\shrey\Documents\BIOE 594\prostate_dat.xlsx')
data = pd.DataFrame(prostate_data, columns= ['lcavol','lweight','age','lbph',
'svi','lcp','gleason','pgg45','lpsa'])
data.to_numpy()
A = data[:,0:7]
b = data[:,8]
At= np.transpose(A)
y = np.linalg.inv(At*A) # Estimating parameter using normal equation
x = y * (At*b)
print(x)
...
Error: TypeError: '(slice(None, None, None), slice(0, 7, None))' is an invalid key
You didn't re-assign the numpy array you created, so the dataframe wasn't transformed. Use:
data = data.to_numpy()
I got the code that downloads tickers and runs the linear regression for each stock in the downloaded list. I am stuck on the last step: showing Prediction & Residual values for each stock, for the last date in the data.
import pandas as pd
import numpy as np
import yfinance as yf
import datetime as dt
from sklearn import linear_model
tickers = ['EXPE','MSFT']
data = yf.download(tickers, start="2012-04-03", end="2017-07-07")['Close']
data = data.reset_index()
data = data.dropna()
df = pd.DataFrame(data, columns = ["Date"])
df["Date"]=df["Date"].apply(lambda x: x.toordinal())
for ticker in tickers:
data[ticker] = pd.DataFrame(data, columns = [ticker])
X = df
y = data[ticker]
lm = linear_model.LinearRegression()
model = lm.fit(X,y)
predictions = lm.predict(X)
residuals = y-lm.predict(X)
print (predictions[-1:])
print(residuals[-1:])
The current output looks like this:
[136.28856636]
1323 13.491432
Name: EXPE, dtype: float64
[64.19943648]
1323 5.260563
Name: MSFT, dtype: float64
But I would like it to show like this (as pandas table):
Predictions Residuals
EXPE 136.29 13.49
MSFT 64.20 5.26
You could do something like this where you store values in a list:
import pandas as pd
import numpy as np
import yfinance as yf
import datetime as dt
from sklearn import linear_model
tickers = ['EXPE','MSFT']
data = yf.download(tickers, start="2012-04-03", end="2017-07-07")['Close']
data = data.reset_index()
data = data.dropna()
df = pd.DataFrame(data, columns = ["Date"])
df["Date"]=df["Date"].apply(lambda x: x.toordinal())
predictions_output = []
residuals_output = []
for ticker in tickers:
data[ticker] = pd.DataFrame(data, columns = [ticker])
X = df
y = data[ticker]
lm = linear_model.LinearRegression()
model = lm.fit(X,y)
predictions = lm.predict(X)
residuals = y-lm.predict(X)
predictions_output.append(float(predictions[-1:]))
residuals_output.append(float(residuals[-1:]))
expectation_df = pd.DataFrame(list(zip(predictions_output, residuals_output)),
columns =['Predictions', 'Residuals']).set_index([tickers])
print(expectation_df)
with the output being:
Predictions Residuals
EXPE 136.288566 13.491432
MSFT 64.199436 5.260563
EDIT: I went too quickly and looked back and realized tickers was already defined, so you can use that to set your index here and lose the Tickers index heading to match your desired output.
Also if you want those values rounded, you can just append these two lines in your loop:
predictions_output.append(round(float(predictions[-1:]), 2))
residuals_output.append(round(float(residuals[-1:]), 2))