i have data length is over 3000.
below are code for making 20days value ( Volume Ration in Stock market)
it took more than 2 min.
is there any good way to reduce running time.
import pandas as pd
import numpy as np
from pandas.io.data import DataReader
import matplotlib.pylab as plt
data = DataReader('047040.KS','yahoo',start='2010')
data['vr']=0
data['Volume Ratio']=0
data['acend']=0
data['vr'] = np.sign(data['Close']-data['Open'])
data['vr'] = np.where(data['vr']==0,0.5,data['vr'])
data['vr'] = np.where(data['vr']<0,0,data['vr'])
data['acend'] = np.multiply(data['Volume'],data['vr'])
for i in range(len(data['Open'])):
if i<19:
data['Volume Ratio'][i]=0
else:
data['Volume Ratio'][i] = ((sum(data['acend'][i-19:i]))/((sum(data['Volume'][i-19:i])-sum(data['acend'][i-19:i]))))*100
Consider using conditional row selection and rolling.sum():
data.loc[data.index[:20], 'Volume Ratio'] = 0
data.loc[data.index[20:], 'Volume Ratio'] = (data.loc[:20:, 'acend'].rolling(window=20).sum() / (data.loc[:20:, 'Volume'].rolling(window=20).sum() - data.loc[:20:, 'acend'].rolling(window=20).sum()) * 100
or, simplified - .rolling.sum() will create np.nan for the first 20 values so just use .fillna(0):
data['new_col'] = data['acend'].rolling(window=20).sum().div(data['Volume'].rolling(window=20).sum().subtract(data['acend'].rolling(window=20).sum()).mul(100).fillna(0)
Related
I am new to python and does not have a lot of experience.
I am trying to add constrains in this code so that the weights of one stock cannot be equal to 0, but the weights of the same stock cannot be above 5% either. This constrain will be valid for all the stocks.
This is what I have so far, is there anyone that have any tips on how to add these constrains?
Thanks in advance!
import os
import pandas as pd
import numpy as np
from scipy.optimize import linprog
data = pd.read_excel("data.xlsm")
# change data['WGT_GLOBAL'] s.t. EUTax = 20
data['Weights screened'] = data['WGT_GLOBAL']*data['Positiv screening']
EUTax = (data['Weights screened']*data['EU tax']).sum()
# min = -(x*data['YTD Return']).sum()
# s.t. x >= 0, x <= 1, (x*data['Positiv screening']*data['EU tax']).sum() = 20
obj = -(data['YTD Return'].fillna(0).to_numpy())
bnd = [(0,1)]
lhs_eq = [data['Positiv screening']*data['EU tax'].to_numpy(),np.ones(len(data))]
rhs_eq = [[20],[1]]
opt = linprog(c=obj, A_eq=lhs_eq, b_eq=rhs_eq, bounds=bnd, method="revised simplex")
optimal_weights = opt.x
Im trying to get BTC USD data for 1 min in day and then for each open value im comparing the first open value with the next if its greater than the first value then buy and vice versa
This is what Ive got:
import numpy as np
import pandas as pd
import yfinance as yf
data = yf.download(tickers='BTC-USD', period='1d', interval='1m')
Opens = data['Open'].to_numpy()
for x in Opens:
for y in Opens:
if x > y:
print("Buy")
else:
print("Sell")
Storing all the Buy/Sell into a column named decision:
import numpy as np
import pandas as pd
import yfinance as yf
data = yf.download(tickers='BTC-USD', period='1d', interval='1m')
Opens = data[['Open']]
Opens['decision'] = np.where(Opens['Open'] > Opens['Open'].shift(1), 'Buy', 'Sell')
print(Opens)
I have a data frame which is like the following :
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import csv
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
df_input = pd.read_csv('combine_input.csv', delimiter=',')
df_output = pd.read_csv('combine_output.csv', delimiter=',')
In this data frame, there are many repeated rows for example the first row is repeated more than 1000 times, and so on for the other rows
when I plot the time distribution I got that figure which shows that the frequency of the time parameter
df_input.plot(y='time',kind = 'hist',figsize=(10,10))
plt.grid()
plt.show()
My question is how can I take the data only in the following red rectangular for example at time = 0.006 and frequency = 0.75 1e6 ( check the following pic )
Note: InPlace of target you have to write time as your column name Is time,or change column name to target
def calRows(df,x,y):
#df For consideration
df1 = pd.DataFrame(df.target[df.target<=x])
minCount = len(df1)
targets = df1.target.unique()
for i in targets:
count = int(df1[df1.target == i].count())
if minCount > count:
minCount = count
if minCount > y:
minCount = int(y)
return minCount
You have To pass your data frame, x-intercept of the graph, y-intercept of graph to calRows(df,x,y) function which will return the number of rows to take for each target.
rows = CalRows(df,6,75)
print(rows)
takeFeatures(df,rows,x) function will take dataframe, rows (result of first function), x-intercept of graph and will return you the final dataframe.
def takeFeatures(df,rows,x):
finalDf = pd.DataFrame(columns = df.columns)
df1 = df[df.target<=x]
targets = df1.target.unique()
for i in targets:
targeti = df1[df1.target==i]
sample = targeti.sample(rows)
finalDf = pd.concat([finalDf,sample])
return finalDf
Calling takeFeature() Function
final = takeFeatures(df,rows,6)
print(final)
Your Final DataFrame will have the Values ThatYou expected in Graph
And After Plotting this final dataframe you will get like this graph
I was recently given a challenge of calculating the presence of Bull/Bear markets using the values of -1, 1 to denote which one is which.
It is straight forward enough to do this with a for-loop but I know this is the worst way to do these things and it's better to use numpy/pandas methods if possible. However, I'm not seeing an easy way to do it.
Any ways of how to do this, maybe using changes of +/- 20% from current place to determine which regime you're in.
Here's a sample dataframe:
dates = pd.date_range(start='1950-01-01', periods=25000)
rand = np.random.RandomState(42)
vals = np.zeros(25000)
vals[0] = 15
for i in range(1, 25000):
vals[i] = vals[i-1] + rand.normal(0, 1)
df = pd.DataFrame(vals, columns=['Price'], index=dates)
The plot of these prices looks like this:
Anyone have any recommendations to calculate what regime the current point is in?
If you have to use a for loop then that's fine.
Here is a solution using the S&P 500 index from Yahoo! Finance (ticker ^GSPC):
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import yfinance as yf
import requests_cache
session = requests_cache.CachedSession()
df = yf.download('^GSPC', session=session)
df = df[['Adj Close']].copy()
df['dd'] = df['Adj Close'].div(df['Adj Close'].cummax()).sub(1)
df['ddn'] = ((df['dd'] < 0.) & (df['dd'].shift() == 0.)).cumsum()
df['ddmax'] = df.groupby('ddn')['dd'].transform('min')
df['bear'] = (df['ddmax'] < -0.2) & (df['ddmax'] < df.groupby('ddn')['dd'].transform('cummin'))
df['bearn'] = ((df['bear'] == True) & (df['bear'].shift() == False)).cumsum()
bears = df.reset_index().query('bear == True').groupby('bearn')['Date'].agg(['min', 'max'])
print(bears)
df['Adj Close'].plot()
for i, row in bears.iterrows():
plt.fill_between(row, df['Adj Close'].max(), alpha=0.25, color='r')
plt.gca().yaxis.set_major_formatter(plt.matplotlib.ticker.StrMethodFormatter('{x:,.0f}'))
plt.ylabel('S&P 500 Index (^GSPC)')
plt.title('S&P 500 Index with Bear Markets (> 20% Declines)')
plt.savefig('bears.png')
plt.show()
Here are the bear markets in data frame bears:
min max
bearn
1 1956-08-06 1957-10-21
2 1961-12-13 1962-06-25
3 1966-02-10 1966-10-06
4 1968-12-02 1970-05-25
5 1973-01-12 1974-10-02
6 1980-12-01 1982-08-11
7 1987-08-26 1987-12-03
8 2000-03-27 2002-10-08
9 2007-10-10 2009-03-06
10 2020-02-20 2020-03-20
11 2022-01-04 2022-06-15
Here is a plot:
Edit: I think this is an improvement from my first solution since ^GSPC provides a longer time series and bear markets are not typically dividend-adjusted.
I think this might work:
import numpy as np
vals = np.random.normal(0, 1, 25000)
vals[0] = 15
vals = np.cumsum(vals)
I just wrote a function to calculate the standard-deviation of one specific column of a pandas dataframe. I just wanted to ask, if there is a way to do this more efficient than I did it here.
import numpy as np
import pandas as pd
import astropy
from astropy.table import QTable, Table, Column
from astropy import units as u
import random
def std_dev(Dataframe, column, nb_rows, av):
DF = Dataframe
calc = []
for x in DF[column].tolist():
r = (x - av)**2
calc.append(r)
#calculating sum over calc
R = sum(calc)
return (R / (nb_rows - 1))**(1 / 2)
if __name__ == "__main__":
sample = Table({
'row':
np.array([round(random.uniform(3300, 3700), 2) for i in
range(20)])
})
df = sample.to_pandas()
a = std_dev(df, 'row', 20, 3500)
Thanks for the help.