Why does yfinance data gives different numbers through times of day - python

I'm a beginner in Python and I've been doing a project on a stock screener for one of my classes. I create a csv list of several tickers, run through those lists to get data from yfinance and then use that to go through a set of criteria. Here's the loop I use to get data from yfinance:
for i in allrecs:
stock = yf.Ticker(i[0])
PriceEarning = stock.info["forwardPE"]
cap = stock.info["marketCap"]
dividend = stock.info["dividendYield"]
dividend_output = formatpercentage(dividend)
cap_formatted = "${:,.0f}".format(cap)
beta = stock.info['beta']
# result = condition(EPS_forward)
# rec = i[0], result
print(i[0],PriceEarning, cap, dividend, beta)
The problem I'm having is, on different times of days I receive different numbers. For example, when I was working on the file at 2pm they returned dividend yield and beta, but now at 4pm they all return "None". Also, when I retrieve the data from individual ticker, I got the number but when I was running it through the loop, it returned "None". How should I fix it?

Related

How to save results (and recall them when needed) of a simulation in Python?

I started (based on the idea shown in this model an actuarial project in Python in which I want to simulate, based on a set of inputs and adding (as done here: https://github.com/Saurabh0503/Financial-modelling-and-valuationn/blob/main/Dynamic%20Salary%20Retirement%20Model%20Internal%20Randomness.ipynb) some degree of internal randomness, how much it will take for an individual to retire, with a certain amount of wealth and a certain amount of annual salary and by submitting a certain annual payment (calculated as the desired cash divided by the years that will be necessary to retire). In my model's variation, the user can define his/her own parameters, making the model more flexible and user friendly; and there is a function that calculates the desired retirement cash based on individual's propensity both to save and spend.
The problem is that since I want to summarize (by taking the mean, max, min and std. deviation of wealth, salary and years to retirement) the output I obtain from the model, I have to save results (and to recall them) when I need to do so; but I don't have idea of what to do in order to accomplish this task.
I tried this solution, consisting in saving the simultation's output in a pandas dataframe. In particular I wrote that function:
def get_salary_wealth_year_case_df(data):
all_ytrs = []
salary = []
wealth = []
annual_payments = []
for i in range(data.n_iter):
ytr = years_to_retirement(data, print_output=False)
sal = salary_at_year(data, year, case, print_output=False)
wlt = wealth_at_year(data, year, prior_wealth, case, print_output=False)
pmt = annual_pmts_case_df(wealth_at_year, year, case, print_output=False)
all_ytrs.append(ytr)
salary.append(sal)
annual_payments.append(pmt)
df = pd.DataFrame()
df['Years to Retirement'] = all_ytrs
df['Salary'] = sal
df['Wealth'] = wlt
df['Annual Payments'] = pmt
return df
I need a feedback about what I'm doing. Am I doing it right? If so, are there more efficient ways to do so? If not, what should I do? Thanks in advance!
Given the inputs used for the function, I'm assuming your code (as it is) will do just fine in terms of computation speed.
As suggested, you can add a saving option to your function so the results that are being returned are stored in a .csv file.
def get_salary_wealth_year_case_df(data, path):
all_ytrs = []
salary = []
wealth = []
annual_payments = []
for i in range(data.n_iter):
ytr = years_to_retirement(data, print_output=False)
sal = salary_at_year(data, year, case, print_output=False)
wlt = wealth_at_year(data, year, prior_wealth, case, print_output=False)
pmt = annual_pmts_case_df(wealth_at_year, year, case, print_output=False)
all_ytrs.append(ytr)
salary.append(sal)
annual_payments.append(pmt)
df = pd.DataFrame()
df['Years to Retirement'] = all_ytrs
df['Salary'] = sal
df['Wealth'] = wlt
df['Annual Payments'] = pmt
# Save the dataframe to a given path inside your workspace
df.to_csv(path, header=False)
return df
After saving, returning the object might be optional. This depends on if you are going to use this dataframe on your code moving forward.

How to match asset price data from a csv file to another csv file with relevant news by date

I am researching the impact of news article sentiment related to a financial instrument and its potenatial effect on its instruments's price. I have tried to get the timestamp of each news item, truncate it to minute data (ie remove second and microsecond components) and get the base shareprice of an instrument at that time, and at several itervals after that time, in our case t+2. However, program created twoM to the file, but does not return any calculated price changes
Previously, I used Reuters Eikon and its functions to conduct the research, described in the article below.
https://developers.refinitiv.com/article/introduction-news-sentiment-analysis-eikon-data-apis-python-example
However, instead of using data available from Eikon, I would like to use my own csv news file with my own price data from another csv file. I am trying to match the
excel_file = 'C:\\Users\\Artur\\PycharmProjects\\JRA\\sentimenteikonexcel.xlsx'
df = pd.read_excel(excel_file)
sentiment = df.Sentiment
print(sentiment)
start = df['GMT'].min().replace(hour=0,minute=0,second=0,microsecond=0).strftime('%Y/%m/%d')
end = df['GMT'].max().replace(hour=0,minute=0,second=0,microsecond=0).strftime('%Y/%m/%d')
spot_data = 'C:\\Users\\Artur\\Desktop\\stocksss.csv'
spot_price_10 = pd.read_csv(spot_data)
print(spot_price_10)
df['twoM'] = np.nan
for idx, newsDate in enumerate(df['GMT'].values):
sTime = df['GMT'][idx]
sTime = sTime.replace(second=0, microsecond=0)
try:
t0 = spot_price_10.iloc[spot_price_10.index.get_loc(sTime),2]
df['twoM'][idx] = ((spot_price_10.iloc[spot_price_10.index.get_loc((sTime + datetime.timedelta(minutes=10))),3]/(t0)-1)*100)
except:
pass
print(df)
However, the programm is not able to return the twoM price change values
I assume that you got a warning because you are trying to make changes on views. As soon as you have 2 [] (one for the column, one for the row) you can only read. You must use loc or iloc to write a value:
...
try:
t0 = spot_price_10.iloc[spot_price_10.index.get_loc(sTime),2]
df.loc[idx,'twoM'] = ((spot_price_10.iloc[spot_price_10.index.get_loc((sTime + datetime.timedelta(minutes=10))),3]/(t0)-1)*100)
except:
pass
...

Alpha Vantage stockinfo only collects 4 dfs properly formatted, not 6

I can get 4 tickers of stockinfo from Alpha Vantage before the rest of the DataFrames are not getting the stockinfo I ask for. So my resulting concatenated df gets interpreted as Nonetype (because the 4 first dfs are formatted differently than the last 2). This is not my problem. The fact that I only get 4 of my requests is... If I can fix that - the resulting concatenated df will be intact.
My code
import pandas as pd
import datetime
import requests
from alpha_vantage.timeseries import TimeSeries
import time
tickers = []
def alvan_csv(stocklist):
api_key = 'demo' # For use with Alpha Vantage stock-info retrieval.
for ticker in stocklist:
#data=requests.get('https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=%s&apikey={}'.format(api_key) %(ticker))
df = pd.read_csv('https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&datatype=csv&symbol=%s&apikey={}'.format(api_key) %(ticker))#, index_col = 0) &outputsize=full
df['ticker'] = ticker
tickers.append(df)
# concatenate all the dfs
df = pd.concat(tickers)
print('\ndata before json parsing for === %s ===\n%s' %(ticker,df))
df['adj_close'] = df['adjusted_close']
del df['adjusted_close']
df['date'] = df['timestamp']
del df['timestamp']
df = df[['date','ticker','adj_close','volume','dividend_amount','split_coefficient','open','high','low']] #
df=df.sort_values(['ticker','date'], inplace=True)
time.sleep(20.3)
print('\ndata after col reshaping for === %s ===\n%s' %(ticker,df))
return df
if __name__ == '__main__':
stocklist = ['vws.co','nflx','mmm','abt','msft','aapl']
df = alvan_csv(stocklist)
NB. Please note that to use the Alpha Vantage API, you need a free API-Key which you may optain here: https://www.alphavantage.co/support/#api-key
Replace the demo API Key with your API Key to make this code work.
Any ideas as to get this to work?
Apparently Alpha Vantage has a pretty low fair usage allowance, where they measure no of queries pr. minute. So in effekt only the first 4 stocks are allowed at full speed. The rest of the stocks need to pause before downloading for not violating their fair-usage policy.
I have now introduced a pause between my stock-queries. At the moment I get approx 55% of my stocks, if I pause for 10 sec. between calls, and 100% if I pause for 15 seconds.
I will be testing exactly how low the pause can be set to allow for 100% of stocks to come through.
I must say compared to the super high-speed train we had at finance.yahoo.com, this strikes me as steam-train. Really really slow downloads. To get my 500 worth of tickers it takes me 2½ hours. But I guess beggars can't be choosers. This is a free service and I will manage with this.

more efficient way of looping over name list in python

I"m playing around with Bittrex's API to get the current price of a coin. (E.g: btc-ltc). So in this case, the API will read:
r = requests.get('https://bittrex.com/api/v1.1/public/getticker?market=BTC-LTC').json()
pd = pandas.Dataframe(r)
print(pd)
If I want to get the current price of maybe... 50 or 200 different coins, i wrote a loop to replace BTC-LTC with that particular market coin name. (part of another API on Bittrex)
for i in marketnames:
r = requests.get('https://bittrex.com/api/v1.1/public/getticker?market={names}'.format(names=i)).json()
pd = pandas.Dataframe(r)
print(pd)
The problem with this loop is that it goes through 1 by 1, iterating over the list of coin names, 200 times to get the price.
Is there a more efficient way of doing this?
was there a typo in your code? if you iterate through the marketnames list then you should use i in your code, as below?
for i in marketnames:
r = requests.get('https://bittrex.com/api/v1.1/public/getticker?market={names}'.format(names=i)).json()
pd = pandas.Dataframe(r)
print(pd)

Loop variables through python urls on a list

I can get one stock ticker into my URL...but how do I create a list and loop through the URL. Please see a failed attempt...it's for a tweepy project i'm fiddling with. Mostly worried about getting through multiple urls.
ticker=["AAPL","XOM"]
For i < len(ticker):
responeData = requests.get("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol="+str(ticker[i])+"&apikey=XXXXXX")
symbol = str(responeData.json()['Meta Data']['2. Symbol'])
refresh = str(responeData.json()['Meta Data']['3. Last Refreshed'])
checkclose = str(responeData.json()['Time Series (Daily)'])
close=str(responeData.json()['Time Series (Daily)'][refresh]['4. close'])
api.update_status(status=symbol+' '+refresh+' Close Price: $'+close)
You can use a for loop to iterate your list.
tickers=["AAPL","XOM"]
for ticker in tickers:
responeData = requests.get("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol="+ticker+"&apikey=XXXXXX")
symbol = str(responeData.json()['Meta Data']['2. Symbol'])
refresh = str(responeData.json()['Meta Data']['3. Last Refreshed'])
checkclose = str(responeData.json()['Time Series (Daily)'])
close=str(responeData.json()['Time Series (Daily)'][refresh]['4. close'])
api.update_status(status=symbol+' '+refresh+' Close Price: $'+close)

Categories