I'm trying to build a cryptocurrency price tracker in Python (see code below). I'm working with Python 3.10.1 in Visual Studio Code.
import pandas_datareader.data as web
import datetime as dt
currency = 'EUR'
metric = 'Close'
crypto = ['BTC','ETH']
colnames = []
first = True
start = dt.datetime(2020,1,1)
end = dt.datetime.now()
for ticker in crypto:
data = web.DataReader(f'{crypto}-{currency}', 'yahoo', start, end)
if first:
combined = data[[metric]].copy()
colnames.append(ticker)
combined.columns = colnames
first = False
else:
combined = combined.join(data[metric])
colnames.append(ticker)
combined.columns = colnames
When I execute this code, I get the following error notification:
RemoteDataError: No data fetched for symbol ['BTC', 'ETH']-EUR using YahooDailyReader
When I change the variable crypto to only pull the prices for BTC the code works, but the output looks like this:
Date
B
T
C
2020-01-01
6417.781738
6417.781738
6417.781738
2020-01-02
6252.938477
6252.938477
6252.938477
2020-01-03
6581.735840
6581.735840
6581.735840
In the scenario of only pulling BTC, the variable colnames looks like this: colnames = ['B','T', 'C']. I suspect, there's something wrong with that variable and it's potentially the reason why my code fails when I try to pull the data for multiple cryptocurrencies but I can't quite figure it out and solve my problem.
Related
I am trying to get the current price and market cap of all of the tickers in the S&P500 and the way I am currently doing it is very slow, so I was wondering if there was anything I could do to improve it, or any other methods.
Here is my current method, simply to print the name, market cap and current price:
import yfinance as yf
#I am using a csv file with a list of all the tickers which I use to create a pandas dataframe and form a space seperated string of all of the tickers called all_symbols
#I have simplified the pandas dataframe to a list for the purpose of this question
ticker_list = ["A", "AL", "AAP", "AAPL", ... "ZBRA", "ZION", "ZTS"]
all_symbols = " ".join(ticker_list)
tickers = yf.Tickers(all_symbols)
for ticker in ticker_list:
price = tickers.tickers[ticker].info["currentPrice"]
market_cap = tickers.tickers[ticker].info["marketCap"]
print(ticker, market_cap, price)
This method is currently very slow and the information is received one at a time, so is there anyway to make it faster and/or get the ticker info as a batch.
I have also tried using the yf.download method to download information on multiple tickers at once, and this was faster but I could not get the information I wanted from that, so is it possible to get the market cap and current price using the yf.download method?
Although there have been similar questions to this, they all seem to use the same general idea which I use, which takes a long time when the number of tickers is high, I am yet to find any solution which is faster than my current one, so any suggestions are appreciated, even solutions not using yfinance, as long as they get real-time data without a massive delay.
There is another library you can try called yahooquery. In my trial the time reduced from 34 seconds to 0.4 seconds.
from yahooquery import Ticker
ticker_list = ["A", "AL", "AAP", "AAPL", "ZBRA", "ZION", "ZTS"]
all_symbols = " ".join(ticker_list)
myInfo = Ticker(all_symbols)
myDict = myInfo.price
for ticker in ticker_list:
ticker = str(ticker)
longName = myDict[ticker]['longName']
market_cap = myDict[ticker]['marketCap']
price = myDict[ticker]['regularMarketPrice']
print(ticker, longName, market_cap, price)
There are lots of other information in the myDict {} dictionary, check it out.
You may find that getting the values for a single ticker in a discrete thread will give you better overall performance. Here's an example:
import yfinance as yf
from concurrent.futures import ThreadPoolExecutor
def get_stats(ticker):
info = yf.Tickers(ticker).tickers[ticker].info
print(f"{ticker} {info['currentPrice']} {info['marketCap']}")
ticker_list = ['AAPL', 'ORCL', 'PREM.L', 'UKOG.L', 'KOD.L', 'TOM.L', 'VELA.L', 'MSFT', 'AMZN', 'GOOG']
with ThreadPoolExecutor() as executor:
executor.map(get_stats, ticker_list)
Output:
VELA.L 0.035 6004320
UKOG.L 0.1139 18496450
PREM.L 0.461 89516976
ORCL 76.755 204970377216
MSFT 294.8669 2210578825216
TOM.L 0.604 10558403
KOD.L 0.3 47496900
AMZN 3152.02 1603886514176
AAPL 171.425 2797553057792
GOOG 2698.05 1784584732672
I am trying to scrape historical data from a table in coinmarketcap. However, the code that I run gives back "no data." I thought it would be fairly easy, but not sure what I am missing.
url = "https://coinmarketcap.com/currencies/bitcoin/historical-data/"
data = requests.get(url)
bs=BeautifulSoup(data.text, "lxml")
table_body=bs.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
cols=row.find_all('td')
cols=[x.text.strip() for x in cols]
print(cols)
Output:
C:\Users\Ejer\anaconda3\envs\pythonProject\python.exe C:/Users/Ejer/PycharmProjects/pythonProject/CloudSQL_test.py
['No Data']
Process finished with exit code 0
You don't need to scrape the data, you can get request it:
import time
import requests
def get_timestamp(datetime: str):
return int(time.mktime(time.strptime(datetime, '%Y-%m-%d %H:%M:%S')))
def get_btc_quotes(start_date: str, end_date: str):
start = get_timestamp(start_date)
end = get_timestamp(end_date)
url = f'https://web-api.coinmarketcap.com/v1/cryptocurrency/ohlcv/historical?id=1&convert=USD&time_start={start}&time_end={end}'
return requests.get(url).json()
data = get_btc_quotes(start_date='2020-12-01 00:00:00',
end_date='2020-12-10 00:00:00')
import pandas as pd
# making A LOT of assumptions here, hopefully the keys don't change in the future
data_flat = [quote['quote']['USD'] for quote in data['data']['quotes']]
df = pd.DataFrame(data_flat)
print(df)
Output:
open high low close volume market_cap timestamp
0 18801.743593 19308.330663 18347.717838 19201.091157 3.738770e+10 3.563810e+11 2020-12-02T23:59:59.999Z
1 19205.925404 19566.191884 18925.784434 19445.398480 3.193032e+10 3.609339e+11 2020-12-03T23:59:59.999Z
2 19446.966422 19511.404714 18697.192914 18699.765613 3.387239e+10 3.471114e+11 2020-12-04T23:59:59.999Z
3 18698.385279 19160.449265 18590.193675 19154.231131 2.724246e+10 3.555639e+11 2020-12-05T23:59:59.999Z
4 19154.180593 19390.499895 18897.894072 19345.120959 2.529378e+10 3.591235e+11 2020-12-06T23:59:59.999Z
5 19343.128798 19411.827676 18931.142919 19191.631287 2.689636e+10 3.562932e+11 2020-12-07T23:59:59.999Z
6 19191.529463 19283.478339 18269.945444 18321.144916 3.169229e+10 3.401488e+11 2020-12-08T23:59:59.999Z
7 18320.884784 18626.292652 17935.547820 18553.915377 3.442037e+10 3.444865e+11 2020-12-09T23:59:59.999Z
8 18553.299728 18553.299728 17957.065213 18264.992107 2.554713e+10 3.391369e+11 2020-12-10T23:59:59.999Z
Your problem basically is you're trying to get a table but this table is dynamically created by JS in this case you need to call an interpreter for this JS. But however you just can check the network monitor on your browser and you can get the requests and probably contains a full JSON or XML raw data and you don't need to scrape. I did it and I got this request:
https://web-api.coinmarketcap.com/v1/cryptocurrency/ohlcv/historical?id=1&convert=USD&time_start=1604016000&time_end=1609286400
Check it out and I hope help you!
I am researching the impact of news article sentiment related to a financial instrument and its potenatial effect on its instruments's price. I have tried to get the timestamp of each news item, truncate it to minute data (ie remove second and microsecond components) and get the base shareprice of an instrument at that time, and at several itervals after that time, in our case t+2. However, program created twoM to the file, but does not return any calculated price changes
Previously, I used Reuters Eikon and its functions to conduct the research, described in the article below.
https://developers.refinitiv.com/article/introduction-news-sentiment-analysis-eikon-data-apis-python-example
However, instead of using data available from Eikon, I would like to use my own csv news file with my own price data from another csv file. I am trying to match the
excel_file = 'C:\\Users\\Artur\\PycharmProjects\\JRA\\sentimenteikonexcel.xlsx'
df = pd.read_excel(excel_file)
sentiment = df.Sentiment
print(sentiment)
start = df['GMT'].min().replace(hour=0,minute=0,second=0,microsecond=0).strftime('%Y/%m/%d')
end = df['GMT'].max().replace(hour=0,minute=0,second=0,microsecond=0).strftime('%Y/%m/%d')
spot_data = 'C:\\Users\\Artur\\Desktop\\stocksss.csv'
spot_price_10 = pd.read_csv(spot_data)
print(spot_price_10)
df['twoM'] = np.nan
for idx, newsDate in enumerate(df['GMT'].values):
sTime = df['GMT'][idx]
sTime = sTime.replace(second=0, microsecond=0)
try:
t0 = spot_price_10.iloc[spot_price_10.index.get_loc(sTime),2]
df['twoM'][idx] = ((spot_price_10.iloc[spot_price_10.index.get_loc((sTime + datetime.timedelta(minutes=10))),3]/(t0)-1)*100)
except:
pass
print(df)
However, the programm is not able to return the twoM price change values
I assume that you got a warning because you are trying to make changes on views. As soon as you have 2 [] (one for the column, one for the row) you can only read. You must use loc or iloc to write a value:
...
try:
t0 = spot_price_10.iloc[spot_price_10.index.get_loc(sTime),2]
df.loc[idx,'twoM'] = ((spot_price_10.iloc[spot_price_10.index.get_loc((sTime + datetime.timedelta(minutes=10))),3]/(t0)-1)*100)
except:
pass
...
I can get 4 tickers of stockinfo from Alpha Vantage before the rest of the DataFrames are not getting the stockinfo I ask for. So my resulting concatenated df gets interpreted as Nonetype (because the 4 first dfs are formatted differently than the last 2). This is not my problem. The fact that I only get 4 of my requests is... If I can fix that - the resulting concatenated df will be intact.
My code
import pandas as pd
import datetime
import requests
from alpha_vantage.timeseries import TimeSeries
import time
tickers = []
def alvan_csv(stocklist):
api_key = 'demo' # For use with Alpha Vantage stock-info retrieval.
for ticker in stocklist:
#data=requests.get('https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=%s&apikey={}'.format(api_key) %(ticker))
df = pd.read_csv('https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&datatype=csv&symbol=%s&apikey={}'.format(api_key) %(ticker))#, index_col = 0) &outputsize=full
df['ticker'] = ticker
tickers.append(df)
# concatenate all the dfs
df = pd.concat(tickers)
print('\ndata before json parsing for === %s ===\n%s' %(ticker,df))
df['adj_close'] = df['adjusted_close']
del df['adjusted_close']
df['date'] = df['timestamp']
del df['timestamp']
df = df[['date','ticker','adj_close','volume','dividend_amount','split_coefficient','open','high','low']] #
df=df.sort_values(['ticker','date'], inplace=True)
time.sleep(20.3)
print('\ndata after col reshaping for === %s ===\n%s' %(ticker,df))
return df
if __name__ == '__main__':
stocklist = ['vws.co','nflx','mmm','abt','msft','aapl']
df = alvan_csv(stocklist)
NB. Please note that to use the Alpha Vantage API, you need a free API-Key which you may optain here: https://www.alphavantage.co/support/#api-key
Replace the demo API Key with your API Key to make this code work.
Any ideas as to get this to work?
Apparently Alpha Vantage has a pretty low fair usage allowance, where they measure no of queries pr. minute. So in effekt only the first 4 stocks are allowed at full speed. The rest of the stocks need to pause before downloading for not violating their fair-usage policy.
I have now introduced a pause between my stock-queries. At the moment I get approx 55% of my stocks, if I pause for 10 sec. between calls, and 100% if I pause for 15 seconds.
I will be testing exactly how low the pause can be set to allow for 100% of stocks to come through.
I must say compared to the super high-speed train we had at finance.yahoo.com, this strikes me as steam-train. Really really slow downloads. To get my 500 worth of tickers it takes me 2½ hours. But I guess beggars can't be choosers. This is a free service and I will manage with this.
I can get one stock ticker into my URL...but how do I create a list and loop through the URL. Please see a failed attempt...it's for a tweepy project i'm fiddling with. Mostly worried about getting through multiple urls.
ticker=["AAPL","XOM"]
For i < len(ticker):
responeData = requests.get("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol="+str(ticker[i])+"&apikey=XXXXXX")
symbol = str(responeData.json()['Meta Data']['2. Symbol'])
refresh = str(responeData.json()['Meta Data']['3. Last Refreshed'])
checkclose = str(responeData.json()['Time Series (Daily)'])
close=str(responeData.json()['Time Series (Daily)'][refresh]['4. close'])
api.update_status(status=symbol+' '+refresh+' Close Price: $'+close)
You can use a for loop to iterate your list.
tickers=["AAPL","XOM"]
for ticker in tickers:
responeData = requests.get("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol="+ticker+"&apikey=XXXXXX")
symbol = str(responeData.json()['Meta Data']['2. Symbol'])
refresh = str(responeData.json()['Meta Data']['3. Last Refreshed'])
checkclose = str(responeData.json()['Time Series (Daily)'])
close=str(responeData.json()['Time Series (Daily)'][refresh]['4. close'])
api.update_status(status=symbol+' '+refresh+' Close Price: $'+close)