Scaling this python script for multiple stocks - python

First off, thank you for taking the time to help me. We are using this python script that pulls data from Yahoo for a given time period.
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib import style
import pandas as pd
import pandas_datareader.data as web
style.use('ggplot')
start = dt.datetime (2007,1,1)
end = dt.datetime(2022,1,31)
df = web.DataReader('AAPL','yahoo', start, end)
df.to_csv('AAPl.csv')
The code above grabs the data we need from Yahoo for the AAPL stock for the dates we set, then it creates a CSV for that stock. The problem we are running into is that we have 5000 different stocks we need to do this for. We have a CSV file with all the different tickers we need to run this program over. How can we modify our code to run over the different stocks from our CSV? Instead of us having to run this program manually 5000 times.

You don't have to write the intermediate dataframes (for individual stock symbols) to file. Try something like:
tickers = pd.read_csv(r'path_to/symbols.csv')['symbol_column_name'].values
full_df = pd.DataFrame({})
for ticker in tickers:
df = web.DataReader(ticker,'yahoo', start, end)
full_df = pd.concat((full_df, df))
#Now write merged table to file
full_df.to_csv('my_output_table.csv')

Related

Append YahooQuery (yfinance alternative) financial data to Excel for thousands of stocks to save time and offset errors?

The code below runs great. But it is only pulling data for three symbols (AAPL, GOOG, and MSFT). Errors occur as the number of symbols go into the thousands. No output file is produced. Hours of time is wasted as a result.
Thus, I would like to append the results to the excel output file as the data is gathered for each symbol individually (or append 100 symbols at a time). That way if an error occurs, the data output is still saved to a file.
import pandas as pd
from yahooquery import Ticker
symbols = ['AAPL','GOOG','MSFT']
faang = Ticker(symbols)
faang.all_financial_data(frequency='q')
df = (faang.all_financial_data(frequency='q'))
df.to_excel('output.xlsx')

Retrieve a lot of data from Yahoo finance

I have a csv file which contains the ticker symbols for all the stocks listed on Nasdaq. Here is a link to that csv file. One can download it from there. There are more than 8000 stocks listed. Following is the code
import pandas as pd
import yfinance # pip install yfinance
tick_pd = pd.read_csv("/path/to/the/csv/file/nasdaq_screener_1654004691484.csv",
usecols = [0])
I have made a function which retrieves the historical stock prices for a ticker symbol. That function is as following:-
## function to be applied on each stock symbol
def appfunc(ticker):
A = yf.Ticker(ticker).history(period="max")
A["symbol"] = ticker
return A
And I apply this function to each row of the tick_pd, the following way:-
hist_prices = tick_pd.apply(appfunc)
But this takes way too much time, way way too much time. I was hoping if someone could find a way with which I can retrieve this data quite quickly. Or if there is a way I could parallelize it. I am quite new to python, so, I don't really know a lot of ways to do this.
Thanks in advance
You can use yf.download to download all tickers asynchronously::
tick_pd = pd.read_csv('nasdaq_screener_1654024849057.csv', usecols=[0])
df = yf.download(tick_pd['Symbol'].tolist(), period='max')
You can use threads as parameter of yf.download:
# Enable mass downloading (default is True)
df = yf.download(tick_pd['Symbol'].tolist(), period='max', threads=True)
# OR
# You can control the number of threads
df = yf.download(tick_pd['Symbol'].tolist(), period='max', threads=8)

How do I only get workday data from pandas datareader?

I am trying to get only workday data when importing Bitcoin quotes from yahoo finance. However, when I try to import it, it also gives weekend data, which I do not need. I transfered all data to .csv files to check what the problem was, and found that the bitcoin data included weekends and holidays. Since bitcoin is traded 24/7, I am getting more data. How do I get only data from workdays?
Code:
import pandas_datareader.data as web
import datetime as dt
start = dt.datetime(2017,1,1)
end = dt.datetime(2017,2,1)
a = web.DataReader('BTC-USD', 'yahoo', start, end)
a.to_csv('BTC.csv')
(Coded in Spyder, Python 3.7)
Use this:
import pandas_datareader.data as web
a = web.DataReader('BTC-USD', 'yahoo', '2017-01-01', '2017-02-01')
a_business_days = a[a.index.dayofweek < 5]

How to update instead of rewriting csv by python with pandas to fetch stock data?

I am an absolute noob in terms of programming.
I wish to fetch historical data of a list of stock from yahoo for data analysis.
I modified the script I found and got this.
#settings for importing built-in datetime and date libraries
#and external pandas_datareader libraries
import pandas_datareader.data as web
import datetime
from datetime import timedelta
#read ticker symbols from a file to python symbol list
symbol = []
with open('E:\Google drive\Investment\Python Stock pick\Stocklist1.txt') as f:
for line in f:
symbol.append(line.strip())
f.close
end = datetime.datetime.today()
start = end - timedelta(days=400)
#set path for csv file
path_out = 'E:/Google drive/Investment/Python Stock pick/CSV/'
i=0
while i<len(symbol):
try:
df = web.DataReader(symbol[i], 'yahoo', start, end)
df.insert(0,'Symbol',symbol[i])
df = df.drop(['Adj Close'], axis=1)
if i == 0:
df.to_csv(path_out+symbol[i]+'.csv')
print (i, symbol[i],'has data stored to csv file')
else:
df.to_csv(path_out+symbol[i]+'.csv',header=True)
print (i, symbol[i],'has data stored to csv file')
except:
print("No information for ticker # and symbol:")
print (i,symbol[i])
i=i+1
continue
i=i+1
And I run the script everyday and it fetches stock data in the past.
It would replace the entire csv file and always replacing the old data with the new one.
Is there anyway for the script to just add the new data into the csv file?
Thanks a lot in advance. I am all new to the programming world and have no idea how to do this.
I think you need to add 'a+' instead. Otherwise the file will keep looping itself. It is what happened to me.
You have to add param 'a':
with open('E:\Google drive\Investment\PythonStockpic\Stocklist1.txt','a') as f:
f.write(line.strip())
see: append new row to old csv file python

how to save Python pandas data into excel file?

I am trying to load data from the web source and save it as a Excel file but not sure how to do it. What should I do? The original dataframe has different columns. Let's say that I am trying to save 'Open' column
import matplotlib.pyplot as plt
import pandas_datareader.data as web
import datetime
import pandas as pd
def ViewStockTrend(compcode):
start = datetime.datetime(2015,2,2)
end = datetime.datetime(2016,7,13)
stock = web.DataReader(compcode,'yahoo',start,end)
print(stock['Open'])
compcode = ['FDX','GOOGL','FB']
aa= ViewStockTrend(compcode)
Once you have made the pandas dataframe just use to_excel on the entire thing if you want:
aa.to_excel('output/filename.xlsx')
If stock is a pandas DataFrame, you need to construct a new Framefrom that column and output that one to excel:
df = pd.DataFrame(stock['Open'])
df.to_excel('path/to/your/file')

Categories