I'm reading in stock data from Yahoo associated with the "tickers" (stock codes) provided to me in a CSV file. However, some of the stock codes are actually not available on Yahoo, so I was wondering if there is a way to account for this in my code below via Exception Handling.
import pandas
import pandas.io.data as web
import datetime
import csv
f1=open('C:\Users\Username\Documents\Programming\Financialdata.csv') #Enter the location of the file
c1= csv.reader(f1)
tickers =[]
for row in c1: #reading tickers from the csv file
tickers.append(row)
start=datetime.datetime(2012,1,1)
end=datetime.datetime(2013,1,1)
l=[]; m=[]; tickernew=[]
i=0;j=0; k=0; z=[]
for tick in tickers[0]:
f=web.DataReader(tick,'yahoo', start,end)
if len(f)==250: #checking if the stock was traded for 250 days
tickernew.append(tick) #new ticker list to keep track of the new index number of tickers
k = k + 1 #k keeps track of the number of new tickers
for i in range(0,len(f)-1):
m.append(f['Adj Close'][i+1]/f['Adj Close'][i]) #calculating returns
Absolutely. Your first step should be to look at the traceback you get when your program crashes because of the invalid input you mention.
Then, simply wrap the line of the code you're crashing on with a try/except. Good Python style encourages you to be specific about what type of exception you're handling. So, for example, if the crash is raising a "ValueException" you'll want to do this:
try:
bad_line_of_code
except ValueException:
handle_the_issue
Related
There are better ways to skin this cat but I am going through it as an exercise to help me understand some of the python structures.
I have a list of stock tickers from various markets. Some of them or invalid, I don't know why but they are. I was using the yahoo financial historic download URL as a hack to iterate through them and find which ones returned something and which ones returned an error. I then kept a list of the good ones. That didn't work so well as the program stopped returning valid stocks. I think it was just pinging too fast but no matter.
I jumped over to YahooFinancials API and I can make it all work except I don't see how I can iterate through that same sketchy list and find the good tickers. It returns a string on a failure as opposed to a File not found exception so I can't trap it in a try/catch block.
Is there a way to test for this or is there some method in the API that would allow me to test for ticker validity?
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
import yfinance as yf
from yahoofinancials import YahooFinancials
df = pd.DataFrame()
dfG = pd.DataFrame(columns=['name'])
dfB = pd.DataFrame(columns=['name'])
GoodList = []
BadList = []
file = open("NYSE.txt", 'r')
symbols = file.readlines()
file.close
for i in symbols:
try:
dff = yf.download(i.strip(),
start='2021-06-09',
end='2021-06-12',
progress=False,
)
except:
BadList.append(i.strip())
print(f'The file {i.strip()} does not exist')
else:
GoodList.append(i.strip())
print(i.strip())
dfG["name"] = GoodList
dfB["name"] = BadList
dfG.to_csv('NYSE_Good.txt')
dfB.to_csv('NYSE_Bad.txt')
And example bad output is as such
1 Failed download:
PEB-H: No data found, symbol may be delisted
Which isn't an exception but I am not sure how to capture this and treat keep it out of the mix. Understand that the wording is left over from when i was checking for a historical file on Yahoo so that is why it looks a little odd. Just prototyping here.
Any thoughts?
Edit: I couldn't get the exception handling to work properly so I punted and used the condition of the returned empty dataset to handle it. It solved the problem.
for i in symbols:
try:
df2 = yf.download(i.strip(),
start='2021-06-09',
end='2021-06-12',
progress=False,
)
except:
print(f'There was an exception for {i.strip()} ')
else:
if df2.empty:
BadList.append(i.strip())
print(f'The file {i.strip()} does not exist')
else:
GoodList.append(i.strip())
print(i.strip())
I want to download historical intraday stock data. I've found AlphaVantage offers two years of data. It's the longest history of data that I've found for free.
I'm making a script to download the full two years of data for all ticker symbols that they offer and in all timeframes. They provide the data divided in 30 days intervals from the current day (or the last trading day, I'm not sure). The rows go from newest to oldest timedate. I want to reverse the order in which the data appears and concatenate all the months with the column headers appearing only once. So I would have a single csv file with two years of data for each stock and timeframe. The rows of the data would go from oldest to newest timedate.
The problem I have is that I also want to use the script to update the data and I don't know how to append only the data that doesn't already appear in my files. The data that I've downloaded goes from 2020-09-28 07:15:00 to 2020-10-26 20:00:00 in 15 minutes intervals (when they exists, there are some missing). When I use the script again I'd like to update the data. I'd like to delete somehow the rows that already appear and append only the rest. So if the last datetime that appears is for example 2020-10-26 20:00:00 it would continue appending from 2020-10-26 2020-10-26 20:15:00 if it exists. How can I update the data correctly?
Also, when updating, if the file already exists, it copies the column headers which is something I don't want to do. Edit: I've solved this with header=(not os.path.exists(file)) but it seems very inefficient to check if the file exists in every iteration.
I also have to make the script comply with the API's rule of 5 calls per minute and 500 calls per day. Is there a way for the script to stop when it reaches the daily limit and continue at that point next time it runs? Or should I just add a 173 seconds sleep between API calls?
import os
import glob
import pandas as pd
from typing import List
from requests import get
from pathlib import Path
import os.path
import sys
BASE_URL= 'https://www.alphavantage.co/'
def download_previous_data(
file: str,
ticker: str,
timeframe: str,
slices: List,
):
for _slice in slices:
url = f'{BASE_URL}query?function=TIME_SERIES_INTRADAY_EXTENDED&symbol={ticker}&interval={timeframe}&slice={_slice}&apikey=demo&datatype=csv'
pd.read_csv(url).iloc[::-1].to_csv(file, mode='a', index=False, encoding='utf-8-sig')
def main():
# Get a list of all ticker symbols
print('Downloading ticker symbols:')
#df = pd.read_csv('https://www.alphavantage.co/query?function=LISTING_STATUS&apikey=demo')
#tickers = df['symbol'].tolist()
tickers = ['IBM']
timeframes = ['1min', '5min', '15min', '30min', '60min']
# To download the data in a subdirectory where the script is located
modpath = os.path.dirname(os.path.abspath(sys.argv[0]))
# Make sure the download folders exists
for timeframe in timeframes:
download_path = f'{modpath}/{timeframe}'
#download_path = f'/media/user/Portable Drive/Trading/data/{timeframe}'
Path(download_path).mkdir(parents=True, exist_ok=True)
# For each ticker symbol download all data available for each timeframe
# except for the last month which would be incomplete.
# Each download iteration has to be in a 'try except' in case the ticker symbol isn't available on alphavantage
for ticker in tickers:
print(f'Downloading data for {ticker}...')
for timeframe in timeframes:
download_path = f'{modpath}/{timeframe}'
filepath = f'{download_path}/{ticker}.csv'
# NOTE:
# To ensure optimal API response speed, the trailing 2 years of intraday data is evenly divided into 24 "slices" - year1month1, year1month2,
# year1month3, ..., year1month11, year1month12, year2month1, year2month2, year2month3, ..., year2month11, year2month12.
# Each slice is a 30-day window, with year1month1 being the most recent and year2month12 being the farthest from today.
# By default, slice=year1month1
if Path(filepath).is_file(): # if the file already exists
# download the previous to last month
slices = ['year1month2']
download_previous_data(filepath, ticker, timeframe, slices)
else: # if the file doesn't exist
# download the two previous years
#slices = ['year2month12', 'year2month11', 'year2month10', 'year2month9', 'year2month8', 'year2month7', 'year2month6', 'year2month5', 'year2month4', 'year2month3', 'year2month2', 'year2month1', 'year1month12', 'year1month11', 'year1month10', 'year1month9', 'year1month8', 'year1month7', 'year1month6', 'year1month5', 'year1month4', 'year1month3', 'year1month2']
slices = ['year1month2']
download_previous_data(filepath, ticker, timeframe, slices)
if __name__ == '__main__':
main()
You have an awful lot of questions within your question!
These are suggestions for you to try, but I have no way to test the validity of them:
Read all your files names into a list check files names exist against the list rather than pinging the os each time
Read the data from existing file and append everything in pandas and write new file. Can't tell if you are appending the csv files but if you're having difficulty there just read the data and append new data - until you figure out how to append a excel correctly. Or save new iterations to their own file and consolidate files later.
Look into drop_duplicates() if you have concerned with having duplicates
Look into time module for time.sleep() in your for loops for reduce calls
If you have 1min data you can look into resample() to 5min, 15min rather than importing at all those timeframes
So I developed a script that would pull data from a live-updated site tracking coronavirus data. I set it up to pull data every 30 minutes but recently tested it on updates every 30 seconds.
The idea is that it creates the request to the site, pulls the html, creates a list of all of the data I need, then restructures into a dataframe (basically it's the country, the cases, deaths, etc.).
Then it will take each row and append to the rows of each of the 123 excel files that are for the various countries. This will work well for, I believe, somewhere in the range of 30-50 iterations before it either causes file corruptions or weird data entries.
I have my code below. I know it's poorly written (my initial reasoning was I felt confident I could set it up quickly and I wanted to collect data quickly.. unfortunately I overestimated my abilities but now I want to learn what went wrong). Below my code I'll include sample output.
PLEASE note that this 30 second interval code pull is only for quick testing. I don't usually look to send that many requests for months. I just wanted to see what the issue was. Originally it was set to pull every 30 minutes when I detected this issue.
See below for the code:
import schedule
import time
def RecurringProcess2():
import requests
from bs4 import BeautifulSoup
import pandas as pd
import datetime
import numpy as np
from os import listdir
import os
try:
extractTime = datetime.datetime.now()
extractTime = str(extractTime)
print("Access Initiated at " + extractTime)
link = 'https://www.worldometers.info/coronavirus/'
response = requests.get(link)
soup = BeautifulSoup(response.text,'html.parser').findAll('td')#[1107].get_text()
table = pd.DataFrame(columns=['Date and Time','Country','Total Cases','New Cases','Total Deaths','New Deaths','Total Recovered','Active Cases','Serious Critical','Total Cases/1M pop'])
soupList = []
for i in range(1107):
value = soup[i].get_text()
soupList.insert(i,value)
table = np.reshape(soupList,(123,-1))
table = pd.DataFrame(table)
table.columns=['Country','Total Cases','New Cases (+)','Total Deaths','New Deaths (+)','Total Recovered','Active Cases','Serious Critical','Total Cases/1M pop']
table['Date & Time'] = extractTime
#Below code is run once to generate the initial files. That's it.
# for i in range(122):
# fileName = table.iloc[i,0] + '.xlsx'
# table.iloc[i:i+1,:].to_excel(fileName)
FilesDirectory = 'D:\\Professional\\Coronavirus'
fileType = '.csv'
filenames = listdir(FilesDirectory)
DataFiles = [ filename for filename in filenames if filename.endswith(fileType) ]
for file in DataFiles:
countryData = pd.read_csv(file,index_col=0)
MatchedCountry = table.loc[table['Country'] == str(file)[:-4]]
if file == ' USA .csv':
print("Country Data Rows: ",len(countryData))
if os.stat(file).st_size < 1500:
print("File Size under 1500")
countryData = countryData.append(MatchedCountry)
countryData.to_csv(FilesDirectory+'\\'+file, index=False)
except :
pass
print("Process Complete!")
return
schedule.every(30).seconds.do(RecurringProcess2)
while True:
schedule.run_pending()
time.sleep(1)
When I check the code after some number of iterations (usually successful for like 30-50) it has either displayed only 2 rows and lost all other rows, or it'll keep appending while deleting a single entry in the row above while two rows above loses 2 entries, etc. (essentially forming a triangle of sorts).
Above that image would be a few hundred empty rows. Does anyone have an idea of what is going wrong here? I'd consider this a failed attempt but would still like to learn from this attempt. I appreciate any help in advance.
Hi as per my understanding the webpage only has one table element. My suggestion would be to use pandas read_html method as it provides clean and structured table.
Try the below code you can modify to schedule the same:-
import requests
import pandas as pd
url = 'https://www.worldometers.info/coronavirus/'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print(df)
Disclaimer: I'm still evaluating this solution. So far it works almost perfectly for 77 rows.
Originally I had set the script up to run for .xlsx files. I converted everything to .csv but retained the index column code:
countryData = pd.read_csv(file,index_col=0)
I started realizing that things were being ordered differently every time the script ran. I have since removed that from the code and so far it works. Almost.
Unnamed: 0 Unnamed: 0.1
0 7
7
For some reason I have the above output in every file. I don't know why. But it's in the first 2 columns yet it still seems to be reading and writing correctly. Not sure what's going on here.
I am setting up a raspberry pi zero to sense air quality, temperature and humidity. I am able to run the script to get the sensed data every 60 second. What should do if I need to save the data in a csv file at regular intervals?
import bme680
import time
import datetime
from datetime import datetime
from bme680 import BME680
from pms5003 import PMS5003, ReadTimeoutError
try:
sensor = bme680.BME680(bme680.I2C_ADDR_PRIMARY)
except IOError:
sensor = bme680.BME680(bme680.I2C_ADDR_SECONDARY)
pms5003 = PMS5003()
readings = pms5003.read()
sensor.set_humidity_oversample(bme680.OS_2X)
sensor.set_pressure_oversample(bme680.OS_4X)
sensor.set_temperature_oversample(bme680.OS_8X)
sensor.set_filter(bme680.FILTER_SIZE_3)
print('Data sensing')
try:
while True:
if sensor.get_sensor_data():
output = datetime.now().strftime('%Y-%m-%d,%H:%M:%S,')+'{0:.2f} C,{1:.2f} hPa,{2:.3f} %RH'.format(
sensor.data.temperature,
sensor.data.pressure,
sensor.data.humidity)
if pms5003.read():
print(output, readings)
else:
print(output)
time.sleep(60)
except KeyboardInterrupt:
pass
I expect the program to save the data in a csv file with headers like date, time, temperature , humidity etc.
To store that data into CSV or any other file format there are plenty of ways out there in python. If you want more controlled and more detailed csv then you can use Pandas or Numpy.
But if just you want a simple solution, then here it is.
def store_data(time,temperature,pressure,humidity):
append = [time,temperature,pressure,humidity]
with open('sensor_output.csv', 'a') as csvFile:
writer = csv.writer(csvFile)
writer.writerow(append)
csvFile.close()
Just pass your values in this function and python will handle rest. File will be created automatically and will be appended every time.
store_data(datetime.now().strftime('%Y-%m-%d,%H:%M:%S,'),sensor.data.temperature,sensor.data.pressure,sensor.data.humidity)
This is how you can call the function in your case.
#UPDATE:
If you are familiar with pandas and DataFrame then this answer might help you:
Writing a pandas DataFrame to CSV file
I am an absolute noob in terms of programming.
I wish to fetch historical data of a list of stock from yahoo for data analysis.
I modified the script I found and got this.
#settings for importing built-in datetime and date libraries
#and external pandas_datareader libraries
import pandas_datareader.data as web
import datetime
from datetime import timedelta
#read ticker symbols from a file to python symbol list
symbol = []
with open('E:\Google drive\Investment\Python Stock pick\Stocklist1.txt') as f:
for line in f:
symbol.append(line.strip())
f.close
end = datetime.datetime.today()
start = end - timedelta(days=400)
#set path for csv file
path_out = 'E:/Google drive/Investment/Python Stock pick/CSV/'
i=0
while i<len(symbol):
try:
df = web.DataReader(symbol[i], 'yahoo', start, end)
df.insert(0,'Symbol',symbol[i])
df = df.drop(['Adj Close'], axis=1)
if i == 0:
df.to_csv(path_out+symbol[i]+'.csv')
print (i, symbol[i],'has data stored to csv file')
else:
df.to_csv(path_out+symbol[i]+'.csv',header=True)
print (i, symbol[i],'has data stored to csv file')
except:
print("No information for ticker # and symbol:")
print (i,symbol[i])
i=i+1
continue
i=i+1
And I run the script everyday and it fetches stock data in the past.
It would replace the entire csv file and always replacing the old data with the new one.
Is there anyway for the script to just add the new data into the csv file?
Thanks a lot in advance. I am all new to the programming world and have no idea how to do this.
I think you need to add 'a+' instead. Otherwise the file will keep looping itself. It is what happened to me.
You have to add param 'a':
with open('E:\Google drive\Investment\PythonStockpic\Stocklist1.txt','a') as f:
f.write(line.strip())
see: append new row to old csv file python