I would like to read a set of csv files from URL as dataframes. These files contain a date in their name like YYYYMMDD.csv in their names. I need to iterate over a set of predefined dates and read the corresponding file into a Python.
Sometimes the file does not exist and an error as follows is thrown:
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
What I would do in this situation is to add one day to the date like turning 2020-05-01 to 2020-05-02 and in case of throwing the aforementioned error I would add 2 days to the date or at most 3 days until there is a url available without an error.
I would like to know how I can write it in a program maybe with nested try - except where if adding 1 day to the date leads to a URL without error the subsequent steps are not executed.
As I don't have the data I will use the following URL as an example:
import pandas as pd
import requests
url = 'http://winterolympicsmedals.com/medals.csv'
s = requests.get(url).content
c = pd.read_csv(s)
Here the file being read is medals.csv. If you try madels.csv or modals.csv you will get the error I am talking about. So I need to know how I can control the errors in 3 steps by replacing the file name until I get the desired dataframe like first we try madels.csv resulting in an error, then models.csv also resulting in an error and after that medals.csv which result in the desired output.
My problem is that sometimes the modification I made to the file also fails in except so I need to know how I can accommodate a second modification.
No need to do any nested try-except blocks, all you need is one try-except and a for loop.
First, function that tries to read a file (returns content of the file, or None if the file is not found):
def read_file(fp):
try:
with open(fp, 'r') as f:
text = f.read()
return text
except Exception as e:
print(e)
return None
Then, function that tries to find a file from a predefined date (example input would be '20220514'). The functions tries to read content of the file with the given date, or dates up to 3 days after it:
from datetime import datetime, timedelta
def read_from_predefined_date(date):
date_format = '%Y%m%d'
date = datetime.strptime(date, date_format)
result = None
for i in range(4):
date_to_read = date + timedelta(days=i)
date_as_string = date_to_read.strftime(date_format)
fp = f'data/{date_as_string}.csv'
result = read_file(fp)
if result:
break
return result
To test, e.g. create a data/20220515.csv and run following code:
d = '20220514'
result = read_from_predefined_date(d)
here's a simple function that given that URL will do exactly what you ask. Be aware that a slight change in the input url can lead to a few errors so make sure the date format is exactly what you mentioned. In any case:
Notes: URL Parsing
import pandas as pd
import datetime as dt
def url_next_day(url):
# if the full url is passed I suggest you would use urllib parse but
# urllib parse but here's a workaround
filename = url.rstrip("/").split("/")[-1]
date=dt.datetime.strptime(filename.strip(".csv"),"%Y%m%d").date()
date_plus_one_day= date + dt.timedelta(days=1)
new_file_name= dt.datetime.strftime(date_plus_one_day,"%Y%m%d")+".csv"
url_next_day=url.replace(filename,new_file_name)
return url_next_day
for url in list_of_urls:
try:
s = requests.get(url).content
c = pd.read_csv(s)
except Exception as e:
print(f"Invalid URL: {url}. The error: {e}. Trying the days after...")
for _ in range(3): #because you want at most 3 days after
try:
url=url_next_day(url)
s = requests.get(url).content
c = pd.read_csv(s)
break
except Exception:
pass
else:
print("No file available in the days after. Moving on")
Happy Coding!
OK, I have enough changes I want to recommend on top of #Daniel Gonçalves's initial solution that I'm going to post them as a second answer.
1- The loop trying additional days needs to break when it got a hit, so it doesn't keep going.
2- That loop needs an else: block to handle the complete failure case.
3- It is best practice to catch only the exception you mean to catch and know how to handle. Here a urllib.error.HTTPError means a failure to fetch the page, but an other exception would mean something else is wrong with the program, and it would be best not to catch that, so you would notice it and fix your program when that happens.
The result:
for url in list_of_urls:
try:
s = requests.get(url).content
c = pd.read_csv(s)
except urllib.error.HTTPError as e:
print(f"Invalid URL: {url}. The error: {e}. Trying the days after...")
for _ in range(3): #because you want at most 3 days after
try:
url = url_next_day(url)
s = requests.get(url).content
c = pd.read_csv(s)
break
except urllib.error.HTTPError:
print(f"Also failed to fetch {url}...")
else:
# this block is only executed if the loop never breaks
print("No file available in the days after. Moving on.")
c = None # or an empty data frame, or whatever won't crash the rest of your code
Related
Hi I have a function that parses the first 60 lines in a file and is supposed to alert a user if there are lines that are entirely whitespace. However these can occur anywhere in those 60 lines so I want the script to parse the entire 60 lines mainly because I need data from a couple of the lines for my error reporting. And we might want to know where those errors occur in the future. I wrote this:
def header_data(data):
dictionary = {}
datalen = len(data)
itrcntr = 0
try:
for line in data:
itrcntr += 1
if line.isspace():
raise Exception('File has badly formatted header data line(s)')
else:
linesplit = line.rstrip().split(":")
if len(linesplit) > 1:
dictionary[linesplit[0]] = linesplit[1].strip()
return dictionary
except Exception as e:
errmsg = str(e)
if itrcntr == datalen:
return (dictionary, errmsg)
else:
pass
With this function, I'd expect if it sees that itrcntr is not equal datalen, it would pass and go back to the try block and move on to the next line. But this doesn't happen. INstead it breaks out of the function and continues in the next line in the function caller. How can I make it continue looping till it reaches to the end of the loop in which it will then return the dictionary along with the error message? Or can this not be done with try catch exception handlers?
Unless you want to catch exceptions other than the situation of line.isspace, I wouldn't use a try block at all. Just collect your errors in a list, eg:
errors = []
for line in data:
itrcntr += 1
if line.isspace():
errors.append('File has badly formatted header data at line %d.' % itrcntr)
# then at the end:
if errors:
# do something about it...
If any exception occurred, try clause will be skipped and except clause
will run.
If you raise an exception anywhere in the Try, everything else will be skipped. So if you want the loop to continue, then just don't use Try Except.
Just collect every error message and then return it.
I am trying to scrape data using yfinance and have met a road block when trying to retrieve a ticker with no data, the error is - 7086.KL: No data found for this date range, symbol may be delisted.
How do I try catch this error? I've tried try catching it as seen in the code below but it still prints that error.
The code:
tickerdata = yf.Ticker("7086.KL")
try:
history = tickerdata.history(start="2019-06-01", end="2020-05-01")
except ValueError as ve:
print("Error")
Any advise on how to solve this?
I just had a look at the source code. It looks like they are indeed just printing the message. But they are also adding the error to a dictionary in the shared.py file. You can use this to check for errors:
from yfinance import shared
ticker = <ticker as string>
tickerdata = yf.Ticker(ticker)
history = tickerdata.history(start="2019-06-01", end="2020-05-01")
error_message = shared._ERRORS[ticker]
I see that this seems to be a common error, but I'm not seeing the answer for my case.
UnboundLocalError: local variable 'tfstate_dict' referenced before assignment
#!/usr/bin/env python
import json
def main():
get_sfleet_id()
def get_sfleet_id():
try:
f=open("terraform_remote.tfstate", "r")
contents =f.read()
tfstate_dict = json.load(contents)
except:
print("error loading %s" % f)
print(contents)
print(tfstate_dict)
if __name__ == '__main__': main()
As I wrote in my comment, tfstate_dict does not come in to existence until you exit your try block. But that's not to say that it applies to all the preceding code; it simply applies to tfstate_dict because it happens to be the very last line.
This is easily testable with the following:
try:
a = int(2)
b = int(3)
c = int('hi')
except:
print(locals())
print()
print(locals().get('a'))
You should see that 'a' and 'b' are both defined and can be accessed (depending on how you're running this code, there could a lot of stuff in locals() too). So, the existence of 'a' and 'b' gives you no assurance that 'c' exists.
There's two issues with your current exception handling:
There's probably too much going on in your try block to be handled the way you currently do. This code will fail if the file cannot be located, but you wouldn't necessarily know that was happening. And if your code originally failed only on tfstate_dict = json.load(contents) you're now scratching your head why you're getting a NameError on print(contents) all of a sudden.
You don't want to be catching these issues with blanket except. At a minimum you'll want to use except Exception as e:, which allows you to print e too.
Here's a hypothetical situation where you handle the file not existing, and you also give a shot at parsing JSON.
import json
from json.decoder import JSONDecodeError
try:
with open('something.json') as infile:
try:
#data = json.load(infile) # This is what you'd really use
data = json.loads("{hi: 2}") # But let's make it fail
except JSONDecodeError:
print("Not valid JSON, try something else")
data = infile.read()
except FileNotFoundError:
print("Can't find file")
data = ''
I am using the following code to read data from yahoo finance from a list symbols downloaded from nasdaq.
pnls = {i:dreader.DataReader(i,'yahoo','1985-01-01','2017-03-30') for i in symbols}
for df_name in pnls:
try:
pnls.get(df_name).to_csv("/Users/Jiong/Documents/data/{}_data.csv".format(df_name), index=True, header=True)
except:
print("error {}".format(df_name))
else:
print("done {}".format(df_name))
Guess some symbols may not be valid and Yahoo Finance throws RemoteDataError Exception.
The code above is supposed to continue on, but it stopped still at the errors.
Isn't except to catch all exceptions? or this is run time error ?
Anyway to get the code ignores it and continue? Thanks. See errors below running
118 if params is not None and len(params) > 0:
119 url = url + "?" + urlencode(params)
--> 120 raise RemoteDataError('Unable to read URL: {0}'.format(url))
121
122 def _read_lines(self, out):
RemoteDataError: Unable to read URL: http://ichart.finance.yahoo.com/table.csv?c=1985&f=2017&s=MITT%5EA&g=d&ignore=.csv&d=2&e=30&a=0&b=1
You need to handle the raise exception or it will halt at the place of raised. So, if a raise exception is not caught and handled, it will be interrupted.
What you need is something like this :
except RemoteDataError as exp :
print('Unable to read URL: {0}'.format(url))
You can refer to this documentation for more details on errors.
Grabbed this import from another page; it seemed to correct the issue for me. I was getting a remote error while trying to pull stock data from Yahoo. You'll need to do the following import prior to adding the RemoteDataError exception:
from pandas_datareader._utils import RemoteDataError
df_ = pd.DataFrame()
for i in assets:
print(i)
try:
vec = web.DataReader(i, 'yahoo', start='12/10/2006', end='2/1/2007')
vec['asset'] = i
vec['returns_close_raw'] = np.log(vec.Close/vec.Close.shift())
vec['returns_open_raw'] = np.log(vec.Open/vec.Open.shift())
vec['returns_open_raw10'] = np.log(vec.Open/vec.Open.shift(10))
vec['returns_close_raw10'] = np.log(vec.Close/vec.Close.shift(10))
df_ = pd.concat([df_, vec])
except RemoteDataError:
print('remote error')
except KeyError:
print('key error')
I am a relative python newbie and I am getting confused with how to properly handle exceptions. Apologies for the dumb question.
In my main() I iterate through a list of dates and for each date I call a function, which downloads a csv file from a public web server. I want to properly catch exceptions for obvious reasons but especially because I do not know when the files of interest will be available for download. My program will execute as part of a cron job and will attempt to download these files every 3 hours if available.
What I want is to download the first file in the list of dates and if that results in a 404 then the program shouldn't proceed to the next file because the assumption is if the oldest date in the list is not available then none of the others that come after it will be available either.
I have the following python pseudo code. I have try/except blocks inside the function that attempts to download the files but if an exception occurred inside the function how do I properly handle it in the main() so I can make decisions whether to proceed to the next date or not. The reason why I created a function to perform the download is because I want to re-use that code later on in the same main() block for other file types.
def main():
...
...
# datelist is a list of date objects
for date in datelist:
download_file(date)
def download_file(date):
date_string = str(date.year) + str(date.strftime('%m')) + str(date.strftime('%d'))
request = HTTP_WEB_PREFIX+ date_string + FILE_SUFFIX
try:
response = urllib2.urlopen(request)
except urllib2.HTTPError, e:
print "HTTPError = " + str(e)
except urllib2.URLError, e:
print "URLError = " + str(e)
except httplib.HTTPException, e:
print "HTTPException = " + str(e)
except IOError:
print "IOError = " + str(e)
except Exception:
import traceback
print "Generic exception: " + traceback.format_exc()
else:
print "No problem downloading %s - continue..." % (response)
try:
with open(TMP_DOWNLOAD_DIRECTORY + response, 'wb') as f:
except IOError:
print "IOError = " + str(e)
else:
f.write(response.read())
f.close()
The key concept here is, if you can fix the problem, you should trap the exception; if you can't, it's the caller's problem to deal with. In this case, the downloader can't fix things if the file isn't there, so it should bubble up its exceptions to the caller; the caller should know to stop the loop if there's an exception.
So let's move all the exception handling out of the function into the loop, and fix it so it craps out if there's a failure downloading the file, as the spec requires:
for date in datelist:
date_string = str(date.year) +
str(date.strftime('%m')) +
str(date.strftime('%d'))
try:
download_file(date_string)
except:
e = sys.exc_info()[0]
print ( "Error downloading for date %s: %s" % (date_string, e) )
break
download_file should now, unless you want to put in retries or something like that, simply not trap the exceptions at all. Since you've decoded the date as you like in the caller, that code can come out of download_file as well, giving the much simpler
def download_file(date_string):
request = HTTP_WEB_PREFIX + date_string + FILE_SUFFIX
response = urllib2.urlopen(request)
print "No problem downloading %s - continue..." % (response)
with open(TMP_DOWNLOAD_DIRECTORY + response, 'wb') as f:
f.write(response.read())
f.close()
I would suggest that the print statement is superfluous, but that if you really want it, using logger is a more flexible way forward, as that will allow you to turn it on or off as you prefer later by changing a config file instead of the code.
From my understanding of your question... you should just insert code into the except blocks that you want to execute when you encounter the particular exception. You don't have to print out the encountered error, you can do whatever you feel is necessary when it is raised... provide a popup box with information/options or otherwise direct your program to the next step. Your else section should isolate that portion, so it will only execute if none of your exceptions are raised.