I have a very simple caching service that caches files on S3. There are times when the file I am trying to cache locally does not exist on AWS S3. As such in one of my files that uses the caching service I prefer to return None if the file i am trying to cache is no found.
However I realize that i will be using the caching service in many other places and as a result I have been told by my peers that cache.cache_file() should still raise an error in this case but a simpler one like FileNotFoundError that doesn't require the caller to do if e.response["Error"]["Code"] == "404"
My Caching Code
import logging
import os
from pathlib import Path
from stat import S_IREAD, S_IRGRP, S_IROTH
from mylib import s3
from mylib.aws.clients import s3_client, s3_resource
logger = logging.getLogger(__name__)
class Cache:
def _is_file_size_equal(self, s3_path_of_file: str, local_path: Path, file_name: str) -> bool:
bucket, key = s3.deconstruct_s3_url(f"{s3_path_of_file}/{file_name}")
s3_file_size = s3_resource().Object(bucket, key).content_length
local_file_size = (local_path / file_name).stat().st_size
return s3_file_size == local_file_size
def cache_file(self, s3_path_of_file: str, local_path: Path, file_name: str) -> None:
bucket, key = s3.deconstruct_s3_url(f"{s3_path_of_file}/{file_name}")
if not (local_path / file_name).exists() or not self._is_file_size_equal(
s3_path_of_file, local_path, file_name
):
os.makedirs(local_path, exist_ok=True)
s3_client().download_file(bucket, key, f"{local_path}/{file_name}")
os.chmod(local_path / file_name, S_IREAD | S_IRGRP | S_IROTH)
else:
logger.info("Cached File is Valid!")
My Code that calls the Caching Code
def get_required_stream(environment: str, proxy_key: int) -> Optional[BinaryIO]:
s3_overview_file_path = f"s3://{TRACK_BUCKET}/{environment}"
overview_file = f"{some_key}.mv"
local_path = _cache_directory(environment)
try:
cache.cache_file(s3_overview_file_path, local_path, overview_file)
overview_file_cache = local_path / f"{proxy_key}.mv"
return overview_file_cache.open("rb")
except botocore.exceptions.ClientError as e:
if e.response["Error"]["Code"] == "404":
return None
else:
raise
Issue
Being new to Python I am a little unsure how this would work. I assume it means that my code that calls the caching service especially the except part would look something like this.
except FileNotFoundError:
return None
And in the caching service where i have s3_client().download_file(bucket, key, f"{local_path}/{file_name}") I would wrap it with a try and catch ?
While this question probably comes across as trivial and it is I thought I would ask it here anyway since it would be good learning opportunity for me and also understand how to write clean code. I would love suggestions on how I can achieve the desired and if my assumption is wrong?
def get_required_stream(environment: str, proxy_key: int) -> Optional[BinaryIO]:
s3_overview_file_path = f"s3://{TRACK_BUCKET}/{environment}"
overview_file = f"{some_key}.mv"
local_path = _cache_directory(environment)
try:
cache.cache_file(s3_overview_file_path, local_path, overview_file)
overview_file_cache = local_path / f"{proxy_key}.mv"
return overview_file_cache.open("rb")
except botocore.exceptions.ClientError as e:
if e.response["Error"]["Code"] == "404":
exc = FileNotFoundError()
exc.filename = overview_file_cache
raise exc
raise
# then you can use your function like this
try:
filedesc = get_required_stream(...)
except FileNotFoundError e:
print(f'{e.filename} not found')
If you do not want calling code to catch botocore.exceptions.ClientError exception, you could wrap your cache_file method in a try except block and throw a specific exception. I would also go a step further and create a simple custom exception object that wraps botocore.exceptions.ClientError and exposes error_code and error_message from boto exception. That way, caller doesn't have to separately catch FileNotFoundError when file not found and then again botocore.exceptions.ClientError for a different type of error (say permission or some network error). They can just catch the custom exception and further inspect for more details.
try:
//do something
except botocore.exceptions.ClientError as ex:
raise YourCustomS3Exception(ex) //YourCustomS3Exception needs to handle ex
One clean way would be an additional file_exists() method in the Cache class, so every user of the cache can check before they attempt the actual caching/download. Just like using pythons filesystem/path functions.
An exception can still occur if the file is deleted/becomes unreachable between the file_exists() call and the download, but I think in this rare case, the botocore exception is just fine.
Related
I am running the following try-except code:
try:
paths = file_system_client.get_paths("{0}/{1}/0/{2}/{3}/{4}".format(container_initial_folder, container_second_folder, chronological_date[0], chronological_date[1], chronological_date[2]), recursive=True)
list_of_paths=["abfss://{0}#{1}.dfs.core.windows.net/".format(storage_container_name, storage_account_name)+path.name for path in paths if ".avro" in path.name]
except Exception as e:
if e=="AccountIsDisabled":
pass
else:
print(e)
I want neither to print the following error if my try-except fells upon it nor to stop my program execution if I fell upon this error:
"(AccountIsDisabled) The specified account is disabled.
RequestId:3159a59e-d01f-0091-5f71-2ff884000000
Time:2020-05-21T13:09:03.3540242Z"
I just want to overpass it and print any other error/exception (eg. TypeError, ValueError, etc) that will occur.
Is this feasible in Python 3?
Please note that the .get_paths() method belongs to the azure.storage.filedatalake module which enables direct connection of Python with Azure Data Lake for path extraction.
I am giving the note to pinpoint that the Exception I am trying to bypass is not a built-in Exception.
[Update] In sort after following the proposed attached answers I modified my code to this:
import sys
from concurrent.futures import ThreadPoolExecutor
from azure.storage.filedatalake._models import StorageErrorException
from azure.storage.filedatalake import DataLakeServiceClient, DataLakeFileClient
storage_container_name="name1" #confidential
storage_account_name="name2" #confidential
storage_account_key="password" #confidential
container_initial_folder="name3" #confidential
container_second_folder="name4" #confidential
def datalake_connector(storage_account_name, storage_account_key):
global service_client
datalake_client = DataLakeServiceClient(account_url="{0}://{1}.dfs.core.windows.net".format("https", storage_account_name), credential=storage_account_key)
print("Client successfuly created!")
return datalake_client
def create_list_paths(chronological_date,
container_initial_folder="name3",
container_second_folder="name4",
storage_container_name="name1",
storage_account_name="name2"
):
list_of_paths=list()
print("1. success")
paths = file_system_client.get_paths("{0}/{1}/0/{2}/{3}/{4}".format(container_initial_folder, container_second_folder, chronological_date[0], chronological_date[1], chronological_date[2]), recursive=True)
print("2. success")
list_of_paths=["abfss://{0}#{1}.dfs.core.windows.net/".format(storage_container_name, storage_account_name)+path.name for path in paths if ".avro" in path.name]
print("3. success")
list_of_paths=functools.reduce(operator.iconcat, result, [])
return list_of_paths
service_client = datalake_connector(storage_account_name, storage_account_key)
file_system_client = service_client.get_file_system_client(file_system=storage_container_name)
try:
list_of_paths=[]
executor=ThreadPoolExecutor(max_workers=8)
print("Start path extraction!")
list_of_paths=[executor.submit(create_list_paths, i, container_initial_folder, storage_container_name, storage_account_name).result() for i in date_list]
except:
print("no success")
print(sys.exc_info())
Unfortunately the StorageErrorException cannot be handled for a reason, I am still getting the following stdout:
Listing [Python 3.Docs]: Compound statements - The try statement.
There are several ways of achieving this. Here's one:
try:
# ...
except StorageErrorException:
pass
except:
print(sys.exc_info()[1])
Note that except: is tricky because you might silently handle exceptions that you shouldn't. Another way would be to catch any exception the code could raise explicitly.
try:
# ...
except StorageErrorException:
pass
except (SomeException, SomeOtherException, SomeOtherOtherException) as e:
print(e)
Quickly browsing [MS.Docs]: filedatalake package and the sourcecode, revealed that StorageErrorException (which extends [MS.Docs]: HttpResponseError class) is the one that you need to handle.
Might want to check [SO]: About catching ANY exception.
Related to the failure of catching the exception, apparently there are 2 having the same name:
azure.storage.blob._generated.models._models_py3.StorageErrorException (currently imported)
azure.storage.filedatalake._generated.models._models_py3.StorageErrorException
I don't know the rationale (I didn't work with the package), but given the fact the package raises an exception defined in another package when it also defines one with the same name, seems lame. Anyway importing the right exception solves the problem.
As a side note, when dealing with this kind of situation, don't only import the base name, but work with the fully qualified one:
import azure.storage.filedatalake._generated.models.StorageErrorException
you want to compare the type of the exception, change your condition to:
if type(e)==AccountIsDisabled:
example:
class AccountIsDisabled(Exception):
pass
print("try #1")
try:
raise AccountIsDisabled
except Exception as e:
if type(e)==AccountIsDisabled:
pass
else:
print(e)
print("try #2")
try:
raise Exception('hi', 'there')
except Exception as e:
if type(e)==AccountIsDisabled:
pass
else:
print(e)
Output:
try #1
try #2
('hi', 'there')
There are two functions: one downloads the excel file (ExcelFileUploadView(APIView)) and the other processes the downloaded file(def parse_excel_rfi_sheet).
Function parse_excel_rfi_sheet is called inside ExcelFileUploadView(APIView)
class ExcelFileUploadView(APIView):
parser_classes = (MultiPartParser, FormParser)
permission_classes = (permissions.AllowAny,)
def put(self, request, format=None):
if 'file' not in request.data:
raise ParseError("Empty content")
f = request.data['file']
filename = f.name
if filename.endswith('.xlsx'):
try:
file = default_storage.save(filename, f)
r = parse_excel_rfi_sheet(file)
status = 200
except:
raise Exception({"general_errors": ["Error during file upload"]})
finally:
default_storage.delete(file)
else:
status = 406
r = {"general_errors": ["Please upload only xlsx files"]}
return Response(r, status=status)
def parse_excel_rfi_sheet(file):
workbook = load_workbook(filename=file)
sheet = workbook["RFI"]
curent_module_coordinate = []
try:
....
curent_module_coordinate.append(sheet['E688'].value)
curent_module_coordinate.append(sheet['E950'].value)
if check_exel_rfi_template_structure(structure=curent_module_coordinate):
file_status = True
else:
file_status = False
except:
raise Exception({"general_errors": ["Error during excel file parsing. Unknown module cell"]})
The problem is that when an error occurs inside the parse_excel_rfi_sheet, I do not see a call of {"general_errors": ["Error during excel file parsing. Unknown module cell"]}
Instead, I always see the call
{"general_errors": ["Error during file upload"]}
That's why I can't understand at what stage the error occurred: at the moment of downloading the file or at the moment of processing.
How to change this?
Since you are calling parse_excel_rfi_sheet from ExcelFileUploadView whenever the exception {"general_errors": ["Error during excel file parsing. Unknown module cell"]} is raised from parse_excel_rfi_sheet function try block from ExcelFileUploadView fails and comes to except and raises the exception {"general_errors": ["Error during file upload"]}.
You can verify this by printing the exception raised by the ExcelFileUploadView function.
Chane the try block to the following:
try:
file = default_storage.save(filename, f)
r = parse_excel_rfi_sheet(file)
status = 200
except Exception as e:
print("Exception raised ", e)
raise Exception({"general_errors": ["Error during file upload"]})
Your problem comes from catching absolutely all exceptions, first in parse_excel_rfi_sheet, then once again in your put method. Both bare except clause (except: whatever_code_here) and large try blocks are antipatterns - you only want to catch the exact exceptions you're expecting at a given point (using except (SomeExceptionType, AnotherExceptionType, ...) as e:, and have as few code as possible in your try blocks so you are confident you know where the exception comes from.
The only exception (no pun intended) to this rule is the case of "catch all" handlers at a higher level, that are use to catch unexpected errors, log them (so you have a trace of what happened), and present a friendly error message to the user - but even then, you don't want a bare except clause but a except Exception as e.
TL;DR: never assume anything about which exception was raised, where and why, and never pass exceptions silently (at least log them - and check your logs).
raise Exception(...) generates a new Exception instance and raises that one.
This means, the try ... except in put effectively throws away the exception it caught and replaces it with a new one with message "Error during file upload", which is why you always see the same message.
A clean way to handle this would be to define a custom subclass of Exception (e.g., InvalidFormatException) and raise that one in parse_excel_rfi_sheet, having two different except cases in put:
class InvalidFormatException(Exception):
pass
[...]
def parse_excel_rfi_sheet(file):
workbook = load_workbook(filename=file)
sheet = workbook["RFI"]
curent_module_coordinate = []
try:
....
curent_module_coordinate.append(sheet['E688'].value)
curent_module_coordinate.append(sheet['E950'].value)
if check_exel_rfi_template_structure(structure=curent_module_coordinate):
file_status = True
else:
file_status = False
except:
raise InvalidFormatException({"general_errors": ["Error during excel file parsing. Unknown module cell"]})
Your put then becomes:
def put(self, request, format=None):
if 'file' not in request.data:
raise ParseError("Empty content")
f = request.data['file']
filename = f.name
if filename.endswith('.xlsx'):
try:
file = default_storage.save(filename, f)
r = parse_excel_rfi_sheet(file)
status = 200
except InvalidFormatException:
raise # pass on the exception
except:
raise Exception({"general_errors": ["Error during file upload"]})
finally:
default_storage.delete(file)
else:
status = 406
r = {"general_errors": ["Please upload only xlsx files"]}
return Response(r, status=status)
Warning: As pointed out in the comments to this answer, note that -although not directly inquired- the OP's code should be further modified to remove the bare except: clause, as this is probably not the expected behaviour.
What is the proper way to test a scenario that while initializing my object an exception will be raised? With given snippet of code:
def __init__(self, snmp_node: str = "0", config_file_name: str = 'config.ini'):
[...]
self.config_file_name = config_file_name
try:
self.config_parser.read(self.config_file_name)
if len(self.config_parser.sections()) == 0:
raise FileNotFoundError
except FileNotFoundError:
msg = "Error msg"
return msg
I tried the following test:
self.assertTrue("Error msg", MyObj("0", 'nonExistingIniFile.ini')
But I got an AssertionError that init may not return str.
What is the proper way to handle such situation? Maybe some other workaround: I just want to be sure that if an user passes wrong .ini file the program won't accept that.
init is required to return None. I think you are looking for self.assertRaises
with self.assertRaises(FileNotFoundError):
MyObj("0", 'nonExistingIniFile.ini')
I've written a python script that wants to write logs to a file at /var/log/myapp.log. However, on some platforms this doesn't exist, or we might not have permission to do that. In that case, I'd like to try writing somewhere else.
def get_logfile_handler():
log_file_handler = None
log_paths = ['/var/log/myapp.log', './myapp.log']
try:
log_file_handler = logging.FileHandler(log_paths[0])
except IOError:
log_file_handler = logging.FileHandler(log_paths[1])
return log_file_handler
The above code may work, but it seems far from elegant - in particular, trying a different file as part of the exception handling seems wrong. It could just throw another exception!
Ideally, it would take an arbitrary list of paths rather than just two. Is there an elegant way to write this?
you can simply use a loop such as:
def get_logfile_handler():
log_file_handler = None
log_paths = ['/var/log/myapp.log', './myapp.log']
for log_path in log_paths:
try:
return logging.FileHandler(log_path)
except IOError:
pass
raise Exception("Cannot open log file!")
HTH
There is, as #PM-2's comment suggests, no need to reference each possible path individually. You could try something like this:
def getlogfile_handler():
log_file_handler = None
log_paths = ('/var/log/myapp.log', './myapp.log') # and more
for log_path in log_paths:
try:
log_file_handler = logging.FileHandler(log_path)
break
except IOError:
continue
else:
raise ValueError("No log path available")
return log_file_handler
The else clause handles the case where the loop terminates without finding a suitable log_path value. If the loop breaks early then (and only then) the return statement is executed.
It's perfectly OK to use exceptions for control flow purposes like this - the cases are exceptional but they aren't errors - the only real error occurs when no path can be found, in which case the code raises its own exception that the caller may catch if it so chooses.
def get_logfile_handler( log_files ):
log_file_handler = None
for path in log_files:
try:
log_file_handler = logging.FileHandler(path)
break
except IOError:
pass
return log_file_handler
This may solve your problem. You can define locally/globally instead of passing as a parameter.
As a possible solution if you want to avoid handling exceptions you can check the Operating System you are on by using the platform module and then choose the required path. Then you can use the os and stat modules in order to check permissions.
import platform
import os
import stat
...
#the logic goes here
...
I created a class named Options. It works fine but not not with Python 2.
And I want it to work on both Python 2 and 3.
The problem is identified: FileNotFoundError doesn t exist in Python 2.
But if I use IOError it doesn t work in Python 3
Changed in version 3.3: EnvironmentError, IOError, WindowsError, VMSError, socket.error, select.error and mmap.error have been merged into OSError.
What should I do ???(Please do not discuss my choice of portability, I have reasons.)
Here s the code:
#!/usr/bin/python
#-*-coding:utf-8*
#option_controller.py
#Walle Cyril
#25/01/2014
import json
import os
class Options():
"""Options is a class designed to read, add and change informations in a JSON file with a dictionnary in it.
The entire object works even if the file is missing since it re-creates it.
If present it must respect the JSON format: e.g. keys must be strings and so on.
If something corrupted the file, just destroy the file or call read_file method to remake it."""
def __init__(self,directory_name="Cache",file_name="options.json",imported_default_values=None):
#json file
self.option_file_path=os.path.join(directory_name,file_name)
self.directory_name=directory_name
self.file_name=file_name
#self.parameters_json_file={'sort_keys':True, 'indent':4, 'separators':(',',':')}
#the default data
if imported_default_values is None:
DEFAULT_INDENT = 2
self.default_values={\
"translate_html_level": 1,\
"indent_size":DEFAULT_INDENT,\
"document_title":"Titre"}
else:
self.default_values=imported_default_values
def read_file(self,read_this_key_only=False):
"""returns the value for the given key or a dictionary if the key is not given.
returns None if it s impossible"""
try:
text_in_file=open(self.option_file_path,'r').read()
except FileNotFoundError:#not 2.X compatible
text_in_file=""#if the file is not there we re-make one with default values
if text_in_file=="":#same if the file is empty
self.__insert_all_default_values()
text_in_file=open(self.option_file_path,'r').read()
try:
option_dict=json.loads(text_in_file)
except ValueError:
#if the json file is broken we re-make one with default values
self.__insert_all_default_values()
text_in_file=open(self.option_file_path,'r').read()
option_dict=json.loads(text_in_file)
if read_this_key_only:
if read_this_key_only in option_dict:
return option_dict[read_this_key_only]#
else:
#if the value is not there it should be written for the next time
if read_this_key_only in self.default_values:
self.add_option_to_file(read_this_key_only,self.default_values[read_this_key_only])
return self.default_values[read_this_key_only]
else:
#impossible because there is not default value so the value isn t meant to be here
return None
else:
return option_dict
def add_option_to_file(self,key,value):#or update
"""Adds or updates an option(key and value) to the json file if the option exists in the default_values of the object."""
option_dict=self.read_file()
if key in self.default_values:
option_dict[key]=value
open(self.option_file_path,'w').write(\
json.dumps(option_dict,sort_keys=True, indent=4, separators=(',',':')))
def __insert_all_default_values(self):
"""Recreate json file with default values.
called if the document is empty or non-existing or corrupted."""
try:
open(self.option_file_path,'w').write(\
json.dumps(self.default_values,sort_keys=True, indent=4, separators=(',',':')))
except FileNotFoundError:
os.mkdir(self.directory_name)#Create the directory
if os.path.isdir(self.directory_name):#succes
self.__insert_all_default_values()
else:
print("Impossible to write in %s and file %s not found" % (os.getcwd(),self.option_file_path))
#demo
if __name__ == '__main__':
option_file_object=Options()
print(option_file_object.__doc__)
print(option_file_object.read_file())
option_file_object.add_option_to_file("","test")#this should have no effect
option_file_object.add_option_to_file("translate_html_level","0")#this should have an effect
print("value of translate_html_level:",option_file_object.read_file("translate_html_level"))
print(option_file_object.read_file())
If FileNotFoundError isn't there, define it:
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
Now you can catch FileNotFoundError in Python 2 since it's really IOError.
Be careful though, IOError has other meanings. In particular, any message should probably say "file could not be read" rather than "file not found."
You can use the base class exception EnvironmentError and use the 'errno' attribute to figure out which exception was raised:
from __future__ import print_function
import os
import errno
try:
open('no file of this name') # generate 'file not found error'
except EnvironmentError as e: # OSError or IOError...
print(os.strerror(e.errno))
Or just use IOError in the same way:
try:
open('/Users/test/Documents/test') # will be a permission error
except IOError as e:
print(os.strerror(e.errno))
That works on Python 2 or Python 3.
Be careful not to compare against number values directly, because they can be different on different platforms. Instead, use the named constants in Python's standard library errno module which will use the correct values for the run-time platform.
The Python 2 / 3 compatible way to except a FileNotFoundError is this:
import errno
try:
with open('some_file_that_does_not_exist', 'r'):
pass
except EnvironmentError as e:
if e.errno != errno.ENOENT:
raise
Other answers are close, but don't re-raise if the error number doesn't match.
Using IOError is fine for most cases, but for some reason os.listdir() and friends raise OSError instead on Python 2. Since IOError inherits from OSError it's fine to just always catch OSError and check the error number.
Edit: The previous sentence is only true on Python 3. To be cross compatible, instead catch EnvironmentError and check the error number.
For what it's worth, although the IOError is hardly mentioned in Python 3's official document and does not even showed up in its official Exception hierarchy, it is still there, and it is the parent class of FileNotFoundError in Python 3. See python3 -c "print(isinstance(FileNotFoundError(), IOError))" giving you a True. Therefore, you can technically write your code in this way, which works for both Python 2 and Python 3.
try:
content = open("somefile.txt").read()
except IOError: # Works in both Python 2 & 3
print("Oops, we can not read this file")
It might be "good enough" in many cases. Although in general, it is not recommended to rely on an undocumented behavior. So, I'm not really suggesting this approach. I personally use Kindall's answer.