Prevent pytest from caching modules that dynamically modify and reimport themselves

Prevent pytest from caching modules that dynamically modify and reimport themselves - python

I am trying to use pytest to test some code that uses towerlib, an external Python package. Part of the functionality is that towerlib will dynamically add different attributes to its credential.py module, and then the class imports its own module and looks for that attribute:
This is in towerlib.entities.credential.py:
class Credential: # pylint: disable=too-few-public-methods
"""Credential factory to handle the different credential types returned."""
def __new__(cls, tower_instance, data):
try:
credential_type_name = tower_instance.get_credential_type_by_id(data.get('credential_type')).name
credential_type_name = ''.join(credential_type_name.split())
credential_type = f'{credential_type_name}Credential'
credential_type_obj = getattr(importlib.import_module('towerlib.entities.credential'), credential_type)
credential = credential_type_obj(tower_instance, data)
except Exception: # pylint: disable=broad-except
LOGGER.exception(
'Could not dynamically load credential with type : "%s", trying a generic one.', credential_type
)
credential = GenericCredential(tower_instance, data)
return credential
I assume this pytest does some sort of caching or other behavior that doesn't get updated when the module dynamically adds attributes to itself and reimports itself. Does anyone know if that's how pytest works, and if that's configurable with a flag or decorator on my test?
The function that I'm trying to test works correctly when run outside of pytest, but when running it in a pytest function, I'm getting the "Could not dynamically load credential with type" exception.
Running in the REPL:
>>> e2e.prepare_and_run_e2e(["my_device"], None, None, False, True, True, False, True)
^this returns a full results set with no errors.
My pytest function:
#pytest.mark.parametrize("device", get_all_e2e_devices())
def test_e2e(device):
results = e2e.prepare_and_run_e2e([device], None, None, False, True, True, False, True)
...
Ends up with:
ERROR credentials:credential.py:189 Could not dynamically load credential with type : "NetworkCredential", trying a generic one.
Traceback (most recent call last):
File "{my local path}\.venv\lib\site-packages\towerlib\entities\credential.py", line 186, in __new__
credential_type_obj = getattr(importlib.import_module('towerlib.entities.credential'), credential_type)
AttributeError: module 'towerlib.entities.credential' has no attribute 'NetworkCredential'
I tried running pytest with -p no:cacheprovider, but that gives me the same result.
I don't want to patch/mock around the Credential because this is a special test in my application, and I want the test to actually get the towerlib Credential and talk to an external system. I also don't have the ability to change the somewhat confusing approach that towerlib is taking, since it's an external library, and I need to share this test across a team that will just be using the standard towerlib.

Related

serverless starting error: ModuleNotFoundError: No module named 'handler' [duplicate]

I have a deployment package in the following structure:
my-project.zip
--- my-project.py
------ lambda_handler()
Then I define the handler path in configuration file
my-project.lambda_handler
Get the error:
'handler' missing on module
Can not understand that

There are some issues occurring this error.
Issue#1:
The very first issue you’re gonna run into is if you name the file incorrectly, you get this error:
Unable to import module 'lambda_function': No module named lambda_function
If you name the function incorrectly you get this error:
Handler 'handler' missing on module 'lambda_function_file': 'module'
object has no attribute 'handler'
On the dashboard, make sure the handler field is entered as function_filename.actual_function_name and make sure they match up in your deployment package.
If only the messages were a bit more instructive that would have been a simpler step.
Resource Link:
No lambda_function?
Issue#2:
adrian_praja has solved the issue in aws forum. He answered the following
I belive your index.js should contain
exports.createThumbnailHandler = function(event, context) {}
Issue#3:
Solution: Correctly specify the method call
This happens when the specification of the method called by node.js is incorrect in Lambda's setting.
Please review the specification of the method to call.
In the case of the above error message, I attempted to call the handler method of index.js, but the corresponding method could not be found.
The processing to call is set with "Handler" on the configuration tab.
Below is an example of setting to call the handler method of index.js.
Resource Link:
http://qiita.com/kazuqqfp/items/ac8d93918d0030b31aad
AWS Lambda Function is returning Handler 'handler' missing on module 'index'

I had this issue and had to make sure I had a function called handler in my file, e.g.:
# this just takes whatever is sent to the api gateway and sends it back
def handler(event, context):
try:
return response(event, 200)
except Exception as e:
return response('Error' + e.message, 400)
def response(message, status_code):
return message

Apache Airflow - connecting to AWS S3 error

I'm trying to get S3 hook in Apache Airflow using the Connection object.
It looks like this:
class S3ConnectionHandler:
def __init__():
# values are read from configuration class, which loads from env. variables
self._s3 = Connection(
conn_type="s3",
conn_id=config.AWS_CONN_ID,
login=config.AWS_ACCESS_KEY_ID,
password=config.AWS_SECRET_ACCESS_KEY,
extra=json.dumps({"region_name": config.AWS_DEFAULT_REGION}),
)
#property
def s3(self) -> Connection:
return get_live_connection(self.logger, self._s3)
#property
def s3_hook(self) -> S3Hook:
return self.s3.get_hook()
I get an error:
Broken DAG: [...] Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/connection.py", line 282, in get_hook
return hook_class(**{conn_id_param: self.conn_id})
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/base_aws.py", line 354, in __init__
raise AirflowException('Either client_type or resource_type must be provided.')
airflow.exceptions.AirflowException: Either client_type or resource_type must be provided.
Why does this happen? From what I understand the S3Hook calls the constructor from the parent class, AwsHook, and passes the client_type as "s3" string. How can I fix this?
I took this configuration for hook from here.
EDIT: I even get the same error when directly creating the S3 hook:
#property
def s3_hook(self) -> S3Hook:
#return self.s3.get_hook()
return S3Hook(
aws_conn_id=config.AWS_CONN_ID,
region_name=self.config.AWS_DEFAULT_REGION,
client_type="s3",
config={"aws_access_key_id": self.config.AWS_ACCESS_KEY_ID, "aws_secret_access_key": self.config.AWS_SECRET_ACCESS_KEY}
)
``

If youre using Airflow 2
please refer to the new documentation - it can be kind of tricky as most of the google searches redirect you to the old ones.
In my case I was using the AwsHook and had to switch to AwsBaseHook as it seems to be the only and correct one for version 2. I've had to switch the import path as well, now aws stuff isnt on contrib anymore its under providers
And as you can see on the new documentation you can pass either client_type ou resource_type as a AwsBaseHook parameter, depending on the one you want to use. Once you do that your problem should be solved

No other answers worked, I couldn't get around this. I ended up using boto3 library directly, which also gave me more low-level flexibility that Airflow hooks lacked.

First of all , I suggest that you create a S3 connection , for this you must go the path Admin >> Connections
After that and assuming that you want to load a file into S3 Bucket, you can code :
def load_csv_S3():
# Send to S3
hook = S3Hook(aws_conn_id="s3_conn")
hook.load_file(
filename='/write_your_path_file/filename.csv',
key='filename.csv',
bucket_name="BUCKET_NAME",
replace=True,
)
Finally, you can check all the functions of S3Hook 👉 HERE

What has worked for me, in case it helps someone, in my answer to a similar post: https://stackoverflow.com/a/73652781/4187360

Passing binary data to a python logger

I want to log raw bytes. But if I change the file mode in FileHandler from "w" to "wb" the logger fails with error, whichever data I pass to it: string or bytes.
logging.getLogger("clientIn").error(b"bacd")
Traceback (most recent call last):
File "/usr/lib/python3.4/logging/__init__.py", line 980, in emit
stream.write(msg)
TypeError: 'str' does not support the buffer interface
Call stack:
File "<string>", line 1, in <module>
File "/usr/lib/python3.4/multiprocessing/spawn.py", line 106, in spawn_main
exitcode = _main(fd)
File "/usr/lib/python3.4/multiprocessing/spawn.py", line 119, in _main
return self._bootstrap()
File "/usr/lib/python3.4/multiprocessing/process.py", line 254, in _bootstrap
self.run()
File "/usr/lib/python3.4/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/serj/work/proxy_mult/proxy/connection_worker_process.py", line 70, in __call__
self._do_work(ipc_socket)
File "/home/serj/work/proxy_mult/proxy/connection_worker_process.py", line 76, in _do_work
logging.getLogger("clientIn").error("bacd")
Message: 'bacd'
I need the way to adapt logging module to binary data.

The easiest solution would be to store the bytes in a bytestring.
The other possible way is to customize your logging. The documentation is a start but you will need to look into examples of how people have done it. Personally I have gone only as far as to using a slightly customized record, handler and formatter for allowing my logger to use a SQLite backend.
There are multiple things you need to modify (sorry for not being that specific but I am also still a beginner when it comes to the logging module of Python):
LogRecord - if you inherit from it, you will see that the __init__(...) specifies an argument msg of type object. As the documentation states msg is the event description message, possibly a format string with placeholders for variable data. Imho if msg was supposed to be just a string it would not have been of type object. This is a place, where you can investigate further incl. the use of args. Inheriting is not really necessary for many cases and a simple namedtuple would do just fine.
LoggerAdapter - there is the contextual information of a message, which can contain arbitrary data (from what I understand). You will need a custom adapter to work with that.
In addition you will probably have to use a custom Formatter and/or Handler. Worst case you will have to use some arbitrary string message while passing the extra data (binary or otherwise) alongside it.
Here is a quick and dirty example, where I use a namedtuple to hold the extra data. Note that I was unable to just pass the extra data without an actual message but you might be able to go around this issue if you implement your actual custom LogRecord. Also note that I am omitting the rest of my code since this is just a demonstration for customization:
TensorBoardLogRecord = namedtuple('TensorBoardLogRecord' , 'dtime lvl src msg tbdata')
TensorBoardLogRecordData = namedtuple('tbdata', 'image images scalar scalars custom_scalars')
class TensorBoardLoggerHandler(logging.Handler):
def __init__(self, level=logging.INFO, tboard_dir='./runs') -> None:
super().__init__(level)
self.tblogger = SummaryWriter(tboard_dir)
def emit(self, record: TensorBoardLogRecord) -> None:
# For debugging call print record.__dict__ to see how the record is structured
# If record contains Tensorboard data, add it to TB and flush
if hasattr(record, 'args'):
# TODO Do something with the arguments
...
class TensorBoardLogger(logging.Logger):
def __init__(self, name: str='TensorBoardLogger', level=logging.INFO, tboard_dir='./runs') -> None:
super().__init__(name, level)
self.handler = TensorBoardLoggerHandler(level, tboard_dir)
self.addHandler(self.handler)
...
logging.setLoggerClass(TensorBoardLogger)
logger = logging.getLogger('TensorBoardLogger')
logger.info('Some message', TensorBoardLogRecordData(None, None, 10000, None, None))
What I am trying to do is add the ability to the logger (still work in progress) to actually write a Tensorboard (in my case from the PyTorch utilities module) log entry that can be visualized via the tool inside the web browser. Yours doesn't need to be that complicated. This "solution" is mostly in case you can't find a way to override the msg handling.
I found also this repository - visual logging, which uses the logging facilities of the Python module to handle images. Following the code provided by the repo I was able to get
<LogRecord: TensorBoardLogger, 20, D:\Projects\remote-sensing-pipeline\log.py, 86, "TensorBoardLogRecord(image=None, images=None, scalar=1, scalars=None, custom_scalars=None)">
{'name': 'TensorBoardLogger', 'msg': TensorBoardLogRecord(image=None, images=None, scalar=1, scalars=None, custom_scalars=None), 'args': (), 'levelname': 'INFO', 'levelno': 20, 'pathname': 'D:\\Projects\\remote-sensing-pipeline\\log.py', 'filename': 'log.py', 'module': 'log', 'exc_info': None, 'exc_text': None, 'stack_info': None, 'lineno': 86, 'funcName': '<module>', 'created': 1645193616.9026344, 'msecs': 902.6343822479248, 'relativeCreated': 834.2068195343018, 'thread': 6508, 'threadName': 'MainThread', 'processName': 'MainProcess', 'process': 16208}
by just calling
logger = TensorBoardLogger(tboard_dir='./LOG')
logger.info(TensorBoardLogRecord(image=None, images=None, scalar=1, scalars=None, custom_scalars=None))
where I changed TensorBoardLogRecord to be
TensorBoardLogRecord = namedtuple('TensorBoardLogRecord' , 'image images scalar scalars custom_scalars')
As you can see the msg is my object TensorBoardLogRecord, which confirms both my statement above as well as the statement in the documentation - as long as you customize your logging properly, you can log whatever you want. In the case of the repo I've pointed at the author is using images, which are numpy objects. However ultimately those images are read from image files hence binary data is also there.

#mock.patch isn't raising an attribute error even after setting side_effect

I'm attempting to fix a bug in the python package caniusepython3 which arises because distlib isn't parsing pypi projects correctly. I've written this unit test
#mock.patch('distlib.locators.locate')
def test_blocking_dependencies_locators_fails(self, distlib_mock):
"""
Testing the work around for //bitbucket.org/pypa/distlib/issue/59/
"""
py3 = {'py3_project': ''}
breaking_project = 'test_project'
distlib_mock.locators.locate.return_value = "foo"
distlib_mock.locators.locate.side_effect = AttributeError()
got = dependencies.blocking_dependencies([breaking_project], py3)
# If you'd like to test that a message is logged we can use
# testfixtures.LogCapture or stdout redirects.
So that when distlib fixes the error in the next release of distlib the test case will still be valid.
The problem is that the MagicMock never raises a AttributeError as I expected and instead returns a string representation of the magic mock object
try:
# sets dist to <MagicMock name='locate()' id='4447530792'>
dist = distlib.locators.locate(project)
except AttributeError:
# This is a work around //bitbucket.org/pypa/distlib/issue/59/
log.warning('{0} found but had to be skipped.'.format(project))
continue
And causes this stack trace later on because it returns the object repr,
======================================================================
ERROR: Testing the work around for //bitbucket.org/pypa/distlib/issue/59/
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/unittest/mock.py", line 1136, in patched
return func(*args, **keywargs)
File "/Users/alexlord/git/caniusepython3/caniusepython3/test/test_dependencies.py", line 81, in test_blocking_dependencies_locators_fails
got = dependencies.blocking_dependencies([breaking_project], py3)
File "/Users/alexlord/git/caniusepython3/caniusepython3/dependencies.py", line 119, in blocking_dependencies
return reasons_to_paths(reasons)
File "/Users/alexlord/git/caniusepython3/caniusepython3/dependencies.py", line 43, in reasons_to_paths
parent = reasons[blocker]
File "/Users/alexlord/git/caniusepython3/caniusepython3/dependencies.py", line 29, in __getitem__
return super(LowerDict, self).__getitem__(key.lower())
nose.proxy.KeyError: <MagicMock name='locate().name.lower().lower()' id='4345929400'>
-------------------- >> begin captured logging << --------------------
ciu: INFO: Checking top-level project: test_project ...
ciu: INFO: Locating <MagicMock name='locate().name.lower()' id='4344734944'>
ciu: INFO: Dependencies of <MagicMock name='locate().name.lower()' id='4344734944'>: []
--------------------- >> end captured logging << ---------------------
Why is the MagicMock not returning an exception when distlib.locator.locate() is called?
Update: I was able to get this unit test to work when I switched to using
def test_blocking_dependencies_locators_fails(self):
"""
Testing the work around for //bitbucket.org/pypa/distlib/issue/59/
"""
with mock.patch.object(distlib.locators, 'locate') as locate_mock:
py3 = {'py3_project': ''}
breaking_project = 'test_project'
locate_mock.side_effect = AttributeError()
got = dependencies.blocking_dependencies([breaking_project], py3)
# If you'd like to test that a message is logged we can use
# testfixtures.LogCapture or stdout redirects.
But I'm still wondering what I did wrong with the decorator format.

When you use #mock.patch, it mocks what you tell it, and passes that mock object as a parameter. Thus, your distlib_mock parameter is the mock locate function. You're effectively setting attributes on distlib.locators.locate.locators.locate. Set the attributes directly on the provided mock, and things should work better.
#mock.patch('distlib.locators.locate')
def test_blocking_dependencies_locators_fails(self, locate_mock):
# ...
locate_mock.return_value = "foo"
locate_mock.side_effect = AttributeError()
# ...

Python: catching particular exception

I have such code (Python 2.5, GAE dev server):
try:
yt_service.UpgradeToSessionToken() // this line produces TokenUpgradeFailed
except gdata.service.TokenUpgradeFailed:
return HttpResponseRedirect(auth_sub_url()) # this line will never be executed (why?)
except Exception, exc:
return HttpResponseRedirect(auth_sub_url()) # instead this line is executed (why?)
So I set breakpoint at last line and under debugger I see:
"exc" TokenUpgradeFailed: {'status': 403, 'body': 'html stripped', 'reason': 'Non 200 response on upgrade'}
"type(exc)" type: <class 'gdata.service.TokenUpgradeFailed'>
"exc is gdata.service.TokenUpgradeFailed" bool: False
"exc.__class__" type: <class 'gdata.service.TokenUpgradeFailed'>
"isinstance(exc, gdata.service.TokenUpgradeFailed)" bool: False
"exc.__class__.__name__" str: TokenUpgradeFailed
What I missed in python exception handling? Why isinstance(exc, gdata.service.TokenUpgradeFailed) is False?

This error can occur if your relative/absolute import statements do not match everywhere. If there is a mismatch, the target module can be loaded more than once and in slightly different contexts. Usually this isn't a problem but it does prevent classes from the differently loaded modules from comparing as equal (hence the exception catching problem).
There may be other causes for the error but I suggest looking through your code and ensuring that everything importing the gdata.service module explicitly mentions the gdata package. Even within the gdata package itself, each module using the service module should import it from the package explicitly via from gdata import service rather than by way of the relative import: import service.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Prevent pytest from caching modules that dynamically modify and reimport themselves - python

Related

serverless starting error: ModuleNotFoundError: No module named 'handler' [duplicate]

Apache Airflow - connecting to AWS S3 error

Passing binary data to a python logger

#mock.patch isn't raising an attribute error even after setting side_effect

Python: catching particular exception

Categories

Resources