Can't provide much context given the complexity, but I'm hoping for some insight/thought-provoking questions as to why this is happening.
I'm testing a process which loads files into a database, so I'm patching the credentials for a database connection using unittest.mock.patch to use test and not production credentials. We have a series of mocks that are applied as a contextmanager, simplified version here:
from contextlib import ExitStack
def context_stack(contexts):
stack = ExitStack()
for context in contexts:
stack.enter_context(context)
return stack
def patch_mocks():
mocks = [
patch('db_config.ReadWrite', db_mocks.ReadWrite),
patch('db_config.ReadWrite', db_mocks.ReadWrite)
]
return context_stack(mocks)
It gets used as such (simplified):
with patch_mocks():
LoadFiles(file_list)
LoadFiles will iterate over each file in file_list and attempt to insert the contents into the database. The underlying methods connect to the database using db_config.ReadWrite but of course they are patched by db_mocks.ReadWrite. This works pretty consistently except, seemingly very randomly, it will fail as it tries to instead use db_config.ReadWrite when trying to create the connection.
So for example, there could be a hundred files, and it will patch the most of them successfully, but it will randomly stop using the patch halfway through and fail the test. What conditions/variables could be causing this patch to not be applied? Is there a limit to the number of patches that can be applied? Should it be applied in another way?
My first line of investigation would involve this warning from the docs on .patch():
target should be a string in the form 'package.module.ClassName'. The target is imported and the specified object replaced with the new object, so the target must be importable from the environment you are calling patch() from. The target is imported when the decorated function is executed, not at decoration time.
and this further explanation on Where to patch
The basic principle is that you patch where an object is looked up, which is not necessarily the same place as where it is defined.
I would try to find a broken case and check the status of the import environment there to make sure the same import you're using everywhere else is reachable from there.
Do not patch/mock, instead use the repository pattern to access the database.
You would then have two implementations of the Repository interface:
in memory: keeps all data in-memory
using a DB-driver/connector: actually writes to the DB
Related
I have read about unit testing in python, but all examples I found are based on trivial examples of mocking objects. I have no idea how to really implement tests for the project in a way it can be tested without accessing COM port. project is mainly used for control of another application via COM API plus evaluating data produced by app and making reports, so let's say it is higher abstraction level api made as python library. COM interface is non-trivial, with main object that is exposing some managers, the managers exposing containers of objects, that are having references to another objects, so a web of conencted objects responsible for different things controlled application. Architecture of python library somehow follows the COM structure. During library import main COM object is dispatched and stored in central module, manager-level modules on import are getting references to COM manager objects from central module and then manager-level methods are using this COM manager objects in their methods. Examples for better understanding
#__init__
from managera import ManagerA
from centralobject import CentralObject
central_object = CentralObject()
manager_a = ManagerA() #here all initialisation to make everything work,
manager_b = ManagerB() #so full com object needed to import the package
...
#centalobject
class CentalObject():
def __init__():
self.com = Dispatch("application")
...
#managera
from project import central_object
class ManagerA():
def __init__():
self.com = central_object.com.manager_a
def manager_a_method1(x):
foo = self.com.somecontainer[x].somemethod()
foo.configure(3)
return foo
...
In current state it is hard to test. It is even not possible import without connection to the COM app. Dispatch could be moved into some init function that is executed after the import, but I don't see how it would make the testing possible. One solution would be to make test double that have structutre similar to the original application, but for my unexperienced in testing mind it seems a bit overkill to do this in each method test. Maybe it is not and that's how it should be done, I am asking somebody's more experience advice.
Second solutuon that came to my mind is to dispatch COM object every time any COM call is made, and then mock it but it seems a lot of dispatches and I don't see how it makes the code better.
Tried also with managers not being defined as classes, but modules, however it seemed even more hard to test as then COM managers was refrerenced during module import, not object instantiation.
What should be done to make testing posible and nice without accessing COM? Interested in everything: solution, advice or just topic that I should read more about.
This is my fabric code:
from fabric import Connection, task
server = Connection(host="usrename#server.com:22", connect_kwargs={"password": "mypassword"})
#task
def dostuff(somethingmustbehere):
server.run("uname -a")
This code works just fine. When I execute fab dostuff it does what I want it to do.
When I remove somethingmustbehere however I get this error message:
raise TypeError("Tasks must have an initial Context argument!")
TypeError: Tasks must have an initial Context argument!
I never defined somethingmustbehere anywhere in my code. I just put it in and the error is gone and everything works. But why? What is this variable? Why do I need it? Why is it so important? And if it is so important why can it just be empty? I am really lost here. Yes it works, but I cannot run code that I don't understand. It drives me insane. :-)
Please be aware that I'm talking about the Python 3(!) version of Fabric!
The Fabric version is 2.4.0
To be able to run a #task you need a context argument. Fabric uses invoke task() which expects to see a context object. Normally we name the variable c or ctx (which I always use to make it more clear). I don't prefer using c because I use it normally for connection
Check this line on github from invoke package repo, you will see that it raises an exception when the context argument is not present, but it doesn't explain why!
To know more about Context object, what it 's and why we need it, you can read the following on the site of pyinvoke:
Aside: what exactly is this ‘context’ arg anyway? A common problem
task runners face is transmission of “global” data - values loaded
from configuration files or other configuration vectors, given via CLI
flags, generated in ‘setup’ tasks, etc.
Some libraries (such as Fabric 1.x) implement this via module-level
attributes, which makes testing difficult and error prone, limits
concurrency, and increases implementation complexity.
Invoke encapsulates state in explicit Context objects, handed to tasks
when they execute . The context is the primary API endpoint, offering
methods which honor the current state (such as Context.run) as well as
access to that state itself.
Check these both links :
Context
what exactly is this ‘context’ arg anyway?
To be honest, I wasted a lot of time figuring out what context is and why my code wouldn't run without it. But at some point I just gave up and started using to make my code run without errors.
We are facing strange issue in our application, where we are getting error as:-
Request was aborted after waiting too long to attempt to service your request.
Earlier we were able to scale well with same number of idle instances, but now we are not and we have not changed anything code wise that could impact start up time. But now we are receiving timeout.
With python it's possible to significantly improve instance startup time depending on your app's code organisation.
Lazy loading of request handler code only loads at instance startup the python file containing the handler for the actual request triggering the instance start, not the entire app code. The handler code can thus be separated from the module's main file mentioned in the module's .yaml. In as many file as you want, even a single handler per file.
For example in a particular module's module_blah.py file you could have something like this loading only the respective file1.py, file2.py or rare.py files in addition to the module_blah.py file (which is loaded based on the module's .yaml file) and only if/when the respective paths are requested:
app = webapp2.WSGIApplication([
('/path1', 'file1.HandlerOne'), # loads file1.py
('/path2', 'file2.HandlerTwo'), # loads file2.py
('/pathX', 'rare.RareHandlerOne'), # loads rare.py
('/pathY', 'rare.RareHandlerTwo'), # loads rare.py
('/.*', ModuleBlahHandlerX) # already loaded module_blah.py
], debug=True, config=apart_config)
You can also place the necessary import statements for heavier/3rd party libraries inside the methods using them instead of at the top of the files, which also loads those files only when the respected methods are invoked.
The drawback is that the response time for some requests may occasionally see some peaks - when the respective code is loaded, on demand, in the instances handling those requests. Subsequent requests handled by the same instances will be faster since the code is already loaded.
I guess changes in the SDK or in the operating conditions of the GAE infra (overall load balancing activity, shorter or longer transients due to maintenace, outages, etc) at certain times may account for variations in the instances startup time, potentially causing the symptoms you describe if your instance's startup time was close enough to the maximum allowed. The techniques I described could help your app stay further away from that max, reducing the chances of hitting the problem.
Finally, configuring a more powerfull instance class would also speedup the instance startup time for the same app code.
Problem:
We are seeing the following problem after upgrading the MapReduce library to the latest version (we had a version using the File API and we switched when that became deprecated):
When running the MapReducer, in the tasks it’s starting, PipelineBase from the library sporadically becomes unavailable, forcing the task to fail. At the retry, it works ok. After a random period of time has passed, said part of the MapReduce library becomes completely and globally unavailable. This means it’s not available in the normal application background, nor in the tasks.
The only way to make the application respond again is to reset all instances and make them reload the whole application code from scratch.
Error:
File "/process/custom_mapper/views.py", line 23, in <module> from process.custom_mapper.pipeline import CustomMapperPipeline
File "/process/custom_mapper/pipeline.py", line 7, in <module> class CustomMapperPipeline(PipelineBase):
TypeError: Error when calling the metaclass bases cannot create 'NoneType’ instances
Which implies that PipelineBase is None. Looking inside the MapReduce library, in base_handler.py, from where PipelineBase is imported, we can see this code:
try:
from mapreduce import pipeline_base
except ImportError:
pipeline_base = None
And then:
if pipeline_base:
# For backward compatiblity.
PipelineBase = pipeline_base.PipelineBase
else:
PipelineBase = None
Our assumption is that the import from mapreduce import pipeline_base is failing.
Extra info:
The app is using Flask and the MapReducer is started remotely via an external endpoint. This endpoint is defined inside urls.py, loading a mapper controller class. This class imports the CustomMapperPipeline module. This means that each time an url is accessed, the CustomMapperPipeline class is instantiated. If PipelineBase is None, then the whole app fails.
At some point in time PipelineBase becomes completely unavailable.
As per the instructions, the mapreducer package is in the root of the app, next to the app.yaml file.
There are only 2 items sent to the MapReducer currently.
Our implementation:
The map reducer goes through a list of items. Each item will create a set of X number task queues to process data. Each of the X task queues will create Y number of task queues themselves to process data.
We are using the following trigger for the map reduce:
yield MapreducePipeline(
job_name,
'process.custom_mapper.mappers.map_items',
'process.custom_mapper.mappers.reduce_items',
'process.custom_mapper.readers.ItemInputReader',
'mapreduce.output_writers.GoogleCloudStorageOutputWriter',
mapper_params={
[...]
},
reducer_params={
'output_writer': {
'mime_type': 'text/plain',
'bucket_name': bucket_name
}
}
)
The CustomInputReader makes sure that each item is distributed to an individual shard.
It looks like, sometimes, it couldn't load the pipeline dependency that the MapReduce library uses to manage the pipelines when dealing with the map and reduce jobs. The pipeline library was added as a dependency in the requirements.txt file, thus residing in another folder than the MapReduce library, which was in the root of the app.
Solution was to manually copy the pipeline library in the root of the app.
This is such an easy problem, but it was made much more difficult by the way the base_pipeline import was handled. I've opened up a pull request to fix that and I am waiting for an answer: https://github.com/GoogleCloudPlatform/appengine-mapreduce/pull/82
Is there a way to run cProfile or line_profile on a script on a server?
ie: how could I get the results for one of the two methods on http://www.Example.com/cgi-bin/myScript.py
Thanks!
Not sure what line_profile is. For cProfile, you just need to direct the results to a file you can later read on the server (depending on what kind of access you have to the server).
To quote the example from the docs,
import cProfile
cProfile.run('foo()', 'fooprof')
and put all the rest of the code into a def foo(): -- then later retrieve that fooprof file and analyze it at leisure (assuming your script runs with permissions to write it in the first place, of course).
Of course you can ensure different runs get profiled into different files, etc, etc -- whether this is practical also depends on what kind of access and permissions you're getting from your hosting provider, i.e., how are you allowed to persist data, in a way that lets you retrieve that data later? That's not a question of Python, it's a question of contracts between you and your hosting provider;-).