Access variables and lists from function - python

I am new to unit testing with Python. I would like to test some functions in my code. In particular I need to test if the outputs have specific dimensions or the same dimensions.
My Python script for unit testing looks like this:
import unittest
from func import *
class myTests(unittest.TestCase):
def setUp(self):
# I am not really sure whats the purpose of this function
def test_main(self):
# check if outputs of the function "main" are not empty:
self.assertTrue(main, msg = 'The main() function provides no return values!')
# check if "run['l_modeloutputs']" and "run['l_modeloutputs']", within the main() function have the same size:
self.assertCountEqual(self, run['l_modeloutputs'], run['l_dataoutputs'], msg=None)
# --> Doesn't work so far!
# check if the dimensions of "props['k_iso']", within the main() function are (80,40,100):
def tearDown(self):
# I am also not sure of the purpose of this function
if _name__ == "__main__":
unittest.main()
Here is the code under test:
def main(param_file):
# Load parameter file
run, model, sequences, hydraulics, flowtrans, elements, mg = hu.model_setup(param_file)
# some other code
...
if 'l_modeloutputs' in run:
if hydraulics['flag_gen'] is False:
print('No hydraulic parameters generated. No model outputs saved')
else:
save_models(realdir, realname, mg, run['l_modeloutputs'], flowtrans, props['k_iso'], props['ktensors'])
I need to access the parameters run['l_modeloutputs'] and run['l_dataoutputs'] of the main function from func.py. How can I pass the dimensions of these parameters to the unit testing script?

It sounds a bit like one of two things at the moment. Either your code isn't laid out at the moment in a way that is easy to test, or maybe you are trying to test or call too much code in one go.
If your code is laid out like the following:
main(file_name):
with open(file_name) as file:
... do work ...
results = outcome_of_work
and you are trying to test what you have got from the file_name as well as the size of results, then you may want to think of refactoring this so that you can test a smaller action. Maybe:
main(file_name):
# `get_file_contents` appears to be `hu.model_setup`
# `file_contents` would be `run`
file_contents = get_file_contents(file_name)
results = do_work_on_file_contents(file_contents)
Of course, if you already have a similar setup then the following is also applicable. This you can do easier tests, as you have easy control to both what's going into test (file_name or file_contents) and can then test the outcome (file_contents or results) for expected results.
With the unittest module you would basically be creating a small function for each test:
class Test(TestCase):
def test_get_file_contents(self):
# ... set up example `file-like object` ...
run = hu.model_setup(file_name)
self.assertCountEqual(
run['l_modeloutputs'], run['l_dataoutputs'])
... repeat for other possible files ...
def test_do_work_on_file_contents(self):
example_input = ... setup input ...
example_output = do_work_on_file_contents(example_input)
assert example_output == as_expected
This can then be repeated for different sets of potential inputs, both good and edge cases.
Its probably worth looking about for a more in-depth tutorial as this is obviously only a very quick look over.
And setUp and tearDown are only needed if there is something to be done for each test you have written (i.e. you have set up an object in a particular way, for several tests, this can be done in setUp and its run before each test function.

Related

How can i accept and run user's code securely on my web app?

I am working on a django based web app that takes python file as input which contains some function, then in backend i have some lists that are passed as parameters through the user's function,which will generate a single value output.The result generated will be used for some further computation.
Here is how the function inside the user's file look like :
def somefunctionname(list):
''' some computation performed on list'''
return float value
At present the approach that i am using is taking user's file as normal file input. Then in my views.py i am executing the file as module and passing the parameters with eval function. Snippet is given below.
Here modulename is the python file name that i had taken from user and importing as module
exec("import "+modulename)
result = eval(f"{modulename}.{somefunctionname}(arguments)")
Which is working absolutely fine. But i know this is not the secured approach.
My question , Is there any other way through which i can run users file securely as the method that i am using is not secure ? I know the proposed solutions can't be full proof but what are the other ways in which i can run this (like if it can be solved with dockerization then what will be the approach or some external tools that i can use with API )?
Or if possible can somebody tell me how can i simply sandbox this or any tutorial that can help me..?
Any reference or resource will be helpful.
It is an important question. In python sandboxing is not trivial.
It is one of the few cases where the question which version of python interpreter you are using. For example, Jyton generates Java bytecode, and JVM has its own mechanism to run code securely.
For CPython, the default interpreter, originally there were some attempts to make a restricted execution mode, that were abandoned long time ago.
Currently, there is that unofficial project, RestrictedPython that might give you what you need. It is not a full sandbox, i.e. will not give you restricted filesystem access or something, but for you needs it may be just enough.
Basically the guys there just rewrote the python compilation in a more restricted way.
What it allows to do is to compile a piece of code and then execute, all in a restricted mode. For example:
from RestrictedPython import safe_builtins, compile_restricted
source_code = """
print('Hello world, but secure')
"""
byte_code = compile_restricted(
source_code,
filename='<string>',
mode='exec'
)
exec(byte_code, {__builtins__ = safe_builtins})
>>> Hello world, but secure
Running with builtins = safe_builtins disables the dangerous functions like open file, import or whatever. There are also other variations of builtins and other options, take some time to read the docs, they are pretty good.
EDIT:
Here is an example for you use case
from RestrictedPython import safe_builtins, compile_restricted
from RestrictedPython.Eval import default_guarded_getitem
def execute_user_code(user_code, user_func, *args, **kwargs):
""" Executed user code in restricted env
Args:
user_code(str) - String containing the unsafe code
user_func(str) - Function inside user_code to execute and return value
*args, **kwargs - arguments passed to the user function
Return:
Return value of the user_func
"""
def _apply(f, *a, **kw):
return f(*a, **kw)
try:
# This is the variables we allow user code to see. #result will contain return value.
restricted_locals = {
"result": None,
"args": args,
"kwargs": kwargs,
}
# If you want the user to be able to use some of your functions inside his code,
# you should add this function to this dictionary.
# By default many standard actions are disabled. Here I add _apply_ to be able to access
# args and kwargs and _getitem_ to be able to use arrays. Just think before you add
# something else. I am not saying you shouldn't do it. You should understand what you
# are doing thats all.
restricted_globals = {
"__builtins__": safe_builtins,
"_getitem_": default_guarded_getitem,
"_apply_": _apply,
}
# Add another line to user code that executes #user_func
user_code += "\nresult = {0}(*args, **kwargs)".format(user_func)
# Compile the user code
byte_code = compile_restricted(user_code, filename="<user_code>", mode="exec")
# Run it
exec(byte_code, restricted_globals, restricted_locals)
# User code has modified result inside restricted_locals. Return it.
return restricted_locals["result"]
except SyntaxError as e:
# Do whaever you want if the user has code that does not compile
raise
except Exception as e:
# The code did something that is not allowed. Add some nasty punishment to the user here.
raise
Now you have a function execute_user_code, that receives some unsafe code as a string, a name of a function from this code, arguments, and returns the return value of the function with the given arguments.
Here is a very stupid example of some user code:
example = """
def test(x, name="Johny"):
return name + " likes " + str(x*x)
"""
# Lets see how this works
print(execute_user_code(example, "test", 5))
# Result: Johny likes 25
But here is what happens when the user code tries to do something unsafe:
malicious_example = """
import sys
print("Now I have the access to your system, muhahahaha")
"""
# Lets see how this works
print(execute_user_code(malicious_example, "test", 5))
# Result - evil plan failed:
# Traceback (most recent call last):
# File "restr.py", line 69, in <module>
# print(execute_user_code(malitious_example, "test", 5))
# File "restr.py", line 45, in execute_user_code
# exec(byte_code, restricted_globals, restricted_locals)
# File "<user_code>", line 2, in <module>
#ImportError: __import__ not found
Possible extension:
Pay attention that the user code is compiled on each call to the function. However, it is possible that you would like to compile the user code once, then execute it with different parameters. So all you have to do is to save the byte_code somewhere, then to call exec with a different set of restricted_locals each time.
EDIT2:
If you want to use import, you can write your own import function that allows to use only modules that you consider safe. Example:
def _import(name, globals=None, locals=None, fromlist=(), level=0):
safe_modules = ["math"]
if name in safe_modules:
globals[name] = __import__(name, globals, locals, fromlist, level)
else:
raise Exception("Don't you even think about it {0}".format(name))
safe_builtins['__import__'] = _import # Must be a part of builtins
restricted_globals = {
"__builtins__": safe_builtins,
"_getitem_": default_guarded_getitem,
"_apply_": _apply,
}
....
i_example = """
import math
def myceil(x):
return math.ceil(x)
"""
print(execute_user_code(i_example, "myceil", 1.5))
Note that this sample import function is VERY primitive, it will not work with stuff like from x import y. You can look here for a more complex implementation.
EDIT3
Note, that lots of python built in functionality is not available out of the box in RestrictedPython, it does not mean it is not available at all. You may need to implement some function for it to become available.
Even some obvious things like sum or += operator are not obvious in the restricted environment.
For example, the for loop uses _getiter_ function that you must implement and provide yourself (in globals). Since you want to avoid infinite loops, you may want to put some limits on the number of iterations allowed. Here is a sample implementation that limits number of iterations to 100:
MAX_ITER_LEN = 100
class MaxCountIter:
def __init__(self, dataset, max_count):
self.i = iter(dataset)
self.left = max_count
def __iter__(self):
return self
def __next__(self):
if self.left > 0:
self.left -= 1
return next(self.i)
else:
raise StopIteration()
def _getiter(ob):
return MaxCountIter(ob, MAX_ITER_LEN)
....
restricted_globals = {
"_getiter_": _getiter,
....
for_ex = """
def sum(x):
y = 0
for i in range(x):
y = y + i
return y
"""
print(execute_user_code(for_ex, "sum", 6))
If you don't want to limit loop count, just use identity function as _getiter_:
restricted_globals = {
"_getiter_": labmda x: x,
Note that simply limiting the loop count does not guarantee security. First, loops can be nested. Second, you cannot limit the execution count of a while loop. To make it secure, you have to execute unsafe code under some timeout.
Please take a moment to read the docs.
Note that not everything is documented (although many things are). You have to learn to read the project's source code for more advanced things. Best way to learn is to try and run some code, and to see what kind function is missing, then to see the source code of the project to understand how to implement it.
EDIT4
There is still another problem - restricted code may have infinite loops. To avoid it, some kind of timeout is required on the code.
Unfortunately, since you are using django, that is multi threaded unless you explicitly specify otherwise, simple trick for timeouts using signeals will not work here, you have to use multiprocessing.
Easiest way in my opinion - use this library. Simply add a decorator to execute_user_code so it will look like this:
#timeout_decorator.timeout(5, use_signals=False)
def execute_user_code(user_code, user_func, *args, **kwargs):
And you are done. The code will never run more than 5 seconds.
Pay attention to use_signals=False, without this it may have some unexpected behavior in django.
Also note that this is relatively heavy on resources (and I don't really see a way to overcome this). I mean not really crazy heavy, but it is an extra process spawn. You should hold that in mind in your web server configuration - the api which allows to execute arbitrary user code is more vulnerable to ddos.
For sure with docker you can sandbox the execution if you are careful. You can restrict CPU cycles, max memory, close all network ports, run as a user with read only access to the file system and all).
Still,this would be extremely complex to get it right I think. For me you shall not allow a client to execute arbitrar code like that.
I would be to check if a production/solution isn't already done and use that. I was thinking that some sites allow you to submit some code (python, java, whatever) that is executed on the server.

Concurrent.futures.map initializes code from beginning

I am a fairly beginner programmer with python and in general with not that much experience, and currently I'm trying to parallelize a process that is heavily CPU bound in my code. I'm using anaconda to create environments and Visual Code to debug.
A summary of the code is as following :
from tkinter import filedialog
import myfuncs as mf, concurrent.futures
file_path = filedialog.askopenfilename('Ask for a file containing data')
# import data from file_path
a = input('Ask the user for input')
Next calculations are made from these and I reach a stage where I need to iterate of a list of lists. These lists may contain up to two values and calls are made to a separate file.
For example the inputs are :
sub_data1 = [test1]
sub_data2 = [test1, test2]
dataset = [sub_data1, sub_data2]
This is the stage I use concurrent.futures.ProcessPoolExecutor()-instance and its .map() method :
with concurrent.futures.ProcessPoolExecutor() as executor:
sm_res = executor.map(mf.process_distr, dataset)
While inside a myfuncs.py, the mf.process_distr() function works like this :
def process_distr(tests):
sm_reg = []
for i in range(len(tests)):
if i==0:
# do stuff
sm_reg.append(result1)
else:
# do stuff
sm_reg.append(result2)
return sm_reg
The problem is that when I try to execute this code on the main.py file, it seems that the main.py starts running multiple times, and asks for user inputs and file dialog pops up multiple times (same amount as cores count).
How can I resolve this matter?
Edit: After reading more into it, encapsulating the whole main.py code with:
if __name__ == '__main__':
did the trick. Thank you to anyone who gave time to help with my rookie problem.

Python unit test advice

Can I get some advice on writing a unit test for the following piece of code?
%python
import sys
import json
sys.argv = []
sys.argv.append('{"product1":{"brand":"x","type":"y"}}')
sys.argv.append('{"product1":{"brand":"z","type":"a"}}')
products = sys.argv
yy= {}
my_products = []
for n, i in enumerate(products[:]):
xx = json.loads(i)
for j in xx.keys():
yy["brand"] = xx[j]['brand']
yy["type"] = xx[j]["type"]
my_products.append(yy)
print my_products
As it stands there aren't any units to test!!!
A test might consist of:
packaging your program in a script
invoking your program from python unit test as a subprocess
piping the output of your command process to a buffer
asserting the buffer is what you except it to be
While the above would technically allow you to have an automated test on your code it comes with a lot of burden:
- multi processing
- weak assertions by not having types
- coarse interaction (have to invoke a script, can't just assert on the brand/type logic
One way to address those issues could be to package your code into smaller units, ie create a method to encapsulate:
for j in xx.keys():
yy["brand"] = xx[j]['brand']
yy["type"] = xx[j]["type"]
my_products.append(yy)
Import it, exercise it and assert on its output. Then there might be something to map the loading and application of xx.keys() loop to an array (which you could also encapsulate as a function).
And then there could be the highest level taking in args and composing the product mapper loader transformer. And since your code will be thoroughly unit tested at this point, you may get away with not having a test for your top level script?

Writing python unit tests inside the actual code

Sometimes I'm writing small utilities functions and pack them as python package.
How small? 30 - 60 lines of python.
And my question is do you think writing the tests inside the actual code is bad? abusing?
I can see a great benefits like usage examples inside the code itself without jumping between files (again from really small projects).
Example:
#!/usr/bin/env python
# Actual code
def increment(number, by=1):
return number += by
# Tests
def test_increment_positive():
assert increment(1) == 2
def test_increment_negative():
assert increment(-5) == -4
def test_increment_zero():
assert increment(0) == 1
The general Idea taken from the monitoring framework riemann which I use, in riemann you write your tests file along with your code link
You can write doctests inside your documentation to indicate how your function should be used:
def increment(number, by=1):
""" Increments the given number by some other number
>>> increment(3)
4
>>> increment(5,3)
8
"""
return number += by
From the documentation:
To check that a module’s docstrings are up-to-date by verifying that all interactive examples still work as documented.
To perform regression testing by verifying that interactive examples from a test file or a test object work as expected.
To write tutorial documentation for a package, liberally illustrated with input-output examples. Depending on whether the
examples or the expository text are emphasized, this has the
flavor of “literate testing” or “executable documentation”

How to skip the rest of tests in the class if one has failed?

I'm creating the test cases for web-tests using Jenkins, Python, Selenium2(webdriver) and Py.test frameworks.
So far I'm organizing my tests in the following structure:
each Class is the Test Case and each test_ method is a Test Step.
This setup works GREAT when everything is working fine, however when one step crashes the rest of the "Test Steps" go crazy. I'm able to contain the failure inside the Class (Test Case) with the help of teardown_class(), however I'm looking into how to improve this.
What I need is somehow skip(or xfail) the rest of the test_ methods within one class if one of them has failed, so that the rest of the test cases are not run and marked as FAILED (since that would be false positive)
Thanks!
UPDATE: I'm not looking or the answer "it's bad practice" since calling it that way is very arguable. (each Test Class is independent - and that should be enough).
UPDATE 2: Putting "if" condition in each test method is not an option - is a LOT of repeated work. What I'm looking for is (maybe) somebody knows how to use the hooks to the class methods.
I like the general "test-step" idea. I'd term it as "incremental" testing and it makes most sense in functional testing scenarios IMHO.
Here is a an implementation that doesn't depend on internal details of pytest (except for the official hook extensions). Copy this into your conftest.py:
import pytest
def pytest_runtest_makereport(item, call):
if "incremental" in item.keywords:
if call.excinfo is not None:
parent = item.parent
parent._previousfailed = item
def pytest_runtest_setup(item):
previousfailed = getattr(item.parent, "_previousfailed", None)
if previousfailed is not None:
pytest.xfail("previous test failed (%s)" % previousfailed.name)
If you now have a "test_step.py" like this:
import pytest
#pytest.mark.incremental
class TestUserHandling:
def test_login(self):
pass
def test_modification(self):
assert 0
def test_deletion(self):
pass
then running it looks like this (using -rx to report on xfail reasons):
(1)hpk#t2:~/p/pytest/doc/en/example/teststep$ py.test -rx
============================= test session starts ==============================
platform linux2 -- Python 2.7.3 -- pytest-2.3.0.dev17
plugins: xdist, bugzilla, cache, oejskit, cli, pep8, cov, timeout
collected 3 items
test_step.py .Fx
=================================== FAILURES ===================================
______________________ TestUserHandling.test_modification ______________________
self = <test_step.TestUserHandling instance at 0x1e0d9e0>
def test_modification(self):
> assert 0
E assert 0
test_step.py:8: AssertionError
=========================== short test summary info ============================
XFAIL test_step.py::TestUserHandling::()::test_deletion
reason: previous test failed (test_modification)
================ 1 failed, 1 passed, 1 xfailed in 0.02 seconds =================
I am using "xfail" here because skips are rather for wrong environments or missing dependencies, wrong interpreter versions.
Edit: Note that neither your example nor my example would directly work with distributed testing. For this, the pytest-xdist plugin needs to grow a way to define groups/classes to be sent whole-sale to one testing slave instead of the current mode which usually sends test functions of a class to different slaves.
If you'd like to stop the test execution after N failures anywhere (not in a particular test class) the command line option pytest --maxfail=N is the way to go:
https://docs.pytest.org/en/latest/usage.html#stopping-after-the-first-or-n-failures
if you instead want to stop a test that is comprised of multiple steps if any of them fails, (and continue executing the other tests) you should put all your steps in a class, and use the #pytest.mark.incremental decorator on that class and edit your conftest.py to include the code shown here
https://docs.pytest.org/en/latest/example/simple.html#incremental-testing-test-steps.
The pytest -x option will stop test after first failure:
pytest -vs -x test_sample.py
It's generally bad practice to do what are you doing. Each test should be as independent as possible from the others, while you completely depend on the results of the other tests.
Anyway, reading the docs it seems like a feature like the one you want is not implemented.(Probably because it wasn't considered useful).
A work-around could be to "fail" your tests calling a custom method which sets some condition on the class, and mark each test with the "skipIf" decorator:
class MyTestCase(unittest.TestCase):
skip_all = False
#pytest.mark.skipIf("MyTestCase.skip_all")
def test_A(self):
...
if failed:
MyTestCase.skip_all = True
#pytest.mark.skipIf("MyTestCase.skip_all")
def test_B(self):
...
if failed:
MyTestCase.skip_all = True
Or you can do this control before running each test and eventually call pytest.skip().
edit:
Marking as xfail can be done in the same way, but using the corresponding function calls.
Probably, instead of rewriting the boiler-plate code for each test, you could write a decorator(this would probably require that your methods return a "flag" stating if they failed or not).
Anyway, I'd like to point out that,as you state, if one of these tests fails then other failing tests in the same test case should be considered false positive...
but you can do this "by hand". Just check the output and spot the false positives.
Even though this might be boring./error prone.
You might want to have a look at pytest-dependency. It is a plugin that allows you to skip some tests if some other test had failed.
In your very case, it seems that the incremental tests that gbonetti discussed is more relevant.
Based on hpk42's answer, here's my slightly modified incremental mark that makes test cases xfail if the previous test failed (but not if it xfailed or it was skipped). This code has to be added to conftest.py:
import pytest
try:
pytest.skip()
except BaseException as e:
Skipped = type(e)
try:
pytest.xfail()
except BaseException as e:
XFailed = type(e)
def pytest_runtest_makereport(item, call):
if "incremental" in item.keywords:
if call.excinfo is not None:
if call.excinfo.type in {Skipped, XFailed}:
return
parent = item.parent
parent._previousfailed = item
def pytest_runtest_setup(item):
previousfailed = getattr(item.parent, "_previousfailed", None)
if previousfailed is not None:
pytest.xfail("previous test failed (%s)" % previousfailed.name)
And then a collection of test cases has to be marked with #pytest.mark.incremental:
import pytest
#pytest.mark.incremental
class TestWhatever:
def test_a(self): # this will pass
pass
def test_b(self): # this will be skipped
pytest.skip()
def test_c(self): # this will fail
assert False
def test_d(self): # this will xfail because test_c failed
pass
def test_e(self): # this will xfail because test_c failed
pass
UPDATE: Please take a look at #hpk42 answer. His answer is less intrusive.
This is what I was actually looking for:
from _pytest.runner import runtestprotocol
import pytest
from _pytest.mark import MarkInfo
def check_call_report(item, nextitem):
"""
if test method fails then mark the rest of the test methods as 'skip'
also if any of the methods is marked as 'pytest.mark.blocker' then
interrupt further testing
"""
reports = runtestprotocol(item, nextitem=nextitem)
for report in reports:
if report.when == "call":
if report.outcome == "failed":
for test_method in item.parent._collected[item.parent._collected.index(item):]:
test_method._request.applymarker(pytest.mark.skipif("True"))
if test_method.keywords.has_key('blocker') and isinstance(test_method.keywords.get('blocker'), MarkInfo):
item.session.shouldstop = "blocker issue has failed or was marked for skipping"
break
def pytest_runtest_protocol(item, nextitem):
# add to the hook
item.ihook.pytest_runtest_logstart(
nodeid=item.nodeid, location=item.location,
)
check_call_report(item, nextitem)
return True
Now adding this to conftest.py or as a plugin solves my problem.
Also it's improved to STOP testing if the blocker test has failed. (meaning that the entire further tests are useless)
Or quite simply instead of calling py.test from cmd (or tox or wherever), just call:
py.test --maxfail=1
see here for more switches:
https://pytest.org/latest/usage.html
To complement hpk42's answer, you can also use pytest-steps to perform incremental testing, this can help you in particular if you wish to share some kind of incremental state/intermediate results between the steps.
With this package you do not need to put all the steps in a class (you can, but it is not required), simply decorate your "test suite" function with #test_steps:
from pytest_steps import test_steps
def step_a():
# perform this step ...
print("step a")
assert not False # replace with your logic
def step_b():
# perform this step
print("step b")
assert not False # replace with your logic
#test_steps(step_a, step_b)
def test_suite_no_shared_results(test_step):
# Execute the step
test_step()
You can add a steps_data parameter to your test function if you wish to share a StepsDataHolder object between your steps.
import pytest
from pytest_steps import test_steps, StepsDataHolder
def step_a(steps_data):
# perform this step ...
print("step a")
assert not False # replace with your logic
# intermediate results can be stored in steps_data
steps_data.intermediate_a = 'some intermediate result created in step a'
def step_b(steps_data):
# perform this step, leveraging the previous step's results
print("step b")
# you can leverage the results from previous steps...
# ... or pytest.skip if not relevant
if len(steps_data.intermediate_a) < 5:
pytest.skip("Step b should only be executed if the text is long enough")
new_text = steps_data.intermediate_a + " ... augmented"
print(new_text)
assert len(new_text) == 56
#test_steps(step_a, step_b)
def test_suite_with_shared_results(test_step, steps_data: StepsDataHolder):
# Execute the step with access to the steps_data holder
test_step(steps_data)
Finally, you can automatically skip or fail a step if another has failed using #depends_on, check in the documentation for details.
(I'm the author of this package by the way ;) )

Categories