Python Mock Process for Unit Testing

Python Mock Process for Unit Testing - python

Background:
I am currently writing a process monitoring tool (Windows and Linux) in Python and implementing unit test coverage. The process monitor hooks into the Windows API function EnumProcesses on Windows and monitors the /proc directory on Linux to find current processes. The process names and process IDs are then written to a log which is accessible to the unit tests.
Question:
When I unit test the monitoring behavior I need a process to start and terminate. I would love if there would be a (cross-platform?) way to start and terminate a fake system process that I could uniquely name (and track its creation in a unit test).
Initial ideas:
I could use subprocess.Popen() to open any system process but this runs into some issues. The unit tests could falsely pass if the process I'm using to test is run by the system as well. Also, the unit tests are run from the command line and any Linux process I can think of suspends the terminal (nano, etc.).
I could start a process and track it by its process ID but I'm not exactly sure how to do this without suspending the terminal.
These are just thoughts and observations from initial testing and I would love it if someone could prove me wrong on either of these points.
I am using Python 2.6.6.
Edit:
Get all Linux process IDs:
try:
processDirectories = os.listdir(self.PROCESS_DIRECTORY)
except IOError:
return []
return [pid for pid in processDirectories if pid.isdigit()]
Get all Windows process IDs:
import ctypes, ctypes.wintypes
Psapi = ctypes.WinDLL('Psapi.dll')
EnumProcesses = self.Psapi.EnumProcesses
EnumProcesses.restype = ctypes.wintypes.BOOL
count = 50
while True:
# Build arguments to EnumProcesses
processIds = (ctypes.wintypes.DWORD*count)()
size = ctypes.sizeof(processIds)
bytes_returned = ctypes.wintypes.DWORD()
# Call enum processes to find all processes
if self.EnumProcesses(ctypes.byref(processIds), size, ctypes.byref(bytes_returned)):
if bytes_returned.value &lt size:
return processIds
else:
# We weren't able to get all the processes so double our size and try again
count *= 2
else:
print "EnumProcesses failed"
sys.exit()
Windows code is from here

edit: this answer is getting long :), but some of my original answer still applies, so I leave it in :)
Your code is not so different from my original answer. Some of my ideas still apply.
When you are writing Unit Test, you want to only test your logic. When you use code that interacts with the operating system, you usually want to mock that part out. The reason being that you don't have much control over the output of those libraries, as you found out. So it's easier to mock those calls.
In this case, there are two libraries that are interacting with the sytem: os.listdir and EnumProcesses. Since you didn't write them, we can easily fake them to return what we need. Which in this case is a list.
But wait, in your comment you mentioned:
"The issue I'm having with it however is that it really doesn't test
that my code is seeing new processes on the system but rather that the
code is correctly monitoring new items in a list."
The thing is, we don't need to test the code that actually monitors the processes on the system, because it's a third party code. What we need to test is that your code logic handles the returned processes. Because that's the code you wrote. The reason why we are testing over a list, is because that's what your logic is doing. os.listir and EniumProcesses return a list of pids (numeric strings and integers, respectively) and your code acts on that list.
I'm assuming your code is inside a Class (you are using self in your code). I'm also assuming that they are isolated inside their own methods (you are using return). So this will be sort of what I suggested originally, except with actual code :) Idk if they are in the same class or different classes, but it doesn't really matter.
Linux method
Now, testing your Linux process function is not that difficult. You can patch os.listdir to return a list of pids.
def getLinuxProcess(self):
try:
processDirectories = os.listdir(self.PROCESS_DIRECTORY)
except IOError:
return []
return [pid for pid in processDirectories if pid.isdigit()]
Now for the test.
import unittest
from fudge import patched_context
import os
import LinuxProcessClass # class that contains getLinuxProcess method
def test_LinuxProcess(self):
"""Test the logic of our getLinuxProcess.
We patch os.listdir and return our own list, because os.listdir
returns a list. We do this so that we can control the output
(we test *our* logic, not a built-in library's functionality).
"""
# Test we can parse our pdis
fakeProcessIds = ['1', '2', '3']
with patched_context(os, 'listdir', lamba x: fakeProcessIds):
myClass = LinuxProcessClass()
....
result = myClass.getLinuxProcess()
expected = [1, 2, 3]
self.assertEqual(result, expected)
# Test we can handle IOERROR
with patched_context(os, 'listdir', lamba x: raise IOError):
myClass = LinuxProcessClass()
....
result = myClass.getLinuxProcess()
expected = []
self.assertEqual(result, expected)
# Test we only get pids
fakeProcessIds = ['1', '2', '3', 'do', 'not', 'parse']
.....
Windows method
Testing your Window's method is a little trickier. What I would do is the following:
def prepareWindowsObjects(self):
"""Create and set up objects needed to get the windows process"
...
Psapi = ctypes.WinDLL('Psapi.dll')
EnumProcesses = self.Psapi.EnumProcesses
EnumProcesses.restype = ctypes.wintypes.BOOL
self.EnumProcessses = EnumProcess
...
def getWindowsProcess(self):
count = 50
while True:
.... # Build arguments to EnumProcesses and call enun process
if self.EnumProcesses(ctypes.byref(processIds),...
..
else:
return []
I separated the code into two methods to make it easier to read (I believe you are already doing this). Here is the tricky part, EnumProcesses is using pointers and they are not easy to play with. Another thing is, that I don't know how to work with pointers in Python, so I couldn't tell you of an easy way to mock that out =P
What I can tell you is to simply not test it. Your logic there is very minimal. Besides increasing the size of count, everything else in that function is creating the space EnumProcesses pointers will use. Maybe you can add a limit to the count size but other than that, this method is short and sweet. It returns the windows processes and nothing more. Just what I was asking for in my original comment :)
So leave that method alone. Don't test it. Make sure though, that anything that uses getWindowsProcess and getLinuxProcess get's mocked out as per my original suggestion.
Hopefully this makes more sense :) If it doesn't let me know and maybe we can have a chat session or do a video call or something.
original answer
I'm not exactly sure how to do what you are asking, but whenever I need to test code that depends on some outside force (external libraries, popen or in this case processes) I mock out those parts.
Now, I don't know how your code is structured, but maybe you can do something like this:
def getWindowsProcesses(self, ...):
'''Call Windows API function EnumProcesses and
return the list of processes
'''
# ... call EnumProcesses ...
return listOfProcesses
def getLinuxProcesses(self, ...):
'''Look in /proc dir and return list of processes'''
# ... look in /proc ...
return listOfProcessses
These two methods only do one thing, get the list of processes. For Windows, it might just be a call to that API and for Linux just reading the /proc dir. That's all, nothing more. The logic for handling the processes will go somewhere else. This makes these methods extremely easy to mock out since their implementations are just API calls that return a list.
Your code can then easy call them:
def getProcesses(...):
'''Get the processes running.'''
isLinux = # ... logic for determining OS ...
if isLinux:
processes = getLinuxProcesses(...)
else:
processes = getWindowsProcesses(...)
# ... do something with processes, write to log file, etc ...
In your test, you can then use a mocking library such as Fudge. You mock out these two methods to return what you expect them to return.
This way you'll be testing your logic since you can control what the result will be.
from fudge import patched_context
...
def test_getProcesses(self, ...):
monitor = MonitorTool(..)
# Patch the method that gets the processes. Whenever it gets called, return
# our predetermined list.
originalProcesses = [....pids...]
with patched_context(monitor, "getLinuxProcesses", lamba x: originalProcesses):
monitor.getProcesses()
# ... assert logic is right ...
# Let's "add" some new processes and test that our logic realizes new
# processes were added.
newProcesses = [...]
updatedProcesses = originalProcessses + (newProcesses)
with patched_context(monitor, "getLinuxProcesses", lamba x: updatedProcesses):
monitor.getProcesses()
# ... assert logic caught new processes ...
# Let's "kill" our new processes and test that our logic can handle it
with patched_context(monitor, "getLinuxProcesses", lamba x: originalProcesses):
monitor.getProcesses()
# ... assert logic caught processes were 'killed' ...
Keep in mind that if you test your code this way, you won't get 100% code coverage (since your mocked methods won't be run), but this is fine. You're testing your code and not third party's, which is what matters.
Hopefully this might be able to help you. I know it doesn't answer your question, but maybe you can use this to figure out the best way to test your code.

Your original idea of using subprocess is a good one. Just create your own executable and name it something that identifies it as a testing thing. Maybe make it do something like sleep for a while.
Alternately, you could actually use the multiprocessing module. I've not used python in windows much, but you should be able to get process identifying data out of the Process object you create:
p = multiprocessing.Process(target=time.sleep, args=(30,))
p.start()
pid = p.getpid()

Related

Is it possible to test a while True loop with pytest (I try with a timeout)?

I have a python function foo with a while True loop inside.
For background: It is expected do stream info from the web, do some writing and run indefinitely. The asserts test if the writing was done correctly.
Clearly I need it to stop sometime, in order to test.
What I did was to run via multirpocessing and introduce a timeout there, however when I see the test coverage, the function which ran through the multiprocessing, are not marked as covered.
Question 1: Why does pytest now work this way?
Question 2: How can I make this work?
I was thinking it's probably because I technically exit the loop, so maybe pytest does not mark this as tested....
import time
import multiprocessing
def test_a_while_loop():
# Start through multiprocessing in order to have a timeout.
p = multiprocessing.Process(
target=foo
name="Foo",
)
try:
p.start()
# my timeout
time.sleep(10)
p.terminate()
finally:
# Cleanup.
p.join()
# Asserts below
...
More info
I looked into adding a decorator such as #pytest.mark.timeout(5), but that did not work and it stops the whole function, so I never get to the asserts. (as suggested here).
If I don't find a way, I will just test the parts, but ideally I would like to find a way to test by breaking the loop.
I know I can re-write my code in order to make it have a timeout, but that would mean changing the code to make it testable, which I don't think is a good design.
Mocks I have not tried (as suggested here), because I don't believe I can mock what I do, since it writes info from the web. I need to actually see the "original" working.

Break out the functionality you want to test into a helper method. Test the helper method.
def scrape_web_info(url):
data = get_it(url)
return data
# In production:
while True:
scrape_web_info(...)
# During test:
def test_web_info():
assert scrape_web_info(...) == ...

Yes, it is possible and the code above shows one way to do it (run through a multiprocessing with a timeout).
Since the asserts were running fine, I found out that the issue was not the pytest, but the coverage report not accounting for the multiprocessing properly.
I describe how I fix this (now separate) issue question here.

Actually, I had the same problem with an endless task to test and coverage. However, In my code, there is a .run_forever() method which runs a .run_once() method inside in an infinite loop. So, I can write a unit test for the .run_once() method to test its functionality. Nevertheless, if you want to test your forever function despite the Halting Problem for getting more extent code coverage, I propose the following approach using a timeout regardless of tools you've mentioned including multiprocessing or #pytest.mark.timeout(5) which didn't work for me either:
First, install the interruptingcow PyPI package to have a nice timeout for raising an optional exception: pip install interruptingcow
Then:
import pytest
import asyncio
from interruptingcow import timeout
from <path-to-loop-the-module> import EventLoop
class TestCase:
#pytest.mark.parametrize("test_case", ['none'])
def test_events(self, test_case: list):
assert EventLoop().run_once() # It's usual
#pytest.mark.parametrize("test_case", ['none'])
def test_events2(self, test_case: list):
try:
with timeout(10, exception=asyncio.CancelledError):
EventLoop().run_forever()
assert False
except asyncio.CancelledError:
assert True

Segmentation fault when initializing array

I am getting a segmentation fault when initializing an array.
I have a callback function from when an RFID tag gets read
IDS = []
def readTag(e):
epc = str(e.epc, 'utf-8')
if not epc in IDS:
now = datetime.datetime.now().strftime('%m/%d/%Y %H:%M:%S')
IDS.append([epc, now, "name.instrument"])
and a main function from which it's called
def main():
for x in vals:
IDS.append([vals[0], vals[1], vals[2]])
for x in IDS:
print(x[0])
r = mercury.Reader("tmr:///dev/ttyUSB0", baudrate=9600)
r.set_region("NA")
r.start_reading(readTag, on_time=1500)
input("press any key to stop reading: ")
r.stop_reading()
The error occurs because of the line IDS.append([epc, now, "name.instrument"]). I know because when I replace it with a print call instead the program will run just fine. I've tried using different types for the array objects (integers), creating an array of the same objects outside of the append function, etc. For some reason just creating an array inside the "readTag" function causes the segmentation fault like row = [1,2,3]
Does anyone know what causes this error and how I can fix it? Also just to be a little more specific, the readTag function will work fine for the first two (only ever two) calls, but then it crashes and the Reader object that has the start_reading() function is from the mercury-api

This looks like a scoping issue to me; the mercury library doesn't have permission to access your list's memory address, so when it invokes your callback function readTag(e) a segfault occurs. I don't think that the behavior that you want is supported by that library

To extend Michael's answer, this appears to be an issue with scoping and the API you're using. In general pure-Python doesn't seg-fault. Or at least, it shouldn't seg-fault unless there's a bug in the interpreter, or some extension that you're using. That's not to say pure-Python won't break, it's just that a genuine seg-fault indicates the problem is probably the result of something messy outside of your code.
I'm assuming you're using this Python API.
In that case, the README.md mentions that the Reader.start_reader() method you're using is "asynchronous". Meaning it invokes a new thread or process and returns immediately and then the background thread continues to call your callback each time something is scanned.
I don't really know enough about the nitty gritty of CPython to say exactly what going on, but you've declared IDS = [] as a global variable and it seems like the background thread is running the callback with a different context to the main program. So when it attempts to access IDS it's reading memory it doesn't own, hence the seg-fault.
Because of how restrictive the callback is and the apparent lack of a buffer, this might be an oversight on the behalf of the developer. If you really need asynchronous reads it's worth sending them an issue report.
Otherwise, considering you're just waiting for input you probably don't need the asynchronous reads, and you could use the synchronous Reader.read() method inside your own busy loop instead with something like:
try:
while True:
readTags(r.read(timeout=10))
except KeyboardInterrupt: ## break loop on SIGINT (Ctrl-C)
pass
Note that r.read() returns a list of tags rather than just one, so you'd need to modify your callback slightly, and if you're writing more than just a quick script you probably want to use threads to interrupt the loop properly as SIGINT is pretty hacky.

RAM is not freed after a Python function is invoked

I'm using an in-house Python library for scientific computing. I need to consecutively copy an object, modify it, and then delete it. The object is huge which causes my machine to run out of memory after a few cycles.
The first problem is that I use python's del to delete the object, which apparently only dereferences the object, rather than freeing up RAM.
The second problem is that even when I encapsulate the whole process in a function, after the function is invoked, the RAM is still not freed up. Here's a code snippet to better explain the issue.
ws = op.core.Workspace()
net = op.network.Cubic(shape=[100,100,100], spacing=1e-6)
proj = net.project
def f():
for i in range(5):
clone = ws.copy_project(proj)
result = do_something_with(clone)
del clone
f()
gc.collect()
>>> ws
{'sim_01': [<openpnm.network.Cubic object at 0x7fed1c417780>],
'sim_02': [<openpnm.network.Cubic object at 0x7fed1c417888>],
'sim_03': [<openpnm.network.Cubic object at 0x7fed1c417938>],
'sim_04': [<openpnm.network.Cubic object at 0x7fed1c417990>],
'sim_05': [<openpnm.network.Cubic object at 0x7fed1c4179e8>],
'sim_06': [<openpnm.network.Cubic object at 0x7fed1c417a40>]}
My question is how do I completely delete a Python object?
Thanks!
PS. In the code snippet, each time ws.copy_project is called, a copy of proj is stored in ws dictionary.

There are some really smart python people on here. They may be able to tell you better ways to keep your memory clear, but I have used leaky libraries before, and found one (so-far) foolproof way to guarantee that your memory gets cleared after use: execute the memory hog in another process.
To do this, you'd need to arrange for an easy way to make your long calculation be executable separately. I have done this by adding special flags to my existing python script that tells it just to run that function; you may find it easier to put that function in a separate .py file, e.g.:
do_something_with.py
import sys
def do_something_with(i)
# Your example is still too vague. Clearly, something differentiates
# each do_something_with, otherwise you're just taking the
# same inputs 5 times over.
# Whatever the difference is, pass it in as an argument to the function
ws = op.core.Workspace()
net = op.network.Cubic(shape=[100,100,100], spacing=1e-6)
proj = net.project
# You may not even need to clone anymore?
clone = ws.copy_project(proj)
result = do_something_with(clone)
# Whatever arg(s) you need to get to the function, just pass it in on the command line
if __name__ == "__main__":
sys.exit(do_something_with(sys.args[1:]))
You can do this using any of the python tools that handle subprocesses. In python 3.5+, the recommended way to do this is subprocess.run. You could change your bigger function to something like this:
import subprocess
invoke_do_something(i):
completed_args = subprocess.run(["python", "do_something_with.py", str(i)], check=False)
return completed_args.returncode
results = map(invoke_do_something, range(5))
You'll obviously need to tailor this to fit your own situation, but by running in a subprocess, you're guaranteed to not have to worry about the memory getting cleaned up. As an added bonus, you could potentially use multiprocess.Pool.map to use multiple processors at one time. (I deliberately coded this to use map to make such a transition simple. You could still use your for loop if you prefer, and then you don't need the invoke... function.) Multiprocessing could speed up your processing, but since you're already worried about memory, is almost certainly a bad idea - with multiple processes of the big memory hog, your system itself will likely quickly run out of memory and kill your process.
Your example is fairly vague, so I've written this at a high level. I can answer some questions if you need.

python simple threading won't ends without doing anything (maybe)

When i run the following code (using "sudo python servers.py") the process seem to just finish immediately with just printing "test".
why doesn't the functions "proxy_server" won't run ? or maybe they do but i do not realize that. (because the first line in proxy function doesn't print anything)
this is an impotent code, i didn't want to put unnecessary content, yet it still demonstrate my problem:
import os,sys,thread,socket,select,struct,time
HTTP_PORT = 80
FTP_PORT=21
FTP_DATA_PORT = 20
IP_IN = '10.0.1.3'
IP_OUT = '10.0.3.3'
sys_http = 'http_proxy'
sys_ftp = 'ftp_proxy'
sys_ftp_data = 'ftp_data_proxy'
def main():
try:
thread.start_new_thread(proxy_server, (HTTP_PORT, IP_IN,sys_http,http_handler))
thread.start_new_thread(proxy_server, (FTP_PORT, IP_IN,sys_ftp,http_handler))
thread.start_new_thread(proxy_server, (FTP_DATA_PORT, IP_OUT,sys_ftp_data,http_handler))
print "test"
except e:
print 'Error!'
sys.exit(1)
def proxy_server(host,port,fileName,handler):
print "Proxy Server Running on ",host,":",port
def http_handler(src,sock):
return ''
if __name__ == '__main__':
main()
What am i missing or doing wrong ?

First, you have indentation problems related to using mixed tabs and spaces for indentation. While they didn't cause your code to misbehave in this particular case, they will cause you problems later if you don't stick to consistently using one or the other. They've already broken the displayed indentation in your question; see the print "test" line in main, which looks misaligned.
Second, instead of the low-level thread module, you should be using threading. Your problem is occurring because, as documented in the thread module documentation,
When the main thread exits, it is system defined whether the other threads survive. On SGI IRIX using the native thread implementation, they survive. On most other systems, they are killed without executing try ... finally clauses or executing object destructors.
threading threads let you explicitly define whether other threads should survive the death of the main thread, and default to surviving. In general, threading is much easier to use correctly.

developing for modularity & reusability: how to handle While True loops?

I've been playing around with the pybluez module recently to scan for nearby Bluetooth devices. What I want to do now is extend the program to also find nearby WiFi client devices.
The WiFi client scanner will have need to have a While True loop to continually monitor the airwaves. If I were to write this as a straight up, one file program, it would be easy.
import ...
while True:
client = scan()
print client['mac']
What I want, however, is to make this a module. I want to be able to reuse it later and, possible, have others use it too. What I can't figure out is how to handle the loop.
import mymodule
scan()
Assuming the first example code was 'mymodule', this program would simply print out the data to stdout. I would want to be able to use this data in my program instead of having the module print it out...
How should I code the module?

I think the best approach is going to be to have the scanner run on a separate thread from the main program. The module should have methods that start and stop the scanner, and another that returns the current access point list (using a lock to synchronize). See the threading module.

How about something pretty straightforward like:
mymodule.py
import ...
def scanner():
while True:
client = scan()
yield client['mac']
othermodule.py
import mymodule
for mac in mymodule.scanner():
print mac
If you want something more useful than that, I'd also suggest a background thread as #kindall did.

Two interfaces would be useful.
scan() itself, which returned a list of found devices, such that I could call it to get an instantaneous snapshot of available bluetooth. It might take a max_seconds_to_search or a max_num_to_return parameter.
A "notify on found" function that accepted a callback. For instance (maybe typos, i just wrote this off the cuff).
def find_bluetooth(callback_func, time_to_search = 5.0):
already_found = []
start_time = time.clock()
while 1:
if time.clock()-start_time > 5.0: break
found = scan()
for entry in found:
if entry not in already_found:
callback_func(entry)
already_found.append(entry)
which would be used by doing this:
def my_callback(new_entry):
print new_entry # or something more interesting...
find_bluetooth(my_callback)

If I get your question, you want scan() in a separate file, so that it can be reused later.
Create utils.py
def scan():
# write code for scan here.
Create WiFi.py
import utils
def scan_wifi():
while True:
cli = utils.scan()
...
return

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.