UnitTesting questions to check my coding - python

My assignment is to test my Statslist for reasonable inputs that might go into using my StatsList class. I'm very confused on how to make unittests for these questions provided. I've correctly done the first one in class. The questions are:
What is the count, mean, median, and mode of an empty list (a StatsList that nothing's been appended to)?
What is the count, mean, median, and mode of a list with one value?
What is the count, mean, median, and mode of a list with two values?
If values are inserted in the wrong order (as in the example above), does the median still work?
If values are inserted in order, does the median still work?
If there are multiple values that all appear the same number of times, what will the mode be?
Coding:
import unittest
import statslist
class StatsTest(unittest.TestCase):
def test_Append(self):
sl = statslist.StatsList()
self.assertEqual(0, sl.count())
sl.append(10)
self.assertEqual(1, sl.count())
def test_OneValue(self):
def test_Mean(self):
self.assert
def test_Median(self):
def test_Mode(self):
if __name__ == '__main__':
unittest.main()
Coding for StatsList:
class StatsList:
def __init__(self):
self.sum = 0
self.nums = []
def append(self, number):
self.nums.append(number)
def count(self):
count = len(self.nums)
return count
def mean(self):
for num in self.nums:
self.sum = self.sum + num
return self.sum /len(self.nums)
def median(self):
self.nums.sort()
midPos = self.count() // 2
if self.count() % 2 == 0:
median = (nums[midPos] + nums[midPos-1]) / 2.0
else:
median = self.nums[midPos]
return median
def mode(self):
counts= {}
for num in self.nums:
counts[num] = counts.get(num,0) + 1
mode = max(counts, key = counts.get)
return mode
def byFreq(pair):
return pair[1]
def main():
l = StatsList()
l.append(1)
l.append(11)
l.append(3)
l.append(1)
l.append(4)
print("Count:", l.count()) # should print 5
print("Mean:", l.mean()) # should print 4.0
print("Median:", l.median()) # should print 3
print("Mode:", l.mode()) # should print 1
if __name__ == '__main__':
main()

You can write something like this:
import unittest
import statslist
class StatsTest(unittest.TestCase):
def test_append(self):
sl = statslist.StatsList()
self.assertEqual(0, sl.count())
sl.append(10)
self.assertEqual(1, sl.count())
def test_one_value(self):
# given
sl = statslist.StatsList()
# when
sl.append(10)
# then
self.assertEqual(1, sl.count())
self.assertEqual(10, sl.mean())
self.assertEqual(10, sl.median())
self.assertEqual(10, sl.mode())
def test_two_values(self):
# given
sl = statslist.StatsList()
# when
sl.append(10)
sl.append(11)
# then
self.assertEqual(1, sl.count())
self.assertEqual(10, sl.mean())
self.assertEqual(10, sl.median())
self.assertEqual(10, sl.mode())
def test_median_wrong_order(self):
# given
sl = statslist.StatsList()
# when
sl.append(12)
sl.append(13)
sl.append(11)
# then
self.assertEqual(12, sl.median())
def test_median_in_order(self):
# given
sl = statslist.StatsList()
# when
sl.append(11)
sl.append(12)
sl.append(13)
# then
self.assertEqual(12, sl.median())
def test_mode_with_multiple_vals_same_num_of_times(self):
# given
sl = statslist.StatsList()
# when
sl.append(11)
sl.append(11)
sl.append(12)
sl.append(12)
sl.append(13)
# then
self.assertEqual(11, sl.mode())
The idea of unit tests is to make sure your code actually works the way it is supposed to. It is a great way to discover bugs early and to prevent you having to spend countless hours debugging that weird bug that's just happened in production.
Your unit test should cover all (or most) edge cases. This brings an additional advantage: it automatically documents your code and helps others refactor your code later because they can just run the unit tests and if there is an error after refactoring, that probably means they've done something wrong.
Depending on your needs, you could improve your code to automatically track the stats when an element is added. This would make the mean(), median(), count() and mode() execute with O(1) complexity, however depending on the algorithms used it might slow down the append() method.

Related

Mocking API call embedded in some objects and changing behavior based on inputs within object

This is a continuation of the SO question asked here but with a more complicated pattern than originally requested. My main intention is to try to mock an API call based on values passed to its caller. The API call has no idea of the values passed to its caller but needs to provide the correct behavior so that the caller can be tested fully. I am using time to determine which behavior I want when I want it.
Given a an object:
# some_object.py
from some_import import someApiCall
class SomeObject():
def someFunction(time, a, b, c):
apiReturnA = someApiCall(a)
returnB = b + 1
apiReturnC = someApiCall(c)
return [apiReturnA, returnB, apiReturnC]
that is created by another object with entry point code:
# some_main.py
import some_object
class myMainObject():
def entry_point(self, time):
someObj = some_object.SomeObject()
if 'yesterday' == time:
(a, b, c) = (1, 1, 1)
elif 'today' == time:
(a, b, c) = (2, 2, 2)
elif 'later' == time:
(a, b, c) = (3, 3, 3)
elif 'tomorrow' == time:
(a, b, c) = (4, 4, 4)
else:
return "ERROR"
return someObj.someFunction(time, a, b, c)
how can I get someApiCall to change based on the time argument?
# some_import.py
def someApiCall(var):
print("I'm not mocked! I'm slow, random and hard to test")
return var + 2
Here is an example test case
# test_example.py
import some_main
def amend_someApiCall_yesterday(var):
# Reimplement api.someApiCall
return var * 2
def amend_someApiCall_today(var):
# Reimplement api.someApiCall
return var * 3
def amend_someApiCall_later(var):
# Just wrap around api.someApiCall. Call the original function afterwards. Here we can also put
# some conditionals e.g. only call the original someApiCall if the var is an even number.
import some_import
var *= 4
return some_import.someApiCall(var)
def someObject_decorator_patch(someFunction, mocker, *args):
def wrapper(time, a, b, c):
# If x imports y.z and we want to patch the calls to z, then we have to patch x.z. Patching
# y.z would still retain the original value of x.z thus still calling the original
# functionality. Thus here, we would be patching src.someApiCall and not api.someApiCall.
if time == "yesterday":
mocker.patch("some_object.someApiCall", side_effect=amend_someApiCall_yesterday)
elif time == "today":
mocker.patch("some_object.someApiCall", side_effect=amend_someApiCall_today)
elif time == "later":
mocker.patch("some_object.someApiCall", side_effect=amend_someApiCall_later)
elif time == "tomorrow":
mocker.patch("src.someApiCall", return_value=0)
else:
# Use the original api.someApiCall
pass
return someFunction(time, a, b, c)
return wrapper
def test_some_main(mocker):
results = 0
uut = some_main.myMainObject()
times = ['yesterday', 'today', 'later', 'tomorrow']
mocker.patch.object(some_main.some_object.SomeObject, 'someFunction', someObject_decorator_patch)
for time in times:
results = uut.entry_point(time)
print(results)
assert 0 != results
The test case doesn't get the result I want (it returns a function pointer).
Here is an improvised solution of https://stackoverflow.com/a/67498948/11043825 without using decorators.
As already pointed out, we still need to intercept the calls to the function that accepts the time argument which indicates how someApiCall would behave, which is either entry_point or someFunction. Here we would intercept someFunction.
Instead of implementing python decorator on someFunction which then needs to call that explicitly created decorated function, here we would amend (well this still follows the decorator design pattern) the someFunction in-place and make it available to the rest of the source code calls without explicitly changing the call to the decorated function. This is like an in-place replacement of the original functionalities, where we would replace (or rather wrap around) the original functionality with an updated one which would perform an assessment of the time before calling the original functionality.
Also for your reference, I solved it for 2 types of functions, a class method src.SomeClass.someFunction and a global function src.someFunction2.
./_main.py
import src
class MyMainClass:
def __init__(self):
self.var = 0
def entry_point(self, time):
someObj = src.SomeClass()
self.var += 1
if self.var >= 10:
self.var = 0
ret = f'\n[1]entry_point({time})-->{someObj.someFunction(time, self.var)}'
self.var += 1
ret += f'\n[2]entry_point({time})-->{src.someFunction2(time, self.var)}'
return ret
./src.py
class SomeClass:
def someFunction(self, time, var):
return f'someFunction({time},{var})-->{someSloowApiCall(var)}'
def someFunction2(time, var):
return f'someFunction2({time},{var})-->{someSloowApiCall2(var)}'
./api.py
def someSloowApiCall(var):
return f'someSloowApiCall({var})-->{special_message(var)}'
def someSloowApiCall2(var):
return f'someSloowApiCall2({var})-->{special_message(var)}'
def special_message(var):
special_message = "I'm not mocked! I'm slow, random and hard to test"
if var > 10:
special_message = "I'm mocked! I'm not slow, random or hard to test"
return special_message
./test_main.py
import _main, pytest, api
def amend_someApiCall_yesterday(var):
# Reimplement api.someSloowApiCall
return f'amend_someApiCall_yesterday({var})'
def amend_someApiCall_today(var):
# Reimplement api.someSloowApiCall
return f'amend_someApiCall_today({var})'
def amend_someApiCall_later(var):
# Just wrap around api.someSloowApiCall. Call the original function afterwards. Here we can also put
# some conditionals e.g. only call the original someSloowApiCall if the var is an even number.
return f'amend_someApiCall_later({var})-->{api.someSloowApiCall(var+10)}'
def amend_someApiCall_later2(var):
# Just wrap around api.someSloowApiCall2. Call the original function afterwards. Here we can also put
# some conditionals e.g. only call the original someSloowApiCall2 if the var is an even number.
return f'amend_someApiCall_later2({var})-->{api.someSloowApiCall2(var+10)}'
def get_amended_someFunction(mocker, original_func):
def amend_someFunction(self, time, var):
if time == "yesterday":
mocker.patch("_main.src.someSloowApiCall", amend_someApiCall_yesterday)
# or
# src.someSloowApiCall = amend_someApiCall_yesterday
elif time == "today":
mocker.patch("_main.src.someSloowApiCall", amend_someApiCall_today)
# or
# src.someSloowApiCall = amend_someApiCall_today
elif time == "later":
mocker.patch("_main.src.someSloowApiCall", amend_someApiCall_later)
# or
# src.someSloowApiCall = amend_someApiCall_later
elif time == "tomorrow":
mocker.patch("_main.src.someSloowApiCall", lambda var: f'lambda({var})')
# or
# src.someSloowApiCall = lambda var: 0
else:
pass
# or
# src.someSloowApiCall = someSloowApiCall
return original_func(self, time, var)
return amend_someFunction
def get_amended_someFunction2(mocker, original_func):
def amend_someFunction2(time, var):
if time == "yesterday":
mocker.patch("_main.src.someSloowApiCall2", amend_someApiCall_yesterday)
# or
# src.someSloowApiCall2 = amend_someApiCall_yesterday
elif time == "today":
mocker.patch("_main.src.someSloowApiCall2", amend_someApiCall_today)
# or
# src.someSloowApiCall2 = amend_someApiCall_today
elif time == "later":
mocker.patch("_main.src.someSloowApiCall2", amend_someApiCall_later2)
# or
# src.someSloowApiCall2 = amend_someApiCall_later
elif time == "tomorrow":
mocker.patch("_main.src.someSloowApiCall2", lambda var : f'lambda2({var})')
# or
# src.someSloowApiCall2 = lambda var: 0
else:
pass
# or
# src.someSloowApiCall2 = someSloowApiCall2
return original_func(time, var)
return amend_someFunction2
#pytest.mark.parametrize(
'time',
[
'yesterday',
'today',
'later',
'tomorrow',
'whenever',
],
)
def test_entrypointFunction(time, mocker):
mocker.patch.object(
_main.src.SomeClass,
"someFunction",
side_effect=get_amended_someFunction(mocker, _main.src.SomeClass.someFunction),
autospec=True, # Needed for the self argument
)
# or
# src.SomeClass.someFunction = get_amended_someFunction(mocker, src.SomeClass.someFunction)
mocker.patch(
"_main.src.someFunction2",
side_effect=get_amended_someFunction2(mocker, _main.src.someFunction2),
)
# or
# src.someFunction2 = get_amended_someFunction2(mocker, src.someFunction2)
uut = _main.MyMainClass()
print(f'\nuut.entry_point({time})-->{uut.entry_point(time)}')
Output:
$ pytest -rP
=================================== PASSES ====================================
_____________________ test_entrypointFunction[yesterday] ______________________
---------------------------- Captured stdout call -----------------------------
uut.entry_point(yesterday)-->
[1]entry_point(yesterday)-->someFunction(yesterday,1)-->amend_someApiCall_yesterday(1)
[2]entry_point(yesterday)-->someFunction2(yesterday,2)-->amend_someApiCall_yesterday(2)
_______________________ test_entrypointFunction[today] ________________________
---------------------------- Captured stdout call -----------------------------
uut.entry_point(today)-->
[1]entry_point(today)-->someFunction(today,1)-->amend_someApiCall_today(1)
[2]entry_point(today)-->someFunction2(today,2)-->amend_someApiCall_today(2)
_______________________ test_entrypointFunction[later] ________________________
---------------------------- Captured stdout call -----------------------------
uut.entry_point(later)-->
[1]entry_point(later)-->someFunction(later,1)-->amend_someApiCall_later(1)-->someSloowApiCall(11)-->I'm mocked! I'm not slow, random or hard to test
[2]entry_point(later)-->someFunction2(later,2)-->amend_someApiCall_later2(2)-->someSloowApiCall2(12)-->I'm mocked! I'm not slow, random or hard to test
______________________ test_entrypointFunction[tomorrow] ______________________
---------------------------- Captured stdout call -----------------------------
uut.entry_point(tomorrow)-->
[1]entry_point(tomorrow)-->someFunction(tomorrow,1)-->lambda(1)
[2]entry_point(tomorrow)-->someFunction2(tomorrow,2)-->lambda2(2)
______________________ test_entrypointFunction[whenever] ______________________
---------------------------- Captured stdout call -----------------------------
uut.entry_point(whenever)-->
[1]entry_point(whenever)-->someFunction(whenever,1)-->someSloowApiCall(1)-->I'm not mocked! I'm slow, random and hard to test
[2]entry_point(whenever)-->someFunction2(whenever,2)-->someSloowApiCall2(2)-->I'm not mocked! I'm slow, random and hard to test
============================== 5 passed in 0.07s ==============================

Why does python multiprocessing script slow down after a while?

I read an old question Why does this python multiprocessing script slow down after a while? and many others before posting this one. They do not answer the problem I'm having.
IDEA OF THE SCRIPT.
The script generates arrays, 256x256, in a serialised loop. Elements of an array are calculated one-by-one from a list that contains dictionaries with relevant params, one dictionary per an array element (256x256 in total per a list). The list is the way for me to enable parallel calculations.
THE PROBLEM.
In the beginning, the generation of the data speeds up from a dozen seconds up-to a few seconds. Then, after a few iterations, it starts slowing down a fraction of a second with each new array generated to the point it takes forever to calculate anything.
Additional info.
I am using a pool.map function. After making a few small changes to identify which element is being calculated, I also tried using map_async. Unfortunately, it is slower because I need to init the pool each time I finish calculating an array.
When using the pool.map, I init the pool once before anything starts. In this way, I hope to save time initializing the pool in comparison to map_async.
CPU shows low usage, up to ~18%.
In my instance, a hard-drive isn't a bottleneck. All the data necessary for calculations is in RAM. I also do not save data onto a hard-drive keeping everything in RAM.
I also checked if the problem persists if I use a different number of cores, 2-24. No changes either.
I made some additional tests by running and terminating a pool, a. each time an array is generated, b. every 10 arrays. I noticed that in each case execution of the code slows down compared to the previous pool's execution time, i.e. if the previous slowed down to 5s, another one will be 5.Xs and so on. The only time the execution doesn't slow down is when I run the code serially.
Working env: Windows 10, Python 3.7, conda 4.8.2, Spyder 4.
THE QUESTION: Why multiprocessing slows down after a while in the case where only CPU & RAM are involved (no hard-drive slowdown)? Any idea?
UPDATED CODE:
import multiprocessing as mp
from tqdm import tqdm
import numpy as np
import random
def wrapper_(arg):
return tmp.generate_array_elements(
self=arg['self'],
nu1=arg['nu1'],
nu2=arg['nu2'],
innt=arg['innt'],
nu1exp=arg['nu1exp'],
nu2exp=arg['nu2exp'],
ii=arg['ii'],
jj=arg['jj'],
llp=arg['self'].llp,
rr=arg['self'].rr,
)
class tmp:
def __init__(self, multiprocessing, length, n_of_arrays):
self.multiprocessing = multiprocessing
self.inshape = (length,length)
self.length = length
self.ll_len = n_of_arrays
self.num_cpus = 8
self.maxtasksperchild = 10000
self.rr = 0
"""original function is different, modified to return something"""
"""for the example purpose, lp is not relevant here but in general is"""
def get_ll(self, lp):
return [random.sample((range(self.length)),int(np.random.random()*12)+1) for ii in range(self.ll_len)]
"""original function is different, modified to return something"""
def get_ip(self): return np.random.random()
"""original function is different, modified to return something"""
def get_op(self): return np.random.random(self.length)
"""original function is different, modified to return something"""
def get_innt(self, nu1, nu2, ip):
return nu1*nu2/ip
"""original function is different, modified to return something"""
def __get_pp(self, nu1):
return np.exp(nu1)
"""dummy function for the example purpose"""
def dummy_function(self):
"""do important stuff"""
return
"""dummy function for the example purpose"""
def dummy_function_2(self, result):
"""do important stuff"""
return np.reshape(result, np.inshape)
"""dummy function for the example purpose"""
def dummy_function_3(self):
"""do important stuff"""
return
"""original function is different, modified to return something"""
"""for the example purpose, lp is not relevant here but in general is"""
def get_llp(self, ll, lp):
return [{'a': np.random.random(), 'b': np.random.random()} for ii in ll]
"""NOTE, lp is not used here for the example purpose but
in the original code, it's very important variable containg
relevant data for calculations"""
def generate(self, lp={}):
"""create a list that is used to the creation of 2-D array"""
"""providing here a dummy pp param to get_ll"""
ll = self.get_ll(lp)
ip = self.get_ip()
self.op = self.get_op()
"""length of args_tmp = self.length * self.length = 256 * 256"""
args_tmp = [
{'self': self,
'nu1': nu1,
'nu2': nu2,
'ii': ii,
'jj': jj,
'innt': np.abs(self.get_innt(nu1, nu2, ip)),
'nu1exp': np.exp(1j*nu1*ip),
'nu2exp': np.exp(1j*nu2*ip),
} for ii, nu1 in enumerate(self.op) for jj, nu2 in enumerate(self.op)]
pool = {}
if self.multiprocessing:
pool = mp.Pool(self.num_cpus, maxtasksperchild=self.maxtasksperchild)
"""number of arrays is equal to len of ll, here 300"""
for ll_ in tqdm(ll):
"""Generate data"""
self.__generate(ll_, lp, pool, args_tmp)
"""Create a pool of CPU threads"""
if self.multiprocessing:
pool.terminate()
def __generate(self, ll, lp, pool = {}, args_tmp = []):
"""In the original code there are plenty other things done in the code
using class' methods, they are not shown here for the example purpose"""
self.dummy_function()
self.llp = self.get_llp(ll, lp)
"""originally the values is taken from lp"""
self.rr = self.rr
if self.multiprocessing and pool:
result = pool.map(wrapper_, args_tmp)
else:
result = [wrapper_(arg) for arg in args_tmp]
"""In the original code there are plenty other things done in the code
using class' methods, they are not shown here for the example purpose"""
result = self.dummy_function_2(result)
"""original function is different"""
def generate_array_elements(self, nu1, nu2, llp, innt, nu1exp, nu2exp, ii = 0, jj = 0, rr=0):
if rr == 1 and self.inshape[0] - 1 - jj < ii:
return 0
elif rr == -1 and ii > jj:
return 0
elif rr == 0:
"""do nothing"""
ll1 = []
ll2 = []
"""In the original code there are plenty other things done in the code
using class' methods, they are not shown here for the example purpose"""
self.dummy_function_3()
for kk, ll in enumerate(llp):
ll1.append(
self.__get_pp(nu1) *
nu1*nu2*nu1exp**ll['a']*np.exp(1j*np.random.random())
)
ll2.append(
self.__get_pp(nu2) *
nu1*nu2*nu2exp**ll['b']*np.exp(1j*np.random.random())
)
t1 = sum(ll1)
t2 = sum(ll2)
result = innt*np.abs(t1 - t2)
return result
g = tmp(False, 256, 300)
g.generate()
It is hard to tell what is going on in your algorithm. I don't know a lot about multiprocessing but it is probably safer to stick with functions and avoid passing self down into the pooled processes. This is done when you pass args_tmp to wrapper_ in pool.map(). Also overall, try to reduce how much data is passed between the parent and child processes in general. I try to move the generation of the lp list into the pool workers to prevent passing excessive data.
Lastly, altough I don't think it matters in this example code but you should be either cleaning up after using pool or using pool with with.
I rewrote some of your code to try things out and this seems faster but I'm not 100% it adheres to your algorithm. Some of the variable names are hard to distinguish.
This runs a lot faster for me but it is hard to tell if it is producing your solutions accurately. My final conclusion if this is accurate is that the extra data passing was significantly slowing down the pool workers.
#main.py
if __name__ == '__main__':
import os
import sys
file_dir = os.path.dirname(__file__)
sys.path.append(file_dir)
from tmp import generate_1
parallel = True
generate_1(parallel)
#tmp.py
import multiprocessing as mp
import numpy as np
import random
from tqdm import tqdm
from itertools import starmap
def wrapper_(arg):
return arg['self'].generate_array_elements(
nu1=arg['nu1'],
nu2=arg['nu2'],
ii=arg['ii'],
jj=arg['jj'],
lp=arg['self'].lp,
nu1exp=arg['nu1exp'],
nu2exp=arg['nu2exp'],
innt=arg['innt']
)
def generate_1(parallel):
"""create a list that is used to the creation of 2-D array"""
il = np.random.random(256)
"""generating params for parallel data generation"""
"""some params are also calculated here to speed up the calculation process
because they are always the same so they can be calculated just once"""
"""this code creates a list of 256*256 elements"""
args_tmp = [
{
'nu1': nu1,
'nu2': nu2,
'ii': ii,
'jj': jj,
'innt': np.random.random()*nu1+np.random.random()*nu2,
'nu1exp': np.exp(1j*nu1),
'nu2exp': np.exp(1j*nu2),
} for ii, nu1 in enumerate(il) for jj, nu2 in enumerate(il)]
"""init pool"""
"""get list of arrays to generate"""
ip_list = [random.sample((range(256)),int(np.random.random()*12)+1) for ii in range(300)]
map_args = [(idx, ip, args_tmp) for idx, ip in enumerate(ip_list)]
"""separate function to do other important things"""
if parallel:
with mp.Pool(8, maxtasksperchild=10000) as pool:
result = pool.starmap(start_generate_2, map_args)
else:
result = starmap(start_generate_2, map_args)
# Wrap iterator in list call.
return list(result)
def start_generate_2(idx, ip, args_tmp):
print ('starting {idx}'.format(idx=idx))
runner = Runner()
result = runner.generate_2(ip, args_tmp)
print ('finished {idx}'.format(idx=idx))
return result
class Runner():
def generate_2(self, ip, args_tmp):
"""NOTE, the method is much more extensive and uses other methods of the class"""
"""so it must remain a method of the class that is not static!"""
self.lp = [{'a': np.random.random(), 'b': np.random.random()} for ii in ip]
"""this part creates 1-D array of the length of args_tmp, that's 256*256"""
result = map(wrapper_, [dict(args, self=self) for args in args_tmp])
"""it's then reshaped to 2-D array"""
result = np.reshape(list(result), (256,256))
return result
def generate_array_elements(self, nu1, nu2, ii, jj, lp, nu1exp, nu2exp, innt):
"""doing heavy calc"""
""""here is something else"""
if ii > jj: return 0
ll1 = []
ll2 = []
for kk, ll in enumerate(lp):
ll1.append(nu1*nu2*nu1exp**ll['a']*np.exp(1j*np.random.random()))
ll2.append(nu1*nu2*nu2exp**ll['b']*np.exp(1j*np.random.random()))
t1 = sum(ll1)
t2 = sum(ll2)
result = innt*np.abs(t1 - t2)
return result
I'm adding a generic template to show an architecture where you would split the preparation of the shared args away from the task runner and still use classes. The strategy here would be do not create too many tasks(300 seems faster than trying to split them down to 64000), and don't pass too much data to each task. The interface of launch_task should be kept as simple as possible, which in my refactoring of your code would be equivalent to start_generate_2.
import multiprocessing
from itertools import starmap
class Launcher():
def __init__(self, parallel):
self.parallel = parallel
def generate_shared_args(self):
return [(i, j) for i, j in enumerate(range(300))]
def launch(self):
shared_args = self.generate_shared_args()
if self.parallel:
with multiprocessing.Pool(8) as pool:
result = pool.starmap(launch_task, shared_args)
else:
result = starmap(launch_task, shared_args)
# Wrap in list to resolve iterable.
return list(result)
def launch_task(i, j):
task = Task(i, j)
return task.run()
class Task():
def __init__(self, i, j):
self.i = i
self.j = j
def run(self):
return self.i + self.j
if __name__ == '__main__':
parallel = True
launcher = Launcher(parallel)
print(launcher.launch())
There is a warning about the cleanup of pool in the pool documentation here: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool
The first item discusses avoiding shared state and specifically large amounts of data.
https://docs.python.org/3/library/multiprocessing.html#programming-guidelines
Ian Wilson's suggestions were very helpful and one them helped to resolve the issue. That's why his answer is marked as the correct one.
As he suggested, it's better to call pool on a smaller number of tasks. So instead of calling pool.map for each array (N) that is created 256*256 times for each array's element (so N*256*256 tasks in total), now I call pool.map on the function that calculates the whole array so just N times. The array calculation inside the function is done in a serialised way.
I'm still sending self as a param because it's needed in the function but it doesn't have any impact on the performance.
That small change speeds-up a calculation of an array from 7-15s up to 1.5it/s-2s/it!
CURRENT CODE:
import multiprocessing as mp
import tqdm
import numpy as np
import random
def wrapper_(arg):
return tmp.generate_array_elements(
self=arg['self'],
nu1=arg['nu1'],
nu2=arg['nu2'],
innt=arg['innt'],
nu1exp=arg['nu1exp'],
nu2exp=arg['nu2exp'],
ii=arg['ii'],
jj=arg['jj'],
llp=arg['self'].llp,
rr=arg['self'].rr,
)
"""NEW WRAPPER HERE"""
"""Sending self doesn't have bad impact on the performance, at least I don't complain :)"""
def generate(arg):
tmp._tmp__generate(arg['self'], arg['ll'], arg['lp'], arg['pool'], arg['args_tmp'])
class tmp:
def __init__(self, multiprocessing, length, n_of_arrays):
self.multiprocessing = multiprocessing
self.inshape = (length,length)
self.length = length
self.ll_len = n_of_arrays
self.num_cpus = 8
self.maxtasksperchild = 10000
self.rr = 0
"""original function is different, modified to return something"""
"""for the example purpose, lp is not relevant here but in general is"""
def get_ll(self, lp):
return [random.sample((range(self.length)),int(np.random.random()*12)+1) for ii in range(self.ll_len)]
"""original function is different, modified to return something"""
def get_ip(self): return np.random.random()
"""original function is different, modified to return something"""
def get_op(self): return np.random.random(self.length)
"""original function is different, modified to return something"""
def get_innt(self, nu1, nu2, ip):
return nu1*nu2/ip
"""original function is different, modified to return something"""
def __get_pp(self, nu1):
return np.exp(nu1)
"""dummy function for the example purpose"""
def dummy_function(self):
"""do important stuff"""
return
"""dummy function for the example purpose"""
def dummy_function_2(self, result):
"""do important stuff"""
return np.reshape(result, np.inshape)
"""dummy function for the example purpose"""
def dummy_function_3(self):
"""do important stuff"""
return
"""original function is different, modified to return something"""
"""for the example purpose, lp is not relevant here but in general is"""
def get_llp(self, ll, lp):
return [{'a': np.random.random(), 'b': np.random.random()} for ii in ll]
"""NOTE, lp is not used here for the example purpose but
in the original code, it's very important variable containg
relevant data for calculations"""
def generate(self, lp={}):
"""create a list that is used to the creation of 2-D array"""
"""providing here a dummy pp param to get_ll"""
ll = self.get_ll(lp)
ip = self.get_ip()
self.op = self.get_op()
"""length of args_tmp = self.length * self.length = 256 * 256"""
args_tmp = [
{'self': self,
'nu1': nu1,
'nu2': nu2,
'ii': ii,
'jj': jj,
'innt': np.abs(self.get_innt(nu1, nu2, ip)),
'nu1exp': np.exp(1j*nu1*ip),
'nu2exp': np.exp(1j*nu2*ip),
} for ii, nu1 in enumerate(self.op) for jj, nu2 in enumerate(self.op)]
pool = {}
"""MAJOR CHANGE IN THIS PART AND BELOW"""
map_args = [{'self': self, 'idx': (idx, len(ll)), 'll': ll, 'lp': lp, 'pool': pool, 'args_tmp': args_tmp} for idx, ll in enumerate(ll)]
if self.multiprocessing:
pool = mp.Pool(self.num_cpus, maxtasksperchild=self.maxtasksperchild)
for _ in tqdm.tqdm(pool.imap_unordered(generate_js_, map_args), total=len(map_args)):
pass
pool.close()
pool.join()
pbar.close()
else:
for map_arg in tqdm.tqdm(map_args):
generate_js_(map_arg)
def __generate(self, ll, lp, pool = {}, args_tmp = []):
"""In the original code there are plenty other things done in the code
using class' methods, they are not shown here for the example purpose"""
self.dummy_function()
self.llp = self.get_llp(ll, lp)
"""originally the values is taken from lp"""
self.rr = self.rr
"""REMOVED PARALLEL CALL HERE"""
result = [wrapper_(arg) for arg in args_tmp]
"""In the original code there are plenty other things done in the code
using class' methods, they are not shown here for the example purpose"""
result = self.dummy_function_2(result)
"""original function is different"""
def generate_array_elements(self, nu1, nu2, llp, innt, nu1exp, nu2exp, ii = 0, jj = 0, rr=0):
if rr == 1 and self.inshape[0] - 1 - jj < ii:
return 0
elif rr == -1 and ii > jj:
return 0
elif rr == 0:
"""do nothing"""
ll1 = []
ll2 = []
"""In the original code, there are plenty other things done in the code
using class' methods, they are not shown here for the example purpose"""
self.dummy_function_3()
for kk, ll in enumerate(llp):
ll1.append(
self.__get_pp(nu1) *
nu1*nu2*nu1exp**ll['a']*np.exp(1j*np.random.random())
)
ll2.append(
self.__get_pp(nu2) *
nu1*nu2*nu2exp**ll['b']*np.exp(1j*np.random.random())
)
t1 = sum(ll1)
t2 = sum(ll2)
result = innt*np.abs(t1 - t2)
return result
g = tmp(False, 256, 300)
g.generate()
Thank you Ian, again.

Python - Log the things a string has previously been

If this is my code:
x = 1
x = 2
x = 3
How can I “log” the things x has been and print them? If my explanation was dumb, then here’s what I expect:
>>> # Code to print the things x has been
1, 2, 3
>>>
How can I achieve this?
Since assignment overwrites the value of the object (in your example 'x'), it is not possible to do exactly what you want. However, you could create an object, of which the value can be changed and its history remembered. For example like this:
#!/usr/bin/env/python3
class ValueWithHistory():
def __init__(self):
self.history = []
self._value = None
#property
def value(self):
return self._value
#value.setter
def value(self, new_value):
self.history.append(new_value)
self._value = new_value
def get_history(self):
return self.history
def clear_history(self):
self.history.clear()
def main():
test = ValueWithHistory()
test.value = 1
print(test.value)
test.value = 2
print(test.value)
test.value = 3
print(test.value)
print(test.get_history())
if __name__ == '__main__':
main()
This prints:
1
2
3
[1, 2, 3]
Of course, you could also use a set instead of a list to only remember each unique value once, for example.
You can order a second thread to observe the string and print the changes:
from threading import Thread
def string_watcher():
global my_string
global log
temp = ''
while True:
if my_string != temp:
log.append(my_string)
temp = my_string
t = Thread(target=string_watcher, daemon=True)
t.start()
This checks weather the string „my_string“ was manipulated and appends it to the list „log“, if it has been changed. With this you should be able to perform
Print(log)
At any moment of the runtime

Python possible to short circuit function call

I am building a function to construct objects with set attributes (similar to a namedtuple); however, the output length must be variable.
I would like to build a function that allows the user to append additional attributes through a function call. Importantly, I would like to find a way to 'short-circuit' parameters and am unsure if Python is powerful enough to do this.
To explain take this trivial example:
def foo():
print("foo")
return False
def bar():
print("bar")
return True
if foo() and bar():
pass
Foo's function call returns False, and Bar short-circuits. The output console will only print foo, and bar is never executed.
Is there such a way to mimic this behavior with inspection or reflection in respect to function calls. Here is an example with my implementation is shown below:
from inspect import stack
cache = {}
def fooFormat(**kwargs):
caller = stack()[1][3]
if caller not in cache:
class fooOut(object):
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def optional(self, opt, **kwargs):
if (opt):
self.__dict__.update(kwargs)
return self
def __str__(self):
return caller + str(self.__dict__)
cache[caller] = iadsOut
return cache[caller](**kwargs)
def stdev(nums, avg = None):
print("\tStdev call")
if avg is None:
avg = sum(nums) / len(nums)
residuals = sum((i - avg)**2 for i in nums)
return residuals**.5
def stats(nums, verbose=False):
if verbose:
print("Stats call with verbose")
else:
print("Stats call without verbose")
total = sum(nums)
N = len(nums)
avg = total / N
return fooFormat(
avg = avg,
lowerB = min(nums),
upperB = max(nums)).optional(verbose,
stdev = stdev(nums, avg))
In the function 'stats', the return fooFormat should of course yield avg, lowerB, and upperB; additionally, it should yield std if verbose is set to True. Moreover, the function 'stdev' should NOT be called if verbose is set to False.
stats([1,2,3,4], False)
stats([1,2,3,4], True)
Of course, a way around this is:
if verbose:
return fooFormat(
avg = avg,
lowerB = min(nums),
upperB = max(nums),
stdev = stdev(nums, avg))
else:
return fooFormat(
avg = avg,
lowerB = min(nums),
upperB = max(nums))
However, I am hoping there to implement this behavior without a branch.
This doesn't quite answer the shortcutting point, but this is a more efficient way of writing it:
out_dic = { # these items will always be calculated
'avg': avg,
'lowerB':max(nums),
'upperB':min(nums)
}
if verbose: # this is calculated only if verbose
out_dic['stdev'] = stdev(nums,avg)
return fooFormat(**out_dic)
In other words you can expand a dictionary to the kwargs, and add to the dictionary dynamically.

List callbacks?

Is there any way to make a list call a function every time the list is modified?
For example:
>>>l = [1, 2, 3]
>>>def callback():
print "list changed"
>>>apply_callback(l, callback) # Possible?
>>>l.append(4)
list changed
>>>l[0] = 5
list changed
>>>l.pop(0)
list changed
5
Borrowing from the suggestion by #sr2222, here's my attempt. (I'll use a decorator without the syntactic sugar):
import sys
_pyversion = sys.version_info[0]
def callback_method(func):
def notify(self,*args,**kwargs):
for _,callback in self._callbacks:
callback()
return func(self,*args,**kwargs)
return notify
class NotifyList(list):
extend = callback_method(list.extend)
append = callback_method(list.append)
remove = callback_method(list.remove)
pop = callback_method(list.pop)
__delitem__ = callback_method(list.__delitem__)
__setitem__ = callback_method(list.__setitem__)
__iadd__ = callback_method(list.__iadd__)
__imul__ = callback_method(list.__imul__)
#Take care to return a new NotifyList if we slice it.
if _pyversion < 3:
__setslice__ = callback_method(list.__setslice__)
__delslice__ = callback_method(list.__delslice__)
def __getslice__(self,*args):
return self.__class__(list.__getslice__(self,*args))
def __getitem__(self,item):
if isinstance(item,slice):
return self.__class__(list.__getitem__(self,item))
else:
return list.__getitem__(self,item)
def __init__(self,*args):
list.__init__(self,*args)
self._callbacks = []
self._callback_cntr = 0
def register_callback(self,cb):
self._callbacks.append((self._callback_cntr,cb))
self._callback_cntr += 1
return self._callback_cntr - 1
def unregister_callback(self,cbid):
for idx,(i,cb) in enumerate(self._callbacks):
if i == cbid:
self._callbacks.pop(idx)
return cb
else:
return None
if __name__ == '__main__':
A = NotifyList(range(10))
def cb():
print ("Modify!")
#register a callback
cbid = A.register_callback(cb)
A.append('Foo')
A += [1,2,3]
A *= 3
A[1:2] = [5]
del A[1:2]
#Add another callback. They'll be called in order (oldest first)
def cb2():
print ("Modify2")
A.register_callback(cb2)
print ("-"*80)
A[5] = 'baz'
print ("-"*80)
#unregister the first callback
A.unregister_callback(cbid)
A[5] = 'qux'
print ("-"*80)
print (A)
print (type(A[1:3]))
print (type(A[1:3:2]))
print (type(A[5]))
The great thing about this is if you realize you forgot to consider a particular method, it's just 1 line of code to add it. (For example, I forgot __iadd__ and __imul__ until just now :)
EDIT
I've updated the code slightly to be py2k and py3k compatible. Additionally, slicing creates a new object of the same type as the parent. Please feel free to continue poking holes in this recipe so I can make it better. This actually seems like a pretty neat thing to have on hand ...
You'd have to subclass list and modify __setitem__.
class NotifyingList(list):
def __init__(self, *args, **kwargs):
self.on_change_callbacks = []
def __setitem__(self, index, value):
for callback in self.on_change_callbacks:
callback(self, index, value)
super(NotifyingList, self).__setitem__(name, index)
notifying_list = NotifyingList()
def print_change(list_, index, value):
print 'Changing index %d to %s' % (index, value)
notifying_list.on_change_callbacks.append(print_change)
As noted in comments, it's more than just __setitem__.
You might even be better served by building an object that implements the list interface and dynamically adds and removes descriptors to and from itself in place of the normal list machinery. Then you can reduce your callback calls to just the descriptor's __get__, __set__, and __delete__.
I'm almost certain this can't be done with the standard list.
I think the cleanest way would be to write your own class to do this (perhaps inheriting from list).

Categories