Related
I am trying to get a list of outputs from functions.
For example, let's say I define a function called 'compute' as below
def compute(a, b):
add = a + b
sub = a - b
return add, sub
What I want to do next is to create a new function that takes this 'compute' function as an argument and returns a list of outputs of the function, add and sub, as strings.
That is, if I name the function "output_list", I want the function output_list(compute) to return ['add', 'sub'].
It seems it is supposed to be simple, but I have trouble writing it.
What should the code look like?
This is not possible. The names of the local variables inside compute are not known outside of compute. In fact, the local variables very likely do not even exist at runtime at all.
Well, it might defeat the purpose but if you are the one who defines compute function, maybe you could do something like this:
from varname import nameof
def compute(a, b):
add = a + b
sub = a - b
compute.output_list = [ nameof(add), nameof(sub) ]
return add, sub
>>> compute.output_list
['add', 'sub']
Your question is a bit confusing, how do you want to put a called function "CONTAINS PARAMETERS" as a Parameter for a other function without mentioning the Parameters' values?? its a bit confusing... Now, do you want the Output to be a list of the variables as string or you want to list the variables' results in a list???
I will consider the best scenario that you want to list the results of variables as a list of values for another function..
Code Syntax
def compute(a, b):
add = a + b
sub = a - b
return [add, sub]
def another_function(lista= compute(3, 4)):
return lista
print(another_function())
OUTPUT
[7, -1]
[Program finished]
The ray tutorial explains, that for having a method on an object, which returns multiple object_ids one can use the #ray.method() decorator see here. But in the example 'Learning to play Pong' the method compute gradient actually has two return values, which are called as object_ids later, but it is not coded as ray.method with the respective decoratorsee here. I would like to understand what the use of ray.method is now.
It will return a tuple when fetching the object.
The equivalent paradigm in Python is:
def test():
return 1, 2
a = test()
assert a == (1, 2)
I would like concurrent.futures.ProcessPoolExecutor.map() to call a function consisting of 2 or more arguments. In the example below, I have resorted to using a lambda function and defining ref as an array of equal size to numberlist with an identical value.
1st Question: Is there a better way of doing this? In the case where the size of numberlist can be million to billion elements in size, hence ref size would have to follow numberlist, this approach unnecessarily takes up precious memory, which I would like to avoid. I did this because I read the map function will terminate its mapping until the shortest array end is reach.
import concurrent.futures as cf
nmax = 10
numberlist = range(nmax)
ref = [5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
workers = 3
def _findmatch(listnumber, ref):
print('def _findmatch(listnumber, ref):')
x=''
listnumber=str(listnumber)
ref = str(ref)
print('listnumber = {0} and ref = {1}'.format(listnumber, ref))
if ref in listnumber:
x = listnumber
print('x = {0}'.format(x))
return x
a = map(lambda x, y: _findmatch(x, y), numberlist, ref)
for n in a:
print(n)
if str(ref[0]) in n:
print('match')
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
#for n in executor.map(_findmatch, numberlist):
for n in executor.map(lambda x, y: _findmatch(x, ref), numberlist, ref):
print(type(n))
print(n)
if str(ref[0]) in n:
print('match')
Running the code above, I found that the map function was able to achieve my desired outcome. However, when I transferred the same terms to concurrent.futures.ProcessPoolExecutor.map(), python3.5 failed with this error:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/queues.py", line 241, in _feed
obj = ForkingPickler.dumps(obj)
File "/usr/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function <lambda> at 0x7fd2a14db0d0>: attribute lookup <lambda> on __main__ failed
Question 2: Why did this error occur and how do I get concurrent.futures.ProcessPoolExecutor.map() to call a function with more than 1 argument?
To answer your second question first, you are getting an exception because a lambda function like the one you're using is not picklable. Since Python uses the pickle protocol to serialize the data passed between the main process and the ProcessPoolExecutor's worker processes, this is a problem. It's not clear why you are using a lambda at all. The lambda you had takes two arguments, just like the original function. You could use _findmatch directly instead of the lambda and it should work.
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
for n in executor.map(_findmatch, numberlist, ref):
...
As for the first issue about passing the second, constant argument without creating a giant list, you could solve this in several ways. One approach might be to use itertools.repeat to create an iterable object that repeats the same value forever when iterated on.
But a better approach would probably be to write an extra function that passes the constant argument for you. (Perhaps this is why you were trying to use a lambda function?) It should work if the function you use is accessible at the module's top-level namespace:
def _helper(x):
return _findmatch(x, 5)
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
for n in executor.map(_helper, numberlist):
...
(1) No need to make a list. You can use itertools.repeat to create an iterator that just repeats the some value.
(2) You need to pass a named function to map because it will be passed to the subprocess for execution. map uses the pickle protocol to send things, lambdas can't be pickled and therefore they can't be part of the map. But its totally unnecessary. All your lambda did was call a 2 parameter function with 2 parameters. Remove it completely.
The working code is
import concurrent.futures as cf
import itertools
nmax = 10
numberlist = range(nmax)
workers = 3
def _findmatch(listnumber, ref):
print('def _findmatch(listnumber, ref):')
x=''
listnumber=str(listnumber)
ref = str(ref)
print('listnumber = {0} and ref = {1}'.format(listnumber, ref))
if ref in listnumber:
x = listnumber
print('x = {0}'.format(x))
return x
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
#for n in executor.map(_findmatch, numberlist):
for n in executor.map(_findmatch, numberlist, itertools.repeat(5)):
print(type(n))
print(n)
#if str(ref[0]) in n:
# print('match')
Regarding your first question, do I understand it correctly that you want to pass an argument whose value is determined only at the time you call map but constant for all instances of the mapped function? If so, I would do the map with a function derived from a "template function" with the second argument (ref in your example) baked into it using functools.partial:
from functools import partial
refval = 5
def _findmatch(ref, listnumber): # arguments swapped
...
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
for n in executor.map(partial(_findmatch, refval), numberlist):
...
Re. question 2, first part: I haven't found the exact piece of code that tries to pickle (serialize) the function that should then be executed in parallel, but it sounds natural that that has to happen -- not only the arguments but also the function has to be transferred to the workers somehow, and it likely has to be serialized for this transfer. The fact that partial functions can be pickled while lambdas cannot is mentioned elsewhere, for instance here: https://stackoverflow.com/a/19279016/6356764.
Re. question 2, second part: if you wanted to call a function with more than one argument in ProcessPoolExecutor.map, you would pass it the function as the first argument, followed by an iterable of first arguments for the function, followed by an iterable of its second arguments etc. In your case:
for n in executor.map(_findmatch, numberlist, ref):
...
The general "why global variables are bad" answers don't seem to cover this, so I'm just curious if this would be bad practice or not.
Here is an example of what I mean - You have one function to generate a list of itemsYou have another function to build these items
You would like to output the current progress in each function every x seconds, where x is defined at the start of the code. Would it be bad to set this as a global variable? You'd avoid having to pass an unimportant thing to each function which is what I did in my first code, and it got so messy in some places I had to categorise everything into lists, then pull them out again inside the function. This made it impossible to test individual functions as they needed too much input to run.Also, if a function only needs one variable to run, I don't want to add anything else to it, so passing a huge list/dictionary containing everything isn't really ideal.
Or alternatively, would it be worthwile setting up a global dictionary to contain any values you may want to use throughout the code? You could set the variable name as the key so it's easy to access wherever it's needed.I'm not so sure about other versions of Python, but in Maya it's all pretty much contained in the same block of code, so there's no real danger of it affecting anything else.
Global variables aren't bad in themselves. What's bad is modifying global variables in your functions. Reading global constants, which is what you're proposing here, is fine - you don't even need the global keyword, since Python will look up the var in surrounding scopes until it finds it. You might want to put your variable in ALL_CAPS to signify that it is indeed a constant.
However, you still might consider whether your code would be better structured as a class, with your variable as an instance attribute.
As stated by Daniel, using a global variable isn't bad in itself; however, using them just because it solves the problem doesn't make it a good use of a global variable.
In your example above, you noted you had three conditions:
1) You want a function to generate a list of items
2) You have a second function to build those items (the assumed is that it's being called from the first function)
3) You want to be able to check those items.
None of those indicates a need for a global, but a class may solve your problem. I would suggest using a class similar to the one I have below:
from threading import Thread # To thread the build logic
import random # For random numbers to increase entrophy
import time
class builder( object ):
def __init__(self, numObjs):
self.myList = []
self.numObjs = int(numObjs)
self.inProcess = False
# Build a list of objects
def build(self):
if self.currStatus(False) and self.inProcess == False:
self.inProcess = True
self.currGenThread = Thread(target=self.generate)
self.currGenThread.start()
else:
print "Failed to start new thread -- current thread in process!"
# Generates the objects for the list
def generate(self):
import random, time # Note: Solves thread "not defined" issues
self.myList = []
for currObj in range(self.numObjs):
self.myList.append(currObj)
time.sleep(random.randint(0,1)) # Sleep randomly
# Prints the current status
# Return False indicates completion
def currStatus(self, alsoPrint=True):
retVal = True
if len(self.myList) >= self.numObjs:
if self.currGenThread:
self.currGenThread.join()
self.inProcess = False
retVal = False
# Print status if called for
if alsoPrint:
print "Progress %d -- %s" % (len(self.myList)/self.numObjs,
str(self.myList))
return retVal
obj = builder(10)
obj.build()
while obj.currStatus():
time.sleep(1)
It's not the perfect example, but if you run that code you get something like this:
$python test.py
Progress 0 -- []
Progress 0 -- [0, 1, 2, 3]
Progress 0 -- [0, 1, 2, 3, 4]
Progress 0 -- [0, 1, 2, 3, 4, 5]
Progress 0 -- [0, 1, 2, 3, 4, 5, 6]
Progress 0 -- [0, 1, 2, 3, 4, 5, 6, 7, 8]
Progress 1 -- [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
python test.py
Progress 0 -- []
Progress 0 -- [0, 1]
Progress 0 -- [0, 1, 2, 3, 4, 5, 6, 7]
Progress 1 -- [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
You can do the same thing with a global or in several other ways. Hopefully this helps.
Maybe your design with function is bad. Consider using class to do this thinfs. Read about design patterns like Builder, Decorator, iterator etc.
Global is great if you really want just one set of data. And modifying globals has reasonable use cases also. Take the re module for example. It caches regular expressions its already seen in a global dict to save the cost of recompiling. Although re doesn't protect its cache with a lock, its common to use threading.Lock to make sure you don't insert the same key multiple times.
Given Python code such as
def func():
for i in range(10):
pass
for i in range(10):
pass
pylint complains
Redefining name 'i' from outer scope
What is the Pythonic way to write the above? Use a different variable locally, say j?
But why, when the variable means exactly the same in both cases (i for index). Let's say I change all local indexes to j and then later I find I want to use j as the second index in the glocal scope. Have to change again?
I can't disable lint warnings, I don't want to have them, I want to write Pythonic, and yet I want to use the same name for the same thing throughout, in the simple case like the above. Is this not possible?
You can avoid global variable conflict by not having any global variables:
def func():
for i in range(10):
pass
def _init_func():
for i in range(10):
pass
_init_func()
Any code that needs to run at module-init time can be put into a single function. This leaves, as the only executable code to run during module init: def statements, class statements, and one function call.
Similarly, if your code is not intended to be imported, but rather is a script to be run,
def func():
for i in range(10):
pass
def main():
for i in range(10):
pass
if __name__=="__main__":
main()
Because it eliminate the risk of being using one when you believe you are using the other. Lint tools are made to make your code more robust. By having all your variables having different names, you ensure that no such conflict could arise.
This is especially critical in interpreted language because the errors are not checked at "compile time". I once had the problem that the second call to a function gave me an error, because I renamed a function and I didn't realize that in some case There was a variable which was created with the same name as my function, and so, when I was trying to call my function, the interpreter tried to "call" my newly created variable, which never worked XD.
this lint policy will avoid that kind of problem.
Here is a sample code (this is a program to compute pi) :
from fractions import Fraction
def order(x):
r, old_r, n, old_n = 2, 1, 1, 0
while (x>=r):
r, old_r, n, old_n = r*r, r, 2*n, n
return order(x >> old_n) + old_n if old_n > 0 else 0
def term(m, n, i):
return Fraction(4 * m, n**(2*i+1) * (2*i+1))
def terms_generator(exp_prec):
ws = [ [term(parm[1], parm[2], 0), 0] + list(parm)
for parm in ((1, 44, 57),
(1, 7, 239),
(-1, 12, 682),
(1, 24, 12943))]
digits = 0
while digits<exp_prec:
curws = max(ws, key=lambda col: col[0])
digits = int(0.30103 *
(order(curws[0].denominator))
- order(curws[0].numerator))
yield curws[2] * curws[0], digits
curws[2] = -curws[2]
curws[1] += 1
curws[0] = term(curws[3], curws[4], curws[1])
expected_precision = 1000
pi = 0
for term, dgts in terms_generator(expected_precision):
pi += term
print("{} digits".format(dgts))
print("pi = 3.{}".format(int((pi-3)*10**expected_precision)))
In this case, I initialized a variable from a generator and the generator used another function which conflicted with my variable name once it was initialized by my generator. Well, It's not a very good example because here both names are global, but from it's structure, it wasn't immediately obvious that it would happen.
My point is that even if you KNOW how to program, you make mistakes and those practices will help reduce the risks for those to stay hidden.
The linter warns because i lives on after the loop, if it ran at least once. This means that if you were to use it without re-initializing it it would still have the value it had during the last iteration of the loop.
The way you use it is fine since i will always be reinitialized.
A useful practice could be to name all values in the outer scope in ALL_CAPS. This way no mistakes could be made.
This answer was rightfully determined to be wrong. Please see : https://stackoverflow.com/a/25072186