NameError on global variables when multiprocessing, only in subdirectory

NameError on global variables when multiprocessing, only in subdirectory - python

I have a main process which uses execfile and runs a script in a child process. This works fine unless the script is in another directory -- then everything breaks down.
This is in mainprocess.py:
from multiprocessing import Process
m = "subdir\\test.py"
if __name__ == '__main__':
p = Process(target = execfile, args = (m,))
p.start()
Then in a subdirectory aptly named subdir, I have test.py
import time
def foo():
print time.time()
foo()
When I run mainprocess.py, I get the error:
NameError: global name 'time' is not defined
but the issue isn't limited to module names -- sometimes I'll get an error on a function name on other pieces of code.
I've tried importing time in mainprocess.py and also inside the if statement there, but neither has any effect.
One way of avoiding the error (I haven't tried this), is to copy test.py into the parent directory and insert a line in the file to os.chdir back to the original directory. However, this seems rather sloppy.
So what is happening?

The solution is to change your Process initialization:
p = Process(target=execfile, args=(m, {}))
Honestly, I'm not entirely sure why this works. I know it has something to do with which dictionary (locals vs. globals) that the time import is added to. It seems like when your import is made in test.py, it's treated like a local variable, because the following works:
import time # no foo() anymore
print(time.time()) # the call to time.time() is in the same scope as the import
However, the following also works:
import time
def foo():
global time
print(time.time())
foo()
This second example shows me that the import is still assigned to some kind of global namespace, I just don't know how or why.
If you call execfile() normally, rather than in a subprocess, everything runs fine, and in fact, you can then use the time module any place after the call to execfile() call in your main process because time has been brought into the same namespace. I think that since you're launching it in a subprocess there is no module-level namespace for the import to be assigned to (execfile doesn't create a module object when called). I think that when we add the empty dictionary to the call to execfile, we're adding supplying the global dictionary argument, thus giving the import mechanism a global namespace to assign the name time to.
Some links for background:
1) Tutorial page on namespaces and scope
- look here for builtin, global, and local namespace explanations first
2) Python docs on execfile command
3) A very similar question on a non-SO site

Related

python global operator emulation

in Lutz's book I read how to emulate global operator in a function body.
I created p.py file in documents folder:
var = 0
def func():
import p #import itself
p.var = 15
func()
print(var)
output:
15
0
I thought is is supposed to simply print 15, but by some reason it also added 0 to output. So I'm wandering why has it happened.
for example, when i do the same thing in terminal, but in the main module, it works as I want:
var = 0
def func():
import __main__ #import itself
__main__.var = 15
func()
print(var)
and output is
15
I have python 3.7.7

Files are not modules: files are used to define modules. If you run p.py as a script that contains import p, there are two modules, __main__ and p, both created from the same file, but each with its own global namespace.

Okay, let's break this into few steps.
First of all:
self-importing of a module is not a good thing to do - one example why is because of the things you noticed in your question.
import p #
var = 0
def func():
p.var = 15
func()
print(var)
print(p.var)
Running this, you will notice that var and p.var are actually 2 separate variables, even though logically they are the same thing, just in a different namespace.
You've also got func and p.func, which again do the same but are 2 separate things.
Second problem:
Global keyword should be used only when absolutely necessary. When you have a global variable, it is much harder to track when it changes and control the flow of your program - Global Variables Are Bad (not everything applies to Python, but most points still stand).
Finally, why you are actually seeing the behaviour you see:
In Python when the module is imported, all of it is executed just like the main script you are running.
Let's say you have pr.py file containing only this:
print("text")
Importing such a module with import pr would be enough to see text on console, because it gets executed on import.
That's why you see 15 printed (in import p, print(var) happens - and because we are in module p, p.var and var are the same here). Then import p finishes and we get to print(var) - but because we are now in __main__ module, var is not the same as p.var, and it still has the original value of 0.
Why does import __main__ work different? It's because it is handled specially by Python, as can be read here. In short, __main__ is initialised from the start of the program, so import __main__ does not cause the code to run again, just like importing some module more only causes it to be executed once.

Does using 'import module_name' statement in a function cause the module to be reloaded?

I know that when we do 'import module_name', then it gets loaded only once, irrespective of the number of times the code passes through the import statement.
But if we move the import statement into a function, then for each function call does the module get re-loaded? If not, then why is it a good practice to import a module at the top of the file, instead of in function?
Does this behavior change for a multi threaded or multi process app?

It does not get loaded every time.
Proof:
file.py:
print('hello')
file2.py:
def a():
import file
a()
a()
Output:
hello
Then why put it on the top?:
Because writing the imports inside a function will cause calls to that function take longer.

I know that when we do 'import module_name', then it gets loaded only once, irrespective of the number of times the code passes through the import statement.
Right!
But if we move the import statement into a function, then for each function call does the module get re-loaded?
No. But if you want, you can explicitly do it something like this:
import importlib
importlib.reload(target_module)
If not, then why is it a good practice to import a module at the top of the file, instead of in function?
When Python imports a module, it first checks the module registry (sys.modules) to see if the module is already imported. If that’s the case, Python uses the existing module object as is.
Even though it does not get reloaded, still it has to check if this module is already imported or not. So, there is some extra work done each time the function is called which is unnecessary.

It doesn't get reloaded after every function call and threading does not change this behavior. Here's how I tested it:
test.py:
print("Loaded")
testing.py:
import _thread
def call():
import test
for i in range(10):
call()
_thread.start_new_thread(call, ())
_thread.start_new_thread(call, ())
OUTPUT:
LOADED
To answer your second question, if you import the module at the top of the file, the module will be imported for all functions within the python file. This saves you from having to import the same module in multiple functions if they use the same module.

Forcing Unload/Deconstruction of Dynamically Imported File from Source

Been a longtime browser of SO, finally asking my own questions!
So, I am writing an automation script/module that looks through a directory recursively for python modules with a specific name. If I find a module with that name, I load it dynamically, pull what I need from it, and then unload it. I noticed though that simply del'ing the module does not remove all references to that module, there is another lingering somewhere and I do not know where it is. I tried taking a peek at the source code, but couldn't make sense of it too well. Here is a sample of what I am seeing, greatly simplified:
I am using Python 3.5.2 (Anaconda v4.2.0). I am using importlib, and that is what I want to stick with. I also want to be able to do this with vanilla python-3.
I got the import from source from the python docs here (yes I am aware this is the Python 3.6 docs).
My main driver...
# main.py
import importlib.util
import sys
def foo():
spec = importlib.util.spec_from_file_location('a', 'a.py')
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
print(sys.getrefcount(module))
del module
del spec
if __name__ == '__main__':
foo()
print('THE END')
And my sample module...
# a.py
print('hello from a')
class A():
def __del__(self):
print('SO LONG A!')
inst = A()
Output:
python main.py
HELLO FROM A!
2
THE END
SO LONG A!
I expected to see "SO LONG A!" printed before "THE END". So, where is this other hidden reference to my module? I understand that my del's are gratuitous with the fact that I have it wrapped in a function. I just wanted the deletion and scope to be explicit. How do I get a.py to completely unload? I plan on dynamically loading a ton of modules like a.py, and I do not want to hold on to them any longer than I really have to. Is there something I am missing?

There is a circular reference here, the module object references objects that reference the module again.
This means the module is not cleared immediately (as the reference count never goes to 0 by itself). You need to wait for the circle to be broken by the garbage collector.
You can force this by calling gc.collect():
import gc
# ...
if __name__ == '__main__':
foo()
gc.collect()
print('THE END')
With that in place, the output becomes:
$ python main.py
hello from a
2
SO LONG A!
THE END

How to run a large amount of python code from a string?

I need to be able to run a large amount of python code from a string. Simply using exec doesn't seem to work, as, while the code runs perfectly in a normal setting, doing it this way seems to throw an error. I also don't think I can just import it as it it hosted on the internet. Here is the code:
import urllib.request
URL = "https://dl.dropboxusercontent.com/u/127476718/instructions.txt"
def main():
instructions = urllib.request.urlopen(URL)
exec(instructions.read().decode())
if __name__ == "__main__":
main()
This is the error I've been getting:
Traceback (most recent call last):
File "C:\Python33\rc.py", line 12, in <module>
main()
File "C:\Python33\rc.py", line 9, in main
exec(instructions.read().decode())
File "<string>", line 144, in <module>
File "<string>", line 120, in main
NameError: global name 'Player' is not defined
The code I'm trying to run is available in the link in the first code snippet.
If you have any questions I'll answer them. Thank you.

Without specifying globals, the exec function (Python/bltinmodule.c) uses PyEval_GetGlobals() and PyEval_GetLocals(). For the execution frame of a function, the latter creates a new f_locals dict, which will be the target for the IMPORT_NAME, STORE_NAME, LOAD_NAME ops in the compiled code.
At the module level in Python the normal state of affairs is globals() == locals(). In that case STORE_NAME is using the module's globals, which is what a function defined within the module will use as its global namespace. However, using separate dicts for globals and locals obviously breaks that assumption.
The solution is to to manually supply globals, which exec will also use as locals:
def main():
instructions = urllib.request.urlopen(URL)
exec(instructions.read().decode(), globals())
You could also use a new dict that has __name__ defined:
def main():
instructions = urllib.request.urlopen(URL)
g = {'__name__': '__main__'}
exec(instructions.read().decode(), g)
I see in the source that the current directory will need a sound file named "pickup.wav", else you'll just get another error.
Of course, the comments about the security problems with using exec like this still apply. I'm only addressing the namespace technicality.

First I thought you might try __import__ with a StringIO object. Might look something like StackOverflow: Local Import Statements in Python.
... but that's not right.
Then I thought of using the imp module but that doesn't seen to work either.
Then I looked at: Alex Martelli's answer to Use of Eval in Python --- and tried to use it on a silly piece of code myself.
I can get the ast object, and the results of the compile() from that (though it also seems that one can simply call compile(some_string_containing_python_source, 'SomeName', 'exec') without going through the ast.parse() intermediary step if you like. From what I gather you'd use ast if you wanted to then traverse the resulting syntax tree, inspecting and possibly modifying nodes, before you compiled it.
At the end it seems that you'll need to exec() the results of your compile() before you have resulting functions, classes or variables defined in your execution namespace.

You can use pipe to put all strings into a child process of python and get output result from it.
Google os.popen or subprocess.Popen

IPython Parallel Computing Namespace Issues

I've been reading and re-reading the IPython documentation/tutorial, and I can't figure out the issue with this particular piece of code. It seems to be that the function dimensionless_run is not visible to the namespace delivered to each of the engines, but I'm confused because the function is defined in __main__, and clearly visible as part of the global namespace.
wrapper.py:
import math, os
def dimensionless_run(inputs):
output_file = open(inputs['fn'],'w')
...
return output_stats
def parallel_run(inputs):
import math, os ## Removing this line causes a NameError: global name 'math'
## is not defined.
folder = inputs['folder']
zfill_amt = int(math.floor(math.log10(inputs['num_iters'])))
for i in range(inputs['num_iters']):
run_num_str = str(i).zfill(zfill_amt)
if not os.path.exists(folder + '/'):
os.mkdir(folder)
dimensionless_run(inputs)
return
if __name__ == "__main__":
inputs = [input1,input2,...]
client = Client()
lbview = client.load_balanced_view()
lbview.block = True
for x in sorted(globals().items()):
print x
lbview.map(parallel_run,inputs)
Executing this code after ipcluster start --n=6 yields the sorted global dictionary, including the math and os modules, and the parallel_run and dimensionless_run functions. This is followed by an IPython.parallel.error.CompositeError: one or more exceptions from call to method: parallel_run, which is composed of a large number of [n:apply]: NameError: global name 'dimensionless_run' is not defined, where n runs from 0-5.
There are two things I don't understand, and they're clearly linked.
Why doesn't the code identify dimensionless_run in the global namespace?
Why is import math, os necessary inside the definition of parallel_run?
Edited: This turned out not be much of a namespace error at all--I was executing ipcluster start --n=6 in a directory that didn't contain the code. To fix it, all I needed to do was execute the start command in my code's directory. I also fixed it by adding the lines:
inputs = input_pairs
os.system("ipcluster start -n 6") #NEW
client = Client()
...
lbview.map(parallel_run,inputs)
os.system("ipcluster stop") #NEW
which start the required cluster in the right place.

This is mostly a duplicate of Python name space issues with IPython.parallel, which has a more detailed answer, but the gist:
When the Client sends parallel_run to the engine, it just sends that function, not the entire namespace in which the function is defined (the __main__ module). So when running the remote parallel_run, lookups to math or os or dimensionless_run will look first in locals() (what has been defined already in the function, i.e. your in-function imports), then in the globals(), which is the __main__ module on the engine.
There are various approaches to making sure names available on the engines, but perhaps the simplest is to explicitly define/send them to the engines (the interactive namespace is __main__ on the engines, just like it is locally in IPython):
client[:].execute("import os, math")
client[:]['dimensionless_run'] = dimensionless_run
prior to making your run, in which case everything should work as you expect.
This is an issue unique to modules defined interactively / in a script - It does not come up if this file is a module instead of a script, e.g.
from mymod import parallel_run
lbview.map(parallel_run, inputs)
In which case the globals() is the module globals, which are generally the same everywhere.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

NameError on global variables when multiprocessing, only in subdirectory - python

Related

python global operator emulation

Does using 'import module_name' statement in a function cause the module to be reloaded?

Forcing Unload/Deconstruction of Dynamically Imported File from Source

How to run a large amount of python code from a string?

IPython Parallel Computing Namespace Issues

Categories

Resources