python global operator emulation - python

in Lutz's book I read how to emulate global operator in a function body.
I created p.py file in documents folder:
var = 0
def func():
import p #import itself
p.var = 15
func()
print(var)
output:
15
0
I thought is is supposed to simply print 15, but by some reason it also added 0 to output. So I'm wandering why has it happened.
for example, when i do the same thing in terminal, but in the main module, it works as I want:
var = 0
def func():
import __main__ #import itself
__main__.var = 15
func()
print(var)
and output is
15
I have python 3.7.7

Files are not modules: files are used to define modules. If you run p.py as a script that contains import p, there are two modules, __main__ and p, both created from the same file, but each with its own global namespace.

Okay, let's break this into few steps.
First of all:
self-importing of a module is not a good thing to do - one example why is because of the things you noticed in your question.
import p #
var = 0
def func():
p.var = 15
func()
print(var)
print(p.var)
Running this, you will notice that var and p.var are actually 2 separate variables, even though logically they are the same thing, just in a different namespace.
You've also got func and p.func, which again do the same but are 2 separate things.
Second problem:
Global keyword should be used only when absolutely necessary. When you have a global variable, it is much harder to track when it changes and control the flow of your program - Global Variables Are Bad (not everything applies to Python, but most points still stand).
Finally, why you are actually seeing the behaviour you see:
In Python when the module is imported, all of it is executed just like the main script you are running.
Let's say you have pr.py file containing only this:
print("text")
Importing such a module with import pr would be enough to see text on console, because it gets executed on import.
That's why you see 15 printed (in import p, print(var) happens - and because we are in module p, p.var and var are the same here). Then import p finishes and we get to print(var) - but because we are now in __main__ module, var is not the same as p.var, and it still has the original value of 0.
Why does import __main__ work different? It's because it is handled specially by Python, as can be read here. In short, __main__ is initialised from the start of the program, so import __main__ does not cause the code to run again, just like importing some module more only causes it to be executed once.

Related

Only want one function to be imported yet the full program comes over

I made 2 programs for show porpose. I would like to import the global variable from the transmitter function into another file, yet the problem that I encounter is that the While true loop also comes along spoiling my whole second program, because the second problem now also starts to show the itiration.
Program 1:
import time
def transmitter():
global temp
global temp
temp = 2
transmitter()
while True: # a random task just to see if I only imported the function
x = 0
print(x + 1)
time.sleep(0.2)
Program 2:
from transmitguy import transmitter
def valuepullup():
newval = transmitguy.transmitter()
print(newval)
valuepullup()
I only need my second program to show the value of 2 once. (2 is the globalvar from file 1)
The short answer is that you can't get only one piece of a module. from x import y imports x in the same way import x does. The only difference is that, in the former case, y is added to your current global namespace, and in the latter case, x is. The docs for import say:
The from form ... find[s] the module specified in the from clause, loading and initializing it if necessary ...
I am not sure exactly what you are trying to accomplish. As the commenters noted, you can check for __main__. However, you might do better to put your variable in its own module, then import that module from both of your existing modules.
See also the tutorial.

Forcing Unload/Deconstruction of Dynamically Imported File from Source

Been a longtime browser of SO, finally asking my own questions!
So, I am writing an automation script/module that looks through a directory recursively for python modules with a specific name. If I find a module with that name, I load it dynamically, pull what I need from it, and then unload it. I noticed though that simply del'ing the module does not remove all references to that module, there is another lingering somewhere and I do not know where it is. I tried taking a peek at the source code, but couldn't make sense of it too well. Here is a sample of what I am seeing, greatly simplified:
I am using Python 3.5.2 (Anaconda v4.2.0). I am using importlib, and that is what I want to stick with. I also want to be able to do this with vanilla python-3.
I got the import from source from the python docs here (yes I am aware this is the Python 3.6 docs).
My main driver...
# main.py
import importlib.util
import sys
def foo():
spec = importlib.util.spec_from_file_location('a', 'a.py')
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
print(sys.getrefcount(module))
del module
del spec
if __name__ == '__main__':
foo()
print('THE END')
And my sample module...
# a.py
print('hello from a')
class A():
def __del__(self):
print('SO LONG A!')
inst = A()
Output:
python main.py
HELLO FROM A!
2
THE END
SO LONG A!
I expected to see "SO LONG A!" printed before "THE END". So, where is this other hidden reference to my module? I understand that my del's are gratuitous with the fact that I have it wrapped in a function. I just wanted the deletion and scope to be explicit. How do I get a.py to completely unload? I plan on dynamically loading a ton of modules like a.py, and I do not want to hold on to them any longer than I really have to. Is there something I am missing?
There is a circular reference here, the module object references objects that reference the module again.
This means the module is not cleared immediately (as the reference count never goes to 0 by itself). You need to wait for the circle to be broken by the garbage collector.
You can force this by calling gc.collect():
import gc
# ...
if __name__ == '__main__':
foo()
gc.collect()
print('THE END')
With that in place, the output becomes:
$ python main.py
hello from a
2
SO LONG A!
THE END

NameError on global variables when multiprocessing, only in subdirectory

I have a main process which uses execfile and runs a script in a child process. This works fine unless the script is in another directory -- then everything breaks down.
This is in mainprocess.py:
from multiprocessing import Process
m = "subdir\\test.py"
if __name__ == '__main__':
p = Process(target = execfile, args = (m,))
p.start()
Then in a subdirectory aptly named subdir, I have test.py
import time
def foo():
print time.time()
foo()
When I run mainprocess.py, I get the error:
NameError: global name 'time' is not defined
but the issue isn't limited to module names -- sometimes I'll get an error on a function name on other pieces of code.
I've tried importing time in mainprocess.py and also inside the if statement there, but neither has any effect.
One way of avoiding the error (I haven't tried this), is to copy test.py into the parent directory and insert a line in the file to os.chdir back to the original directory. However, this seems rather sloppy.
So what is happening?
The solution is to change your Process initialization:
p = Process(target=execfile, args=(m, {}))
Honestly, I'm not entirely sure why this works. I know it has something to do with which dictionary (locals vs. globals) that the time import is added to. It seems like when your import is made in test.py, it's treated like a local variable, because the following works:
import time # no foo() anymore
print(time.time()) # the call to time.time() is in the same scope as the import
However, the following also works:
import time
def foo():
global time
print(time.time())
foo()
This second example shows me that the import is still assigned to some kind of global namespace, I just don't know how or why.
If you call execfile() normally, rather than in a subprocess, everything runs fine, and in fact, you can then use the time module any place after the call to execfile() call in your main process because time has been brought into the same namespace. I think that since you're launching it in a subprocess there is no module-level namespace for the import to be assigned to (execfile doesn't create a module object when called). I think that when we add the empty dictionary to the call to execfile, we're adding supplying the global dictionary argument, thus giving the import mechanism a global namespace to assign the name time to.
Some links for background:
1) Tutorial page on namespaces and scope
- look here for builtin, global, and local namespace explanations first
2) Python docs on execfile command
3) A very similar question on a non-SO site

IPython Parallel Computing Namespace Issues

I've been reading and re-reading the IPython documentation/tutorial, and I can't figure out the issue with this particular piece of code. It seems to be that the function dimensionless_run is not visible to the namespace delivered to each of the engines, but I'm confused because the function is defined in __main__, and clearly visible as part of the global namespace.
wrapper.py:
import math, os
def dimensionless_run(inputs):
output_file = open(inputs['fn'],'w')
...
return output_stats
def parallel_run(inputs):
import math, os ## Removing this line causes a NameError: global name 'math'
## is not defined.
folder = inputs['folder']
zfill_amt = int(math.floor(math.log10(inputs['num_iters'])))
for i in range(inputs['num_iters']):
run_num_str = str(i).zfill(zfill_amt)
if not os.path.exists(folder + '/'):
os.mkdir(folder)
dimensionless_run(inputs)
return
if __name__ == "__main__":
inputs = [input1,input2,...]
client = Client()
lbview = client.load_balanced_view()
lbview.block = True
for x in sorted(globals().items()):
print x
lbview.map(parallel_run,inputs)
Executing this code after ipcluster start --n=6 yields the sorted global dictionary, including the math and os modules, and the parallel_run and dimensionless_run functions. This is followed by an IPython.parallel.error.CompositeError: one or more exceptions from call to method: parallel_run, which is composed of a large number of [n:apply]: NameError: global name 'dimensionless_run' is not defined, where n runs from 0-5.
There are two things I don't understand, and they're clearly linked.
Why doesn't the code identify dimensionless_run in the global namespace?
Why is import math, os necessary inside the definition of parallel_run?
Edited: This turned out not be much of a namespace error at all--I was executing ipcluster start --n=6 in a directory that didn't contain the code. To fix it, all I needed to do was execute the start command in my code's directory. I also fixed it by adding the lines:
inputs = input_pairs
os.system("ipcluster start -n 6") #NEW
client = Client()
...
lbview.map(parallel_run,inputs)
os.system("ipcluster stop") #NEW
which start the required cluster in the right place.
This is mostly a duplicate of Python name space issues with IPython.parallel, which has a more detailed answer, but the gist:
When the Client sends parallel_run to the engine, it just sends that function, not the entire namespace in which the function is defined (the __main__ module). So when running the remote parallel_run, lookups to math or os or dimensionless_run will look first in locals() (what has been defined already in the function, i.e. your in-function imports), then in the globals(), which is the __main__ module on the engine.
There are various approaches to making sure names available on the engines, but perhaps the simplest is to explicitly define/send them to the engines (the interactive namespace is __main__ on the engines, just like it is locally in IPython):
client[:].execute("import os, math")
client[:]['dimensionless_run'] = dimensionless_run
prior to making your run, in which case everything should work as you expect.
This is an issue unique to modules defined interactively / in a script - It does not come up if this file is a module instead of a script, e.g.
from mymod import parallel_run
lbview.map(parallel_run, inputs)
In which case the globals() is the module globals, which are generally the same everywhere.

Why do all module run together?

I just made a fresh copy of eclipse and installed pydev.
In my first trial to use pydev with eclipse, I created 2 module under the src package(the default one)
FirstModule.py:
'''
Created on 18.06.2009
#author: Lars Vogel
'''
def add(a,b):
return a+b
def addFixedValue(a):
y = 5
return y +a
print "123"
run.py:
'''
Created on Jun 20, 2011
#author: Raymond.Yeung
'''
from FirstModule import add
print add(1,2)
print "Helloword"
When I pull out the pull down menu of the run button, and click "ProjectName run.py", here is the result:
123
3
Helloword
Apparantly both module ran, why? Is this the default setting?
When you import a module, everything in it is "run". This means that classes and function objects are created, global variables are set, and print statements are executed. *)
It is common practice to enclose statements only meant to be executed when the module is run directly in an if-block such as this:
if __name__ == "__main__":
print "123"
Now if you run the module as a script, __name__ is set to "__main__", so "123" will be printed. However, if you import the module from somewhere else __name__ will be "FirstModule" in your case, not "__main__", so whatever is in the block will not be executed.
*) Note that if you import the same module again, it is not "run" again. Python keeps track of imported modules and just re-uses the already imported module the second time. This makes C/C++ tricks like enclosing header file bodies with IFNDEF statements to make sure the header is only imported once unnecessary in python.

Categories