I'm trying to refactor a project targeting Python 3.6 and pytest. The test suite contains a lot of debug statements such as:
print('This is how something looks right now', random_thing.foo.bar.start,
random_thing.foo.bar.middle, random_thing.foo.bar.end)
The idea behind these statements is that if a test starts failing in future, we will have some context to help us track down what the problem could be. There's no need to test what the actual values are right now in that test, but once things start failing, having that information is important for further debugging.
I would like to avoid repeating random_thing.foo.bar. that many times. I could assign that to a temporary variable, but the code does not really need that variable available ever after. I'm not really worried about performance, but I have a strong preference for keeping the code "clean" -- and "leaking" these variable names rubs me the wrong way. There is a feature like this in other languages that I'm familiar with, so I'm wondering how to do this in Python.
I'm fluent in C++, where I would probably just put that debug print into an extra scope:
{
const auto& bar = random_thing.foo.bar;
debug << "start: " << bar.start << ", middle: " << bar.middle << ", end: " << bar.end;
}
Given that there are no anonymous blocks in Python, is there a "Pythonic" way of avoiding this namespace clutter? I'm not really looking for opinions or a popularity contest, but for a review based on how people who have been doing Python longer than me perceive these approaches, so here are a few things that I tried:
1. Just add that damn variable and del it afterwards
Well, I don't like repeatedly doing stuff that a machine should do for me.
2. with statement and contextlib.nullcontext
In Python, there is no new scope with the with statement, so this leaves that opj variable available through locals:
>>> import os
>>> import os.path
>>> import contextlib
>>> with contextlib.nullcontext(os.path.join) as opj:
... print(type(opj))
...
<class 'function'>
>>> print(type(opj))
<class 'function'>
3. with statement and Vladimir Iakovlev's let statement decorator
from contextlib import contextmanager
from inspect import currentframe, getouterframes
#contextmanager
def let(**bindings):
frame = getouterframes(currentframe(), 2)[-1][0] # 2 because first frame in `contextmanager` is the decorator
locals_ = frame.f_locals
original = {var: locals_.get(var) for var in bindings.keys()}
locals_.update(bindings)
yield
locals_.update(original)
The code looks awesome to me:
>>> a = 3
>>> b = 4
>>> with let(a=33, b=44):
... print(a, b)
...
(33, 44)
>>> print(a, b)
(3, 4)
It does not undef a variable which was not defined before, but that's easy to add. Is manipulating the stack in this way a sane idea? My Python-fu is limited, so I'm torn between seeing this as uber-cool and uber-hackish. Is the final result "reasonably Pythonic"?
4. A wrapper around print with **kwargs
Let's use **kwargs:
def print_me(format, **kwargs):
print(format.format(**kwargs))
print_me('This is it: {bar.start} {bar.middle} {bar.end}', bar=random_thing.foo.bar)
This is good enough, but f-strings can contain actual expressions, such as:
foo = 10
print(f'{foo + 1}')
I would like to keep this functionality. I understand that str.format cannot really support this because of security implication of passing user-defined inputs.
Your best option is to just create the variable and leave it there, or del it afterward if it really bothers you that much.
with is not a viable approach. Particularly, that let thing is completely broken in multiple ways.
The most important way it's wrong is that modifying f_locals is undefined behavior, but this isn't immediately apparent in tests due to the other bugs. Two of the other bugs are that the 2 controls something completely unrelated to what the author thought, and the [-1] is indexing from the wrong end. These bugs cause the code to access the "root" stack frame, the one at the start of the stack, instead of the frame the author wanted. Finally, it has no handling for actually clearing variables - it can only set them to None.
If you test it with a function, you'll find that it doesn't work:
from contextlib import contextmanager
from inspect import currentframe, getouterframes
#contextmanager
def let(**bindings):
frame = getouterframes(currentframe(), 2)[-1][0] # 2 because first frame in `contextmanager` is the decorator
locals_ = frame.f_locals
original = {var: locals_.get(var) for var in bindings.keys()}
locals_.update(bindings)
yield
locals_.update(original)
def f():
x = 1
with let(x=3):
print(x)
f()
print(x)
Output:
1
None
The 3 isn't visible in the code that should have seen it, and there's an extra None hanging around in the wrong scope afterwards.
There's no good way to get the functionality you want out of a with statement. Default with scope rules don't do what you want, and Python doesn't provide a way for a context manager to mess with the locals of the code that called it.
If you really hate that variable and you don't want to use del, the closest thing to a good option might be to use a Javascript-style immediately-invoked lambda:
(lambda x: print(f'start: {x.start}, middle: {x.middle}, end: {x.end}'))(
random_thing.foo.bar)
I think this option is a lot worse than just assigning x the normal way, but maybe you think differently.
Here's a bit of fun with it.
#Fake object structure 👇
class Bar:
start="mystart"
middle= "mymiddle"
end="theend"
class Foo:
bar = Bar
class Rando:
foo = Foo
random_thing = Rando()
#Fake object structure 👆
def printme(tmpl, di_g={}, di_l={}, **kwargs):
""" use passed-in dictionaries, typically globals(), locals() then kwargs
last-one wins.
"""
di = di_g.copy()
di.update(**di_l)
di.update(**kwargs)
print(tmpl.format(**di))
bar = random_thing.foo.bar
printme('This is it: {bar.start} {bar.middle} {bar.end}', globals())
printme('This is it: {bar.start} {bar.middle} {bar.end}', bar=Bar)
def letsdoit():
"using locals and overriding bar"
bar = Bar()
bar.middle = "themiddle"
printme('This is it: {bar.start} {bar.middle} {bar.end} {fooplus}', globals(), locals(), fooplus=(10+1))
letsdoit()
output:
This is it: mystart mymiddle theend
This is it: mystart mymiddle theend
This is it: mystart themiddle theend 11
Related
I have had PyCharm 2017.3 extract some code inside a top-level function to another top-level function, and it does a good job.
However, sometimes I would like to not put the extracted function on top level, but rather it should become a function nested inside the existing function. The rationale is re-using code that is only used inside a function, but several times there. I think that this "sub-function" should ideally not be accessible outside of the original function.
How can I do this? I have not seen any options in the refactoring dialog.
Example
a) Original code:
def foo():
a = ''
if a == '':
b = 'empty'
else:
b = 'not empty'
return b
b) What extracting does:
def foo():
a = ''
b = bar(a)
return b
def bar(a):
if a == '':
b = 'empty'
else:
b = 'not empty'
return b
c) What I would like to have:
def foo():
def bar():
if a == '':
b = 'empty'
else:
b = 'not empty'
return b
a = ''
b = bar(a)
return b
I am aware that bar's b will shadow foo's b unless it is renamed in the process. I also thought about completely accepting the shadowing by not returning or requesting b and just modifying it inside bar.
Please also hint me if what I want is not a good thing for any reason.
It is considered good practice to keep function boundaries isolated: get data as parameters and spit data as return values with as little side-effects as possible. That said, there are a few special cases where you break this rule; many of them when using closures. Closures are not as idiomatic in Python as they are in Javascript - personally I think it is good but many people disagree.
There is one place were closures are absolutely idiomatic in Python: decorators. For other cases where you would use a closure in order to avoid use of global variables and provide some form of data hiding there are other alternatives in Python. Although some people advocates using closure instead of a class when it has just one method, a plain function combined with functools.partial can be even better.
This is my guess about why there is no such feature in Pycharm: we almost never do it in Python, instead we tend to keep the function signature as foo(x) even when we can get x from the enclosing scope. Hell, in Python our methods receive self explicitly where most languages have an implicit this. If you write code this way then Pycharm already does everything that is needed when refactoring: it fixes the indentation when you cut & paste.
If you catch yourself doing this kind of refactoring a lot I guess you are coming from a language where closures are more idiomatic like Javascript or Lisp.
So my point is: this "nested to global" or "global to nested" function refactoring feature does not exist in Pycharm because nested functions relying on the enclosing scopes are not idiomatic in Python unless for closures - and even closures are not that idiomatic outside of decorators.
If you care enough go ahead and fill a feature request at their issue tracker or upvote some related tickets like #PY-12802 and #PY-2701 - as you can see those have not attracted a lot of attention possibly because of the reasons above.
I've come across recently a number of places in our code which do things like this
...
globals()['machine'] = otherlib.Machine()
globals()['logger'] = otherlib.getLogger()
globals()['logfile'] = datetime.datetime.now().strftim('logfiles_%Y_%m_%d.log')
and I am more than a little confused as to why people would do that, rather than doing
global machine
machine = otherlib.Machine()
and so on.
Here is a slightly anonymised function which does this, in full:
def openlog(num)
log_file = '/log_dir/thisprogram.' + num
if os.path.exists(log_file):
os.rename(log_file, log_file + '.old')
try:
globals()["log"] = open(log_file, 'w')
return log
except:
print 'Unable to open ' + log_file
sys.exit(1)
It confuses the hell out of pylint (0.25) as well me.
Is there any reason for coding it that way? There's minimal usage of eval in our code, and this isn't in a library
PS I checked Reason for globals() in python but it doesn't really answer as to why you'd use this for setting globals in a program
Maybe the function uses a local variable with the same name as the global one, and the programmer didn't want to bother changing the variable name?
def foo(bar):
global bar # SyntaxError
bar = bar + 1
def foo(bar):
globals()['bar'] = bar + 1
foo(1)
print(bar) # prints 2
Another use case, albeit still a bit specious (and clearly not the case in the example function you gave), is for defining variable names dynamically. This is rarely, if ever, a good idea, but it does come up a lot in questions on this site, at least. For example:
>>> def new_variable():
... name = input("Give your new variable a name! ")
... value = input("Give your new variable a value! ")
... globals()[name] = value
...
>>> new_variable()
Give your new variable a name! foo
Give your new variable a value! bar
>>> print(foo)
bar
Otherwise, I can think of only one reason to do this: perhaps some supervising entity requires that all global variables be set this way, e.g. "in order to make it really, really clear that these variables are global". Or maybe that same supervising entity has placed a blanket ban on the global keyword, or docks programmer pay for each line.
I'm not saying that any of these would be a good reason, but then again, I truly can't conceive of a good reason to define variables this way if not for scoping purposes (and even then, it seems questionable...).
Just in case, I did a timing check, to see if maybe the globals() call is faster than using the keyword. I'd expect the function call + dictionary access to be significantly slower, and it is.
>>> import timeit
>>> timeit.timeit('foo()', 'def foo():\n\tglobals()["bar"] = 1',number=10000000)
2.733132876863408
>>> timeit.timeit('foo()', 'def foo():\n\tglobal bar\n\tbar = 1',number=10000000)
1.6613818077011615
Given the code you posted and my timing results, I can think of no legitimate reason for the code you're looking at to be written like this. Looks like either misguided management requirement, or simple incompetence.
Are the authors PHP converts? This is a valid code in PHP:
$GLOBALS['b'] = $GLOBALS['a'] + $GLOBALS['b'];
See this for more examples. If someone was used to this way of writing the code, maybe they just used the closest matching way of doing it in Python and didn't bother to check for alternatives.
You'd sometimes use a superglobal $GLOBAL variable to define something, because although global keyword exists in PHP, it will only import existing variables - it cannot create a new variable as far as I know.
I often do interactive work in Python that involves some expensive operations that I don't want to repeat often. I'm generally running whatever Python file I'm working on frequently.
If I write:
import functools32
#functools32.lru_cache()
def square(x):
print "Squaring", x
return x*x
I get this behavior:
>>> square(10)
Squaring 10
100
>>> square(10)
100
>>> runfile(...)
>>> square(10)
Squaring 10
100
That is, rerunning the file clears the cache. This works:
try:
safe_square
except NameError:
#functools32.lru_cache()
def safe_square(x):
print "Squaring", x
return x*x
but when the function is long it feels strange to have its definition inside a try block. I can do this instead:
def _square(x):
print "Squaring", x
return x*x
try:
safe_square_2
except NameError:
safe_square_2 = functools32.lru_cache()(_square)
but it feels pretty contrived (for example, in calling the decorator without an '#' sign)
Is there a simple way to handle this, something like:
#non_resetting_lru_cache()
def square(x):
print "Squaring", x
return x*x
?
Writing a script to be executed repeatedly in the same session is an odd thing to do.
I can see why you'd want to do it, but it's still odd, and I don't think it's unreasonable for the code to expose that oddness by looking a little odd, and having a comment explaining it.
However, you've made things uglier than necessary.
First, you can just do this:
#functools32.lru_cache()
def _square(x):
print "Squaring", x
return x*x
try:
safe_square_2
except NameError:
safe_square_2 = _square
There is no harm in attaching a cache to the new _square definition. It won't waste any time, or more than a few bytes of storage, and, most importantly, it won't affect the cache on the previous _square definition. That's the whole point of closures.
There is a potential problem here with recursive functions. It's already inherent in the way you're working, and the cache doesn't add to it in any way, but you might only notice it because of the cache, so I'll explain it and show how to fix it. Consider this function:
#lru_cache()
def _fact(n):
if n < 2:
return 1
return _fact(n-1) * n
When you re-exec the script, even if you have a reference to the old _fact, it's going to end up calling the new _fact, because it's accessing _fact as a global name. It has nothing to do with the #lru_cache; remove that, and the old function will still end up calling the new _fact.
But if you're using the renaming trick above, you can just call the renamed version:
#lru_cache()
def _fact(n):
if n < 2:
return 1
return fact(n-1) * n
Now the old _fact will call fact, which is still the old _fact. Again, this works identically with or without the cache decorator.
Beyond that initial trick, you can factor that whole pattern out into a simple decorator. I'll explain step by step below, or see this blog post.
Anyway, even with the less-ugly version, it's still a bit ugly and verbose. And if you're doing this dozens of times, my "well, it should look a bit ugly" justification will wear thin pretty fast. So, you'll want to handle this the same way you always factor out ugliness: wrap it in a function.
You can't really pass names around as objects in Python. And you don't want to use a hideous frame hack just to deal with this. So you'll have to pass the names around as strings. ike this:
globals().setdefault('fact', _fact)
The globals function just returns the current scope's global dictionary. Which is a dict, which means it has the setdefault method, which means this will set the global name fact to the value _fact if it didn't already have a value, but do nothing if it did. Which is exactly what you wanted. (You could also use setattr on the current module, but I think this way emphasizes that the script is meant to be (repeatedly) executed in someone else's scope, not used as a module.)
So, here that is wrapped up in a function:
def new_bind(name, value):
globals().setdefault(name, value)
… which you can turn that into a decorator almost trivially:
def new_bind(name):
def wrap(func):
globals().setdefault(name, func)
return func
return wrap
Which you can use like this:
#new_bind('foo')
def _foo():
print(1)
But wait, there's more! The func that new_bind gets is going to have a __name__, right? If you stick to a naming convention, like that the "private" name must be the "public" name with a _ prefixed, we can do this:
def new_bind(func):
assert func.__name__[0] == '_'
globals().setdefault(func.__name__[1:], func)
return func
And you can see where this is going:
#new_bind
#lru_cache()
def _square(x):
print "Squaring", x
return x*x
There is one minor problem: if you use any other decorators that don't wrap the function properly, they will break your naming convention. So… just don't do that. :)
And I think this works exactly the way you want in every edge case. In particular, if you've edited the source and want to force the new definition with a new cache, you just del square before rerunning the file, and it works.
And of course if you want to merge those two decorators into one, it's trivial to do so, and call it non_resetting_lru_cache.
However, I'd keep them separate. I think it's more obvious what they do. And if you ever want to wrap another decorator around #lru_cache, you're probably still going to want #new_bind to be the outermost decorator, right?
What if you want to put new_bind into a module that you can import? Then it's not going to work, because it will be referring to the globals of that module, not the one you're currently writing.
You can fix that by explicitly passing your globals dict, or your module object, or your module name as an argument, like #new_bind(__name__), so it can find your globals instead of its. But that's ugly and repetitive.
You can also fix it with an ugly frame hack. At least in CPython, sys._getframe() can be used to get your caller's frame, and frame objects have a reference to their globals namespace, so:
def new_bind(func):
assert func.__name__[0] == '_'
g = sys._getframe(1).f_globals
g.setdefault(func.__name__[1:], func)
return func
Notice the big box in the docs that tells you this is an "implementation detail" that may only apply to CPython and is "for internal and specialized purposes only". Take this seriously. Whenever someone has a cool idea for the stdlib or builtins that could be implemented in pure Python, but only by using _getframe, it's generally treated almost the same as an idea that can't be implemented in pure Python at all. But if you know what you're doing, and you want to use this, and you only care about present-day versions of CPython, it will work.
There is no persistent_lru_cache in the stdlib. But you can build one pretty easily.
The functools source is linked directly from the docs, because this is one of those modules that's as useful as sample code as it is for using it directly.
As you can see, the cache is just a dict. If you replace that with, say, a shelf, it will become persistent automatically:
def persistent_lru_cache(filename, maxsize=128, typed=False):
"""new docstring explaining what dbpath does"""
# same code as before up to here
def decorating_function(user_function):
cache = shelve.open(filename)
# same code as before from here on.
Of course that only works if your arguments are strings. And it could be a little slow.
So, you might want to instead keep it as an in-memory dict, and just write code that pickles it to a file atexit, and restores it from a file if present at startup:
def decorating_function(user_function):
# ...
try:
with open(filename, 'rb') as f:
cache = pickle.load(f)
except:
cache = {}
def cache_save():
with lock:
with open(filename, 'wb') as f:
pickle.dump(cache, f)
atexit.register(cache_save)
# …
wrapper.cache_save = cache_save
wrapper.cache_filename = filename
Or, if you want it to write every N new values (so you don't lose the whole cache on, say, an _exit or a segfault or someone pulling the cord), add this to the second and third versions of wrapper, right after the misses += 1:
if misses % N == 0:
cache_save()
See here for a working version of everything up to this point (using save_every as the "N" argument, and defaulting to 1, which you probably don't want in real life).
If you want to be really clever, maybe copy the cache and save that in a background thread.
You might want to extend the cache_info to include something like number of cache writes, number of misses since last cache write, number of entries in the cache at startup, …
And there are probably other ways to improve this.
From a quick test, with save_every=1, this makes the cache on both get_pep and fib (from the functools docs) persistent, with no measurable slowdown to get_pep and a very small slowdown to fib the first time (note that fib(100) has 100097 hits vs. 101 misses…), and of course a large speedup to get_pep (but not fib) when you re-run it. So, just what you'd expect.
I can't say I won't just use #abarnert's "ugly frame hack", but here is the version that requires you to pass in the calling module's globals dict. I think it's worth posting given that decorator functions with arguments are tricky and meaningfully different from those without arguments.
def create_if_not_exists_2(my_globals):
def wrap(func):
if "_" != func.__name__[0]:
raise Exception("Function names used in cine must begin with'_'")
my_globals.setdefault(func.__name__[1:], func)
def wrapped(*args):
func(*args)
return wrapped
return wrap
Which you can then use in a different module like this:
from functools32 import lru_cache
from cine import create_if_not_exists_2
#create_if_not_exists_2(globals())
#lru_cache()
def _square(x):
print "Squaring", x
return x*x
assert "_square" in globals()
assert "square" in globals()
I've gained enough familiarity with decorators during this process that I was comfortable taking a swing at solving the problem another way:
from functools32 import lru_cache
try:
my_cine
except NameError:
class my_cine(object):
_reg_funcs = {}
#classmethod
def func_key (cls, f):
try:
name = f.func_name
except AttributeError:
name = f.__name__
return (f.__module__, name)
def __init__(self, f):
k = self.func_key(f)
self._f = self._reg_funcs.setdefault(k, f)
def __call__(self, *args, **kwargs):
return self._f(*args, **kwargs)
if __name__ == "__main__":
#my_cine
#lru_cache()
def fact_my_cine(n):
print "In fact_my_cine for", n
if n < 2:
return 1
return fact_my_cine(n-1) * n
x = fact_my_cine(10)
print "The answer is", x
#abarnert, if you are still watching, I'd be curious to hear your assessment of the downsides of this method. I know of two:
You have to know in advance what attributes to look in for a name to associate with the function. My first stab at it only looked at func_name which failed when passed an lru_cache object.
Resetting a function is painful: del my_cine._reg_funcs[('__main__', 'fact_my_cine')], and the swing I took at adding a __delitem__ was unsuccessful.
I can't really think of any reason why Python needs the del keyword (and most languages seem to not have a similar keyword). For instance, rather than deleting a variable, one could just assign None to it. And when deleting from a dictionary, a del method could be added.
Is there a reason to keep del in Python, or is it a vestige of Python's pre-garbage collection days?
Firstly, you can del other things besides local variables
del list_item[4]
del dictionary["alpha"]
Both of which should be clearly useful. Secondly, using del on a local variable makes the intent clearer. Compare:
del foo
to
foo = None
I know in the case of del foo that the intent is to remove the variable from scope. It's not clear that foo = None is doing that. If somebody just assigned foo = None I might think it was dead code. But I instantly know what somebody who codes del foo was trying to do.
There's this part of what del does (from the Python Language Reference):
Deletion of a name removes the binding of that name from the local or global namespace
Assigning None to a name does not remove the binding of the name from the namespace.
(I suppose there could be some debate about whether removing a name binding is actually useful, but that's another question.)
One place I've found del useful is cleaning up extraneous variables in for loops:
for x in some_list:
do(x)
del x
Now you can be sure that x will be undefined if you use it outside the for loop.
Deleting a variable is different than setting it to None
Deleting variable names with del is probably something used rarely, but it is something that could not trivially be achieved without a keyword. If you can create a variable name by writing a=1, it is nice that you can theoretically undo this by deleting a.
It can make debugging easier in some cases as trying to access a deleted variable will raise an NameError.
You can delete class instance attributes
Python lets you write something like:
class A(object):
def set_a(self, a):
self.a=a
a=A()
a.set_a(3)
if hasattr(a, "a"):
print("Hallo")
If you choose to dynamically add attributes to a class instance, you certainly want to be able to undo it by writing
del a.a
There is a specific example of when you should use del (there may be others, but I know about this one off hand) when you are using sys.exc_info() to inspect an exception. This function returns a tuple, the type of exception that was raised, the message, and a traceback.
The first two values are usually sufficient to diagnose an error and act on it, but the third contains the entire call stack between where the exception was raised and where the the exception is caught. In particular, if you do something like
try:
do_evil()
except:
exc_type, exc_value, tb = sys.exc_info()
if something(exc_value):
raise
the traceback, tb ends up in the locals of the call stack, creating a circular reference that cannot be garbage collected. Thus, it is important to do:
try:
do_evil()
except:
exc_type, exc_value, tb = sys.exc_info()
del tb
if something(exc_value):
raise
to break the circular reference. In many cases where you would want to call sys.exc_info(), like with metaclass magic, the traceback is useful, so you have to make sure that you clean it up before you can possibly leave the exception handler. If you don't need the traceback, you should delete it immediately, or just do:
exc_type, exc_value = sys.exc_info()[:2]
To avoid it all together.
Just another thinking.
When debugging http applications in framework like Django, the call stack full of useless and messed up variables previously used, especially when it's a very long list, could be very painful for developers. so, at this point, namespace controlling could be useful.
Using "del" explicitly is also better practice than assigning a variable to None. If you attempt to del a variable that doesn't exist, you'll get a runtime error but if you attempt to set a variable that doesn't exist to None, Python will silently set a new variable to None, leaving the variable you wanted deleted where it was. So del will help you catch your mistakes earlier
del is often seen in __init__.py files. Any global variable that is defined in an __init__.py file is automatically "exported" (it will be included in a from module import *). One way to avoid this is to define __all__, but this can get messy and not everyone uses it.
For example, if you had code in __init__.py like
import sys
if sys.version_info < (3,):
print("Python 2 not supported")
Then your module would export the sys name. You should instead write
import sys
if sys.version_info < (3,):
print("Python 2 not supported")
del sys
To add a few points to above answers:
del x
Definition of x indicates r -> o (a reference r pointing to an object o) but del x changes r rather than o. It is an operation on the reference (pointer) to object rather than the object associated with x. Distinguishing between r and o is key here.
It removes it from locals().
Removes it from globals() if x belongs there.
Removes it from the stack frame (removes the reference physically from it, but the object itself resides in object pool and not in the stack frame).
Removes it from the current scope. It is very useful to limit the span of definition of a local variable, which otherwise can cause problems.
It is more about declaration of the name rather than definition of content.
It affects where x belongs to, not where x points to. The only physical change in memory is this. For example if x is in a dictionary or list, it (as a reference) is removed from there(and not necessarily from the object pool). In this example, the dictionary it belongs is the stack frame (locals()), which overlaps with globals().
I've found del to be useful for pseudo-manual memory management when handling large data with Numpy. For example:
for image_name in large_image_set:
large_image = io.imread(image_name)
height, width, depth = large_image.shape
large_mask = np.all(large_image == <some_condition>)
# Clear memory, make space
del large_image; gc.collect()
large_processed_image = np.zeros((height, width, depth))
large_processed_image[large_mask] = (new_value)
io.imsave("processed_image.png", large_processed_image)
# Clear memory, make space
del large_mask, large_processed_image; gc.collect()
This can be the difference between bringing a script to a grinding halt as the system swaps like mad when the Python GC can't keep up, and it running perfectly smooth below a loose memory threshold that leaves plenty of headroom to use the machine to browse and code while it's working.
Force closing a file after using numpy.load:
A niche usage perhaps but I found it useful when using numpy.load to read a file. Every once in a while I would update the file and need to copy a file with the same name to the directory.
I used del to release the file and allow me to copy in the new file.
Note I want to avoid the with context manager as I was playing around with plots on the command line and didn't want to be pressing tab a lot!
See this question.
I would like to elaborate on the accepted answer to highlight the nuance between setting a variable to None versus removing it with del:
Given the variable foo = 'bar', and the following function definition:
def test_var(var):
if var:
print('variable tested true')
else:
print('variable tested false')
Once initially declared, test_var(foo) yields variable tested true as expected.
Now try:
foo = None
test_var(foo)
which yields variable tested false.
Contrast this behavior with:
del foo
test_var(foo)
which now raises NameError: name 'foo' is not defined.
As an example of what del can be used for, I find it useful i situations like this:
def f(a, b, c=3):
return '{} {} {}'.format(a, b, c)
def g(**kwargs):
if 'c' in kwargs and kwargs['c'] is None:
del kwargs['c']
return f(**kwargs)
# g(a=1, b=2, c=None) === '1 2 3'
# g(a=1, b=2) === '1 2 3'
# g(a=1, b=2, c=4) === '1 2 4'
These two functions can be in different packages/modules and the programmer doesn't need to know what default value argument c in f actually have. So by using kwargs in combination with del you can say "I want the default value on c" by setting it to None (or in this case also leave it).
You could do the same thing with something like:
def g(a, b, c=None):
kwargs = {'a': a,
'b': b}
if c is not None:
kwargs['c'] = c
return f(**kwargs)
However I find the previous example more DRY and elegant.
When is del useful in python?
You can use it to remove a single element of an array instead of the slice syntax x[i:i+1]=[]. This may be useful if for example you are in os.walk and wish to delete an element in the directory. I would not consider a keyword useful for this though, since one could just make a [].remove(index) method (the .remove method is actually search-and-remove-first-instance-of-value).
I think one of the reasons that del has its own syntax is that replacing it with a function might be hard in certain cases given it operates on the binding or variable and not the value it references. Thus if a function version of del were to be created a context would need to be passed in. del foo would need to become globals().remove('foo') or locals().remove('foo') which gets messy and less readable. Still I say getting rid of del would be good given its seemingly rare use. But removing language features/flaws can be painful. Maybe python 4 will remove it :)
The "del" command is very useful for controlling data in an array, for example:
elements = ["A", "B", "C", "D"]
# Remove first element.
del elements[:1]
print(elements)
Output:
['B', 'C', 'D']
del deletes the binding of the variable and its object that it points to.
>>> a = ['a', 'b', 'c']
>>> b = a
>>> del a
>>> b
['a', 'b', 'c']
>>> a
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
A simple use case I can think of would be in case you have used a built-in function name as a variable, and you want to use that function after it has been already "overridden" by your variable name.
t = ('a', "letter")
value, type = t
print(value, type)
del type
print(type(value))
Output:
a letter
<class 'str'>
Yet another niche usage:
In pyroot with ROOT5 or ROOT6, "del" may be useful to remove a python object that referred to a no-longer existing C++ object. This allows the dynamic lookup of pyroot to find an identically-named C++ object and bind it to the python name. So you can have a scenario such as:
import ROOT as R
input_file = R.TFile('inputs/___my_file_name___.root')
tree = input_file.Get('r')
tree.Draw('hy>>hh(10,0,5)')
R.gPad.Close()
R.hy # shows that hy is still available. It can even be redrawn at this stage.
tree.Draw('hy>>hh(3,0,3)') # overwrites the C++ object in ROOT's namespace
R.hy # shows that R.hy is None, since the C++ object it pointed to is gone
del R.hy
R.hy # now finds the new C++ object
Hopefully, this niche will be closed with ROOT7's saner object management.
del is removing the variable from the current scope unless it is re-initialized. Setting it to None keeps it in the current scope.
a = "python string"
print(a)
del a
print(a)
a = "new python string"
print(a)
Output:
python string
Traceback (most recent call last):
File "testing.py", line 4, in <module>
print(a)
NameError: name 'a' is not defined
As I have not seen a interactive console answer, I'll be showing one.
When foo=None that reference and the object exist, it's not pointing to it.
While del foo destroys the object and reference too.
So if you do something like this if foo is None and it was deleted it will rise NameError as the the reference, it's object and everything in between was deleted with del
Deletion of a target list recursively deletes each target, from left to right.
Meanwhile foo=None is just a reference pointing to None so the reference is still kicking, same for the object.
[...]In Python, variables are references to objects and any variable can reference any object[...]
Link to quote 1
Link to quote 2
Another niche case, but useful.
from getpass import getpass
pass = getpass()
token = get_auth_token(pass)
del pass
# Assume more code here...
After the deletion of the pass variable, you don't run the risk of it being printed out later by mistake, or otherwise ending up in a log or stack trace.
Here goes my 2 cents contribution:
I have a optimization problem where I use a Nlopt library for it.
I initializing the class and some of its methods, I was using in several other parts of the code.
I was having ramdom results even if applying the same numerical problem.
I just realized that by doing it, some spurius data was contained in the object when it should have no issues at all. After using del, I guess the memory is being properly cleared and it might be an internal issue to that class where some variables might not be liking to be reused without proper constructor.
Once I had to use:
del serial
serial = None
because using only:
serial = None
didn't release the serial port fast enough to immediately open it again.
From that lesson I learned that del really meant: "GC this NOW! and wait until it's done" and that is really useful in a lot of situations. Of course, you may have a system.gc.del_this_and_wait_balbalbalba(obj).
del is the equivalent of "unset" in many languages
and as a cross reference point moving from another language to python..
people tend to look for commands that do the same thing that they used to do in their first language...
also
setting a var to "" or none doesn't really remove the var from scope..it just empties its value
the name of the var itself would still be stored in memory...why?!?
in a memory intensive script..keeping trash behind its just a no no
and anyways...every language out there has some form of an "unset/delete" var function..why not python?
In many languages (and places) there is a nice practice of creating local scopes by creating a block like this.
void foo()
{
... Do some stuff ...
if(TRUE)
{
char a;
int b;
... Do some more stuff ...
}
... Do even more stuff ...
}
How can I implement this in python without getting the unexpected indent error and without using some sort of if True: tricks
Why do you want to create new scopes in python anyway?
The normal reason for doing it in other languages is variable scoping, but that doesn't happen in python.
if True:
a = 10
print a
In Python, scoping is of three types : global, local and class. You can create specialized 'scope' dictionaries to pass to exec / eval(). In addition you can use nested scopes
(defining a function within another). I found these to be sufficient in all my code.
As Douglas Leeder said already, the main reason to use it in other languages is variable scoping and that doesn't really happen in Python. In addition, Python is the most readable language I have ever used. It would go against the grain of readability to do something like if-true tricks (Which you say you want to avoid). In that case, I think the best bet is to refactor your code into multiple functions, or use a single scope. I think that the available scopes in Python are sufficient to cover every eventuality, so local scoping shouldn't really be necessary.
If you just want to create temp variables and let them be garbage collected right after using them, you can use
del varname
when you don't want them anymore.
If its just for aesthetics, you could use comments or extra newlines, no extra indentation, though.
Python has exactly two scopes, local and global. Variables that are used in a function are in local scope no matter what indentation level they were created at. Calling a nested function will have the effect that you're looking for.
def foo():
a = 1
def bar():
b = 2
print a, b #will print "1 2"
bar()
Still like everyone else, I have to ask you why you want to create a limited scope inside a function.
variables in list comprehension (Python 3+) and generators are local:
>>> i = 0
>>> [i+1 for i in range(10)]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> i
0
but why exactly do you need this?
A scope is a textual region of a
Python program where a namespace is
directly accessible. “Directly
accessible” here means that an
unqualified reference to a name
attempts to find the name in the
namespace...
Please, read the documentation and clarify your question.
btw, you don't need if(TRUE){} in C, a simple {} is sufficient.
As mentioned in the other answers, there is no analogous functionality in Python to creating a new scope with a block, but when writing a script or a Jupyter Notebook, I often (ab)use classes to introduce new namespaces for similar effect. For example, in a notebook where you might have a model "Foo", "Bar" etc. and related variables you might want to create a new scope to avoid having to reuse names like
model = FooModel()
optimizer = FooOptimizer()
...
model = BarModel()
optimizer = BarOptimizer()
or suffix names like
model_foo = ...
optimizer_foo = ...
model_bar = ...
optimizer_bar= ...
Instead you can introduce new namespaces with
class Foo:
model = ...
optimizer = ...
loss = ....
class Bar:
model = ...
optimizer = ...
loss = ...
and then access the variables as
Foo.model
Bar.optimizer
...
I find that using namespaces this way to create new scopes makes code more readable and less error-prone.
While the leaking scope is indeed a feature that is often useful,
I have created a package to simulate block scoping (with selective leaking of your choice, typically to get the results out) anyway.
from scoping import scoping
a = 2
with scoping():
assert(2 == a)
a = 3
b = 4
scoping.keep('b')
assert(3 == a)
assert(2 == a)
assert(4 == b)
https://pypi.org/project/scoping/
I would see this as a clear sign that it's time to create a new function and refactor the code. I can see no reason to create a new scope like that. Any reason in mind?
def a():
def b():
pass
b()
If I just want some extra indentation or am debugging, I'll use if True:
Like so, for arbitrary name t:
### at top of function / script / outer scope (maybe just big jupyter cell)
try: t
except NameError:
class t
pass
else:
raise NameError('please `del t` first')
#### Cut here -- you only need 1x of the above -- example usage below ###
t.tempone = 5 # make new temporary variable that definitely doesn't bother anything else.
# block of calls here...
t.temptwo = 'bar' # another one...
del t.tempone # you can have overlapping scopes this way
# more calls
t.tempthree = t.temptwo; del t.temptwo # done with that now too
print(t.tempthree)
# etc, etc -- any number of variables will fit into t.
### At end of outer scope, to return `t` to being 'unused'
del t
All the above could be in a function def, or just anyplace outside defs along a script.
You can add or del new elements to an arbitrary-named class like that at any point. You really only need one of these -- then manage your 'temporary' namespace as you like.
The del t statement isn't necessary if this is in a function body, but if you include it, then you can copy/paste chunks of code far apart from each other and have them work how you expect (with different uses of 't' being entirely separate, each use starting with the that try: t... block, and ending with del t).
This way if t had been used as a variable already, you'll find out, and it doesn't clobber t so you can find out what it was.
This is less error prone then using a series of random=named functions just to call them once -- since it avoids having to deal with their names, or remembering to call them after their definition, especially if you have to reorder long code.
This basically does exactly what you want: Make a temporary place to put things you know for sure won't collide with anything else, and which you are responsible for cleaning up inside as you go.
Yes, it's ugly, and probably discouraged -- you will be directed to decompose your work into a set of smaller, more reusable functions.
As others have suggested, the python way to execute code without polluting the enclosing namespace is to put it in a class or function. This presents a slight and usually harmless problem: defining the function puts its name in the enclosing namespace. If this causes harm to you, you can name your function using Python's conventional temporary variable "_":
def _():
polluting_variable = foo()
...
_() # Run the code before something overwrites the variable.
This can be done recursively as each local definition masks the definition from the enclosing scope.
This sort of thing should only be needed in very specific circumstances. An example where it is useful is when using Databricks' %run magic, which executes the contents of another notebook in the current notebook's global scope. Wrapping the child notebook's commands in temporary functions prevents them from polluting the global namespace.