Have PyCharm extract code in a Python function to a nested function

Have PyCharm extract code in a Python function to a nested function - python

I have had PyCharm 2017.3 extract some code inside a top-level function to another top-level function, and it does a good job.
However, sometimes I would like to not put the extracted function on top level, but rather it should become a function nested inside the existing function. The rationale is re-using code that is only used inside a function, but several times there. I think that this "sub-function" should ideally not be accessible outside of the original function.
How can I do this? I have not seen any options in the refactoring dialog.
Example
a) Original code:
def foo():
a = ''
if a == '':
b = 'empty'
else:
b = 'not empty'
return b
b) What extracting does:
def foo():
a = ''
b = bar(a)
return b
def bar(a):
if a == '':
b = 'empty'
else:
b = 'not empty'
return b
c) What I would like to have:
def foo():
def bar():
if a == '':
b = 'empty'
else:
b = 'not empty'
return b
a = ''
b = bar(a)
return b
I am aware that bar's b will shadow foo's b unless it is renamed in the process. I also thought about completely accepting the shadowing by not returning or requesting b and just modifying it inside bar.
Please also hint me if what I want is not a good thing for any reason.

It is considered good practice to keep function boundaries isolated: get data as parameters and spit data as return values with as little side-effects as possible. That said, there are a few special cases where you break this rule; many of them when using closures. Closures are not as idiomatic in Python as they are in Javascript - personally I think it is good but many people disagree.
There is one place were closures are absolutely idiomatic in Python: decorators. For other cases where you would use a closure in order to avoid use of global variables and provide some form of data hiding there are other alternatives in Python. Although some people advocates using closure instead of a class when it has just one method, a plain function combined with functools.partial can be even better.
This is my guess about why there is no such feature in Pycharm: we almost never do it in Python, instead we tend to keep the function signature as foo(x) even when we can get x from the enclosing scope. Hell, in Python our methods receive self explicitly where most languages have an implicit this. If you write code this way then Pycharm already does everything that is needed when refactoring: it fixes the indentation when you cut & paste.
If you catch yourself doing this kind of refactoring a lot I guess you are coming from a language where closures are more idiomatic like Javascript or Lisp.
So my point is: this "nested to global" or "global to nested" function refactoring feature does not exist in Pycharm because nested functions relying on the enclosing scopes are not idiomatic in Python unless for closures - and even closures are not that idiomatic outside of decorators.
If you care enough go ahead and fill a feature request at their issue tracker or upvote some related tickets like #PY-12802 and #PY-2701 - as you can see those have not attracted a lot of attention possibly because of the reasons above.

Related

Identify unintentional read/write of global variables inside a python function? For example using static analysis?

One of the things I find frustrating with python is that if I write a function like this:
def UnintentionalValueChangeOfGlobal(a):
SomeDict['SomeKey'] = 100 + a
b = 0.5 * SomeDict['SomeKey']
return b
And then run it like so:
SomeDict = {}
SomeDict['SomeKey'] = 0
b = UnintentionalValueChangeOfGlobal(10)
print(SomeDict['SomeKey'])
Python will: 1) find and use SomeDict during the function call even though I have forgotten to provide it as an input to the function; 2) permanently change the value of SomeDict['SomeKey'] even though it is not included in the return statement of the function.
For me this often leads to variables unintentionally changing values - SomeDict['SomeKey'] in this case becomes 110 after the function is called when the intent was to only manipulate the function output b.
In this case I would have preferred that python: 1) crashes with an error inside the function saying that SomeDict is undefined; 2) under no circumstances permanently changes the value of any variable other than the output b after the function has been called.
I understand that it is not possible to disable the use of globals all together in python, but is there a simple method (a module or an IDE etc.) which can perform static analysis on my python functions and warn me when a function is using and/or changing the value of variables which are not the function's output? I.e., warn me whenever variables are used or manipulated which are not local to the function?

One of the reasons Python doesn't provide any obvious and easy way to prevent accessing (undeclared) global names in a function is that in Python everything (well, everything that can be assigned to a name at least) is an object, including functions, classes and modules, so preventing a function to access undeclared global names would make for quite verbose code... And nested scopes (closures etc) don't help either.
And, of course, despite globals being evils, there ARE still legitimate reasons for mutating a global object sometimes. FWIW, even linters (well pylint and pyflakes at least) don't seem to have any option to detect this AFAICT - but you'll have to double-check by yourself, as I might have overlooked it or it might exist as a pylint extension or in another linter.
OTHO, I very seldom had bugs coming from such an issue in 20+ years (I can't remember a single occurrence actually). Routinely applying basic good practices - short functions avoiding side effects as much as possible, meaningful names and good naming conventions etc, unittesting at least the critical parts etc - seem to be effective enough to prevent such issues.
One of the points here is that I have a rule about non-callable globals being to be considered as (pseudo) constants, which is denoted by naming them ALL_UPPER. This makes it very obvious when you actually either mutate or rebind one...
As a more general rule: Python is by nature a very dynamic language (heck, you can even change the class of an object at runtime...) and with a "we're all consenting adults" philosophy, so it's indeed "lacking" most of the safety guards you'll find in more "B&D" languages like Java and relies instead on conventions, good practices and plain common sense.
Now, Python is not only vey dynamic but also exposes much of it's inners, so you can certainly (if this doesn't already exists) write a pylint extension that would at least detect global names in function codes (hint: you can access the compiled code of a function object with yourfunc.co_code (py2) or yourfunc.__code__ (py3) and then inspect what names are used in the code). But unless you have to deal with a team of sloppy undisciplined devs (in which case you have another issue - there's no technical solutions to stupidity), my very humble opinion is that you're wasting your time.

Ideally I would have wanted the global-checking functionality I’m searching for to be implemented within an IDE and continuously used to assess the use of globals in functions. But since that does not appear to exist I threw together an ad hoc function which takes a python function as input and then looks at the bytecode instructions of the function to see if there are any LOAD_GLOBAL or STORE_GLOBAL instructions present. If it finds any, it tries to assess the type of the global and compare it to a list of user provided types (int, float, etc..). It then prints out the name of all global variables used by the function.
The solution is far from perfect and quite prone to false positives. For instance, if np.unique(x) is used in a function before numpy has been imported (import numpy as np) it will erroneously identify np as a global variable instead of a module. It will also not look into nested functions etc.
But for simple cases such as the example in this post it seems to work fine. I just used it to scan through all the functions in my codebase and it found another global usage that I was unaware of – so at least for me it is useful to have!
Here is the function:
def CheckAgainstGlobals(function, vartypes):
"""
Function for checking if another function reads/writes data from/to global
variables. Only variables of the types contained within 'vartypes' and
unknown types are included in the output.
Inputs:
function - a python function
vartypes - a list of variable types (int, float, dict,...)
Example:
# Define a function
def testfcn(a):
a = 1 + b
return a
# Check if the function read/writes global variables.
CheckAgainstGlobals(testfcn,[int, float, dict, complex, str])
# Should output:
>> Global-check of function: testfcn
>> Loaded global variable: b (of unknown type)
"""
import dis
globalsFound = []
# Disassemble the function's bytecode in a human-readable form.
bytecode = dis.Bytecode(function)
# Step through each instruction in the function.
for instr in bytecode:
# Check if instruction is to either load or store a global.
if instr[0] == 'LOAD_GLOBAL' or instr[0] == 'STORE_GLOBAL':
# Check if its possible to determine the type of the global.
try:
type(eval(instr[3]))
TypeAvailable = True
except:
TypeAvailable = False
"""
Determine if the global variable is being loaded or stored and
check if 'argval' of the global variable matches any of the
vartypes provided as input.
"""
if instr[0] == 'LOAD_GLOBAL':
if TypeAvailable:
for t in vartypes:
if isinstance(eval(instr[3]), t):
s = ('Loaded global variable: %s (of type %s)' %(instr[3], t))
if s not in globalsFound:
globalsFound.append(s)
else:
s = ('Loaded global variable: %s (of unknown type)' %(instr[3]))
if s not in globalsFound:
globalsFound.append(s)
if instr[0] == 'STORE_GLOBAL':
if TypeAvailable:
for t in vartypes:
if isinstance(eval(instr[3]), t):
s = ('Stored global variable: %s (of type %s)' %(instr[3], t))
if s not in globalsFound:
globalsFound.append(s)
else:
s = ('Stored global variable: %s (of unknown type)' %(instr[3]))
if s not in globalsFound:
globalsFound.append(s)
# Print out summary of detected global variable usage.
if len(globalsFound) == 0:
print('\nGlobal-check of fcn: %s. No read/writes of global variables were detected.' %(function.__code__.co_name))
else:
print('\nGlobal-check of fcn: %s' %(function.__code__.co_name))
for s in globalsFound:
print(s)
When used on the function in the example directly after the function has been declared, it will find warn about the usage of the global variable SomeDict but it will not be aware of its type:
def UnintentionalValueChangeOfGlobal(a):
SomeDict['SomeKey'] = 100 + a
b = 0.5 * SomeDict['SomeKey']
return b
# Will find the global, but not know its type.
CheckAgainstGlobals(UnintentionalValueChangeOfGlobal,[int, float, dict, complex, str])
>> Global-check of fcn: UnintentionalValueChangeOfGlobal
>> Loaded global variable: SomeDict (of unknown type)
When used after SomeDict has been defined it also detects that the global is a dict:
SomeDict = {}
SomeDict['SomeKey'] = 0
b = UnintentionalValueChangeOfGlobal(10)
print(SomeDict['SomeKey'])
# Will find the global, and also see its type.
CheckAgainstGlobals(UnintentionalValueChangeOfGlobal,[int, float, dict, complex, str])
>> Global-check of fcn: UnintentionalValueChangeOfGlobal
>> Loaded global variable: SomeDict (of type <class 'dict'>)
Note: in its current state the function fails to detect that SomeDict['SomeKey'] changes value. I.e., it only detects the load instruction, not that the previous value of the global is manipulated. That is because the instruction STORE_SUBSCR seems to be used in this case instead of STORE_GLOBAL. But the use of the global is still detected (since it is being loaded) which is enough for me.

You can check the varible using globals():
def UnintentionalValueChangeOfGlobal(a):
if 'SomeDict' in globals():
raise Exception('Var in globals')
SomeDict['SomeKey'] = 100 + a
b = 0.5 * SomeDict['SomeKey']
return b
SomeDict = {}
SomeDict['SomeKey'] = 0
b = UnintentionalValueChangeOfGlobal(10)
print(SomeDict['SomeKey'])

Limit Python function scope to local variables only

Is there a way to limit function so that it would only have access to local variable and passed arguments?
For example, consider this code
a = 1
def my_fun(x):
print(x)
print(a)
my_fun(2)
Normally the output will be
2
1
However, I want to limit my_fun to local scope so that print(x) would work but throw an error on print(a). Is that possible?

I feel like I should preface this with: Do not actually do this.
You (sort of) can with functions, but you will also disable calls to all other global methods and variables, which I do not imagine you would like to do.
You can use the following decorator to have the function act like there are no variables in the global namespace:
import types
noglobal = lambda f: types.FunctionType(f.__code__, {})
And then call your function:
a = 1
#noglobal
def my_fun(x):
print(x)
print(a)
my_fun(2)
However this actually results in a different error than you want, it results in:
NameError: name 'print' is not defined
By not allowing globals to be used, you cannot use print() either.
Now, you could pass in the functions that you want to use as parameters, which would allow you to use them inside the function, but this is not a good approach and it is much better to just keep your globals clean.
a = 1
#noglobal
def my_fun(x, p):
p(x)
p(a)
my_fun(2, print)
Output:
2
NameError: name 'a' is not defined

Nope. The scoping rules are part of a language's basic definition. To change this, you'd have to alter the compiler to exclude items higher on the context stack, but still within the user space. You obviously don't want to limit all symbols outside the function's context, as you've used one in your example: the external function print. :-)

Alternative to exec

I'm currently trying to code a Python (3.4.4) GUI with tkinter which should allow to fit an arbitrary function to some datapoints. To start easy, I'd like to create some input-function and evaluate it. Later, I would like to plot and fit it using curve_fit from scipy.
In order to do so, I would like to create a dynamic (fitting) function from a user-input-string. I found and read about exec, but people say that (1) it is not safe to use and (2) there is always a better alternative (e.g. here and in many other places). So, I was wondering what would be the alternative in this case?
Here is some example code with two nested functions which works but it's not dynamic:
def buttonfit_press():
def f(x):
return x+1
return f
print(buttonfit_press()(4))
And here is some code that gives rise to NameError: name 'f' is not defined before I can even start to use xval:
def buttonfit_press2(xval):
actfitfunc = "f(x)=x+1"
execstr = "def {}:\n return {}\n".format(actfitfunc.split("=")[0], actfitfunc.split("=")[1])
exec(execstr)
return f
print(buttonfit_press2(4))
An alternative approach with types.FunctionType discussed here (10303248) wasn't successful either...
So, my question is: Is there a good alternative I could use for this scenario? Or if not, how can I make the code with exec run?
I hope it's understandable and not too vague. Thanks in advance for your ideas and input.
#Gábor Erdős:
Either I don't understand or I disagree. If I code the same segment in the mainloop, it recognizes f and I can execute the code segment from execstr:
actfitfunc = "f(x)=x+1"
execstr = "def {}:\n return {}\n".format(actfitfunc.split("=")[0], actfitfunc.split("=")[1])
exec(execstr)
print(f(4))
>>> 5
#Łukasz Rogalski:
Printing execstr seems fine to me:
def f(x):
return x+1
Indentation error is unlikely due to my editor, but I double-checked - it's fine.
Introducing my_locals, calling it in exec and printing in afterwards shows:
{'f': <function f at 0x000000000348D8C8>}
However, I still get NameError: name 'f' is not defined.
#user3691475:
Your example is very similar to my first example. But this is not "dynamic" in my understanding, i.e. one can not change the output of the function while the code is running.
#Dunes:
I think this is going in the right direction, thanks. However, I don't understand yet how I can evaluate and use this function in the next step? What I mean is: in order to be able to fit it, I have to extract fitting variables (i.e. a in f(x)=a*x+b) or evaluate the function at various x-values (i.e. print(f(3.14))).

The problem with exec/eval, is that they can execute arbitrary code. So to use exec or eval you need to either carefully parse the code fragment to ensure it doesn't contain malicious code (an incredibly hard task), or be sure that the source of the code can be trusted. If you're making a small program for personal use then that's fine. A big program that's responsible for sensitive data or money, definitely not. It would seem your use case counts as having a trusted source.
If all you want is to create an arbitrary function at runtime, then just use a combination of the lambda expression and eval. eg.
func_str = "lambda x: x + 1" # equates to f(x)=x+1
func = eval(func_str)
assert func(4) == 5
The reason why your attempt isn't working is that locals(), in the context of a function, creates a copy of the local namespace. Mutations to the resulting dictionary do not effect the current local namespace. You would need to do something like:
def g():
src = """
def f(x):
return x + 1
"""
exec_namespace = {} # exec will place the function f in this dictionary
exec(src, exec_namespace)
return exec_namespace['f'] # retrieve f

I'm not sure what exactly are you trying to do, i.e. what functions are allowed, what operations are permitted, etc.
Here is an example of a function generator with one dynamic parameter:
>>> def generator(n):
def f(x):
return x+n
return f
>>> plus_one=generator(1)
>>> print(plus_one(4))
5

does python 2.5 have an equivalent to Tcl's uplevel command?

Does python have an equivalent to Tcl's uplevel command? For those who don't know, the "uplevel" command lets you run code in the context of the caller. Here's how it might look in python:
def foo():
answer = 0
print "answer is", answer # should print 0
bar()
print "answer is", answer # should print 42
def bar():
uplevel("answer = 42")
It's more than just setting variables, however, so I'm not looking for a solution that merely alters a dictionary. I want to be able to execute any code.

In general, what you ask is not possible (with the results you no doubt expect). E.g., imagine the "any code" is x = 23. Will this add a new variable x to your caller's set of local variables, assuming you do find a black-magical way to execute this code "in the caller"? No it won't -- the crucial optimization performed by the Python compiler is to define once and for all, when def executes, the exact set of local variables (all the barenames that get assigned, or otherwise bound, in the function's body), and turn every access and setting to those barenames into very fast indexing into the stackframe. (You could systematically defeat that crucial optimization e.g. by having an exec '' at the start of every possible caller -- and see your system's performance crash through the floor in consequence).
Except for assigning to the caller's local barenames, exec thecode in thelocals, theglobals may do roughly what you want, and the inspect module lets you get the locals and globals of the caller in a semi-reasonable way (in as far as deep black magic -- which would make me go postal on any coworker suggesting it be perpetrated in production code -- can ever be honored with the undeserved praise of calling it "semi-reasonable", that is;-).
But you do specify "I want to be able to execute any code." and the only solution to that unambiguous specification (and thanks for being so precise, as it makes answering easier!) is: then, use a different programming language.

Is the third party library written in Python? If yes, you could rewrite and rebind the function "foo" at runtime with your own implementation. Like so:
import third_party
original_foo = third_party.foo
def my_foo(*args, **kwds):
# do your magic...
original_foo(*args, **kwds)
third_party.foo = my_foo
I guess monkey-patching is slighly better than rewriting frame locals. ;)

How can one create new scopes in python

In many languages (and places) there is a nice practice of creating local scopes by creating a block like this.
void foo()
{
... Do some stuff ...
if(TRUE)
{
char a;
int b;
... Do some more stuff ...
}
... Do even more stuff ...
}
How can I implement this in python without getting the unexpected indent error and without using some sort of if True: tricks

Why do you want to create new scopes in python anyway?
The normal reason for doing it in other languages is variable scoping, but that doesn't happen in python.
if True:
a = 10
print a

In Python, scoping is of three types : global, local and class. You can create specialized 'scope' dictionaries to pass to exec / eval(). In addition you can use nested scopes
(defining a function within another). I found these to be sufficient in all my code.
As Douglas Leeder said already, the main reason to use it in other languages is variable scoping and that doesn't really happen in Python. In addition, Python is the most readable language I have ever used. It would go against the grain of readability to do something like if-true tricks (Which you say you want to avoid). In that case, I think the best bet is to refactor your code into multiple functions, or use a single scope. I think that the available scopes in Python are sufficient to cover every eventuality, so local scoping shouldn't really be necessary.

If you just want to create temp variables and let them be garbage collected right after using them, you can use
del varname
when you don't want them anymore.
If its just for aesthetics, you could use comments or extra newlines, no extra indentation, though.

Python has exactly two scopes, local and global. Variables that are used in a function are in local scope no matter what indentation level they were created at. Calling a nested function will have the effect that you're looking for.
def foo():
a = 1
def bar():
b = 2
print a, b #will print "1 2"
bar()
Still like everyone else, I have to ask you why you want to create a limited scope inside a function.

variables in list comprehension (Python 3+) and generators are local:
>>> i = 0
>>> [i+1 for i in range(10)]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> i
0
but why exactly do you need this?

A scope is a textual region of a
Python program where a namespace is
directly accessible. “Directly
accessible” here means that an
unqualified reference to a name
attempts to find the name in the
namespace...
Please, read the documentation and clarify your question.
btw, you don't need if(TRUE){} in C, a simple {} is sufficient.

As mentioned in the other answers, there is no analogous functionality in Python to creating a new scope with a block, but when writing a script or a Jupyter Notebook, I often (ab)use classes to introduce new namespaces for similar effect. For example, in a notebook where you might have a model "Foo", "Bar" etc. and related variables you might want to create a new scope to avoid having to reuse names like
model = FooModel()
optimizer = FooOptimizer()
...
model = BarModel()
optimizer = BarOptimizer()
or suffix names like
model_foo = ...
optimizer_foo = ...
model_bar = ...
optimizer_bar= ...
Instead you can introduce new namespaces with
class Foo:
model = ...
optimizer = ...
loss = ....
class Bar:
model = ...
optimizer = ...
loss = ...
and then access the variables as
Foo.model
Bar.optimizer
...
I find that using namespaces this way to create new scopes makes code more readable and less error-prone.

While the leaking scope is indeed a feature that is often useful,
I have created a package to simulate block scoping (with selective leaking of your choice, typically to get the results out) anyway.
from scoping import scoping
a = 2
with scoping():
assert(2 == a)
a = 3
b = 4
scoping.keep('b')
assert(3 == a)
assert(2 == a)
assert(4 == b)
https://pypi.org/project/scoping/

I would see this as a clear sign that it's time to create a new function and refactor the code. I can see no reason to create a new scope like that. Any reason in mind?

def a():
def b():
pass
b()
If I just want some extra indentation or am debugging, I'll use if True:

Like so, for arbitrary name t:
### at top of function / script / outer scope (maybe just big jupyter cell)
try: t
except NameError:
class t
pass
else:
raise NameError('please `del t` first')
#### Cut here -- you only need 1x of the above -- example usage below ###
t.tempone = 5 # make new temporary variable that definitely doesn't bother anything else.
# block of calls here...
t.temptwo = 'bar' # another one...
del t.tempone # you can have overlapping scopes this way
# more calls
t.tempthree = t.temptwo; del t.temptwo # done with that now too
print(t.tempthree)
# etc, etc -- any number of variables will fit into t.
### At end of outer scope, to return `t` to being 'unused'
del t
All the above could be in a function def, or just anyplace outside defs along a script.
You can add or del new elements to an arbitrary-named class like that at any point. You really only need one of these -- then manage your 'temporary' namespace as you like.
The del t statement isn't necessary if this is in a function body, but if you include it, then you can copy/paste chunks of code far apart from each other and have them work how you expect (with different uses of 't' being entirely separate, each use starting with the that try: t... block, and ending with del t).
This way if t had been used as a variable already, you'll find out, and it doesn't clobber t so you can find out what it was.
This is less error prone then using a series of random=named functions just to call them once -- since it avoids having to deal with their names, or remembering to call them after their definition, especially if you have to reorder long code.
This basically does exactly what you want: Make a temporary place to put things you know for sure won't collide with anything else, and which you are responsible for cleaning up inside as you go.
Yes, it's ugly, and probably discouraged -- you will be directed to decompose your work into a set of smaller, more reusable functions.

As others have suggested, the python way to execute code without polluting the enclosing namespace is to put it in a class or function. This presents a slight and usually harmless problem: defining the function puts its name in the enclosing namespace. If this causes harm to you, you can name your function using Python's conventional temporary variable "_":
def _():
polluting_variable = foo()
...
_() # Run the code before something overwrites the variable.
This can be done recursively as each local definition masks the definition from the enclosing scope.
This sort of thing should only be needed in very specific circumstances. An example where it is useful is when using Databricks' %run magic, which executes the contents of another notebook in the current notebook's global scope. Wrapping the child notebook's commands in temporary functions prevents them from polluting the global namespace.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Have PyCharm extract code in a Python function to a nested function - python

Related

Identify unintentional read/write of global variables inside a python function? For example using static analysis?

Limit Python function scope to local variables only

Alternative to exec

does python 2.5 have an equivalent to Tcl's uplevel command?

How can one create new scopes in python

Categories

Resources