To what extent should global variables be avoided when using functions? [closed] - python

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have a function that gets the user's preferred directory for use in many other functions:
def get_pref():
q1 = "What is your pref. dir.?"
path = input(q1)
...
return path
Then, the path is used to specify locations in numerous functions (only 2 are shown here.)
Option A:
def do_stuff1():
thing = path + file_name # using global path variable
...
def do_stuff2():
thing = path + file_name # using global path variable
...
path = get_pref()
do_stuff1()
do_stuff2()
Option B:
def do_stuff1(path_):
thing = path_ + file_name # accepts path as argument
...
def do_stuff2(path_):
thing = path_ + file_name # accepts path as argument
...
path = get_pref()
do_stuff1(path)
do_stuff2(path)
Option A accesses the global variable path in each function. Option B accepts the path as an argument in each function.
B seems to be repetitive since the same variable is passed each time, but I know globals are strongly discouraged in Python. Would it be acceptable to declare the path as a global constant, or is there a more conventional way?

If you are writing a short script, you shouldn't be afraid to use globals. Globals are discouraged due to the namespace pollution and lack of modularity. However, if you have a short script, they will not be a significant impact to maintainability.
However, if you are producing a larger module, consider making a class out of your related functions and maintain the path as an instance variable. You may even consider passing the path into the constructor of your instance variable to make it clear to other engineers that the functionality of the class depends heavily on the path value. A setter and getter would also be recommended for this class attribute.

I prefer B) because I generally find it makes code easier to read and reason about.
You're right though. Strict adherence to this means that every dependency that a function needs must be not only passed as arguments to that function, but must also be passed to every parent function up the call chain. This can complicate code; even if it does make clearer the "pipes of information" that are involved.
In this case though, I think A) is acceptable for a couple reasons:
The data involved is immutable, so you know it can't change out from under you at odd times.
It seems like you're also only assigning it once, so you don't need to worry later about the reference changing either.
The key that I've always kept in mind is: globals are not inherently evil. Global, mutable states can be though because it's background information that you can't easily see, but potentially affects the operations of tests and other functions. As long as global states aren't changing (either as a result of mutating the object, or the reference itself through reassignment), I personally don't have any issue with globals.

If you have data and functionality which necessarily interoperate, consider wrapping it in a class. The "path" can be stored as a variable in the class.
import os
class MyClass(object):
def __init__(self, path):
self.path = path
def my_func(self):
thing = os.path.join(path, "foo")

You can use a class to store your "global" like this, which is like a compromise between A and B.
# dc = data container
class dc():
def __init__(self):
self.path = "aaa-"
def do_stuff1(dc_object, file_name):
dc_object.path = dc_object.path + file_name
Obj = dc()
do_stuff1(Obj, "test")
print(Obj.path)
#out: aaa-test

As a beginning programmer, you should avoid globals as much as possible, until you gain the experience to know when they're acceptable.
Option B is clean code, you should pass it as an argument to a function. And no, it is no repetitive code :)

Related

How to split up a Python application without relying on global variables? [duplicate]

This question already has answers here:
How to make a cross-module variable?
(12 answers)
Closed 4 years ago.
I have found myself using a lot of global variables in order to access user input (e.g. a file path) but I know this is bad practice and will likely lead to spaghetti code in the end.
How can I better organise this application without global variables so that I can make use of variables that are set within the GUI through button presses etc?
I've tried global variables, which do work, but will probably lead to bad habits forming, without these I get errors regarding variables being out of scope and undefined.
The application works so far when everything is within one .py file but now that it is growing and I want to split it up into a proper structure I am struggling.
I want to be able to use variables from within a method tied to my GUI so that I can split the application up into multiple files rather than having it all within one file which doesn't seem practical or good practice. I'm not sure if it would be better to use classes instead or if there is a better approach for what I am trying to achieve.
What you need to do here is look into OOP or object oriented programming.
With OOP, you can define self.whatever = something and that variable will be accessable the way you want it.
A good place to look into OOP is here.
If you have names/variables that you know about in advance, put them all in a single module then import that module's contents into all your other modules. Maybe one of the variables in that module can be a dictionary that holds some stuff or can be added to as the program executes.
tmp.py:
foo = 'bar'
b.py:
import tmp
def f():
return tmp.foo
a.py:
import tmp, b
print(b.f())
tmp.foo = 'ICHANGED!!'
print(b.f())
>>>
bar
ICHANGED!!
This works. the name/variable foo is defined/assigned in tmp.py; then it is imported into a.py and b.py. a changes foo and the function in b sees the change.
Yes, creating a class seems best.
class CSVProccessor(object):
def __init__(self, csv_file_path=None):
self.csv_file_path = csv_file_path
self.df = None
def csv_open(self):
self.csv_file_path = filedialog.askopenfilename()
def csv_display(self):
window2 = Toplevel()
csv_frame = Frame(window2)
csv_frame.pack(fill=BOTH, expand=1)
self.df = pd.read_csv(self.csv_file_path)
window2.table = csv_view = Table(csv_frame,
dataframe=df,
showtoolbar=False,
showstatusbar=False)
csv_view.show()
def pre_processing_split(self):
np.random.seed(11)
self.df = pd.read_csv(self.csv_file_path)
csv_processor = CSVProcessor()
csv_processor.csv_open()
# ...etc...

How to pass down multiple parameter through several functions

Let's assume we have an exposed function (Level 0). We call this function with various parameter. Internally this function calls a second function (Level 1) but does not use any of the given parameters other than calling a third function (Level 2) with them as arguments. It might do some other stuff however.
My Question is. How can we pass down the arguments without creating too much noise in the middle layer function (Level 1)? I list some possible ways beneath. Be warned however that some of them are rather ugly and only there for completeness reasons. I'm looking for some established guideline rather than individual personal opinion on the topic
# Transport all of them individually down the road.
# This is the most obvious way. However the amount of parameter increases the
# noise in A_1 since they are only passed along
def A_0(msg, term_print):
A_1(msg, term_print)
def A_1(msg, term_print):
A_2(msg, term_print)
def A_2(msg, term_print):
print(msg, end=term_print)
# Create parameter object (in this case dict) and pass it down.
# Reduces the amount of parameters. However when only reading the source of B1
# it is impossible to determine what par is
def B_0(msg, term_print):
B_1({'msg': msg, 'end': term_print})
def B_1(par):
B_2(par)
def B_2(par):
print(par['msg'], end=par['end'])
# Use global variables. We all know the pitfalls of global variables. However
# in python there are at least limited to their module
def C_0(msg, term_print):
global MSG, TERM_PRINT
MSG = msg
TERM_PRINT = term_print
C_1()
def C_1():
C_2()
def C_2():
print(MSG, end=TERM_PRINT)
# Use the fact that python creates function objects. We can now append those
# objects. This makes some more 'localised' variables than shown before. However
# this also makes the code harder to maintain. When we change D_2 we have to alter
# D_0 as well even though it never directly calls it
def D_0(msg, term_print):
D_2.msg = msg
D_2.term_print = term_print
D_1()
def D_1():
D_2()
def D_2():
print(D_2.msg, end=D_2.term_print)
# Create a class with the functions E_1, E_2 to enclose the variables.
class E(dict):
def E_1(self):
self.E_2()
def E_2(self):
print(self['msg'], end=self['end'])
def E_0(msg, term_print):
E([('msg', msg), ('end', term_print)]).E_1()
# Create a nested scope. This make it very hard to read the function. Furthermore
# F_1 cannot be called directly from outside (without abusing the construct)
def F_0(msg, term_print):
def F_1():
F_2()
def F_2():
print(msg, end=term_print)
F_1()
A_0('What', ' ')
B_0('is', ' ')
C_0('the', ' ')
D_0('best', ' ')
E_0('way', '')
F_0('?', '\n')
It's hard to give a complete answer without knowing the full specifics of why there are so many parameters and so many levels of functions. But in general, passing too many parameters is considered a code smell.
Generally, if a group of functions all make use of the same parameters, it means they are closely related in some way, and may benefit from encapsulating the parameters within a Class, so that all the associated methods can share that data.
TooManyParameters is often a CodeSmell. If you have to pass that much
data together, it could indicate the data is related in some way and
wants to be encapsulated in its own class. Passing in a single
data structure that belongs apart doesn't solve the problem. Rather,
the idea is that things that belong together, keep together; things
that belong apart, keep apart; per the OneResponsibilityRule.
Indeed, you may find that entire functions are completely unnecessary if all they are doing is passing data along to some other function.
class A():
def __init__(self, msg, term_print)
self.msg = msg
self.term_print = term_print
def a_0(self):
return self.a_1()
def a_1(self):
return self.a_2()
def a_2(self):
print(msg, self.term_print)
Depending on the meaning of your sets of parameters and of your function A0, using the *args notation may also be an option:
def A0(*args):
A1(*args)
This allows any number of arguments to be passed to A0 and will pass them on to A1 unchanged. If the semantics of A0 is just that, then the * notation expresses the intention best. However, if you are going to pass on the arguments in a different order or do anything else with them besides just passing them on as an opaque sequence, this notation is not a good fit.
The book "Code Complete 2" by Steve McConnell suggests to use globals, their words are:
Reasons to Use Global Data
Eliminating tramp data
Sometimes you pass data to a routine or class
merely so that it can be passed to another routine or class. For
example, you might have an error-processing object that's used in each
routine. When the routine in the middle of the call chain doesn't use
the object, the object is called "tramp data". Use of global variables can eliminate tramp data.
Use Global Data Only as a Last Resort
Before you resort to using global data
consider a few alternatives:
Begin by making each variable local and make variables global only as you need to
Make all variables local to individual routines initially. If you find
they're needed elsewhere, make them private or protected class
variables before you go so far as to make them global. If you finally
find that you have to make them global, do it, but only when you're
sure you have to. If you start by making a variable global, you'll
never make it local, whereas if you start by making it local, you
might never need it to make it global.
Distinguish between global and class variables
Some variables are truly global in that they are accessed throughout
the whole program. Others are really class variables, used heavily
only within a certain set of routines. It's OK to access a class
variable any way you want to within the set of routines that use it
heavily. If routines outside the class need to use it, provide the
variable's value by means of an access routine. Don't access class
values direcly - as if they were global variables - even if your
programming language allows you to. This advice is tantamount to
saying "Modularize! Modularize! Modularize"
Use access routines
Creating access routines is the workhorse approach to getting around
problems with global data...
Link:
https://books.google.com/books/about/Code_Complete.html?hl=nl&id=LpVCAwAAQBAJ

How can I pass on called function value in Python?

Let's say I have a code like this:
def read_from_file(filename):
list = []
for i in filename:
value = i[0]
list.append(value)
return list
def other_function(other_filename):
"""
That's where my question comes in. How can I get the list
from the other function if I do not know the value "filename" will get?
I would like to use the "list" in this function
"""
read_from_file("apples.txt")
other_function("pears.txt")
I'm aware that this code might not work or might not be perfect. But the only thing I need is the answer to my question in the code.
You have two general options. You can make your list a global variable that all functions can access (usually this is not the right way), or you can pass it to other_function (the right way). So
def other_function(other_filename, anylist):
pass # your code here
somelist = read_from_file("apples.txt")
other_function("pears.txt.", somelist)
You need to "catch" the value return from the first function, and then pass that to the second function.
file_name = read_from_file('apples.txt')
other_function(file_name)
You need to store the returned value in a variable before you can pass it onto another function.
a = read_from_file("apples.txt")
There are at least three reasonable ways to achieve this and two which a beginner will probably never need:
Store the returned value of read_from_file and give it as a parameter to other_function (so adjust the signature to other_function(other_filename, whatever_list))
Make whatever_list a global variable.
Use an object and store whatever_list as a property of that object
(Use nested functions)
(Search for the value via garbage collector gc ;-)
)
Nested functions
def foo():
bla = "OK..."
def bar():
print(bla)
bar()
foo()
Global variables
What are the rules for local and global variables in Python? (official docs)
Global and Local Variables
Very short example
Misc
You should not use list as a variable name as you're overriding a built-in function.
You should use a descriptive name for your variables. What is the content of the list?
Using global variables can sometimes be avoided in a good way by creating objects. While I'm not always a fan of OOP, it sometimes is just what you need. Just have a look of one of the plenty tutorials (e.g. here), get familiar with it, figure out if it fits for your task. (And don't use it all the time just because you can. Python is not Java.)

How to avoid excessive parameter passing?

I am developing a medium size program in python spread across 5 modules. The program accepts command line arguments using OptionParser in the main module e.g. main.py. These options are later used to determine how methods in other modules behave (e.g. a.py, b.py). As I extend the ability for the user to customise the behaviour or the program I find that I end up requiring this user-defined parameter in a method in a.py that is not directly called by main.py, but is instead called by another method in a.py:
main.py:
import a
p = some_command_line_argument_value
a.meth1(p)
a.py:
meth1(p):
# some code
res = meth2(p)
# some more code w/ res
meth2(p):
# do something with p
This excessive parameter passing seems wasteful and wrong, but has hard as I try I cannot think of a design pattern that solves this problem. While I had some formal CS education (minor in CS during my B.Sc.), I've only really come to appreciate good coding practices since I started using python. Please help me become a better programmer!
Create objects of types relevant to your program, and store the command line options relevant to each in them. Example:
import WidgetFrobnosticator
f = WidgetFrobnosticator()
f.allow_oncave_widgets = option_allow_concave_widgets
f.respect_weasel_pins = option_respect_weasel_pins
# Now the methods of WidgetFrobnosticator have access to your command-line parameters,
# in a way that's not dependent on the input format.
import PlatypusFactory
p = PlatypusFactory()
p.allow_parthenogenesis = option_allow_parthenogenesis
p.max_population = option_max_population
# The platypus factory knows about its own options, but not those of the WidgetFrobnosticator
# or vice versa. This makes each class easier to read and implement.
Maybe you should organize your code more into classes and objects? As I was writing this, Jimmy showed a class-instance based answer, so here is a pure class-based answer. This would be most useful if you only ever wanted a single behavior; if there is any chance at all you might want different defaults some of the time, you should use ordinary object-oriented programming in Python, i.e. pass around class instances with the property p set in the instance, not the class.
class Aclass(object):
p = None
#classmethod
def init_p(cls, value):
p = value
#classmethod
def meth1(cls):
# some code
res = cls.meth2()
# some more code w/ res
#classmethod
def meth2(cls):
# do something with p
pass
from a import Aclass as ac
ac.init_p(some_command_line_argument_value)
ac.meth1()
ac.meth2()
If "a" is a real object and not just a set of independent helper methods, you can create an "p" member variable in "a" and set it when you instantiate an "a" object. Then your main class will not need to pass "p" into meth1 and meth2 once "a" has been instantiated.
[Caution: my answer isn't specific to python.]
I remember that Code Complete called this kind of parameter a "tramp parameter". Googling for "tramp parameter" doesn't return many results, however.
Some alternatives to tramp parameters might include:
Put the data in a global variable
Put the data in a static variable of a class (similar to global data)
Put the data in an instance variable of a class
Pseudo-global variable: hidden behind a singleton, or some dependency injection mechanism
Personally, I don't mind a tramp parameter as long as there's no more than one; i.e. your example is OK for me, but I wouldn't like ...
import a
p1 = some_command_line_argument_value
p2 = another_command_line_argument_value
p3 = a_further_command_line_argument_value
a.meth1(p1, p2, p3)
... instead I'd prefer ...
import a
p = several_command_line_argument_values
a.meth1(p)
... because if meth2 decides that it wants more data than before, I'd prefer if it could extract this extra data from the original parameter which it's already being passed, so that I don't need to edit meth1.
With objects, parameter lists should normally be very small, since most appropriate information is a property of the object itself. The standard way to handle this is to configure the object properties and then call the appropriate methods of that object. In this case set p as an attribute of a. Your meth2 should also complain if p is not set.
Your example is reminiscent of the code smell Message Chains. You may find the corresponding refactoring, Hide Delegate, informative.

How can one create new scopes in python

In many languages (and places) there is a nice practice of creating local scopes by creating a block like this.
void foo()
{
... Do some stuff ...
if(TRUE)
{
char a;
int b;
... Do some more stuff ...
}
... Do even more stuff ...
}
How can I implement this in python without getting the unexpected indent error and without using some sort of if True: tricks
Why do you want to create new scopes in python anyway?
The normal reason for doing it in other languages is variable scoping, but that doesn't happen in python.
if True:
a = 10
print a
In Python, scoping is of three types : global, local and class. You can create specialized 'scope' dictionaries to pass to exec / eval(). In addition you can use nested scopes
(defining a function within another). I found these to be sufficient in all my code.
As Douglas Leeder said already, the main reason to use it in other languages is variable scoping and that doesn't really happen in Python. In addition, Python is the most readable language I have ever used. It would go against the grain of readability to do something like if-true tricks (Which you say you want to avoid). In that case, I think the best bet is to refactor your code into multiple functions, or use a single scope. I think that the available scopes in Python are sufficient to cover every eventuality, so local scoping shouldn't really be necessary.
If you just want to create temp variables and let them be garbage collected right after using them, you can use
del varname
when you don't want them anymore.
If its just for aesthetics, you could use comments or extra newlines, no extra indentation, though.
Python has exactly two scopes, local and global. Variables that are used in a function are in local scope no matter what indentation level they were created at. Calling a nested function will have the effect that you're looking for.
def foo():
a = 1
def bar():
b = 2
print a, b #will print "1 2"
bar()
Still like everyone else, I have to ask you why you want to create a limited scope inside a function.
variables in list comprehension (Python 3+) and generators are local:
>>> i = 0
>>> [i+1 for i in range(10)]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> i
0
but why exactly do you need this?
A scope is a textual region of a
Python program where a namespace is
directly accessible. “Directly
accessible” here means that an
unqualified reference to a name
attempts to find the name in the
namespace...
Please, read the documentation and clarify your question.
btw, you don't need if(TRUE){} in C, a simple {} is sufficient.
As mentioned in the other answers, there is no analogous functionality in Python to creating a new scope with a block, but when writing a script or a Jupyter Notebook, I often (ab)use classes to introduce new namespaces for similar effect. For example, in a notebook where you might have a model "Foo", "Bar" etc. and related variables you might want to create a new scope to avoid having to reuse names like
model = FooModel()
optimizer = FooOptimizer()
...
model = BarModel()
optimizer = BarOptimizer()
or suffix names like
model_foo = ...
optimizer_foo = ...
model_bar = ...
optimizer_bar= ...
Instead you can introduce new namespaces with
class Foo:
model = ...
optimizer = ...
loss = ....
class Bar:
model = ...
optimizer = ...
loss = ...
and then access the variables as
Foo.model
Bar.optimizer
...
I find that using namespaces this way to create new scopes makes code more readable and less error-prone.
While the leaking scope is indeed a feature that is often useful,
I have created a package to simulate block scoping (with selective leaking of your choice, typically to get the results out) anyway.
from scoping import scoping
a = 2
with scoping():
assert(2 == a)
a = 3
b = 4
scoping.keep('b')
assert(3 == a)
assert(2 == a)
assert(4 == b)
https://pypi.org/project/scoping/
I would see this as a clear sign that it's time to create a new function and refactor the code. I can see no reason to create a new scope like that. Any reason in mind?
def a():
def b():
pass
b()
If I just want some extra indentation or am debugging, I'll use if True:
Like so, for arbitrary name t:
### at top of function / script / outer scope (maybe just big jupyter cell)
try: t
except NameError:
class t
pass
else:
raise NameError('please `del t` first')
#### Cut here -- you only need 1x of the above -- example usage below ###
t.tempone = 5 # make new temporary variable that definitely doesn't bother anything else.
# block of calls here...
t.temptwo = 'bar' # another one...
del t.tempone # you can have overlapping scopes this way
# more calls
t.tempthree = t.temptwo; del t.temptwo # done with that now too
print(t.tempthree)
# etc, etc -- any number of variables will fit into t.
### At end of outer scope, to return `t` to being 'unused'
del t
All the above could be in a function def, or just anyplace outside defs along a script.
You can add or del new elements to an arbitrary-named class like that at any point. You really only need one of these -- then manage your 'temporary' namespace as you like.
The del t statement isn't necessary if this is in a function body, but if you include it, then you can copy/paste chunks of code far apart from each other and have them work how you expect (with different uses of 't' being entirely separate, each use starting with the that try: t... block, and ending with del t).
This way if t had been used as a variable already, you'll find out, and it doesn't clobber t so you can find out what it was.
This is less error prone then using a series of random=named functions just to call them once -- since it avoids having to deal with their names, or remembering to call them after their definition, especially if you have to reorder long code.
This basically does exactly what you want: Make a temporary place to put things you know for sure won't collide with anything else, and which you are responsible for cleaning up inside as you go.
Yes, it's ugly, and probably discouraged -- you will be directed to decompose your work into a set of smaller, more reusable functions.
As others have suggested, the python way to execute code without polluting the enclosing namespace is to put it in a class or function. This presents a slight and usually harmless problem: defining the function puts its name in the enclosing namespace. If this causes harm to you, you can name your function using Python's conventional temporary variable "_":
def _():
polluting_variable = foo()
...
_() # Run the code before something overwrites the variable.
This can be done recursively as each local definition masks the definition from the enclosing scope.
This sort of thing should only be needed in very specific circumstances. An example where it is useful is when using Databricks' %run magic, which executes the contents of another notebook in the current notebook's global scope. Wrapping the child notebook's commands in temporary functions prevents them from polluting the global namespace.

Categories