How to deal with global parameters in a scientific Python script

How to deal with global parameters in a scientific Python script - python

I am writing a piece of scientific software in Python which comprises both a Poisson equation solver (using the Newton method) on a rectangular mesh, and a particle-in-cell code. I've written the Newton Solver and the particle-in-cell code as separate functions, which are called by my main script.
I had originally written the code as one large script, but decided to break up the script so that it was more modular, and so that the individual functions could be called on their own. My problem is that I have a large number of "global" variables which I consider parameters for the problem. This includes mostly problem constants and parameters which define the problem geometry and mesh (such as dimensions, locations of certain boundary conditions, boundary conditions etc.).
These parameters are required by both the main script and the individual functions. My question is: What is the best way (and most proper) to store these variables such that they can be accessed by both the main script and the functions.
My current solution is to define a class in a separate module (parameters.py) as so:
class Parameters:
length = 0.008
width = 0.0014
nz = 160
nr = 28
dz = length/nz
dr = width/nr
...
In my main script I then have:
from parameters import Parameters
par = Parameters()
coeff_a = -2 * (1/par.dr**2 + 1/par.dz**2)
...
This method allows me to then use par as a container for my parameters which can be passed to any functions I want. It also provides an easy way to easily set up the problem space to run just one of the functions on their own. My only concern is that each function does not require everything stored in par, and hence it seems inefficient passing it forward all the time. I could probably remove many of the parameters from par, but then I would need to recalculate them every time a function is called, which seems even more inefficient.
Is there a standard solution which people use in these scenarios? I should mention that my functions are not changing the attributes of par, just reading them. I am also interested in achieving high performance, if possible.

Generally, when your program requires many parameters in different places, it makes sense to come up with a neat configuration system, usually a class that provides a certain interface to your own code.
Upon instantiation of that class, you have a configuration object at hand which you can pass around. In some places you might want to populate it, in other places you just might want to use it. In any case, this configuration object will be globally accessible. If your program is a Python package, then this configuration mechanism might be written in its own module which you can import from all other modules in your package.
The configuration class might provide useful features such as parameter registration (a certain code section says that it needs a certain parameter to be set), definition of defaults and parameter validation.
The actual population of parameters is then based on defaults, user-given commandline arguments or user-given input files.

To make Jan-Philip Gehrcke's answer more figurative, check out A global class pattern for python (btw: it's just a normal class, nothing special about "global" - but you can pass it around "globally").
Before actually implementing this in my own program, I had the same idea but wanted to find out how others would do it (like questioner nicholls). I was a bit skeptical to implement this in the first place, in particular it looked quite strange to instantiate a class in the module itself. But it works fine.
However, there are some things to keep in mind though:
It is not super clean. For instance, someone that doesn't know the function in your module wouldn't expect that a parameter in a configuration class needs to be set
If you have to reload your module/functions but want to maintain the values set in your configuration class, you should not instantiate the configuration class again: if "mem" not in locals(): mem = Mem()
It's not advised to assign a parameter from your configuration class as a default argument for a function. For example function(a, b=mem.defaultB).
You cannot change this default value later after initialization. Instead, do function(a, b=None): if b is None: b=mem.defaultB. Then you can also adjust your configuration class after you loaded your module/functions.
Certainly there are more issues...

Related

Can you efficiently analyse the type of method parameters in Python

I am making a PyCharm plugin that will generate code examples based on a Python class. The user will be able to select a class and the plugin will generate code examples that utilise the attributes and methods of the class.
Basic Class
class Rectangle:
# Hardcoded values
width = 5
length = 10
def calculate_area(self):
return self.width*self.length
Basic Example
r = Rectangle()
r.calculate_area()
The problem arises when the class contains methods that take parameters. For example, if the class contains a method that takes a string parameter, then I want the generated examples to pass either an empty or a random string to the method. Is there a good way to analyse the potential type of the parameters (when not strongly defined) to generate code examples?
Potential Solutions
So far I have come up with 2 applicable solutions (passing None parameters when calling the method is not applicable):
A potential solution would be to keep track of the parameters and analyse the operations that they use/are used in to determine their type. In the code below num will most likely be an integer, as determined by the method body:
def calculate_even(num):
return num % 2 == 0
I know that you can execute a string containing Python code and check if it compiles correctly (source). A possible solution (one that I would love to avoid) is to generate code examples, passing different types as method parameters and determining which ones work after executing said code examples.
I can see huge downsides with both solutions, as the first one will require the handling of lots of operations for the different types and will still most likely be inaccurate, and the second one becomes incredibly slow for methods that take multiple parameters. I am really hoping that there's a more elegant solution that I have missed.
Edit
I am looking for a possible solution that won't require the use of type annotations/hints

Any way to prevent modifications to content of a ndarray subclass?

I am creating various classes for computational geometry that all subclass numpy.ndarray. The DataCloud class, which is typical of these classes, has Python properties (for example, convex_hull, delaunay_trangulation) that would be time consuming and wasteful to calculate more than once. I want to do calculations once and only once. Also, just in time, because for a given instance, I might not need a given property at all. It is easy enough to set this up by setting self.__convex_hull = None in the constructor and, if/when the convex_hull property is called, doing the required calculation, setting self.__convex_hull, and returning the calculated value.
The problem is that once any of those complicated properties is invoked, any changes to the contents made, external to my subclass, by the various numpy (as opposed to DataCloud subclass) methods will invalidate all the calculated properties, and I won't know about it. For example, suppose external code simply does this to the instance: datacloud[3,8] = 5. So is there any way to either (1) make the ndarray base class read-only once any of those properties is calculated or (2) have ndarray set some indicator that there has been a change to its contents (which for my purposes makes it dirty), so that then invoking any of the complex properties will require recalculation?

Looks like the answer is:
np.ndarray.setflags(write=False)

Difference between Python methods which is can make new variable or not

Some methods don't need to make a new variable, i.e. lists.reverse() works like this:
lists = [123, 456, 789]
lists.reverse()
print(lists)
this method make itself reversed (without new variable).
Why there is vary ways to manufacture variable in Python?
Some cases which is like variable.method().method2().method3() are typed continuously but type(variable) and print() are not. Why we can't typing like variable.print() or variable.type()?
Is there any philosophical reasons for Python?

You may be confused by the difference between a function and a method, and by three different purposes to them. As much as I dislike using SO for tutorial purposes, these issues can be hard to grasp from other documentation. You can look up function vs method easily enough -- once you know it's a (slightly) separate issue.
Your first question is a matter of system design. Python merely facilitates what programmers want to do, and the differentiation is common to many (most?) programming languages since ASM and FORTRAN crawled out of the binary slime pools in the days when dinosaurs roamed the earth.
When you design how your application works, you need to make a lot of implementation decisions: individual variables vs a sequence, in-line coding vs functions, separate functions vs encased functions vs classes and methods, etc. Part of this decision making is what each function should do. You've raised three main types:
(1) Process this data -- take the given data and change it, rearrange it, whatever needs doing -- but I don't need the previous version, just the improved version, so just put the new stuff where the old stuff was. This is used almost exclusively when one variable is getting processed; we don't generally take four separate variables and change each of them. In that case, we'd put them all in a list and change the list (a single variable). reverse falls into this class.
One important note is that for such a function, the argument in question must be mutable (capable of change). Python has mutable and immutable types. For instance, a list is mutable; a tuple is immutable. If you wanted to reverse a tuple, you'd need to return a new tuple; you can't change the original.
(2) Tell me something interesting -- take the given data and extract some information. However, I'm going to need the originals, so leave them alone. If I need to remember this cool new insight, I'll put it in a variable of my own. This is a function that returns a value. sqrt is one such function.
(3) Interact with the outside world -- input or output data permanently. For output, nothing in the program changes; we may present the data in an easy-to-read format, but we don't change anything internally. print is such a function.
Much of this decision also depends on the function's designed purpose: is this a "verb" function (do something) or a noun/attribute function (look at this data and tell me what you see)?
Now you get the interesting job for yourself: learn the art of system design. You need to become familiar enough with the available programming tools that you have a feeling for how they can be combined to form useful applications.

See the documentation:
The reverse() method modifies the sequence in place for economy of space when reversing a large sequence. To remind users that it operates by side effect, it does not return the reversed sequence.

Have a keyword parser function return variables into local namespace

This may be a straight-up unwise idea so I'd best explain the context. I am finding that some of my functions have multiple and sometimes mutually exclusive or interdependent keyword arguments - ie, they offer the user the ability to input a certain piece of data as (say) a numpy array or a dataframe. And then if a numpy array, an index can be separately passed, but not if it it's a dataframe.
Which has led me to wonder if it's worth creating some kind of keyword parser function to handle these exclusivities/dependencies. One issue with this is that the keyword parser function would then need to return any variables created (and ex-ante, we would not know their number or their names) into the namespace of the function that called it. I'm not sure if that's possible, at least in a reasonable way (I imagine it could be achieved by directly changing the local dict but that's sometimes said to be a bad idea).
So my question is:
1. Is this a bad idea in the first place? Would creating separate functions depending on whether the input was a dataframe or ndarray be more sensible and simpler?
2. Is it possible without too much hacking to have a function return an unspecified number of variables into the local namespace?
Apologies for the slightly vague nature of this question but any thoughts gratefully received.

A dict is a good way to package a variable number of named values. If the parser returns a dict, then there is a single object that can be queried to get those names and values, avoiding the problem of needing to know the number and names ahead of time.
Another possibility would be to put the parser into a class, either as a factory method (classmethod or staticmethod returning an instance) or as a regular method (invoked during or after __init__), where the class instance holds the parsed values.

Most efficient way to pass multiple arguments to a function?

What is the most efficient (in terms of processing speed and memory utilisation) method for passing a large number of user-input variables as arguments to a function, and for returning multiple results?
A long string of arguments and return values each time I call the function - e.g. (a,b,c,d,e,f,g) = MyFunction(a,b,c,d,e,f,g) - seems inelegant, and I'm guessing is also inefficient; especially if I have to call the function repeatedly or recursively.
However defining the whole list of variables as Global outside of the function also is ugly, and carries the danger of variable names being inadvertently assigned to several different variables as my program grows.
I've tried putting all the variables into a single array or list and passed that to the function as a single argument, as this seems neater.
Am I correct in thinking that this is also more efficient, even for huge arrays, since it is only the pointer to the start of the array that is passed to the function each time, not the whole array itself?
If arrays are the best method for passing a large number of variables to/from a function, at what point does this efficiency saving kick in - e.g. is it better to pass a string of arguments if the number of arguments is less than 5, but use an array or list if 5 or more arguments are required?
A previous discussion on StackExchange:
Elegant way to pass multiple arguments to a function
has recommended using struct rather than vectors/arrays for passing multiple arguments. Why is this method preferred to using arrays, and at what point do efficiency savings justify the added complexity of using struct?
Are there any other methods that I should consider which will work in Python or C/C++?
(e.g. I'm new to object orientated programming, but wonder if this might offer a solution which is specific
to Python?)
Many thanks

All of this depends on the target system and its calling convention for functions. This answer applies to C and C++ only.
Generally, the use of file scope variables will usually be the fastest possible. In such cases, the variable should never be declared as global (accessible throughout the whole project), but as static (accessible by the local file only).
Still, such static file scope variables should be avoided for several reasons: they can make the code harder to read and maintain, indisciplined use may lead to "spaghetti code", they will create re-entrancy issues and they add some extra identifiers to the file scope namespace.
It should be noted, that in case the number of parameters are limited, that keeping them as separate parameters might increase performance, as the compiler may then store some of them in CPU registers instead of storing them on the stack. CPU registers are the fastest way of passing parameters to a function. How this works is very system-specific. However, writing your program in such a manner that you hope to get the parameters passed through CPU registers, is pre-mature optimization in most cases.
The best, de facto way of passing multiple arguments is indeed to create a custom struct (or C++ class) containing all of the arguments. This structure is then passed by reference to the function. Try to make it so that the struct contains only variables related to each other. Consider putting variables that are not related to each other, or special just for one given function, in a separate parameter. Good program design supersedes efficiency in most cases.
The reason why a struct/class is preferable instead of an array, is simply because the variables together form a unique type, but also since they will likely have different types compared to each other. Making an array of variables that all have different types doesn't make any sense.
And in C++, a class offers other advantages over an array, such as constructors and destructors, custom assignment operators etc.

It will obviously depend on what you want to do, because each of the containers has a different purpose.
For sure, in term of processing speed and memory, you should use a pointer or a reference to a container (Structure, class, array, tuple...), in order to not copy all the data but just the address of the container.
However, you must not create a structure, or put all your variables in the same container just in order to give them as a parameter of a function. All the variables that you will put on the data structure should be related.
In the example that you gave, there are multiple variable of different types. That is why a structure is preferred, because an array requires that all parameters have the same type. In python you could use named tuple in order to store different variable.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to deal with global parameters in a scientific Python script - python

Related

Can you efficiently analyse the type of method parameters in Python

Any way to prevent modifications to content of a ndarray subclass?

Difference between Python methods which is can make new variable or not

Have a keyword parser function return variables into local namespace

Most efficient way to pass multiple arguments to a function?

Categories

Resources