What is the most efficient (in terms of processing speed and memory utilisation) method for passing a large number of user-input variables as arguments to a function, and for returning multiple results?
A long string of arguments and return values each time I call the function - e.g. (a,b,c,d,e,f,g) = MyFunction(a,b,c,d,e,f,g) - seems inelegant, and I'm guessing is also inefficient; especially if I have to call the function repeatedly or recursively.
However defining the whole list of variables as Global outside of the function also is ugly, and carries the danger of variable names being inadvertently assigned to several different variables as my program grows.
I've tried putting all the variables into a single array or list and passed that to the function as a single argument, as this seems neater.
Am I correct in thinking that this is also more efficient, even for huge arrays, since it is only the pointer to the start of the array that is passed to the function each time, not the whole array itself?
If arrays are the best method for passing a large number of variables to/from a function, at what point does this efficiency saving kick in - e.g. is it better to pass a string of arguments if the number of arguments is less than 5, but use an array or list if 5 or more arguments are required?
A previous discussion on StackExchange:
Elegant way to pass multiple arguments to a function
has recommended using struct rather than vectors/arrays for passing multiple arguments. Why is this method preferred to using arrays, and at what point do efficiency savings justify the added complexity of using struct?
Are there any other methods that I should consider which will work in Python or C/C++?
(e.g. I'm new to object orientated programming, but wonder if this might offer a solution which is specific
to Python?)
Many thanks
All of this depends on the target system and its calling convention for functions. This answer applies to C and C++ only.
Generally, the use of file scope variables will usually be the fastest possible. In such cases, the variable should never be declared as global (accessible throughout the whole project), but as static (accessible by the local file only).
Still, such static file scope variables should be avoided for several reasons: they can make the code harder to read and maintain, indisciplined use may lead to "spaghetti code", they will create re-entrancy issues and they add some extra identifiers to the file scope namespace.
It should be noted, that in case the number of parameters are limited, that keeping them as separate parameters might increase performance, as the compiler may then store some of them in CPU registers instead of storing them on the stack. CPU registers are the fastest way of passing parameters to a function. How this works is very system-specific. However, writing your program in such a manner that you hope to get the parameters passed through CPU registers, is pre-mature optimization in most cases.
The best, de facto way of passing multiple arguments is indeed to create a custom struct (or C++ class) containing all of the arguments. This structure is then passed by reference to the function. Try to make it so that the struct contains only variables related to each other. Consider putting variables that are not related to each other, or special just for one given function, in a separate parameter. Good program design supersedes efficiency in most cases.
The reason why a struct/class is preferable instead of an array, is simply because the variables together form a unique type, but also since they will likely have different types compared to each other. Making an array of variables that all have different types doesn't make any sense.
And in C++, a class offers other advantages over an array, such as constructors and destructors, custom assignment operators etc.
It will obviously depend on what you want to do, because each of the containers has a different purpose.
For sure, in term of processing speed and memory, you should use a pointer or a reference to a container (Structure, class, array, tuple...), in order to not copy all the data but just the address of the container.
However, you must not create a structure, or put all your variables in the same container just in order to give them as a parameter of a function. All the variables that you will put on the data structure should be related.
In the example that you gave, there are multiple variable of different types. That is why a structure is preferred, because an array requires that all parameters have the same type. In python you could use named tuple in order to store different variable.
Related
Some methods don't need to make a new variable, i.e. lists.reverse() works like this:
lists = [123, 456, 789]
lists.reverse()
print(lists)
this method make itself reversed (without new variable).
Why there is vary ways to manufacture variable in Python?
Some cases which is like variable.method().method2().method3() are typed continuously but type(variable) and print() are not. Why we can't typing like variable.print() or variable.type()?
Is there any philosophical reasons for Python?
You may be confused by the difference between a function and a method, and by three different purposes to them. As much as I dislike using SO for tutorial purposes, these issues can be hard to grasp from other documentation. You can look up function vs method easily enough -- once you know it's a (slightly) separate issue.
Your first question is a matter of system design. Python merely facilitates what programmers want to do, and the differentiation is common to many (most?) programming languages since ASM and FORTRAN crawled out of the binary slime pools in the days when dinosaurs roamed the earth.
When you design how your application works, you need to make a lot of implementation decisions: individual variables vs a sequence, in-line coding vs functions, separate functions vs encased functions vs classes and methods, etc. Part of this decision making is what each function should do. You've raised three main types:
(1) Process this data -- take the given data and change it, rearrange it, whatever needs doing -- but I don't need the previous version, just the improved version, so just put the new stuff where the old stuff was. This is used almost exclusively when one variable is getting processed; we don't generally take four separate variables and change each of them. In that case, we'd put them all in a list and change the list (a single variable). reverse falls into this class.
One important note is that for such a function, the argument in question must be mutable (capable of change). Python has mutable and immutable types. For instance, a list is mutable; a tuple is immutable. If you wanted to reverse a tuple, you'd need to return a new tuple; you can't change the original.
(2) Tell me something interesting -- take the given data and extract some information. However, I'm going to need the originals, so leave them alone. If I need to remember this cool new insight, I'll put it in a variable of my own. This is a function that returns a value. sqrt is one such function.
(3) Interact with the outside world -- input or output data permanently. For output, nothing in the program changes; we may present the data in an easy-to-read format, but we don't change anything internally. print is such a function.
Much of this decision also depends on the function's designed purpose: is this a "verb" function (do something) or a noun/attribute function (look at this data and tell me what you see)?
Now you get the interesting job for yourself: learn the art of system design. You need to become familiar enough with the available programming tools that you have a feeling for how they can be combined to form useful applications.
See the documentation:
The reverse() method modifies the sequence in place for economy of space when reversing a large sequence. To remind users that it operates by side effect, it does not return the reversed sequence.
This may be a straight-up unwise idea so I'd best explain the context. I am finding that some of my functions have multiple and sometimes mutually exclusive or interdependent keyword arguments - ie, they offer the user the ability to input a certain piece of data as (say) a numpy array or a dataframe. And then if a numpy array, an index can be separately passed, but not if it it's a dataframe.
Which has led me to wonder if it's worth creating some kind of keyword parser function to handle these exclusivities/dependencies. One issue with this is that the keyword parser function would then need to return any variables created (and ex-ante, we would not know their number or their names) into the namespace of the function that called it. I'm not sure if that's possible, at least in a reasonable way (I imagine it could be achieved by directly changing the local dict but that's sometimes said to be a bad idea).
So my question is:
1. Is this a bad idea in the first place? Would creating separate functions depending on whether the input was a dataframe or ndarray be more sensible and simpler?
2. Is it possible without too much hacking to have a function return an unspecified number of variables into the local namespace?
Apologies for the slightly vague nature of this question but any thoughts gratefully received.
A dict is a good way to package a variable number of named values. If the parser returns a dict, then there is a single object that can be queried to get those names and values, avoiding the problem of needing to know the number and names ahead of time.
Another possibility would be to put the parser into a class, either as a factory method (classmethod or staticmethod returning an instance) or as a regular method (invoked during or after __init__), where the class instance holds the parsed values.
I have a function f(x), which does something and return values (a tuple).
I have another function that call this function , after processing parameters (the whole function operation is irrelevant to the question); and now I would like to know if there are evil intent in returning the function itself, vs runt the function, dump the output in a variable and return the variable.
A variable has a cost, and assign a value to a variable has a cost; but beside that, is there any sorcery that would happen behind the scene, that would make one better than the other ?
def myfunction(self):
[do something]
return f(x)
is the same as
def myfunction(self):
[do something]
b = f(x)
return b
or one is to prefer to the other (and why)? I am talking purely on the OOP persepctive; without considering that create variables and assign has a cost, in terms of memory and CPU cycles.
That doesn't return the function. Returning the function would look like return f. You're returning the result of the function call. Generally speaking, the only reason to save that result before returning it is if you plan to do some other kind of processing on it before the return, in which case it's faster to just refer to a saved value rather than recomputing it. Another reason to save it would be for clarity, turning what might be a long one-liner with extensive chaining into several steps.
There's a possibility that those two functions might produce different results if you have some kind of asynchronous process that modifies your data in the background between saving the reference and returning it, but that's something you'll have to keep in mind based on your program's situation.
In a nutshell, save it if you want to refer to it, or just return it directly otherwise.
Those are practically identical; use whichever one you think is more readable. If the performance of once versus the other actually matters for you, perhaps Python is not the best choice ;).
The cost difference between these is utterly negligible: in the worst case, one extra dictionary store, one extra dictionary lookup and one extra string in memory. In practice it won't even be that bad, since cpython stores local variables in a C array, so it's more like two c level pointer indirections.
As a matter of style, I would usually avoid the unnecessary variable but its possible that it might be better in particular cases. As a guideline, think about things like whether the amalgamated version leads to an excessively long line of code, whether the extra variable has a better name than eg result, and how clear it is that that function call is the result you need (and if it isnt, whether/how much a variable helps).
I am looking for a way to rename all variables in a formula according to a given substitution map. I am currently using the substitute function, but it seems to be quite slow.
Is there another function I can use which is faster than it? is there any other way of doing it quickly?
N.B. I am only substituting fresh variables to the variables in the original formula, so there are no renaming clashes. Is there any way to perform the renaming faster under this assumption?
For instance,
# given
f = And(Int('x') > Int('y'), Or(Int('x') - 5 >= Int('z'), Int('k') > 1))
# expected result after substitution
# f = And(Int('v0') > Int('v1'), Or(Int('v0') - 5 >= Int('v2'), Int('v3') > 1))
Is there any way to do it working on the context of f?
There isn't an inherently faster way over the API. I have a few comments regarding speed:
You seem to be using the Python API, which by itself has a huge overhead. It may help to time
the portion used in python separately from Z3.
The implementation of the substitute function uses class that gets allocated on the stack.
It is quite possible that making this class a persisted attribute on the context will
speed up amortized time because it will not be allocating and re-allocating memory repeatedly. I would have to profile an instance to be able to tell if this change really pays off.
The more fundamental way to perform renaming is to work with implicit renaming, so not apply substitution at all, but access variables with different offsets. This low level way of dereferencing variables is not available in any way over the API or even the way we represent high-level expressions so it is not going to be an option.
If your application allows it, you may be able to work with existing terms and encode substitutions implicitly. For example in some applications one can just add equality constraints between old and new variables.
I am writing a piece of scientific software in Python which comprises both a Poisson equation solver (using the Newton method) on a rectangular mesh, and a particle-in-cell code. I've written the Newton Solver and the particle-in-cell code as separate functions, which are called by my main script.
I had originally written the code as one large script, but decided to break up the script so that it was more modular, and so that the individual functions could be called on their own. My problem is that I have a large number of "global" variables which I consider parameters for the problem. This includes mostly problem constants and parameters which define the problem geometry and mesh (such as dimensions, locations of certain boundary conditions, boundary conditions etc.).
These parameters are required by both the main script and the individual functions. My question is: What is the best way (and most proper) to store these variables such that they can be accessed by both the main script and the functions.
My current solution is to define a class in a separate module (parameters.py) as so:
class Parameters:
length = 0.008
width = 0.0014
nz = 160
nr = 28
dz = length/nz
dr = width/nr
...
In my main script I then have:
from parameters import Parameters
par = Parameters()
coeff_a = -2 * (1/par.dr**2 + 1/par.dz**2)
...
This method allows me to then use par as a container for my parameters which can be passed to any functions I want. It also provides an easy way to easily set up the problem space to run just one of the functions on their own. My only concern is that each function does not require everything stored in par, and hence it seems inefficient passing it forward all the time. I could probably remove many of the parameters from par, but then I would need to recalculate them every time a function is called, which seems even more inefficient.
Is there a standard solution which people use in these scenarios? I should mention that my functions are not changing the attributes of par, just reading them. I am also interested in achieving high performance, if possible.
Generally, when your program requires many parameters in different places, it makes sense to come up with a neat configuration system, usually a class that provides a certain interface to your own code.
Upon instantiation of that class, you have a configuration object at hand which you can pass around. In some places you might want to populate it, in other places you just might want to use it. In any case, this configuration object will be globally accessible. If your program is a Python package, then this configuration mechanism might be written in its own module which you can import from all other modules in your package.
The configuration class might provide useful features such as parameter registration (a certain code section says that it needs a certain parameter to be set), definition of defaults and parameter validation.
The actual population of parameters is then based on defaults, user-given commandline arguments or user-given input files.
To make Jan-Philip Gehrcke's answer more figurative, check out A global class pattern for python (btw: it's just a normal class, nothing special about "global" - but you can pass it around "globally").
Before actually implementing this in my own program, I had the same idea but wanted to find out how others would do it (like questioner nicholls). I was a bit skeptical to implement this in the first place, in particular it looked quite strange to instantiate a class in the module itself. But it works fine.
However, there are some things to keep in mind though:
It is not super clean. For instance, someone that doesn't know the function in your module wouldn't expect that a parameter in a configuration class needs to be set
If you have to reload your module/functions but want to maintain the values set in your configuration class, you should not instantiate the configuration class again: if "mem" not in locals(): mem = Mem()
It's not advised to assign a parameter from your configuration class as a default argument for a function. For example function(a, b=mem.defaultB).
You cannot change this default value later after initialization. Instead, do function(a, b=None): if b is None: b=mem.defaultB. Then you can also adjust your configuration class after you loaded your module/functions.
Certainly there are more issues...