Preventing symfit models from sharing parameter objects - python

Multiple symfit model instances share parameter objects with the same name. I'd like to understand where this behaviour comes from, what it's intent is and if it's possible to deactivate.
To illustrate what I mean, a minimial example:
import symfit as sf
# Create Parameters and Variables
a = sf.Parameter('a',value=0)
b = sf.Parameter('b',value=1,fixed=True)
x, y = sf.variables('x, y')
# Instanciate two models
model1=sf.Model({y:a*x+b})
model2=sf.Model({y:a*x+b})
# They are indeed not the same
id(model1) == id(model2)
>>False
# There are two parameters
print(model1.params)
>>[a,b]
print(model1.params[1].name, model1.params[1].value)
>>b 1
print(model2.params[1].name, model2.params[1].value)
>>b 1
#They are initially identical
# We want to manually modify the fixed one in only one model
model1.params[1].value = 3
# Both have changed
print(model1.params[1].name, model1.params[1].value)
>>b 3
print(model2.params[1].name, model2.params[1].value)
>>b 3
id(model1.params[1]) == id(model2.params[1])
>>True
# The parameter is the same object
I want to fit multiple data streams with different models, but different fixed paramter values dependent on the data stream. Renaming the parameters in each instance of the model would work, but is ugly given that the paramter represents the same quantity. Processing them sequentially and modifying the parameters in between is possible, but I worry about unintended interactions between steps.
PS: Can someone with sufficient reputation please create the symfit tag

Excellent question. In principle this is because Parameter objects are a subclass of sympy.Symbol, and from its docstring:
Symbols are identified by name and assumptions:
>>> from sympy import Symbol
>>> Symbol("x") == Symbol("x")
True
>>> Symbol("x", real=True) == Symbol("x", real=False)
False
This is fundamental to the inner working of sympy, and therefore something we also use in symfit. But the value and fixed arguments are not viewed as assumptions, so they are not used to distinguish parameters.
Now, to your question on how this would affect fitting. Like you say, working sequentially is a good solution, and one that will not have any side effects:
model = sf.Model({y:a*x+b})
b.fixed = True
fit_results = []
for b_value, xdata, ydata in datastream:
b.value = b_value
fit = Fit(model, x=xdata, y=ydata)
fit_results.append(fit.execute())
So there is no need to define a new Parameter every iteration, the b.value attribute will be the same within each loop so there is no way this can go wrong. The only way I can imagine this going wrong is if you use threading, that will probably create some race conditions. But threading is not desirable for CPU bound tasks anyway, multiprocessing is the way to go. And in that case, separate processes will be spawned, creating separate microcosms, so there should be no problem there either.
I hope this answers your question, if not let me know.
p.s. I'm slowly answering my way up to 1500 to make that tag, but if someone beats me to it I'd be all the happier for it of course ;)

Related

Using global parameters of z3py

I'm trying to use SMT solver over a scheduling problem and could not find anything helping in the documentation.
It seems using following ways of setting parameters do not have any effect on the solver.
from z3 import *
set_param(logic="QF_UFIDL")
s = Optimize() # or even Solver()
or even
from z3 import *
s = Optimize()
s.set("parallel.enable", True)
So how can I set [global] parameters effectively in z3py. to be most specific I need to set parameters below:
parallel.enable=True
auto_confic=False
smtlib2_compliant=True
logic="QF_UFIDL"
Use global parameter statements like the following on separate lines before creating Solver or Optimize object:
set_param('parallel.enable', True)
set_param('parallel.threads.max', 4) # default 10000
To set non-global parameters specific to a Solver or Optimize object, you can use the help() function to show available parameters:
o = Optimize()
o.help()
s = Solver()
s.help()
The following example shows how to set an Optimize parameter:
opt = Optimize()
opt.set(priority='pareto')
Use set_param, as described here: https://z3prover.github.io/api/html/namespacez3py.html#a4ae524d5f91ad1b380d8821de01cd7c3
It isn't clear what's not working for you. Are you getting an error message back? From your description, I understand that the setting does indeed take place, but you don't see any change in behavior? For that, you'll have to provide a concrete example we can look at. Note that for most parameters, the effects will only be visible with benchmarks that trigger the option, and even then it'll be hard to tell what (if any) effect it had, unless you dig into verbose log output.
Also, parallel-solving features, which you seem to be interested in, isn't going to gain you much. See Section 9.2 of https://z3prover.github.io/papers/z3internals.html: Essentially it boils down to attempting to solve the same problem with different seeds to see if one of them goes faster. If you have many cores lying around it might be worth a try, but don't expect magic out of it.

The parameter values are not updating in the Pyomo model with PySP callback

I am using PySP (Pyomo) for one stochastic optimization problem. I created a concrete model for my problem and also defined the scenarios based on the farmer's example given in
https://github.com/Pyomo/pysp/blob/main/examples/farmer/concrete/ReferenceModel.py
In the above example a pysp_instance_creation_callback() function is called for each of the scenarios. In the function, an instance of the model is cloned for each scenario so that the scenario variable (Yield in this case) is updated for each of the scenarios using instance.Yield.store_values(Yield[scenario_name]).
I followed a similar approach to my problem. However, in my case, for each scenario the size of the unknown varies unlike the farmer's example, where the scenarios are for just three crops (wheat, sugar, corn). For instance, my scenarios would look like this,
Scenario1 = {123, 124, 118}
Scenario2 = {117, 10}
Scenario3 = {118, 120, 125, 126}
Scenario4 = {0, 125}
...
My code snippet looks something like the one below (I have only mentioned useful constraints and variables for simplicity)
# Variable:
model.nEdges = 129
model.x_ij = range(0, model.nEdges) # line switching variable range
model.xij = Var(model.x_ij, bounds=(0, 1), within=Binary)
# Scenario parameter:
model.Fault = Param(mutable=True, initialize={123,124,118}, within=Any)
# Constraint:
for key, ite in model.Fault.items():
for faulty in ite.value:
model.c.add(model.xij[faulty] == 0)
# Scenarios:
Fault = {}
Fault['Scenario1'] = {123, 124, 118}
Fault['Scenario2'] = {120, 124, 118}
Fault['Scenario3'] = {1, 125}
# callback function to update the model parameter
def pysp_instance_creation_callback(scenario_name, node_names):
instance = model.clone()
instance.Fault.store_values(Fault[scenario_name])
return instance
However, this method did not work for me. The model.Fault value remains the same for each of the scenarios as it was initialized i.e., {123,124,118}. Although if I check the instance value for each of the scenarios, i.e., instance.Fault.value, then it seems like the values are updating (instance.Fault.value gives different values consistent with different scenarios) but while checking the output lp file for the actual model, the constraints are not updated as desired and the final solution comes the same for each of the scenarios as mentioned before. I am not sure how to tackle this issue and I have been stuck in this problem for days. Can anybody help me here?
The short answer is that you are not using Param correctly. Mutable Params are meant to hold scalar values that appear in the expression tree (used in in either objectives or constraints). You are putting a complex data structure into the Param and then iterating over it when you create the original model, using the data in indirection (as the index of another variable). The reason that this is not working is that model.clone() makes a copy of model as it currently exists, which copies the original constraints. The constraints that you added to model.c are actually independent of the mutable Param Fault, so there is no way for Pyomo to know what / how to update the constraints when you change the values in it.
A better approach for this specific situation would be to fix variables instead of creating additional constraints:
model.xij = Var(model.x_ij, bounds=(0, 1), within=Binary)
# Scenarios:
Fault = {}
Fault['Scenario1'] = {123, 124, 118}
Fault['Scenario2'] = {120, 124, 118}
Fault['Scenario3'] = {1, 125}
# callback function to update the model parameter
def pysp_instance_creation_callback(scenario_name, node_names):
instance = model.clone()
for faulty in Fault[scenario_name]:
instance.xij[faulty].fix(0)
return instance
Before I get started, I should point out that PySP is no longer under active development. New development is all going on in mpi-sppy https://github.com/Pyomo/mpi-sppy
Returning to your issue: PySP assumes that the "shape" of the model is the same for each scenario so you will have to do some more coding to make that assumption valid. The same is more-or-less true for mpi-sppy, although it has mechanisms to allow for Vars that have zero probability depending on the scenario. mpi-sspy would have a little less trouble than PySP with paramaters that change size between scenarios, but it does have some components that assume that the shape of the model can be determined by an arbitrary scenario (PySP makes use of a ReferenceModel so it strongly assumes that the shape does not change).

Using Python Ray With CPLEX Model Object

I am trying to parallelize an interaction with a Python object that is computationally expensive. I would like to use Ray to do this but so far my best efforts have failed.
The object is a CPLEX model object and I'm trying to add a set of constraints for a list of conditions.
Here's my setup:
import numpy as np
import docplex.mp.model as cpx
import ray
m = cpx.Model(name="mymodel")
def mask_array(arr, mask_val):
array_mask = np.argwhere(arr == mask_val)
arg_slice = [i[0] for i in array_mask]
return arg_slice
weeks = [1,3,7,8,9]
const = 1.5
r = rate = np.array(df['r'].tolist(), dtype=np.float)
x1 = m.integer_var_list(data_indices, lb=lower_bound, ub=upper_bound)
x2 = m.dot(x1, r)
#ray.remote
def add_model_constraint(m, x2, x2sum, const):
m.add_constraint(x2sum <= x2*const)
return m
x2sums = []
for w in weeks:
arg_slice = mask_array(x2, w)
x2sum = m.dot([x2[i] for i in arg_slice], r[arg_slice])
x2sums.append(x2sum)
#: this is the expensive part
for x2sum in x2sums:
add_model_constraint.remote(m, x2, x2sum, const)
In a nutshell, what I'm doing is creating a model object, some variables, and then looping over a set of weeks in order to build a constraint. I subset my variable, compute some dot products and apply the constraint. I would like to be able to create the constraint in parallel because it takes a while but so far my code just hangs and I'm not sure why.
I don't know if I should return the model object in my function because by default the m.add_constraint method modifies the object in place. But at the same time I know Ray returns references to the remote value so yea, not sure what's supposed to happen there.
Is this at all a valid use of ray? It it reasonable to expect to be able to modify a CPLEX object in this way (or any other arbitrary python object)?
I am new to Ray so I may be structuring this all wrong, or maybe this will never work for X, Y, and Z reason which would also be good to know.
The Model object is not designed to be used in parallel. You cannot add constraints from multiple threads at the same time. This will result in undefined behavior. You will have to at least a lock to make sure only thread at a time adds constraints.
Note that parallel model building may not be a good idea at all: the order of constraints will be more or less random. On the other hand, behavior of the solver may depend on the order of constraints (this is called performance variability). So you may have a hard time reproducing certain results/behavior.
I understand the primary issue was the performance of module building.
From the code you sent, I have two suggestions to address this:
post constraints in batches, that is store constraints in a list and add them once using Model.add_constraints(), this should be more efficient than adding them one at a time.
experiment with Model.dotf() (functional-style scalar product). It avoids building auxiliary lists, passing instead a function of the key , returning the coefficient.
This method is new in Docplex version 2.12.
For example, assuming a list of 3 variables:
abc = m.integer_var_list(3, name=["a", "b", "c"]) m.dotf(abc, lambda
k: k+2)
docplex.mp.LinearExpression(a+2b+3c)
Model.dotf() is usually faster than Model.dot()

Questions related to performance/efficiency in Python/Django

I have few questions which are bothering me since few days back. I'm a beginner Python/Django Programmer so I just want to clear few things before I dive into real time product development.(for Python 2.7.*)
1) saving value in a variable before using in a function
for x in some_list/tuple:
func(do_something(x))
for x in some_list/tuple:
y = do_something(x)
func(y)
Which one is faster or which one I SHOULD use.
2)Creating a new object of a model in Django
def myview(request):
u = User(username="xyz12",city="TA",name="xyz",...)
u.save()
def myview(request):
d = {'username':"xyz12",'city':"TA",'name':"xyz",...}
u = User(**d)
u.save()
3) creating dictionary
var = Dict(key1=val1,key2=val2,...)
var = {'key1':val1,'key2':val2,...}
4) I know .append() is faster than += but what if I want to append a list's elements to another
a = [1,2,3,],b=[4,5,6]
a += b
or
for i in b:
a.append(i)
This is a very interesting question, but I think you don't ask it for the good reason. The performances gained by such optimisations are negligible, especially if you're working with small number of elements.
On the other hand, what is really important is the ease of reading the code and it's clarity.
def myview(request):
d = {'username':"xyz12",'city':"TA",'name':"xyz",...}
u = User(**d)
u.save()
This code for example isn't "easy" to read and to understand at first sight. It requires to think about it before finding what is actually does. Unless you need the intermediary step, don't do it.
For the 4th point, I'd go for the first solution, way much clearer (and it avoids the function call overhead created by calling the same function in a loop). You could also use more specialised function for better performances such as reduce (see this answer : https://stackoverflow.com/a/11739570/3768672 and this thread as well : What is the fastest way to merge two lists in python?).
The 1st and 3rd points are usually up to what you prefer, as both are really similar and will probably be optimised when compiled to bytecode anyway.
If you really want to optimise more your code, I advise you to go check this out : https://wiki.python.org/moin/PythonSpeed/PerformanceTips
PS : Ultimately, you can still do your own tests. Write two functions doing the exact same things with the two different methods you want to test, measure the execution times of these methods and compare them (be careful, do the tests multiple time to reduce the uncertainties).

Class with too many parameters: better design strategy?

I am working with models of neurons. One class I am designing is a cell class which is a topological description of a neuron (several compartments connected together). It has many parameters but they are all relevant, for example:
number of axon segments, apical bifibrications, somatic length, somatic diameter, apical length, branching randomness, branching length and so on and so on... there are about 15 parameters in total!
I can set all these to some default value but my class looks crazy with several lines for parameters. This kind of thing must happen occasionally to other people too, is there some obvious better way to design this or am I doing the right thing?
UPDATE:
As some of you have asked I have attached my code for the class, as you can see this class has a huge number of parameters (>15) but they are all used and are necessary to define the topology of a cell. The problem essentially is that the physical object they create is very complex. I have attached an image representation of objects produced by this class. How would experienced programmers do this differently to avoid so many parameters in the definition?
class LayerV(__Cell):
def __init__(self,somatic_dendrites=10,oblique_dendrites=10,
somatic_bifibs=3,apical_bifibs=10,oblique_bifibs=3,
L_sigma=0.0,apical_branch_prob=1.0,
somatic_branch_prob=1.0,oblique_branch_prob=1.0,
soma_L=30,soma_d=25,axon_segs=5,myelin_L=100,
apical_sec1_L=200,oblique_sec1_L=40,somadend_sec1_L=60,
ldecf=0.98):
import random
import math
#make main the regions:
axon=Axon(n_axon_seg=axon_segs)
soma=Soma(diam=soma_d,length=soma_L)
main_apical_dendrite=DendriticTree(bifibs=
apical_bifibs,first_sec_L=apical_sec1_L,
L_sigma=L_sigma,L_decrease_factor=ldecf,
first_sec_d=9,branch_prob=apical_branch_prob)
#make the somatic denrites
somatic_dends=self.dendrite_list(num_dends=somatic_dendrites,
bifibs=somatic_bifibs,first_sec_L=somadend_sec1_L,
first_sec_d=1.5,L_sigma=L_sigma,
branch_prob=somatic_branch_prob,L_decrease_factor=ldecf)
#make oblique dendrites:
oblique_dends=self.dendrite_list(num_dends=oblique_dendrites,
bifibs=oblique_bifibs,first_sec_L=oblique_sec1_L,
first_sec_d=1.5,L_sigma=L_sigma,
branch_prob=oblique_branch_prob,L_decrease_factor=ldecf)
#connect axon to soma:
axon_section=axon.get_connecting_section()
self.soma_body=soma.body
soma.connect(axon_section,region_end=1)
#connect apical dendrite to soma:
apical_dendrite_firstsec=main_apical_dendrite.get_connecting_section()
soma.connect(apical_dendrite_firstsec,region_end=0)
#connect oblique dendrites to apical first section:
for dendrite in oblique_dends:
apical_location=math.exp(-5*random.random()) #for now connecting randomly but need to do this on some linspace
apsec=dendrite.get_connecting_section()
apsec.connect(apical_dendrite_firstsec,apical_location,0)
#connect dendrites to soma:
for dend in somatic_dends:
dendsec=dend.get_connecting_section()
soma.connect(dendsec,region_end=random.random()) #for now connecting randomly but need to do this on some linspace
#assign public sections
self.axon_iseg=axon.iseg
self.axon_hill=axon.hill
self.axon_nodes=axon.nodes
self.axon_myelin=axon.myelin
self.axon_sections=[axon.hill]+[axon.iseg]+axon.nodes+axon.myelin
self.soma_sections=[soma.body]
self.apical_dendrites=main_apical_dendrite.all_sections+self.seclist(oblique_dends)
self.somatic_dendrites=self.seclist(somatic_dends)
self.dendrites=self.apical_dendrites+self.somatic_dendrites
self.all_sections=self.axon_sections+[self.soma_sections]+self.dendrites
UPDATE: This approach may be suited in your specific case, but it definitely has its downsides, see is kwargs an antipattern?
Try this approach:
class Neuron(object):
def __init__(self, **kwargs):
prop_defaults = {
"num_axon_segments": 0,
"apical_bifibrications": "fancy default",
...
}
for (prop, default) in prop_defaults.iteritems():
setattr(self, prop, kwargs.get(prop, default))
You can then create a Neuron like this:
n = Neuron(apical_bifibrications="special value")
I'd say there is nothing wrong with this approach - if you need 15 parameters to model something, you need 15 parameters. And if there's no suitable default value, you have to pass in all 15 parameters when creating an object. Otherwise, you could just set the default and change it later via a setter or directly.
Another approach is to create subclasses for certain common kinds of neurons (in your example) and provide good defaults for certain values, or derive the values from other parameters.
Or you could encapsulate parts of the neuron in separate classes and reuse these parts for the actual neurons you model. I.e., you could write separate classes for modeling a synapse, an axon, the soma, etc.
You could perhaps use a Python"dict" object ?
http://docs.python.org/tutorial/datastructures.html#dictionaries
Having so many parameters suggests that the class is probably doing too many things.
I suggest that you want to divide your class into several classes, each of which take some of your parameters. That way each class is simpler and won't take so many parameters.
Without knowing more about your code, I can't say exactly how you should split it up.
Looks like you could cut down the number of arguments by constructing objects such as Axon, Soma and DendriticTree outside of the LayerV constructor, and passing those objects instead.
Some of the parameters are only used in constructing e.g. DendriticTree, others are used in other places as well, so the problem it's not as clear cut, but I would definitely try that approach.
could you supply some example code of what you are working on? It would help to get an idea of what you are doing and get help to you sooner.
If it's just the arguments you are passing to the class that make it long, you don't have to put it all in __init__. You can set the parameters after you create the class, or pass a dictionary/class full of the parameters as an argument.
class MyClass(object):
def __init__(self, **kwargs):
arg1 = None
arg2 = None
arg3 = None
for (key, value) in kwargs.iteritems():
if hasattr(self, key):
setattr(self, key, value)
if __name__ == "__main__":
a_class = MyClass()
a_class.arg1 = "A string"
a_class.arg2 = 105
a_class.arg3 = ["List", 100, 50.4]
b_class = MyClass(arg1 = "Astring", arg2 = 105, arg3 = ["List", 100, 50.4])
After looking over your code and realizing I have no idea how any of those parameters relate to each other (soley because of my lack of knowledge on the subject of neuroscience) I would point you to a very good book on object oriented design. Building Skills in Object Oriented Design by Steven F. Lott is an excellent read and I think would help you, and anyone else in laying out object oriented programs.
It is released under the Creative Commons License, so is free for you to use, here is a link of it in PDF format http://homepage.mac.com/s_lott/books/oodesign/build-python/latex/BuildingSkillsinOODesign.pdf
I think your problem boils down to the overall design of your classes. Sometimes, though very rarely, you need a whole lot of arguments to initialize, and most of the responses here have detailed other ways of initialization, but in a lot of cases you can break the class up into more easier to handle and less cumbersome classes.
This is similar to the other solutions that iterate through a default dictionary, but it uses a more compact notation:
class MyClass(object):
def __init__(self, **kwargs):
self.__dict__.update(dict(
arg1=123,
arg2=345,
arg3=678,
), **kwargs)
Can you give a more detailed use case ? Maybe a prototype pattern will work:
If there are some similarities in groups of objects, a prototype pattern might help.
Do you have a lot of cases where one population of neurons is just like another except different in some way ? ( i.e. rather than having a small number of discrete classes,
you have a large number of classes that slightly differ from each other. )
Python is a classed based language, but just as you can simulate class based
programming in a prototype based language like Javascript, you can simulate
prototypes by giving your class a CLONE method, that creates a new object and
populates its ivars from the parent. Write the clone method so that keyword parameters
passed to it override the "inherited" parameters, so you can call it with something
like:
new_neuron = old_neuron.clone( branching_length=n1, branching_randomness=r2 )
I have never had to deal with this situation, or this topic. Your description implies to me that you may find, as you develop the design, that there are a number of additional classes that will become relevant - compartment is the most obvious. If these do emerge as classes in their own right, it is probable that some of your parameters become parameters of these additional classes.
You could create a class for your parameters.
Instead passing a bunch of parameters, you pass one class.
In my opinion, in your case the easy solution is to pass higher order objects as parameter.
For example, in your __init__ you have a DendriticTree that uses several arguments from your main class LayerV:
main_apical_dendrite = DendriticTree(
bifibs=apical_bifibs,
first_sec_L=apical_sec1_L,
L_sigma=L_sigma,
L_decrease_factor=ldecf,
first_sec_d=9,
branch_prob=apical_branch_prob
)
Instead of passing these 6 arguments to your LayerV you would pass the DendriticTree object directly (thus saving 5 arguments).
You probably want to have this values accessible everywhere, therefore you will have to save this DendriticTree:
class LayerV(__Cell):
def __init__(self, main_apical_dendrite, ...):
self.main_apical_dendrite = main_apical_dendrite
If you want to have a default value too, you can have:
class LayerV(__Cell):
def __init__(self, main_apical_dendrite=None, ...):
self.main_apical_dendrite = main_apical_dendrite or DendriticTree()
This way you delegate what the default DendriticTree should be to the class dedicated to that matter instead of having this logic in the higher order class that LayerV.
Finally, when you need to access the apical_bifibs you used to pass to LayerV you just access it via self.main_apical_dendrite.bifibs.
In general, even if the class you are creating is not a clear composition of several classes, your goal is to find a logical way to split your parameters. Not only to make your code cleaner, but mostly to help people understand what these parameter will be used for. In the extreme cases where you can't split them, I think it's totally ok to have a class with that many parameters. If there is no clear way to split arguments, then you'll probably end up with something even less clear than a list of 15 arguments.
If you feel like creating a class to group parameters together is overkill, then you can simply use collections.namedtuple which can have default values as shown here.
Want to reiterate what a number of people have said. Theres nothing wrong with that amount of parameters. Especially when it comes to scientific computing/programming
Take for example, sklearn's KMeans++ clustering implementation which has 11 parameters you can init with. Like that, there are numerous examples and nothing wrong with them
I would say there is nothing wrong if make sure you need those params. If you really wanna make it more readable I would recommend following style.
I wouldn't say that a best practice or what, it just make others easily know what is necessary for this Object and what is option.
class LayerV(__Cell):
# author: {name, url} who made this info
def __init__(self, no_default_params, some_necessary_params):
self.necessary_param = some_necessary_params
self.no_default_param = no_default_params
self.something_else = "default"
self.some_option = "default"
def b_option(self, value):
self.some_option = value
return self
def b_else(self, value):
self.something_else = value
return self
I think the benefit for this style is:
You can easily know the params which is necessary in __init__ method
Unlike setter, you don't need two lines to construct the object if you need set an option value.
The disadvantage is, you created more methods in your class than before.
sample:
la = LayerV("no_default", "necessary").b_else("sample_else")
After all, if you have a lot of "necessary" and "no_default" params, always think about is this class(method) do too many things.
If your answer is not, just go ahead.

Categories