Counting used attributes at runtime - python

I'm working on a python project that requires me to compile certain attributes of some objects into a dataset. The code I'm currently using is something like the following:
class VectorBuilder(object):
SIZE = 5
def __init__(self, player, frame_data):
self.player = player
self.fd = frame_data
def build(self):
self._vector = []
self._add(self.player)
self._add(self.fd.getSomeData())
self._add(self.fd.getSomeOtherData())
char = self.fd.getCharacter()
self._add(char.getCharacterData())
self._add(char.getMoreCharacterData())
assert len(self._vector) == self.SIZE
return self._vector
def _add(self, element):
self._vector.append(element)
However, this code is slightly unclean because adding/removing attributes to/from the dataset also requires correctly adjusting the SIZE variable. The reason I even have the SIZE variable is that the size of the dataset needs to be known at runtime before the dataset itself is created.
I've thought of instead keeping a list of all the functions used to construct the dataset as strings (as in attributes = ['getPlayer', 'fd.getSomeData', ...]) and then defining the build function as something like:
def build(self):
self._vector = []
for att in attributes:
self._vector.append(getattr(self, att)())
return self._vector
This would let me access the size as simply len(attributes) and I only ever need to edit attributes, but I don't know how to make this approach work with the chained function calls, such as self.fd.getCharacter().getCharacterData().
Is there a cleaner way to accomplish what I'm trying to do?
EDIT:
Some additional information and clarification is necessary.
I was using __ due to some bad advice I read online (essentially saying I should use _ for module-private members and __ for class-private members). I've edited them to _ attributes now.
The getters are a part of the framework I'm using.
The vector is stored as a private class member so I don't have to pass it around the construction methods, which are in actuality more numerous than the simple _add, doing some other stuff like normalisation and bool->int conversion on the elements before adding them to the vector.
SIZE as it currently stands, is a true constant. It is only ever given a value in the first line of VectorBuilder and never changed at runtime. I realise that I did not clarify this properly in the main post, but new attributes never get added at runtime. The adjustment I was talking about would take place at programming time. For example, if I wanted to add a new attribute, I would need to add it in the build function, e.g.:
self._add(self.fd.getCharacter().getAction().getActionData().getSpeed())
, as well as change the SIZE definition to SIZE = 6.
The attributes are compiled into what is currently a simple python list (but will probably be replaced with a numpy array), then passed into a neural network as an input vector. However, the neural network itself needs to be built first, and this happens before any data is made available (i.e. before any input vectors are created). In order to be built successfully, the neural network needs to know the size of the input vectors it will be receiving, though. This is why SIZE is necessary and also the reason for the assert statement - to ascertain that the vectors I'm passing to the network are in fact the size I claimed I would be passing to it.
I'm aware the code is unpythonic, that is why I'm here - the code works, it's just ugly.

Instead of providing the strings of the attributes as a list you would like to create the input arguments from, why don't you initialize the build function with a list containing all the values returned by your getter functions?
Your SIZE variable would then still be the length of the dynamic argument list provided in build(self,*args) for example.

Related

Best way to initialize variable in a module?

Let's say I need to write incoming data into a dataset on the cloud.
When, where and if I will need the dataset in my code, depends on the data coming in.
I only want to get a reference to the dataset once.
What is the best way to achieve this?
Initialize as global variable at start and access through global variable
if __name__="__main__":
dataset = #get dataset from internet
This seems like the simplest way, but initializes the variable even if it is never needed.
Get reference first time the dataset is needed, save in global variable, and access with get_dataset() method
dataset = None
def get_dataset():
global dataset
if dataset is none
dataset = #get dataset from internet
return dataset
Get reference first time the dataset is needed, save as function attribute, and access with get_dataset() method
def get_dataset():
if not hasattr(get_dataset, 'dataset'):
get_dataset.dataset = #get dataset from internet
return get_dataset.dataset
Any other way
The typical way to do what you want is to wrap your service calling for the data into a class:
class MyService():
dataset = None
def get_data(self):
if self.dataset = None:
self.dataset = get_my_data()
return self.dataset
Then you instantiate it once in your main and use it wherever you need it.
if __name__="__main__":
data_service = MyService()
data = data_service.get_data()
# or pass the service to whoever needs it
my_function_that_uses_data(data_service)
The dataset variable is internal but accessible through a discoverable function. You could also use a property on the instance of the class.
Also, using objects and classes makes it much more clear in a large project, as the functionality should be self-explanatory from the classname and methods.
Note that you can easily make this a generic service too, passing it the way to fetch data in the initialization (like a url?), so it can be re-used with different endpoints.
One caveat to avoid is to instantiate the same class multiple times, in your submodules, as opposed to the main. If you did, the data would be fetched and stored for each instance. On the other hand, you can pass the instance of the class to a sub-module and only fetch the data when it's needed (i.e., it may never be fetched if your submodule never needs it), while with all your options, the dataset needs to be fetched first to be passed somewhere else.
Note about your proposed options:
Initializing in the if __name__ == '__main__' section:
It is not initialized globally if you were to call the module as a module (it would only be initialized when calling the module from shell).
You need to fetch the data to pass it somewhere else, even if you don't need it in main.
Set a global within a function.
The use of global is generally discouraged, as it is in any programming language. Modifying variables out of scope is a recipe for encountering odd behaviors. It also tends to make the code harder to test if you rely on this global which is only set in a specific workflow.
Attribute on a function
This one is a bit of an eye-sore: it would certainly work, and the functionality is very similar to the Class pattern I propose, but you have to admit attributes on functions is not very pythonic. The advantage of the Class is that you can initialize it in many ways, can subclass it etc, and yet not fetch the data until you need it. Using a straight function is 'simpler' but much more limited.
You can also use the lru_cache decorator from the functools module for achieving the goal of running an expensive operation only once.
As long as the parameters are the same, calling the function again and again returns the same object.
https://docs.python.org/3/library/functools.html#functools.lru_cache
#lru_cache
def fun(input1, input2):
... # expensive operation
return result
Similar to MrE's answer, it is best to encapsulate the data with a wrapper.
However, I would recommend you to use a python closure python closure instead of a class.
A class should be used to encapsulate data and relevant functions that are closely related to the data. A class should be something that you will instantiate objects of and objects will retain individuality. You can read more about this here
You can use closures in the following way
def get_dataset_wrapper():
dataset = None
def get_dataset():
nonlocal dataset
if dataset is none
dataset = #get dataset from internet
return dataset
return get_dataset
You can use this in the following way
dataset = get_dataset_wrapper()()
If the ()() syntax bothers you, you can do this:
def wrapper():
return get_dataset_wrapper()()

python expression for namedtuple size

I came across the following expression in python. I believe the expression itself is legit and how it constructs actually makes sense. But still, from strongly typed background, I am wondering if people typically do that in python ... i.e. using the same class expressing two quite different concept (data vs data size meta data)
class AttentionDecoderOutput(
namedtuple("DecoderOutput", [
"logits", "predicted_ids", "cell_output", "attention_scores",
"attention_context"
])):
"""Augmented decoder output that also includes the attention scores.
"""
pass
AttentionDecoderOutput defines a data structure. The output_size below defines the size of each element in the namedtuple.
def output_size(self):
return AttentionDecoderOutput(
logits=self.vocab_size,
predicted_ids=tf.TensorShape([]),
cell_output=self.cell.output_size,
attention_scores=tf.shape(self.attention_values)[1:-1],
attention_context=self.attention_values.get_shape()[-1])

Semantic Type Safety in Python

In my recent project I have the problem, that some values are often misinterpreted. For instance I calculate a wave as a sum of two waves (for which I need two amplitudes and two phase shifts), and then sample it at 4 points. I pass these tuples of four values to different functions, but sometimes I made the mistake to pass wave parameters instead of sample points.
These errors are hard to find, because all the calculations work without any error, but the values are totally meaningless in this context and so the results are just wrong.
What I want now is some kind of semantic type. I want to state that the one function returns sample points and the other function awaits sample points, and that I can do nothing that would conflict this declarations without immediately getting an error.
Is there any way to do this in python?
I would recommend implementing specific data types to be able to distinguish between different kind of information with the same structure.
You can simply subclass list for example and then do some type checking at runtime within your functions:
class WaveParameter(list):
pass
class Point(list):
pass
# you can use them just like lists
point = Point([1, 2, 3, 4])
wp = WaveParameter([5, 6])
# of course all methods from list are inherited
wp.append(7)
wp.append(8)
# let's check them
print(point)
print(wp)
# type checking examples
print isinstance(point, Point)
print isinstance(wp, Point)
print isinstance(point, WaveParameter)
print isinstance(wp, WaveParameter)
So you can include this kind of type checking in your functions, to make sure the correct kind of data was passed to it:
def example_function_with_waveparameter(data):
if not isinstance(data, WaveParameter):
log.error("received wrong parameter type (%s instead WaveParameter)" %
type(data))
# and then do the stuff
or simply assert:
def example_function_with_waveparameter(data):
assert(isinstance(data, WaveParameter))
Pyhon's notion of a "semantic type" is called a class, but as mentioned, Python is dynamically typed so even using custom classes instead of tuples you won't get any compile-time error - at best you'll get runtime errors if your classes are designed in such a way that trying to use one instead of the other will fail.
Now classes are not just about data, they are about behaviour too, so if you have functions that do waveform-specific computations these functions would probably become methods of the Waveform class, and idem for the Point part, and this might be enough to avoid logical errors like passing a "waveform" tuple to a function expecting a "point" tuple.
To make a long story short: if you want a statically typed functional language, Python is not the right tool (Haskell might be a better choice). If you really want / have to use Python, try using classes and methods instead of tuples and functions, it still won't detect type errors at compile-time but chances are you'll have less type errors AND that these type errors will be detected at runtime instead of producing wrong results.

Class with too many parameters: better design strategy?

I am working with models of neurons. One class I am designing is a cell class which is a topological description of a neuron (several compartments connected together). It has many parameters but they are all relevant, for example:
number of axon segments, apical bifibrications, somatic length, somatic diameter, apical length, branching randomness, branching length and so on and so on... there are about 15 parameters in total!
I can set all these to some default value but my class looks crazy with several lines for parameters. This kind of thing must happen occasionally to other people too, is there some obvious better way to design this or am I doing the right thing?
UPDATE:
As some of you have asked I have attached my code for the class, as you can see this class has a huge number of parameters (>15) but they are all used and are necessary to define the topology of a cell. The problem essentially is that the physical object they create is very complex. I have attached an image representation of objects produced by this class. How would experienced programmers do this differently to avoid so many parameters in the definition?
class LayerV(__Cell):
def __init__(self,somatic_dendrites=10,oblique_dendrites=10,
somatic_bifibs=3,apical_bifibs=10,oblique_bifibs=3,
L_sigma=0.0,apical_branch_prob=1.0,
somatic_branch_prob=1.0,oblique_branch_prob=1.0,
soma_L=30,soma_d=25,axon_segs=5,myelin_L=100,
apical_sec1_L=200,oblique_sec1_L=40,somadend_sec1_L=60,
ldecf=0.98):
import random
import math
#make main the regions:
axon=Axon(n_axon_seg=axon_segs)
soma=Soma(diam=soma_d,length=soma_L)
main_apical_dendrite=DendriticTree(bifibs=
apical_bifibs,first_sec_L=apical_sec1_L,
L_sigma=L_sigma,L_decrease_factor=ldecf,
first_sec_d=9,branch_prob=apical_branch_prob)
#make the somatic denrites
somatic_dends=self.dendrite_list(num_dends=somatic_dendrites,
bifibs=somatic_bifibs,first_sec_L=somadend_sec1_L,
first_sec_d=1.5,L_sigma=L_sigma,
branch_prob=somatic_branch_prob,L_decrease_factor=ldecf)
#make oblique dendrites:
oblique_dends=self.dendrite_list(num_dends=oblique_dendrites,
bifibs=oblique_bifibs,first_sec_L=oblique_sec1_L,
first_sec_d=1.5,L_sigma=L_sigma,
branch_prob=oblique_branch_prob,L_decrease_factor=ldecf)
#connect axon to soma:
axon_section=axon.get_connecting_section()
self.soma_body=soma.body
soma.connect(axon_section,region_end=1)
#connect apical dendrite to soma:
apical_dendrite_firstsec=main_apical_dendrite.get_connecting_section()
soma.connect(apical_dendrite_firstsec,region_end=0)
#connect oblique dendrites to apical first section:
for dendrite in oblique_dends:
apical_location=math.exp(-5*random.random()) #for now connecting randomly but need to do this on some linspace
apsec=dendrite.get_connecting_section()
apsec.connect(apical_dendrite_firstsec,apical_location,0)
#connect dendrites to soma:
for dend in somatic_dends:
dendsec=dend.get_connecting_section()
soma.connect(dendsec,region_end=random.random()) #for now connecting randomly but need to do this on some linspace
#assign public sections
self.axon_iseg=axon.iseg
self.axon_hill=axon.hill
self.axon_nodes=axon.nodes
self.axon_myelin=axon.myelin
self.axon_sections=[axon.hill]+[axon.iseg]+axon.nodes+axon.myelin
self.soma_sections=[soma.body]
self.apical_dendrites=main_apical_dendrite.all_sections+self.seclist(oblique_dends)
self.somatic_dendrites=self.seclist(somatic_dends)
self.dendrites=self.apical_dendrites+self.somatic_dendrites
self.all_sections=self.axon_sections+[self.soma_sections]+self.dendrites
UPDATE: This approach may be suited in your specific case, but it definitely has its downsides, see is kwargs an antipattern?
Try this approach:
class Neuron(object):
def __init__(self, **kwargs):
prop_defaults = {
"num_axon_segments": 0,
"apical_bifibrications": "fancy default",
...
}
for (prop, default) in prop_defaults.iteritems():
setattr(self, prop, kwargs.get(prop, default))
You can then create a Neuron like this:
n = Neuron(apical_bifibrications="special value")
I'd say there is nothing wrong with this approach - if you need 15 parameters to model something, you need 15 parameters. And if there's no suitable default value, you have to pass in all 15 parameters when creating an object. Otherwise, you could just set the default and change it later via a setter or directly.
Another approach is to create subclasses for certain common kinds of neurons (in your example) and provide good defaults for certain values, or derive the values from other parameters.
Or you could encapsulate parts of the neuron in separate classes and reuse these parts for the actual neurons you model. I.e., you could write separate classes for modeling a synapse, an axon, the soma, etc.
You could perhaps use a Python"dict" object ?
http://docs.python.org/tutorial/datastructures.html#dictionaries
Having so many parameters suggests that the class is probably doing too many things.
I suggest that you want to divide your class into several classes, each of which take some of your parameters. That way each class is simpler and won't take so many parameters.
Without knowing more about your code, I can't say exactly how you should split it up.
Looks like you could cut down the number of arguments by constructing objects such as Axon, Soma and DendriticTree outside of the LayerV constructor, and passing those objects instead.
Some of the parameters are only used in constructing e.g. DendriticTree, others are used in other places as well, so the problem it's not as clear cut, but I would definitely try that approach.
could you supply some example code of what you are working on? It would help to get an idea of what you are doing and get help to you sooner.
If it's just the arguments you are passing to the class that make it long, you don't have to put it all in __init__. You can set the parameters after you create the class, or pass a dictionary/class full of the parameters as an argument.
class MyClass(object):
def __init__(self, **kwargs):
arg1 = None
arg2 = None
arg3 = None
for (key, value) in kwargs.iteritems():
if hasattr(self, key):
setattr(self, key, value)
if __name__ == "__main__":
a_class = MyClass()
a_class.arg1 = "A string"
a_class.arg2 = 105
a_class.arg3 = ["List", 100, 50.4]
b_class = MyClass(arg1 = "Astring", arg2 = 105, arg3 = ["List", 100, 50.4])
After looking over your code and realizing I have no idea how any of those parameters relate to each other (soley because of my lack of knowledge on the subject of neuroscience) I would point you to a very good book on object oriented design. Building Skills in Object Oriented Design by Steven F. Lott is an excellent read and I think would help you, and anyone else in laying out object oriented programs.
It is released under the Creative Commons License, so is free for you to use, here is a link of it in PDF format http://homepage.mac.com/s_lott/books/oodesign/build-python/latex/BuildingSkillsinOODesign.pdf
I think your problem boils down to the overall design of your classes. Sometimes, though very rarely, you need a whole lot of arguments to initialize, and most of the responses here have detailed other ways of initialization, but in a lot of cases you can break the class up into more easier to handle and less cumbersome classes.
This is similar to the other solutions that iterate through a default dictionary, but it uses a more compact notation:
class MyClass(object):
def __init__(self, **kwargs):
self.__dict__.update(dict(
arg1=123,
arg2=345,
arg3=678,
), **kwargs)
Can you give a more detailed use case ? Maybe a prototype pattern will work:
If there are some similarities in groups of objects, a prototype pattern might help.
Do you have a lot of cases where one population of neurons is just like another except different in some way ? ( i.e. rather than having a small number of discrete classes,
you have a large number of classes that slightly differ from each other. )
Python is a classed based language, but just as you can simulate class based
programming in a prototype based language like Javascript, you can simulate
prototypes by giving your class a CLONE method, that creates a new object and
populates its ivars from the parent. Write the clone method so that keyword parameters
passed to it override the "inherited" parameters, so you can call it with something
like:
new_neuron = old_neuron.clone( branching_length=n1, branching_randomness=r2 )
I have never had to deal with this situation, or this topic. Your description implies to me that you may find, as you develop the design, that there are a number of additional classes that will become relevant - compartment is the most obvious. If these do emerge as classes in their own right, it is probable that some of your parameters become parameters of these additional classes.
You could create a class for your parameters.
Instead passing a bunch of parameters, you pass one class.
In my opinion, in your case the easy solution is to pass higher order objects as parameter.
For example, in your __init__ you have a DendriticTree that uses several arguments from your main class LayerV:
main_apical_dendrite = DendriticTree(
bifibs=apical_bifibs,
first_sec_L=apical_sec1_L,
L_sigma=L_sigma,
L_decrease_factor=ldecf,
first_sec_d=9,
branch_prob=apical_branch_prob
)
Instead of passing these 6 arguments to your LayerV you would pass the DendriticTree object directly (thus saving 5 arguments).
You probably want to have this values accessible everywhere, therefore you will have to save this DendriticTree:
class LayerV(__Cell):
def __init__(self, main_apical_dendrite, ...):
self.main_apical_dendrite = main_apical_dendrite
If you want to have a default value too, you can have:
class LayerV(__Cell):
def __init__(self, main_apical_dendrite=None, ...):
self.main_apical_dendrite = main_apical_dendrite or DendriticTree()
This way you delegate what the default DendriticTree should be to the class dedicated to that matter instead of having this logic in the higher order class that LayerV.
Finally, when you need to access the apical_bifibs you used to pass to LayerV you just access it via self.main_apical_dendrite.bifibs.
In general, even if the class you are creating is not a clear composition of several classes, your goal is to find a logical way to split your parameters. Not only to make your code cleaner, but mostly to help people understand what these parameter will be used for. In the extreme cases where you can't split them, I think it's totally ok to have a class with that many parameters. If there is no clear way to split arguments, then you'll probably end up with something even less clear than a list of 15 arguments.
If you feel like creating a class to group parameters together is overkill, then you can simply use collections.namedtuple which can have default values as shown here.
Want to reiterate what a number of people have said. Theres nothing wrong with that amount of parameters. Especially when it comes to scientific computing/programming
Take for example, sklearn's KMeans++ clustering implementation which has 11 parameters you can init with. Like that, there are numerous examples and nothing wrong with them
I would say there is nothing wrong if make sure you need those params. If you really wanna make it more readable I would recommend following style.
I wouldn't say that a best practice or what, it just make others easily know what is necessary for this Object and what is option.
class LayerV(__Cell):
# author: {name, url} who made this info
def __init__(self, no_default_params, some_necessary_params):
self.necessary_param = some_necessary_params
self.no_default_param = no_default_params
self.something_else = "default"
self.some_option = "default"
def b_option(self, value):
self.some_option = value
return self
def b_else(self, value):
self.something_else = value
return self
I think the benefit for this style is:
You can easily know the params which is necessary in __init__ method
Unlike setter, you don't need two lines to construct the object if you need set an option value.
The disadvantage is, you created more methods in your class than before.
sample:
la = LayerV("no_default", "necessary").b_else("sample_else")
After all, if you have a lot of "necessary" and "no_default" params, always think about is this class(method) do too many things.
If your answer is not, just go ahead.

Have well-defined, narrowly-focused classes ... now how do I get anything done in my program?

I'm coding a poker hand evaluator as my first programming project. I've made it through three classes, each of which accomplishes its narrowly-defined task very well:
HandRange = a string-like object (e.g. "AA"). getHands() returns a list of tuples for each specific hand within the string:
[(Ad,Ac),(Ad,Ah),(Ad,As),(Ac,Ah),(Ac,As),(Ah,As)]
Translation = a dictionary that maps the return list from getHands to values that are useful for a given evaluator (yes, this can probably be refactored into another class).
{'As':52, 'Ad':51, ...}
Evaluator = takes a list from HandRange (as translated by Translator), enumerates all possible hand matchups and provides win % for each.
My question: what should my "domain" class for using all these classes look like, given that I may want to connect to it via either a shell UI or a GUI? Right now, it looks like an assembly line process:
user_input = HandRange()
x = Translation.translateList(user_input)
y = Evaluator.getEquities(x)
This smells funny in that it feels like it's procedural when I ought to be using OO.
In a more general way: if I've spent so much time ensuring that my classes are well defined, narrowly focused, orthogonal, whatever ... how do I actually manage work flow in my program when I need to use all of them in a row?
Thanks,
Mike
Don't make a fetish of object orientation -- Python supports multiple paradigms, after all! Think of your user-defined types, AKA classes, as building blocks that gradually give you a "language" that's closer to your domain rather than to general purpose language / library primitives.
At some point you'll want to code "verbs" (actions) that use your building blocks to perform something (under command from whatever interface you'll supply -- command line, RPC, web, GUI, ...) -- and those may be module-level functions as well as methods within some encompassing class. You'll surely want a class if you need multiple instances, and most likely also if the actions involve updating "state" (instance variables of a class being much nicer than globals) or if inheritance and/or polomorphism come into play; but, there is no a priori reason to prefer classes to functions otherwise.
If you find yourself writing static methods, yearning for a singleton (or Borg) design pattern, writing a class with no state (just methods) -- these are all "code smells" that should prompt you to check whether you really need a class for that subset of your code, or rather whether you may be overcomplicating things and should use a module with functions for that part of your code. (Sometimes after due consideration you'll unearth some different reason for preferring a class, and that's allright too, but the point is, don't just pick a class over a module w/functions "by reflex", without critically thinking about it!).
You could create a Poker class that ties these all together and intialize all of that stuff in the __init__() method:
class Poker(object):
def __init__(self, user_input=HandRange()):
self.user_input = user_input
self.translation = Translation.translateList(user_input)
self.evaluator = Evaluator.getEquities(x)
# and so on...
p = Poker()
# etc, etc...

Categories