In Python, I generate quite often a pickle file to conserve the work I have done during programming.
Is there any possibility to store something like a docstring in the pickle that explains how this pickle was generated and what it's meaning is.
Because you can combine all kinds of items into dictionaries, tuples, and lists before pickling them, I would say the most straightforward solution would be to use a dictionary that has a docstring key.
pickle_dict = {'objs': [some, stuff, inhere],
'docstring': 'explanation of those objects'}
Of course, depending on what you are pickling, you may want key-value pairs for each object instead of a list of objects.
When you open the pickle back up, you can just read the docstring to remember how this pickle came to be.
As an alternative solution, I often just need to save one or two integer values about the pickle. In this case, I choose to save in the title of the pickle file. Depending on what you are doing, this could be preferred so you can read the "docstring" without having to unpickle it.
DataFrames and lists don't typically have docstrings because they are data. The docstring specification says:
A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. Such a docstring becomes the __doc__ special attribute of that object.
You can create any of these to make a docstring associated with the process that uses your data. The main class of your module for example.
class MyClass:
"""My docstring"""
def __init__(self, df):
self.df = df # Your dataframe
...
Something like this seems like it is closest to what you were asking within the conventions of the language.
Let's say I need to write incoming data into a dataset on the cloud.
When, where and if I will need the dataset in my code, depends on the data coming in.
I only want to get a reference to the dataset once.
What is the best way to achieve this?
Initialize as global variable at start and access through global variable
if __name__="__main__":
dataset = #get dataset from internet
This seems like the simplest way, but initializes the variable even if it is never needed.
Get reference first time the dataset is needed, save in global variable, and access with get_dataset() method
dataset = None
def get_dataset():
global dataset
if dataset is none
dataset = #get dataset from internet
return dataset
Get reference first time the dataset is needed, save as function attribute, and access with get_dataset() method
def get_dataset():
if not hasattr(get_dataset, 'dataset'):
get_dataset.dataset = #get dataset from internet
return get_dataset.dataset
Any other way
The typical way to do what you want is to wrap your service calling for the data into a class:
class MyService():
dataset = None
def get_data(self):
if self.dataset = None:
self.dataset = get_my_data()
return self.dataset
Then you instantiate it once in your main and use it wherever you need it.
if __name__="__main__":
data_service = MyService()
data = data_service.get_data()
# or pass the service to whoever needs it
my_function_that_uses_data(data_service)
The dataset variable is internal but accessible through a discoverable function. You could also use a property on the instance of the class.
Also, using objects and classes makes it much more clear in a large project, as the functionality should be self-explanatory from the classname and methods.
Note that you can easily make this a generic service too, passing it the way to fetch data in the initialization (like a url?), so it can be re-used with different endpoints.
One caveat to avoid is to instantiate the same class multiple times, in your submodules, as opposed to the main. If you did, the data would be fetched and stored for each instance. On the other hand, you can pass the instance of the class to a sub-module and only fetch the data when it's needed (i.e., it may never be fetched if your submodule never needs it), while with all your options, the dataset needs to be fetched first to be passed somewhere else.
Note about your proposed options:
Initializing in the if __name__ == '__main__' section:
It is not initialized globally if you were to call the module as a module (it would only be initialized when calling the module from shell).
You need to fetch the data to pass it somewhere else, even if you don't need it in main.
Set a global within a function.
The use of global is generally discouraged, as it is in any programming language. Modifying variables out of scope is a recipe for encountering odd behaviors. It also tends to make the code harder to test if you rely on this global which is only set in a specific workflow.
Attribute on a function
This one is a bit of an eye-sore: it would certainly work, and the functionality is very similar to the Class pattern I propose, but you have to admit attributes on functions is not very pythonic. The advantage of the Class is that you can initialize it in many ways, can subclass it etc, and yet not fetch the data until you need it. Using a straight function is 'simpler' but much more limited.
You can also use the lru_cache decorator from the functools module for achieving the goal of running an expensive operation only once.
As long as the parameters are the same, calling the function again and again returns the same object.
https://docs.python.org/3/library/functools.html#functools.lru_cache
#lru_cache
def fun(input1, input2):
... # expensive operation
return result
Similar to MrE's answer, it is best to encapsulate the data with a wrapper.
However, I would recommend you to use a python closure python closure instead of a class.
A class should be used to encapsulate data and relevant functions that are closely related to the data. A class should be something that you will instantiate objects of and objects will retain individuality. You can read more about this here
You can use closures in the following way
def get_dataset_wrapper():
dataset = None
def get_dataset():
nonlocal dataset
if dataset is none
dataset = #get dataset from internet
return dataset
return get_dataset
You can use this in the following way
dataset = get_dataset_wrapper()()
If the ()() syntax bothers you, you can do this:
def wrapper():
return get_dataset_wrapper()()
I have a bunch of File objects, and a bunch of Folder objects. Each folder has a list of files. Now, sometimes I'd like to lookup which folder a certain file is in. I don't want to traverse over all folders and files, so I create a lookup dict file -> folder.
folder = Folder()
myfile = File()
folder_lookup = {}
# This is pseudocode, I don't actually reach into the Folder
# object, but have an appropriate method
folder.files.append(myfile)
folder_lookup[myfile] = folder
Now, the problem is, the files are mutable objects. My application is built around the fact. I change properites on them, and the GUI is notified and updated accordingly. Of course you can't put mutable objects in dicts. So what I tried first is to generate a hash based on the current content, basically:
def __hash__(self):
return hash((self.title, ...))
This didn't work of course, because when the object's contents changed its hash (and thus its identity) changed, and everything got messed up. What I need is an object that keeps its identity, although its contents change. I tried various things, like making __hash__ return id(self), overriding __eq__, and so on, but never found a satisfying solution. One complication is that the whole construction should be pickelable, so that means I'd have to store id on creation, since it could change when pickling, I guess.
So I basically want to use the identity of an object (not its state) to quickly look up data related to the object. I've actually found a really nice pythonic workaround for my problem, which I might post shortly, but I'd like to see if someone else comes up with a solution.
I felt dirty writing this. Just put folder as an attribute on the file.
class dodgy(list):
def __init__(self, title):
self.title = title
super(list, self).__init__()
self.store = type("store", (object,), {"blanket" : self})
def __hash__(self):
return hash(self.store)
innocent_d = {}
dodge_1 = dodgy("dodge_1")
dodge_2 = dodgy("dodge_2")
innocent_d[dodge_1] = dodge_1.title
innocent_d[dodge_2] = dodge_2.title
print innocent_d[dodge_1]
dodge_1.extend(range(5))
dodge_1.title = "oh no"
print innocent_d[dodge_1]
OK, everybody noticed the extremely obvious workaround (that took my some days to come up with), just put an attribute on File that tells you which folder it is in. (Don't worry, that is also what I did.)
But, it turns out that I was working under wrong assumptions. You are not supposed to use mutable objects as keys, but that doesn't mean you can't (diabolic laughter)! The default implementation of __hash__ returns a unique value, probably derived from the object's address, that remains constant in time. And the default __eq__ follows the same notion of object identity.
So you can put mutable objects in a dict, and they work as expected (if you expect equality based on instance, not on value).
See also: I'm able to use a mutable object as a dictionary key in python. Is this not disallowed?
I was having problems because I was pickling/unpickling the objects, which of course changed the hashes. One could generate a unique ID in the constructor, and use that for equality and deriving a hash to overcome this.
(For the curious, as to why such a "lookup based on instance identity" dict might be neccessary: I've been experimenting with a kind of "object database". You have pure python objects, put them in lists/containers, and can define indexes on attributes for faster lookup, complex queries and so on. For foreign keys (1:n relationships) I can just use containers, but for the backlink I have to come up with something clever if I don't want to modify the objects on the n side.)
Disclaimer: this is perhaps a quite subjective question with no 'right' answer but I'd appreciate any feedback on best-practices and program design. So here goes:
I am writing a library where text files are read into Text objects. Now these might be initialized with a list of file-names or directly with a list of Sentence objects. I am wondering what the best / most Pythonic way to do this might be because, if I understand correctly, Python doesn't directly support method overloading.
One example I found in Scikit-Learn's feature extraction module simply passes the type of the input as an argument while initializing the object. I assume that once this parameter is set it's just a matter of handling the different cases internally:
if input == 'filename':
# glob and read files
elif input == 'content':
# do something else
While this is easy to implement, it doesn't look like a very elegant solution. So I am wondering if there is a better way to handle multiple types of inputs to initialize a class that I am overlooking.
One way is to just create classmethods with different names for the different ways of instantiating the object:
class Text(object):
def __init__(self, data):
# handle data in whatever "basic" form you need
#classmethod
def fromFiles(cls, files):
# process list of filenames into the form that `__init__` needs
return cls(processed_data)
#classmethod
def fromSentences(cls, sentences):
# process list of Sentence objects into the form that `__init__` needs
return cls(processed_data)
This way you just create one "real" or "canonical" initialization method that accepts whatever "lowest common denominator" format you want. The specialized fromXXX methods can preprocess different types of input to convert them into the form they need to be in to pass to that canonical instantiation. The idea is that you call Text.fromFiles(...) to make a Text from filenames, or Text.fromSentences(...) to make a Text from sentence objects.
It can also be acceptable to do some simple type-checking if you just want to accept one of a few enumerable kinds of input. For instance, it's not uncommon for a class to accept either a filename (as a string) or a file object. In that case you'd do:
def __init__(self, file):
if isinstance(file, basestring):
# If a string filename was passed in, open the file before proceeding
file = open(file)
# Now you can handle file as a file object
This becomes unwieldy if you have many different types of input to handle, but if it's something relatively contained like this (e.g., an object or the string "name" that can be used to get that object), it can be simpler than the first method I showed.
You can use duck typing. First you consider as if the arguments are of the type X, if they raise an exception, then you assume they are of type Y, etc:
class Text(object):
def __init__(self, *init_vals):
try:
fileobjs = [open(fname) for fname in init_vals]
except TypeError:
# Then we consider them as file objects.
fileobjs = init_vals
try:
senteces = [parse_sentences(fobj) for fobj in fileobjs]
except TypeError:
# Then init_vals are Sentence objects.
senteces = fileobjs
Note that the absence of type checking means that the method actually accepts any type that implement one of the interfaces you actually use (e.g. file-like object, Sentence-like object etc.).
This method becomes quite heavy if you want to support a lot of different types, but I'd consider that bad code design. Accepting more than 2,3,4 types as initializers will probably confuse any programmer that uses your class, since he will always have to think "wait, did X also accept Y, or was it Z that accepted Y...".
It's probably better design the constructor to only accept 2,3 different interfaces and provide the user with some function/class that allows him to convert some often used types to these interfaces.
I am working with models of neurons. One class I am designing is a cell class which is a topological description of a neuron (several compartments connected together). It has many parameters but they are all relevant, for example:
number of axon segments, apical bifibrications, somatic length, somatic diameter, apical length, branching randomness, branching length and so on and so on... there are about 15 parameters in total!
I can set all these to some default value but my class looks crazy with several lines for parameters. This kind of thing must happen occasionally to other people too, is there some obvious better way to design this or am I doing the right thing?
UPDATE:
As some of you have asked I have attached my code for the class, as you can see this class has a huge number of parameters (>15) but they are all used and are necessary to define the topology of a cell. The problem essentially is that the physical object they create is very complex. I have attached an image representation of objects produced by this class. How would experienced programmers do this differently to avoid so many parameters in the definition?
class LayerV(__Cell):
def __init__(self,somatic_dendrites=10,oblique_dendrites=10,
somatic_bifibs=3,apical_bifibs=10,oblique_bifibs=3,
L_sigma=0.0,apical_branch_prob=1.0,
somatic_branch_prob=1.0,oblique_branch_prob=1.0,
soma_L=30,soma_d=25,axon_segs=5,myelin_L=100,
apical_sec1_L=200,oblique_sec1_L=40,somadend_sec1_L=60,
ldecf=0.98):
import random
import math
#make main the regions:
axon=Axon(n_axon_seg=axon_segs)
soma=Soma(diam=soma_d,length=soma_L)
main_apical_dendrite=DendriticTree(bifibs=
apical_bifibs,first_sec_L=apical_sec1_L,
L_sigma=L_sigma,L_decrease_factor=ldecf,
first_sec_d=9,branch_prob=apical_branch_prob)
#make the somatic denrites
somatic_dends=self.dendrite_list(num_dends=somatic_dendrites,
bifibs=somatic_bifibs,first_sec_L=somadend_sec1_L,
first_sec_d=1.5,L_sigma=L_sigma,
branch_prob=somatic_branch_prob,L_decrease_factor=ldecf)
#make oblique dendrites:
oblique_dends=self.dendrite_list(num_dends=oblique_dendrites,
bifibs=oblique_bifibs,first_sec_L=oblique_sec1_L,
first_sec_d=1.5,L_sigma=L_sigma,
branch_prob=oblique_branch_prob,L_decrease_factor=ldecf)
#connect axon to soma:
axon_section=axon.get_connecting_section()
self.soma_body=soma.body
soma.connect(axon_section,region_end=1)
#connect apical dendrite to soma:
apical_dendrite_firstsec=main_apical_dendrite.get_connecting_section()
soma.connect(apical_dendrite_firstsec,region_end=0)
#connect oblique dendrites to apical first section:
for dendrite in oblique_dends:
apical_location=math.exp(-5*random.random()) #for now connecting randomly but need to do this on some linspace
apsec=dendrite.get_connecting_section()
apsec.connect(apical_dendrite_firstsec,apical_location,0)
#connect dendrites to soma:
for dend in somatic_dends:
dendsec=dend.get_connecting_section()
soma.connect(dendsec,region_end=random.random()) #for now connecting randomly but need to do this on some linspace
#assign public sections
self.axon_iseg=axon.iseg
self.axon_hill=axon.hill
self.axon_nodes=axon.nodes
self.axon_myelin=axon.myelin
self.axon_sections=[axon.hill]+[axon.iseg]+axon.nodes+axon.myelin
self.soma_sections=[soma.body]
self.apical_dendrites=main_apical_dendrite.all_sections+self.seclist(oblique_dends)
self.somatic_dendrites=self.seclist(somatic_dends)
self.dendrites=self.apical_dendrites+self.somatic_dendrites
self.all_sections=self.axon_sections+[self.soma_sections]+self.dendrites
UPDATE: This approach may be suited in your specific case, but it definitely has its downsides, see is kwargs an antipattern?
Try this approach:
class Neuron(object):
def __init__(self, **kwargs):
prop_defaults = {
"num_axon_segments": 0,
"apical_bifibrications": "fancy default",
...
}
for (prop, default) in prop_defaults.iteritems():
setattr(self, prop, kwargs.get(prop, default))
You can then create a Neuron like this:
n = Neuron(apical_bifibrications="special value")
I'd say there is nothing wrong with this approach - if you need 15 parameters to model something, you need 15 parameters. And if there's no suitable default value, you have to pass in all 15 parameters when creating an object. Otherwise, you could just set the default and change it later via a setter or directly.
Another approach is to create subclasses for certain common kinds of neurons (in your example) and provide good defaults for certain values, or derive the values from other parameters.
Or you could encapsulate parts of the neuron in separate classes and reuse these parts for the actual neurons you model. I.e., you could write separate classes for modeling a synapse, an axon, the soma, etc.
You could perhaps use a Python"dict" object ?
http://docs.python.org/tutorial/datastructures.html#dictionaries
Having so many parameters suggests that the class is probably doing too many things.
I suggest that you want to divide your class into several classes, each of which take some of your parameters. That way each class is simpler and won't take so many parameters.
Without knowing more about your code, I can't say exactly how you should split it up.
Looks like you could cut down the number of arguments by constructing objects such as Axon, Soma and DendriticTree outside of the LayerV constructor, and passing those objects instead.
Some of the parameters are only used in constructing e.g. DendriticTree, others are used in other places as well, so the problem it's not as clear cut, but I would definitely try that approach.
could you supply some example code of what you are working on? It would help to get an idea of what you are doing and get help to you sooner.
If it's just the arguments you are passing to the class that make it long, you don't have to put it all in __init__. You can set the parameters after you create the class, or pass a dictionary/class full of the parameters as an argument.
class MyClass(object):
def __init__(self, **kwargs):
arg1 = None
arg2 = None
arg3 = None
for (key, value) in kwargs.iteritems():
if hasattr(self, key):
setattr(self, key, value)
if __name__ == "__main__":
a_class = MyClass()
a_class.arg1 = "A string"
a_class.arg2 = 105
a_class.arg3 = ["List", 100, 50.4]
b_class = MyClass(arg1 = "Astring", arg2 = 105, arg3 = ["List", 100, 50.4])
After looking over your code and realizing I have no idea how any of those parameters relate to each other (soley because of my lack of knowledge on the subject of neuroscience) I would point you to a very good book on object oriented design. Building Skills in Object Oriented Design by Steven F. Lott is an excellent read and I think would help you, and anyone else in laying out object oriented programs.
It is released under the Creative Commons License, so is free for you to use, here is a link of it in PDF format http://homepage.mac.com/s_lott/books/oodesign/build-python/latex/BuildingSkillsinOODesign.pdf
I think your problem boils down to the overall design of your classes. Sometimes, though very rarely, you need a whole lot of arguments to initialize, and most of the responses here have detailed other ways of initialization, but in a lot of cases you can break the class up into more easier to handle and less cumbersome classes.
This is similar to the other solutions that iterate through a default dictionary, but it uses a more compact notation:
class MyClass(object):
def __init__(self, **kwargs):
self.__dict__.update(dict(
arg1=123,
arg2=345,
arg3=678,
), **kwargs)
Can you give a more detailed use case ? Maybe a prototype pattern will work:
If there are some similarities in groups of objects, a prototype pattern might help.
Do you have a lot of cases where one population of neurons is just like another except different in some way ? ( i.e. rather than having a small number of discrete classes,
you have a large number of classes that slightly differ from each other. )
Python is a classed based language, but just as you can simulate class based
programming in a prototype based language like Javascript, you can simulate
prototypes by giving your class a CLONE method, that creates a new object and
populates its ivars from the parent. Write the clone method so that keyword parameters
passed to it override the "inherited" parameters, so you can call it with something
like:
new_neuron = old_neuron.clone( branching_length=n1, branching_randomness=r2 )
I have never had to deal with this situation, or this topic. Your description implies to me that you may find, as you develop the design, that there are a number of additional classes that will become relevant - compartment is the most obvious. If these do emerge as classes in their own right, it is probable that some of your parameters become parameters of these additional classes.
You could create a class for your parameters.
Instead passing a bunch of parameters, you pass one class.
In my opinion, in your case the easy solution is to pass higher order objects as parameter.
For example, in your __init__ you have a DendriticTree that uses several arguments from your main class LayerV:
main_apical_dendrite = DendriticTree(
bifibs=apical_bifibs,
first_sec_L=apical_sec1_L,
L_sigma=L_sigma,
L_decrease_factor=ldecf,
first_sec_d=9,
branch_prob=apical_branch_prob
)
Instead of passing these 6 arguments to your LayerV you would pass the DendriticTree object directly (thus saving 5 arguments).
You probably want to have this values accessible everywhere, therefore you will have to save this DendriticTree:
class LayerV(__Cell):
def __init__(self, main_apical_dendrite, ...):
self.main_apical_dendrite = main_apical_dendrite
If you want to have a default value too, you can have:
class LayerV(__Cell):
def __init__(self, main_apical_dendrite=None, ...):
self.main_apical_dendrite = main_apical_dendrite or DendriticTree()
This way you delegate what the default DendriticTree should be to the class dedicated to that matter instead of having this logic in the higher order class that LayerV.
Finally, when you need to access the apical_bifibs you used to pass to LayerV you just access it via self.main_apical_dendrite.bifibs.
In general, even if the class you are creating is not a clear composition of several classes, your goal is to find a logical way to split your parameters. Not only to make your code cleaner, but mostly to help people understand what these parameter will be used for. In the extreme cases where you can't split them, I think it's totally ok to have a class with that many parameters. If there is no clear way to split arguments, then you'll probably end up with something even less clear than a list of 15 arguments.
If you feel like creating a class to group parameters together is overkill, then you can simply use collections.namedtuple which can have default values as shown here.
Want to reiterate what a number of people have said. Theres nothing wrong with that amount of parameters. Especially when it comes to scientific computing/programming
Take for example, sklearn's KMeans++ clustering implementation which has 11 parameters you can init with. Like that, there are numerous examples and nothing wrong with them
I would say there is nothing wrong if make sure you need those params. If you really wanna make it more readable I would recommend following style.
I wouldn't say that a best practice or what, it just make others easily know what is necessary for this Object and what is option.
class LayerV(__Cell):
# author: {name, url} who made this info
def __init__(self, no_default_params, some_necessary_params):
self.necessary_param = some_necessary_params
self.no_default_param = no_default_params
self.something_else = "default"
self.some_option = "default"
def b_option(self, value):
self.some_option = value
return self
def b_else(self, value):
self.something_else = value
return self
I think the benefit for this style is:
You can easily know the params which is necessary in __init__ method
Unlike setter, you don't need two lines to construct the object if you need set an option value.
The disadvantage is, you created more methods in your class than before.
sample:
la = LayerV("no_default", "necessary").b_else("sample_else")
After all, if you have a lot of "necessary" and "no_default" params, always think about is this class(method) do too many things.
If your answer is not, just go ahead.