I have a project in which I run multiple data through a specific function that "cleans" them.
The cleaning function looks like this:
Misc.py
def clean(my_data)
sys.stdout.write("Cleaning genes...\n")
synonyms = FileIO("raw_data/input_data", 3, header=False).openSynonyms()
clean_genes = {}
for g in data:
if g in synonyms:
# Found a data point which appears in the synonym list.
#print synonyms[g]
for synonym in synonyms[g]:
if synonym in data:
del data[synonym]
clean_data[g] = synonym
sys.stdout.write("\t%s is also known as %s\n" % (g, clean_data[g]))
return data
FileIO is a custom class I made to open files.
My question is, this function will be called many times throughout the program's life cycle. What I want to achieve is don't have to read the input_data every time since it's gonna be the same every time. I know that I can just return it, and pass it as an argument in this way:
def clean(my_data, synonyms = None)
if synonyms == None:
...
else
...
But is there another, better looking way of doing this?
My file structure is the following:
lib
Misc.py
FileIO.py
__init__.py
...
raw_data
runme.py
From runme.py, I do this from lib import * and call all the functions I made.
Is there a pythonic way to go around this? Like a 'memory' for the function
Edit:
this line: synonyms = FileIO("raw_data/input_data", 3, header=False).openSynonyms() returns a collections.OrderedDict() from input_data and using the 3rd column as the key of the dictionary.
The dictionary for the following dataset:
column1 column2 key data
... ... A B|E|Z
... ... B F|W
... ... C G|P
...
Will look like this:
OrderedDict([('A',['B','E','Z']), ('B',['F','W']), ('C',['G','P'])])
This tells my script that A is also known as B,E,Z. B as F,W. etc...
So these are the synonyms. Since, The synonyms list will never change throughout the life of the code. I want to just read it once, and re-use it.
Use a class with a __call__ operator. You can call objects of this class and store data between calls in the object. Some data probably can best be saved by the constructor. What you've made this way is known as a 'functor' or 'callable object'.
Example:
class Incrementer:
def __init__ (self, increment):
self.increment = increment
def __call__ (self, number):
return self.increment + number
incrementerBy1 = Incrementer (1)
incrementerBy2 = Incrementer (2)
print (incrementerBy1 (3))
print (incrementerBy2 (3))
Output:
4
5
[EDIT]
Note that you can combine the answer of #Tagc with my answer to create exactly what you're looking for: a 'function' with built-in memory.
Name your class Clean rather than DataCleaner and the name the instance clean. Name the method __call__ rather than clean.
Like a 'memory' for the function
Half-way to rediscovering object-oriented programming.
Encapsulate the data cleaning logic in a class, such as DataCleaner. Make it so that instances read synonym data once when instantiated and then retain that information as part of their state. Have the class expose a clean method that operates on the data:
class FileIO(object):
def __init__(self, file_path, some_num, header):
pass
def openSynonyms(self):
return []
class DataCleaner(object):
def __init__(self, synonym_file):
self.synonyms = FileIO(synonym_file, 3, header=False).openSynonyms()
def clean(self, data):
for g in data:
if g in self.synonyms:
# ...
pass
if __name__ == '__main__':
dataCleaner = DataCleaner('raw_data/input_file')
dataCleaner.clean('some data here')
dataCleaner.clean('some more data here')
As a possible future optimisation, you can expand on this approach to use a factory method to create instances of DataCleaner which can cache instances based on the synonym file provided (so you don't need to do expensive recomputation every time for the same file).
I think the cleanest way to do this would be to decorate your "clean" (pun intended) function with another function that provides the synonyms local for the function. this is iamo cleaner and more concise than creating another custom class, yet still allows you to easily change the "input_data" file if you need to (factory function):
def defineSynonyms(datafile):
def wrap(func):
def wrapped(*args, **kwargs):
kwargs['synonyms'] = FileIO(datafile, 3, header=False).openSynonyms()
return func(*args, **kwargs)
return wrapped
return wrap
#defineSynonyms("raw_data/input_data")
def clean(my_data, synonyms={}):
# do stuff with synonyms and my_data...
pass
Related
I'm brand new to classes and I don't really know when to use them. I want to write a program for simulation of EPR/NMR spectra which requires information about the simulated system. The relevant thing is this: I have a function called rel_inty(I_n,N) that calculates this relevant information from two values. The problem is that it becomes very slow when either of these values becomes large (I_n,N >= 10). That's why I opted for calculating rel_inty(I_n,N) beforehand for the most relevant combinations of (I_n,N) and save them in a dictionary. I write that dictionary to a file and can import it using eval(), since calculating rel_inty(I_n,N) dynamically on each execution would be way too slow.
Now I had the following idea: What if I create a class manage_Dict():, whose methods can either recreate a basic dictionary with adef basic(): , in case the old file somehow gets deleted, or expand the existing one with a def expand(): method, if the basic one doesn't contain a user specified combination of (I_n,N)?
This would be the outline of that class:
class manage_Dict(args):
def rel_inty(I_n,N):
'''calculates relative intensities for a combination (I_n,N)'''
def basic():
'''creates a dict for preset combinations of I_n,N'''
with open('SpinSys.txt','w') as outf:
Dict = {}
I_n_List = [somevalues]
N_List = [somevalues]
for I_n in I_n_List:
Dict[I_n] = {}
for N in N_List:
Dict[I_n][N] = rel_inty(I_n,N)
outf.write(str(Dict))
def expand(*args):
'''expands the existing dict for all tuples (I_n,N) in *args'''
with open('SpinSys.txt','r') as outf:
Dict = eval(outf.read())
for tup in args:
I_n = tup[0]
N = tup[1]
Dict[I_n][N] = rel_inty(I_n,N)
os.remove('SpinSys.txt')
with open('SpinSys.txt','w') as outf:
outf.write(str(Dict))
Usage:
'''Recreate SpinSys.txt if lost'''
manage_Dict.basic()
'''Expand SpinSys.txt in case of missing (I_n,N)'''
manage_Dict.expand((10,5),(11,3),(2,30))
Would this be a sensible solution? I was wondering that because I usually see classes with self and __init__ creating an object instance instead of just managing function calls.
If we are going to make use of an object, lets make sure it's doing some useful work for us and the interface is nicer than just using functions. I'm going to suggest a few big tweaks that will make life easier:
We can sub class dict itself, and then our object is a dict, as well as all our custom fancy stuff
Use JSON instead of text files, so we can quickly, naturally and safely serialise and deserialise
import json
class SpectraDict(dict):
PRE_CALC_I_N = ["...somevalues..."]
PRE_CACL_N = ["...somevalues..."]
def rel_inty(self, i_n, n):
# Calculate and store results from the main function
if i_n not in self:
self[i_n] = {}
if n not in self[i_n]:
self[i_n][n] = self._calculate_rel_inty(i_n, n)
return self[i_n][n]
def _calculate_rel_inty(self, i_n, n):
# Some exciting calculation here instead...
return 0
def pre_calculate(self):
s_dict = SpectraDict()
for i_n in self.PRE_CALC_I_N:
for n in self.PRE_CACL_N:
# Force the dict to calculate and store the values
s_dict.rel_inty(i_n, n)
return s_dict
#classmethod
def load(cls, json_file):
with open(json_file) as fh:
return SpectraDict(json.load(fh))
def save(self, json_file):
with open(json_file, 'w') as fh:
json.dump(self, fh)
return self
Now when ask for values using the rel_inty() function we immediately store the answer in ourselves before giving it back. This is called memoization / caching. Therefore to pre-fill our object with the pre-calculated values, we just need to ask it for lots of answers and it will store them.
After that we can either load or save quite naturally using JSON:
# Bootstrapping from scratch:
s_dict = SpectraDict().pre_calculate().save('spin_sys.json')
# Loading and updating with new values
s_dict = SpectraDict.load('spin_sys.json')
s_dict.rel_inty(10, 45) # All your new calculations here...
s_dict.save('spin_sys.json')
Edit:
This question has been marked duplicate but I don't think that it is. Implementing the suggested answer, that is to use the Mapping abc, does not have the behavior I would like:
from collections import Mapping
class data(Mapping):
def __init__(self,params):
self.params = params
def __getitem__(self,k):
print "getting",k
return self.params[k]
def __len__(self):
return len(self.params)
def __iter__(self):
return ( k for k in self.params.keys() )
def func(*args,**kwargs):
print "In func"
return None
ps = data({"p1":1.,"p2":2.})
print "\ncalling...."
func(ps)
print "\ncalling...."
func(**ps)
Output:
calling....
In func
calling....
in __getitem__ p2
in __getitem__ p1
In func
Which, as mentioned in the question, is not what I want.
The other solution, given in the comments, is to modify the routines that are causing problems. That will certainly work, however I was looking for a quick (lazy?) fix!
Question:
How can I implement the ** operator for a class, other than via __getitem__? For example I would like to be able to do this::
def func(**kwargs):
<do some clever stuff>
x = some_generic_class():
func( **x )
without an implicit call to some_generic_class.__getitem__(). In my application I have already implemented __getitem__ with some data logging which I do not want to perform when the class is referenced as above.
If it's not possible to overload the ** operator, is it possible to detect when __getitem__ is being called as a result of the class being passed to a function, rather than explicitly?
Background:
I am working on a physics model that is built out of a set of packages which are chosen according to user input at runtime. The flexible structure of the model means that I rarely know the required parameters and so i pass a dict of parameter names and values between the models. In order to make this more user friendly I am now trying to develop a class paramlist that overloads the dict functionality with a set of routines that do some consistency checking, set default values, etc. The idea is that I pass an instance of paramlist rather than a dict. One of the more important aims is to keep a log of which members of paramlist have been referenced by the physics packages and which ones have not. A stripped out version is below, which aims to maintain a second dict that logs whether a parameter has been referenced::
class paramlist(object):
def __init__( self, params ):
self.params = copy(params)
self.used = { k:False for k in self.params }
def __getitem__(self, k):
try:
v = self.params[k]
except KeyError:
raise KeyError("Parameter {} not in parameter list".format(k))
else:
self.used[k] = True
return v
def __setitem__(self,k,v):
self.params[k] = v
self.used[k] = False
Which does not have the behaviour I want:
ps = paramlist( {"p1":1.} )
def donothing( *args, **kwargs ):
return None
donothing(ps)
print paramlist.used["p1"]
donothing(**ps)
print paramlist.used["p1"]
Output:
False
True
I would like the use dict to remain False in both cases, so that I can tell the user that one of their parameters was not used (implying that they screwed up and a default value has been used instead). I presume that the ** case has the effect of calling __getitem__ on every entry in the paramlist.
I have a class where I want to initialize an attribute self.listN and an add_to_listN method for each element of a list, e.g. from attrs = ['list1', 'list2'] I want list1 and list2 to be initialized as empty lists and the methods add_to_list1 and add_to_list2 to be created. Each add_to_listN method should take two parameters, say value and unit, and append a tuple (value, unit) to the corresponding listN.
The class should therefore look like this in the end:
class Foo():
def __init__(self):
self.list1 = []
self.list1 = []
def add_to_list1(value, unit):
self.list1.append((value, unit))
def add_to_list2(value, unit):
self.list2.append((value, unit))
Leaving aside all the checks and the rest of the class, I came up with this:
class Foo():
def __init__(self):
for attr in ['list1', 'list2']:
setattr(self, attr, [])
setattr(self, 'add_to_%s' % attr, self._simple_add(attr))
def _simple_add(self, attr):
def method(value, unit=None):
getattr(self, attr).append((value, unit))
return method
I also checked other solutions such as the ones suggested here and I would like to do it "right", so my questions are:
Are/Should these methods (be) actually classmethods or not?
Is there a cost in creating the methods in __init__, and in this case is there an alternative?
Where is the best place to run the for loop and add these methods? Within the class definition? Out of it?
Is the use of metaclasses recommended in this case?
Update
Although Benjamin Hodgson makes some good points, I'm not asking for a (perhaps better) alternative way to do this but for the best way to use the tools that I mentioned. I'm using a simplified example in order not to focus on the details.
To further clarify my questions: the add_to_listN methods are meant to be additional, not to replace setters/getters (so I still want to be able to do l1 = f.list1 and f.list1 = [] with f = Foo()).
You are making a design error. You could override __getattr__, parse the attribute name, and return a closure which does what you want, but it's strange to dynamically generate methods, and strange code is bad code. There are often situations where you need to do it, but this is not one of them.
Instead of generating n methods which each do the same thing to one of n objects, why not just write one method which is parameterised by n? Something roughly like this:
class Foo:
def __init__(self):
self.lists = [
[],
[]
]
def add(self, row, value):
self.lists[row].append(value)
Then foo.add1(x) becomes simply foo.add(1, x); foo.add2(x) becomes foo.add(2, x), and so on. There's one method, parameterised along the axis of variation, which serves all cases - rather than a litany of ad-hoc generated methods. It's much simpler.
Don't mix up the data in your system with the names of the data in your system.
I have a class, and I would like to be able to create multiple objects of that class and place them in an array. I did it like so:
rooms = []
rooms.append(Object1())
...
rooms.append(Object4())
I then have a dict of functions, and I would like to pass the object to the function. However, I'm encountering some problems..For example, I have a dict:
dict = {'look': CallLook(rooms[i])}
I'm able to pass it into the function, however; in the function if I try to call an objects method it gives me problems
def CallLook(current_room)
current_room.examine()
I'm sure that there has to be a better way to do what I'm trying to do, but I'm new to Python and I haven't seen a clean example on how to do this. Anyone have a good way to implement a list of objects to be passed into functions? All of the objects contain the examine method, but they are objects of different classes. (I'm sorry I didn't say so earlier)
The specific error states: TypeError: 'NoneType' object is not callable
Anyone have a good way to implement a list of objects to be passed into functions? All of the objects contain the examine method, but they are objects of different classes. (I'm sorry I didn't say so earlier)
This is Python's plain duck-typing.
class Room:
def __init__(self, name):
self.name = name
def examine(self):
return "This %s looks clean!" % self.name
class Furniture:
def __init__(self, name):
self.name = name
def examine(self):
return "This %s looks comfortable..." % self.name
def examination(l):
for item in l:
print item.examine()
list_of_objects = [ Room("Living Room"), Furniture("Couch"),
Room("Restrooms"), Furniture("Bed") ]
examination(list_of_objects)
Prints:
This Living Room looks clean!
This Couch looks comfortable...
This Restrooms looks clean!
This Bed looks comfortable...
As for your specific problem: probably you have forgotten to return a value from examine()? (Please post the full error message (including full backtrace).)
I then have a dict of functions, and I would like to pass the object to the function. However, I'm encountering some problems..For example, I have a dict:
my_dict = {'look': CallLook(rooms[i])} # this is no dict of functions
The dict you have created may evaluate to {'look': None} (assuming your examine() doesn't return a value.) Which could explain the error you've observed.
If you wanted a dict of functions you needed to put in a callable, not an actual function call, e.g. like this:
my_dict = {'look': CallLook} # this is a dict of functions
if you want to bind the 'look' to a specific room you could redefine CallLook:
def CallLook(current_room)
return current_room.examine # return the bound examine
my_dict = {'look': CallLook(room[i])} # this is also a dict of functions
Another issue with your code is that you are shadowing the built-in dict() method by naming your local dictionary dict. You shouldn't do this. This yields nasty errors.
Assuming you don't have basic problems (like syntax errors because the code you have pasted is not valid Python), this example shows you how to do what you want:
>>> class Foo():
... def hello(self):
... return 'hello'
...
>>> r = [Foo(),Foo(),Foo()]
>>> def call_method(obj):
... return obj.hello()
...
>>> call_method(r[1])
'hello'
Assuming you have a class Room the usual way to create a list of instances would be using a list comprehension like this
rooms = [Room() for i in range(num_rooms)]
I think there are some things you may not be getting about this:
dict = {'look': CallLook(rooms[i])}
This creates a dict with just one entry: a key 'look', and a value which is the result of evaluating CallLook(rooms[i]) right at the point of that statement. It also then uses the name dict to store this object, so you can no longer use dict as a constructor in that context.
Now, the error you are getting tells us that rooms[i] is None at that point in the programme.
You don't need CallLook (which is also named non-standardly) - you can just use the expression rooms[i].examine(), or if you want to evaluate the call later rooms[i].examine.
You probably don't need the dict at all.
That is not a must, but in some cases, using hasattr() is good... getattr() is another way to get an attribute off an object...
So:
rooms = [Obj1(),Obj2(),Obj3()]
if hasattr(rooms[i], 'examine'):#First check if our object has selected function or attribute...
getattr(rooms[i], 'examine') #that will just evaluate the function do not call it, and equals to Obj1().examine
getattr(rooms[i], 'examine')() # By adding () to the end of getattr function, we evalute and then call the function...
You may also pass parameters to examine function like:
getattr(rooms[i], 'examine')(param1, param2)
I'm not sure of your requirement, but you can use dict to store multiple object of a class.
May be this will help,
>>> class c1():
... print "hi"
...
hi
>>> c = c1()
>>> c
<__main__.c1 instance at 0x032165F8>
>>> d ={}
>>> for i in range (10):
... d[i] = c1()
...
>>> d[0]
<__main__.c1 instance at 0x032166E8>
>>> d[1]
<__main__.c1 instance at 0x032164B8>
>>>
It will create a object of c1 class and store it in dict. Obviously, in this case you can use list instead of dict.
This question already has answers here:
Calling a function of a module by using its name (a string)
(18 answers)
Closed 4 months ago.
I have this code:
fields = ['name','email']
def clean_name():
pass
def clean_email():
pass
How can I call clean_name() and clean_email() dynamically?
For example:
for field in fields:
clean_{field}()
I used the curly brackets because it's how I used to do it in PHP but obviously doesn't work.
How to do this with Python?
If don't want to use globals, vars and don't want make a separate module and/or class to encapsulate functions you want to call dynamically, you can call them as the attributes of the current module:
import sys
...
getattr(sys.modules[__name__], "clean_%s" % fieldname)()
Using global is a very, very, bad way of doing this. You should be doing it this way:
fields = {'name':clean_name,'email':clean_email}
for key in fields:
fields[key]()
Map your functions to values in a dictionary.
Also using vars()[] is wrong too.
It would be better to have a dictionary of such functions than to look in globals().
The usual approach is to write a class with such functions:
class Cleaner(object):
def clean_name(self):
pass
and then use getattr to get access to them:
cleaner = Cleaner()
for f in fields:
getattr(cleaner, 'clean_%s' % f)()
You could even move further and do something like this:
class Cleaner(object):
def __init__(self, fields):
self.fields = fields
def clean(self):
for f in self.fields:
getattr(self, 'clean_%s' % f)()
Then inherit it and declare your clean_<name> methods on an inherited class:
cleaner = Cleaner(['one', 'two'])
cleaner.clean()
Actually this can be extended even further to make it more clean. The first step probably will be adding a check with hasattr() if such method exists in your class.
I have come across this problem twice now, and finally came up with a safe and not ugly solution (in my humble opinion).
RECAP of previous answers:
globals is the hacky, fast & easy method, but you have to be super consistent with your function names, and it can break at runtime if variables get overwritten. Also it's un-pythonic, unsafe, unethical, yadda yadda...
Dictionaries (i.e. string-to-function maps) are safer and easy to use... but it annoys me to no end, that i have to spread dictionary assignments across my file, that are easy to lose track of.
Decorators made the dictionary solution come together for me. Decorators are a pretty way to attach side-effects & transformations to a function definition.
Example time
fields = ['name', 'email', 'address']
# set up our function dictionary
cleaners = {}
# this is a parametered decorator
def add_cleaner(key):
# this is the actual decorator
def _add_cleaner(func):
cleaners[key] = func
return func
return _add_cleaner
Whenever you define a cleaner function, add this to the declaration:
#add_cleaner('email')
def email_cleaner(email):
#do stuff here
return result
The functions are added to the dictionary as soon as their definition is parsed and can be called like this:
cleaned_email = cleaners['email'](some_email)
Alternative proposed by PeterSchorn:
def add_cleaner(func):
cleaners[func.__name__] = func
return func
#add_cleaner
def email():
#clean email
This uses the function name of the cleaner method as its dictionary key.
It is more concise, though I think the method names become a little awkward.
Pick your favorite.
globals() will give you a dict of the global namespace. From this you can get the function you want:
f = globals()["clean_%s" % field]
Then call it:
f()
Here's another way:
myscript.py:
def f1():
print 'f1'
def f2():
print 'f2'
def f3():
print 'f3'
test.py:
import myscript
for i in range(1, 4):
getattr(myscript, 'f%d' % i)()
I had a requirement to call different methods of a class in a method of itself on the basis of list of method names passed as input (for running periodic tasks in FastAPI). For executing methods of Python classes, I have expanded the answer provided by #khachik. Here is how you can achieve it from inside or outside of the class:
>>> class Math:
... def add(self, x, y):
... return x+y
... def test_add(self):
... print(getattr(self, "add")(2,3))
...
>>> m = Math()
>>> m.test_add()
5
>>> getattr(m, "add")(2,3)
5
Closely see how you can do it from within the class using self like this:
getattr(self, "add")(2,3)
And from outside the class using an object of the class like this:
m = Math()
getattr(m, "add")(2,3)
Here's another way: define the functions then define a dict with the names as keys:
>>> z=[clean_email, clean_name]
>>> z={"email": clean_email, "name":clean_name}
>>> z['email']()
>>> z['name']()
then you loop over the names as keys.
or how about this one? Construct a string and use 'eval':
>>> field = "email"
>>> f="clean_"+field+"()"
>>> eval(f)
then just loop and construct the strings for eval.
Note that any method that requires constructing a string for evaluation is regarded as kludgy.
for field in fields:
vars()['clean_' + field]()
In case if you have a lot of functions and a different number of parameters.
class Cleaner:
#classmethod
def clean(cls, type, *args, **kwargs):
getattr(cls, f"_clean_{type}")(*args, **kwargs)
#classmethod
def _clean_email(cls, *args, **kwargs):
print("invoked _clean_email function")
#classmethod
def _clean_name(cls, *args, **kwargs):
print("invoked _clean_name function")
for type in ["email", "name"]:
Cleaner.clean(type)
Output:
invoked _clean_email function
invoked _clean_name function
I would use a dictionary which mapped field names to cleaning functions. If some fields don't have corresponding cleaning function, the for loop handling them can be kept simple by providing some sort of default function for those cases. Here's what I mean:
fields = ['name', 'email', 'subject']
def clean_name():
pass
def clean_email():
pass
# (one-time) field to cleaning-function map construction
def get_clean_func(field):
try:
return eval('clean_'+field)
except NameError:
return lambda: None # do nothing
clean = dict((field, get_clean_func(field)) for field in fields)
# sample usage
for field in fields:
clean[field]()
The code above constructs the function dictionary dynamically by determining if a corresponding function named clean_<field> exists for each one named in the fields list. You likely would only have to execute it once since it would remain the same as long as the field list or available cleaning functions aren't changed.