Iterate through different permutations of 4 functions in Python - python

OK I am using different taggers to tag a text. Default, unigram, bigram and trigram.
I have to check which combination of three of those four taggers is the most accurate.
To do that i have to loop through all the possible combinations which i do like this:
permutaties = list(itertools.permutations(['default_tagger','unigram_tagger',
'bigram_tagger','trigram_tagger'],3))
resultaten = []
for element in permutaties:
resultaten.append(accuracy(element))
so each element is a tuple of three tagmethods like for example: ('default_tagger', 'bigram_tagger', 'trigram_tagger')
In the accuracy function I now have to dynamically call the three accompanying methods of each tagger, the problem is: I don't know how to do this.
The tagger functions are as follows:
unigram_tagger = nltk.UnigramTagger(brown_train, backoff=backofff)
bigram_tagger = nltk.BigramTagger(brown_train, backoff=backofff)
trigram_tagger = nltk.TrigramTagger(brown_train, backoff=backofff)
default_tagger = nltk.DefaultTagger('NN')
So for the example the code should become:
t0 = nltk.DefaultTagger('NN')
t1 = nltk.BigramTagger(brown_train, backoff=t0)
t2 = nltk.TrigramTagger(brown_train, backoff=t1)
t2.evaluate(brown_test)
So in essence the problem is how to iterate through all 24 combinations of that list of 4 functions.
Any Python Masters that can help me?

Not shure if I understood what you need, but you can use the methods you want to call themselves instead of strings - sou your code could become soemthing like:
permutaties = itertools.permutations([nltk.UnigramTagger, nltk.BigramTagger, nltk.TrigramTagger, nltk.DefaultTagger],3)
resultaten = []
for element in permutaties:
resultaten.append(accuracy(element, brown_Train, brown_element))
def accuracy(element, brown_train,brown_element):
if element is nltk.DeafultTagger:
evaluator = element("NN")
else:
evaluator = element(brown_train, backoff=XXX) #maybe insert more elif
#clauses to retrieve the proper backoff parameter --or you could
# usr a tuple in the call to permutations so the apropriate backoff
#is avaliable for each function to be called
return evaluator.evaluate(brown_test) # ? I am not shure from your code if this is your intent

Starting with jsbueno's code, I suggest writing a wrapper function for each of the taggers to give them the same signature. And since you only need them once, I suggest using a lambda.
permutaties = itertools.permutations([lambda: ntlk.DefaultTagger("NN"),
lambda: nltk.UnigramTagger(brown_train, backoff),
lambda: nltk.BigramTagger(brown_train, backoff),
lambda: nltk.TrigramTagger(brown_train, backoff)],3)
This would allow you to call each directly, without a special function that figures out which function you're calling and employs the appropriate signature.

basing on jsbueno code I think that you want to reuse evaluator as the backoff argument so the code should be
permutaties = itertools.permutations([nltk.UnigramTagger, nltk.BigramTagger, nltk.TrigramTagger, nltk.DefaultTagger],3)
resultaten = []
for element in permutaties:
resultaten.append(accuracy(element, brown_Train, brown_element))
def accuracy(element, brown_train,brown_element):
evaluator = "NN"
for e in element:
if evaluator == "NN":
evaluator = e("NN")
else:
evaluator = e(brown_train, backoff=evaluator) #maybe insert more elif
#clauses to retrieve the proper backoff parameter --or you could
# usr a tuple in the call to permutations so the apropriate backoff
#is avaliable for each function to be called
return evaluator.evaluate(brown_test) # ? I am not shure from your code if this is your intent

Related

Combining eval() and return in a function

I'm trying to use a function to iteratively return several machine learning models (pickles) with a function based on the accuracy cutoff I specify.
My issue is that I'm trying to load the pickles with eval, as their names correspond to the number given by sdf['number']. The eval function is not loading my pickles, and beyond that, I want them to be loaded and returned by my function. I have tested this by attempting to directly run data through each model after loading it before it moves on to the next one, but it is returning "learn0 not defined" for example.
Any thoughts on how to better do this iteratively?
Variables Explained:
jar = A list of the different variable names (learner names) that I
expected it to load. For example, learn0, learn1, etc.
cutoff = Accuracy Cutoff
sdf_temp = Temporary Study DataFrame
def piklJar(sdf,cutoff):
sdf_temp = sdf[sdf['value'] <= cutoff]
jar = []
i=0
for pklNum in sdf_temp['number']:
eval('"learn{} = load_learner({}/Models/Pkl {}.pkl)".format(i,datapath,pklNum)')
jar.append('learn{}'.format(i))
i+=1
return jar
eval isn't needed. Your example wasn't working code, but this is approximately the same thing:
def piklJar(sdf,cutoff):
sdf_temp = sdf[sdf['value'] <= cutoff]
return [load_learner(f'{datapath}/Models/Pkl {pklNum}') for pklNum in sdf_temp['number']]
After calling jar = pklJar(...), jar[0] would be equivalent to learn0, jar[1] would be learn1, etc. The various load_learner calls are stored in a list generated from a list comprehension.

Python multiproceesing partial pool with multiple iterators

I am trying to accomplish a task that involves doing it parallely using Multiprocessing pool in Python. Basically there are some static parameters for a function and a bunch of variable parameters for different hyperparamters. For eg.
def simulate(static1, static2, iter1, iter2):
#do some math in for loop
return output
Now the thing is nth component in iter2 comes only with nth component of iter1. Like say
iter1 = [1,2,3,4]
iter2 = [x,y,z,w]
So during iteration (1,x),(2,y) etc. should be there as the parameters and in the end I expect to get 4 different outputs. SO I am trying to implement
partial_function = partial(simulate, static1 = s1, static2 = s2)
output = pool.map(partial, (iter1, iter2))
I am stuck at how to use multiple iters given that python returns TypeError mentioning simulate() missing 1 positional argument. Any suggestions on that?

Validating that all components required for an object to exist are present

I need to write a script that gets a list of components from an external source and based on a pre-defined list it validates whether the service is complete. This is needed because the presence of a single component doesn't automatically imply that the service is present - some components are pre-installed even when there is no service. I've devised something really simple below, but I was wondering what is the intelligent way of doing this? There must be a cleaner, simpler way.
# Components that make up a complete service
serviceComponents = ['A','B']
# Input from JSON
data = ['B','A','C']
serviceComplete = True
for i in serviceComponents:
if i in data:
print 'yay ' + i + ' found from ' + ', '.join(service2)
else:
serviceComplete = False
break
# If serviceComplete = True do blabla...
You could do it a few different ways:
set(serviceComponents) <= set(data)
set(serviceComponents).issubset(data)
all(c in data for c in serviceComponents)
You can make it shorter, but you lose readability. What you have now is probably fine. I'd go with the first approach personally, since it expresses your intent clearly with set operations.
# Components that make up a complete service
serviceComponents = ['A','B']
# Input from JSON
data = ['B','A','C']
if all(item in data for item in serviceComponents):
print("All required components are present")
Built-in Set would serve for you, use set.issubset to identify that your required service components is subset of input data:
serviceComponents = set(['A','B'])
input_data = set(['B','A','C'])
if serviceComponents.issubset(input_data):
# perform actions ...

How to write a function in python?

I am doing exploratory data analysis, while doing that i am using the same lines of code many times .So i came to know that why can't i wrote the function for that.But i am new to python i don't know how to define a function exactly.So please help me.....
textdata is my main dataframe and tonumber,smstext are my variables
# subsetting the textdata
mesbytonum = textdata[['tonumber', 'smstext']]
# calculating the no.of messages by tonumber
messbytonum_freq = mesbytonum.groupby('tonumber').agg(len)
# resetting the index
messbytonum_freq.reset_index(inplace=True)
# making them in a descending order
messbytonum_freq_result = messbytonum_freq.sort(['smstext'], ascending=[0])
#calcuating percentages
messbytonum_freq_result['percentage'] = messbytonum_freq_result['smstext']/sum(messbytonum_freq_result['smstext'])
# considering top10
top10tonum = messbytonum_freq_result.head(10)
# top10tonum
i have repeated the similar kind of code around 20 times so i want to write the function for the above code which makes my code smaller. So please help me how can i define.
Thanks in advance
The function is defined like this:
def func(arg1, arg2, argN):
# do something
# you may need to return value(s) too
And called like this:
func(1,2,3) # you can use anything instead of 1,2 and 3
It will be
def MyFunc(textdata):
mesbytonum = textdata[['tonumber', 'smstext']]
messbytonum_freq = mesbytonum.groupby('tonumber').agg(len)
messbytonum_freq.reset_index(inplace=True)
messbytonum_freq_result = messbytonum_freq.sort(['smstext'], ascending=[0])
messbytonum_freq_result['percentage'] = messbytonum_freq_result['smstext']/sum(messbytonum_freq_result['smstext'])
top10tonum = messbytonum_freq_result.head(10)
return # what do you want to return?
# use this function
result=MyFunc(<argument here>)
# then you need to use result somehow
Your function can also return multiple values
return spam, egg
which you have to use like this
mySpam, myEgg=MyFunction(<argument>)

homogenization the functions can be compiled into a calculate networks?

Inside of a network, information (package) can be passed to different node(hosts), by modify it's content it can carry different meaning. The final package depends on hosts input via it's given route of network.
Now I want to implement a calculating network model can do small jobs by give different calculate path.
Prototype:
def a(p): return p + 1
def b(p): return p + 2
def c(p): return p + 3
def d(p): return p + 4
def e(p): return p + 5
def link(p, r):
p1 = p
for x in r:
p1 = x(p1)
return p1
p = 100
route = [a,c,d]
result = link(p,result)
#========
target_result = 108
if result = target_result:
# route is OK
I think finally I need something like this:
p with [init_payload, expected_target, passed_path, actual_calculated_result]
|
\/
[CHAOS of possible of functions networks]
|
\/
px [a,a,b,c,e] # ok this path is ok and match the target
Here is my questions hope may get your help:
can p carry(determin) the route(s) by inspect the function and estmated result?
(1.1 ) for example, if on the route there's a node x()
def x(p): return x / 0 # I suppose it can pass the compile
can p know in somehow this path is not good then avoid select this path?
(1.2) Another confuse is if p is a self-defined class type, the payload inside of this class essentially is a string, when it carry with a path [a,c,d], can p know a() must with a int type then avoid to select this node?'
same as 1.2 when generating the path, can I avoid such oops
def a(p): return p + 1
def b(p): return p + 2
def x(p): return p.append(1)
def y(p): return p.append(2)
full_node_list = [a,b,x,y]
path = random(2,full_node_list) # oops x,y will be trouble for the inttype P and a,b will be trouble to list type.
pls consider if the path is lambda list of functions
PS: as the whole model is not very clear in my mind the any leading and directing will be appreciated.
THANKS!
You could test each function first with a set of sample data; any function which returns consistently unusable values might then be discarded.
def isGoodFn(f):
testData = [1,2,3,8,38,73,159] # random test input
goodEnough = 0.8 * len(testData) # need 80% pass rate
try:
good = 0
for i in testData:
if type(f(i)) is int:
good += 1
return good >= goodEnough
except:
return False
If you know nothing about what the functions do, you will have to essentially do a full breadth-first tree search with error-checking at each node to discard bad results. If you have more than a few functions this will get very large very quickly. If you can guarantee some of the functions' behavior, you might be able to greatly reduce the search space - but this would be domain-specific, requiring more exact knowledge of the problem.
If you had a heuristic measure for how far each result is from your desired result, you could do a directed search to find good answers much more quickly - but such a heuristic would depend on knowing the overall form of the functions (a distance heuristic for multiplicative functions would be very different than one for additive functions, etc).
Your functions can raise TypeError if they are not satisfied with the data types they receive. You can then catch this exception and see whether you are passing an appropriate type. You can also catch any other exception type. But trying to call the functions and catching the exceptions can be quite slow.
You could also organize your functions into different sets depending on the argument type.
functions = { list : [some functions taking a list], int : [some functions taking an int]}
...
x = choose_function(functions[type(p)])
p = x(p)
I'm somewhat confused as to what you're trying to do, but: p cannot "know about" the functions until it is run through them. By design, Python functions don't specify what type of data they operate on: e.g. a*5 is valid whether a is a string, a list, an integer or a float.
If there are some functions that might not be able to operate on p, then you could catch exceptions, for example in your link function:
def link(p, r):
try:
for x in r:
p = x(p)
except ZeroDivisionError, AttributeError: # List whatever errors you want to catch
return None
return p

Categories