float division by zero error related to ngram and nltk

float division by zero error related to ngram and nltk - python

My task is to use 10-fold cross validation method with uni, bi and trigrams in a corpus and compare their accuracy. However, I am stuck with a float division error. All of these codes are given by the question setter except for the loop, so the error is probably there. Here, we are only using the first 1000 sentences to test the program, and that line will be removed once I know the program runs.
import codecs
mypath = "/Users/myname/Desktop/"
corpusFile = codecs.open(mypath + "estonianSample.txt",mode="r",encoding="latin-1")
sentences = [[tuple(w.split("/")) for w in line[:-1].split()] for line in corpusFile.readlines()]
corpusFile.close()
from math import ceil
N=len(sentences)
chunkSize = int(ceil(N/10.0))
sentences = sentences[:1000]
chunks=[sentences[i:i+chunkSize] for i in range(0, N, chunkSize)]
for i in range(10):
training = reduce(lambda x,y:x+y,[chunks[j] for j in range(10) if j!=i])
testing = chunks[i]
from nltk import UnigramTagger,BigramTagger,TrigramTagger
t1 = UnigramTagger(training)
t2 = BigramTagger(training,backoff=t1)
t3 = TrigramTagger(training,backoff=t2)
t3.evaluate(testing)
This is what the error says:
runfile('/Users/myname/pythonhw3.py', wdir='/Users/myname')
Traceback (most recent call last):
File "<ipython-input-1-921164840ebd>", line 1, in <module>
runfile('/Users/myname/pythonhw3.py', wdir='/Users/myname')
File "/Users/myname/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 580, in runfile
execfile(filename, namespace)
File "/Users/myname/pythonhw3.py", line 34, in <module>
t3.evaluate(testing)
File "/Users/myname/anaconda/lib/python2.7/site-packages/nltk/tag/api.py", line 67, in evaluate
return accuracy(gold_tokens, test_tokens)
File "/Users/myname/anaconda/lib/python2.7/site-packages/nltk/metrics/scores.py", line 40, in accuracy
return float(sum(x == y for x, y in izip(reference, test))) / len(test)
ZeroDivisionError: float division by zero

Your error is occurring due to the return value being close to negative infinity.
The line specifically causing the issue is,
t3.evaluate(testing)
What you can do instead is,
try:
t3.evaluate(testing)
except ZeroDivisonError:
# Do whatever you want it to do
print(0)
It works on my end. Try it out!
The answer is four years later, but hopefully, a fellow net citizen can find this helpful.

Related

My created function won't accept an array as one of the arguments

I've written a function that takes two arguments, one for no. dimensions and another for no. simulations. The function does exactly what is needed (calculating the volume of a unit hypersphere), however when I wish to plot the function over a range of dimensions it returns an error: ''list' object cannot be interpreted as an integer'.
My function is the following,
def hvolume(ndim, nsim):
ob = [np.random.uniform(0.0,1.0,(nsim, ndim))]
ob = np.concatenate(ob)
i = 0
res = []
while i <= nsim-1:
arr = np.sqrt(np.sum(np.square(ob[i])))
i += 1
res.append(arr)
N = nsim
n = ndim
M = len([i for i in res if i <= 1])
return ((2**n)*M/N)
The error traceback is:
Traceback (most recent call last):
File "<ipython-input-192-4c4a2c778637>", line 1, in <module>
runfile('H:/Documents/Python Scripts/Q4ATTEMPT.py', wdir='H:/Documents/Python Scripts')
File "C:\Users\u1708511\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile
execfile(filename, namespace)
File "C:\Users\u1708511\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "H:/Documents/Python Scripts/Q4ATTEMPT.py", line 20, in <module>
print(hvolume(d, 2))
File "H:/Documents/Python Scripts/Q4ATTEMPT.py", line 4, in hvolume
ob = [np.random.uniform(0.0,1.0,(nsim, ndim))]
File "mtrand.pyx", line 1307, in mtrand.RandomState.uniform
File "mtrand.pyx", line 242, in mtrand.cont2_array_sc
TypeError: 'list' object cannot be interpreted as an integer
I really have no idea where to go from here, and have searched thoroughly online for how to resolve this. Unfortunately I'm a beginner with this!
Any help is appreciated.

If you simply try your first line in the function;
ob = [np.random.uniform(0.0,1.0,(nsim, ndim))]
with a list as one of the variables like so;
[np.random.uniform(0.0,1.0,([1,2], 2))]
you will get the error:
TypeError: 'list' object cannot be interpreted as an integer
This is because the uniform command it looking for an integer, not a list. You will need to make a for loop if you would like to handle lists.

One pattern I use for situations like this would be to begin the function with a block to handle the case of if they're iterators. Something like this for example.
from collections import Iterator
def hvolume(ndim, nsim):
outputs = []
if isinstance(ndim, Iterator):
for ndim_arg in ndim:
outputs.append(hvolume(ndim_arg, nsim))
if isinstance(nsim, Iterator):
for nsim_arg in nsim:
outputs.append(hvolume(ndim, nsim_arg))
if len(outputs) == 0: # neither above is an Iterator
# ... the rest of the function but it appends to outputs
return outputs

Check the input parameters of your method "hvolume", it seems that you give a list either nsim or ndim, which should be both integer values. That makes the uniform throw a TypeError Exception.

scikitlearn adapt bigram to svm

I have problem. here's my code.
http://colorscripter.com/s/9vc2ryj
And I mistaked. evaluate_classifier(bigram_word_feats) is what I want.
I'm trying to text mining by SVM.
The feature vectors are bigram model.
But I got a problem:
Traceback (most recent call last):
File "C:/Users/LG/Desktop/untitled1/TEST.py", line 184, in <module>
evaluate_classifier(bigram_word_feats)
File "C:/Users/LG/Desktop/untitled1/TEST.py", line 90, in evaluate_classifier
classifier.train(trainfeats)
File "C:\Users\LG\Anaconda3\lib\site-packages\nltk\classify\scikitlearn.py", line 115, in train
X = self._vectorizer.fit_transform(X)
File "C:\Users\LG\Anaconda3\lib\site-packages\sklearn\feature_extraction\dict_vectorizer.py", line 226, in fit_transform
return self._transform(X, fitting=True)
File "C:\Users\LG\Anaconda3\lib\site-packages\sklearn\feature_extraction\dict_vectorizer.py", line 190, in _transform
feature_names.sort()
TypeError: unorderable types: tuple() < str()
Why this happen and how can I solve?
and what's the process of nltk classifier?
give it to my feature word and period? Then it just generate svm model?
Oh and I'm using python 3. Do I need to use python 2?

New answer:
I think the problem is that nltk expects a dict indexed by strings instead of tuples. Can you try to replace the return statement from:
return dict([(ngram, True) for ngram in itertools.chain(words, bigrams)])
to the following:
return dict([('|'.join (ngram), True) for ngram in itertools.chain(words, bigrams)])
Old answer:
`train` methods of Scikit-learn predictors expect two inputs: features and targets. Something like the following (not tested):
negfeats = [featx(f) for f in word_split(negdata)]posfeats = [featx(f) for f in word_split(posdata)]...trainlabels = [-1,] * negcutoff + [+1,] * poscutoffclassifier.train(trainfeats, trainlabels)
In defining trainlabels, I followed your style of using arithmetic operators on lists but I wouldn't do it in my code as it makes it less readable.

Sage for Graph Theory, KeyError

I'll start with my code, because this may just be an obvious problem to those with better understanding of the language:
g = graphs.CompleteGraph(60).complement()
for i in range(1,180):
a = randint(0,59)
b = randint(0,59)
h = copy(g)
h.add_edge(a,b)
if h.is_circular_planar():
g.add_edge(a,b)
strong = copy(strong_resolve(g))
S = strong.vertex_cover()
d = {'#00FF00': [], '#FF0000': []}
for v in G.vertices():
if v in S:
d['#FF0000'].append(v)
else:
d['#00FF00'].append(v)
g.plot(layout="spring", vertex_colors=d).show()
strong.plot(vertex_colors=d).show()
new_strong = copy(strong)
for w in new_strong.vertices():
if len(new_strong.neighbors(w)) == 0: #trying to remove
new_strong.delete_vertex(w) #disconnected vertices
new_strong.plot(vertex_colors=d).show()
A couple notes: strong_resolve is a function which takes in a graph and outputs another graph. The first two blocks of code work fine.
My problem is that once I add the third block things don't work anymore. In fiddling around I've gotten variants of this code that when added cause errors, and when removed the errors remain somehow. What happens now is that the for loop seems to go until its end and only then it will give the following error:
Traceback (most recent call last): if h.is_circular_planar():
File "", line 1, in <module>
File "/tmp/tmprzreop/___code___.py", line 30, in <module>
exec compile(u'new_strong.plot(vertex_colors=d).show()
File "", line 1, in <module>
File "/usr/lib/sagemath/local/lib/python2.7/site-packages/sage/misc/decorators.py", line 550, in wrapper
return func(*args, **options)
File "/usr/lib/sagemath/local/lib/python2.7/site-packages/sage/graphs/generic_graph.py", line 15706, in plot
return self.graphplot(**options).plot()
File "/usr/lib/sagemath/local/lib/python2.7/site-packages/sage/graphs/generic_graph.py", line 15407, in graphplot
return GraphPlot(graph=self, options=options)
File "/usr/lib/sagemath/local/lib/python2.7/site-packages/sage/graphs/graph_plot.py", line 247, in __init__
self.set_vertices()
File "/usr/lib/sagemath/local/lib/python2.7/site-packages/sage/graphs/graph_plot.py", line 399, in set_vertices
pos += [self._pos[j] for j in vertex_colors[i]]
KeyError: 0
this can vary in that KeyError: 0 is occasionally 1 or 2 depending on some unknown factor.
I apologize in advance for my horrible code and acknowledge that I really have no idea what I'm doing but I'd really appreciate if someone could help me out here.

I figured it out! It turns out the error came from d having entries that made no sense in new_strong, namely those for vertices that were deleted already. This caused the key error when plot() tried to colour the vertices according to d.

How do I fix the 'TypeError: hasattr(): attribute name must be string' error?

I have the following code:
import pymc as pm
from matplotlib import pyplot as plt
from pymc.Matplot import plot as mcplot
import numpy as np
from matplotlib import rc
res = [18.752, 12.450, 11.832]
v = pm.Uniform('v', 0, 20)
errors = pm.Uniform('errors', 0, 100, size = 3)
taus = 1/(errors ** 2)
mydist = pm.Normal('mydist', mu = v, tau = taus, value = res, observed = True)
model=pm.Model([mydist, errors, taus, v, res])
mcmc=pm.MCMC(model) # This is line 19 where the TypeError originates
mcmc.sample(20000,10000)
mcplot(mcmc.trace('mydist'))
For some reason it doesn't work, I get the 'TypeError: hasattr(): attribute name must be string' error, with the following trace:
Traceback (most recent call last):
File "<ipython-input-49-759ebaf4321c>", line 1, in <module>
runfile('C:/Users/Paul/.spyder2-py3/temp.py', wdir='C:/Users/Paul/.spyder2-py3')
File "C:\Users\Paul\Miniconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 685, in runfile
execfile(filename, namespace)
File "C:\Users\Paul\Miniconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "C:/Users/Paul/.spyder2-py3/temp.py", line 19, in <module>
mcmc=pm.MCMC(model)
File "C:\Users\Paul\Miniconda3\lib\site-packages\pymc\MCMC.py", line 82, in __init__
**kwds)
File "C:\Users\Paul\Miniconda3\lib\site-packages\pymc\Model.py", line 197, in __init__
Model.__init__(self, input, name, verbose)
File "C:\Users\Paul\Miniconda3\lib\site-packages\pymc\Model.py", line 99, in __init__
ObjectContainer.__init__(self, input)
File "C:\Users\Paul\Miniconda3\lib\site-packages\pymc\Container.py", line 606, in __init__
conservative_update(self, input_to_file)
File "C:\Users\Paul\Miniconda3\lib\site-packages\pymc\Container.py", line 549, in conservative_update
if not hasattr(obj, k):
TypeError: hasattr(): attribute name must be string
How do I make it work and output "mydist"?
Edit: I posted a wrong trace at first by accident.
Edit2: It all must be because res doesn't have a name, because it's an array, but I don't know how to assign a name to it, so it'll make this work.

I must admit that I'm not familiar with pymc, but changing it to the following at least made the application run:
mydist = pm.Normal('mydist', mu = v, tau = taus, value = res, observed = False)
mcmc=pm.MCMC([mydist, errors, taus, v, res])
This seems to be because you were wrapping everything in a Model which is an extension of ObjectContainer, but since you passed it a list, MCMC file_items in Container.py tried to assign index 4 in a list to something using replace, but since Model is an ObjectContainer it assigned the key 4 in it's __dict__ causing the weird TypeError you got. Removing the wrapping Model caused MCMC to correctly use an ListContainer instead.
Now, there's probably a bug in Model.py on line 543 where observable stochastics aren't stored in the database - the expression is for object in self.stochastics | self.deterministics: but I suspect it should include self.observable_stochastics too - so I needed to change observable to False or the last line would throw a KeyError.
I'm not familiar enough with pymc to determine if it's actually or bug or desired behaviour so I leave it up to you to submit an issue about it.

You simply need to define res as a numpy array:
res = np.array([18.752, 12.450, 11.832])
Then you'll get an error here mcmc.trace('mydist')because mydist is observed data, and therefore is not sampled. You probably want to plot other variables...

python error using PaCal statistical package

I recently started exploring Python and have encountred a problem with a package named PaCal
Everything looks to be working fine except that I keep having this error anytime I want to print out some data (like in print A.mean() )
the error line is :
Traceback (most recent call last):
File "C:\Users\rmobenta\Desktop\tt.py", line 12, in <module>
print A.interval(0.95)
File "C:\Python27\lib\site-packages\pacal\distr.py", line 229, in interval
return self.quantile(p_lim), self.quantile(1.0 - p_lim)
File "C:\Python27\lib\site-packages\pacal\distr.py", line 215, in quantile
return self.get_piecewise_cdf().inverse(y)
File "C:\Python27\lib\site-packages\pacal\segments.py", line 1721, in inverse
x = findinv(segi.f, a = segi.a, b = segi.b, c = y, rtol = params.segments.cumint.reltol, maxiter = params.segments.cumint.maxiter) # TODO PInd, MInf
File "C:\Python27\lib\site-packages\pacal\utils.py", line 384, in findinv
return brentq(lambda x : fun(x) - c, a, b, **kwargs)
File "C:\Python27\lib\site-packages\scipy\optimize\zeros.py", line 414, in brentq
raise ValueError("rtol too small (%g < %g)" % (rtol, _rtol))
ValueError: rtol too small (1e-16 < 4.44089e-16)
I am using a two-line script that I got for a demo (given by the author of this package) and have no idea how to tackle this issue.
Here is the script:
from pacal import *
Y = UniformDistr(1, 2)
X = UniformDistr(3, 4)
A = atan(Y / X)
A.plot()
print A.mean()
print A.interval(0.95)

The problem comes from PaCal that defines in l.141 of params.py: segments.vumint.reltol = 1e-16.
This is the value passed as rtol in segments.py to the SciPy function brentq().
Finally it is compared to numpy.finfo(float).eps * 2 (l.413 and l.10 of scipy/optimize/zeros.py) and is unfortunately lesser.
So it could be a problem of PaCal implementation, not related to your code.
Note that the value you provided to interval() corresponds to the default value (l.222 of distr.py).
I think you should contact the PaCal developers to get more informations and probably open an issue.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

float division by zero error related to ngram and nltk - python

Related

My created function won't accept an array as one of the arguments

scikitlearn adapt bigram to svm

Sage for Graph Theory, KeyError

How do I fix the 'TypeError: hasattr(): attribute name must be string' error?

python error using PaCal statistical package

Categories

Resources