gensim lemmatize error generator raised StopIteration

gensim lemmatize error generator raised StopIteration - python

I'm trying to execute simple code to lemmatize string, but there's an error about iteration.
I have found some solutions which are about reinstalling web.py, but this not worked for me.
python code
from gensim.utils import lemmatize
lemmatize("gone")
error is
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
I:\Anaconda\lib\site-packages\pattern\text\__init__.py in _read(path, encoding, comment)
608 yield line
--> 609 raise StopIteration
610
StopIteration:
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
<ipython-input-4-9daceee1900f> in <module>
1 from gensim.utils import lemmatize
----> 2 lemmatize("gone")
-------------------------------------------------------------------------------------
I:\Anaconda\lib\site-packages\pattern\text\__init__.py in <genexpr>(.0)
623 def load(self):
624 # Arnold NNP x
--> 625 dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if len(x.split(" ")) > 1))
626
627 #--- FREQUENCY -------------------------------------------------------------------------------------
RuntimeError: generator raised StopIteration

The error message is misleading – it occurs when there's nothing to properly lemmatize.
By default, lemmatize() only accepts word tags NN|VB|JJ|RB. Pass in a regexp that matches any string to change this:
>>> import re
>>> lemmatize("gone", allowed_tags=re.compile('.*'))
[b'go/VB']

Related

Generator raised StopIteration in find_job_titles package

I am trying to run this code:
from find_job_titles import FinderAcora
finder=FinderAcora()
finder.findall('IT Audit & Governance')
But it gives me this error everytime:
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
/usr/local/lib/python3.8/dist-packages/find_job_titles/__init__.py in longest_match(matches)
48 """
---> 49 longest = next(matches)
50
StopIteration:
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
1 frames
<ipython-input-31-5b965ac3d7be> in <module>
----> 1 finder.findall('IT Audit & Governance')
/usr/local/lib/python3.8/dist-packages/find_job_titles/__init__.py in findall(self, string, use_longest)
82 else return all overlapping matches
83 :returns: list of matches of type `Match`
---> 84 """
85 return list(self.finditer(string, use_longest=use_longest))
86
RuntimeError: generator raised StopIteration
I tried using the suggestions from this Stack Overflow post but it didn't work.

Issue creating data for training and testing using 3 folders containing images

I am running:
path = Path('/content/drive/MyDrive/X-Ray_Image_DataSet')
np.random.seed(41)
data = ImageDataBunch.from_folder(dta, train="Train", valid ="Valid", ds_tfms=get_transforms(),size=(256,256), bs=32, num_workers=4).normalize()
And I am getting this error:
/usr/local/lib/python3.7/dist-packages/fastai/data_block.py:458: UserWarning: Your training set is empty. If this is by design, pass `ignore_empty=True` to remove this warning.
warn("Your training set is empty. If this is by design, pass `ignore_empty=True` to remove this warning.")
/usr/local/lib/python3.7/dist-packages/fastai/data_block.py:461: UserWarning: Your validation set is empty. If this is by design, use `split_none()`
or pass `ignore_empty=True` when labelling to remove this warning.
or pass `ignore_empty=True` when labelling to remove this warning.""")
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/fastai/data_block.py in get_label_cls(self, labels, label_cls, label_delim, **kwargs)
264 if label_delim is not None: return MultiCategoryList
--> 265 try: it = index_row(labels,0)
266 except: raise Exception("""Can't infer the type of your targets.
7 frames
IndexError: index 0 is out of bounds for axis 0 with size 0
During handling of the above exception, another exception occurred:
Exception Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/fastai/data_block.py in get_label_cls(self, labels, label_cls, label_delim, **kwargs)
265 try: it = index_row(labels,0)
266 except: raise Exception("""Can't infer the type of your targets.
--> 267 It's either because your data source is empty or because your labelling function raised an error.""")
268 if isinstance(it, (float, np.float32)): return FloatList
269 if isinstance(try_int(it), (str, Integral)): return CategoryList
Exception: Can't infer the type of your targets.
It's either because your data source is empty or because your labelling function raised an error.

np.random.seed(41)
data = ImageDataBunch.from_folder(path, train = '.', valid_pct=0.2,
ds_tfms=get_transforms(), size=(256,256), bs=32, num_workers=4).normalize()
you can use this instead of that

How to perform assert introspection in Python

I'm looking into how to perform assert introspection in Python, in the same way that py.test does. For example...
>>> a = 1
>>> b = 2
>>> assert a == b
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError # <--- I want more information here, eg 'AssertionError: 1 != 2'
I see that the py.code library has some functionality around this and I've also seen this answer, noting that sys.excepthook allows you to plug in whatever behavior you want to exceptions, but it's not clear to me how to put it all together.

You can do something like this if you want to show a detailed error message
def assertion(a,b):
try:
assert a==b
except AssertionError as e:
e.args += ('some other', 'information',)
raise
a=1
b=2
assertion(a,b)
This code will give this output:
Traceback (most recent call last):
File "tp.py", line 11, in <module>
assertion(a,b)
File "tp.py", line 4, in assertion
assert a==b
AssertionError: ('some other', 'information')

The unittest assert gives extra information (possibly more than you need). Inspired by Raymond Hettinger's talk.
This is a partial answer, only giving the values for a and b (last line of the output), not the additional introspection you are also seeking that is unique in pytest.
import unittest
class EqualTest(unittest.TestCase):
def testEqual(self, a, b):
self.assertEqual(a, b)
a, b = 1, 2
assert_ = EqualTest().testEqual
assert_(a, b)
Output
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-4-851ce0f1f668> in <module>()
9 a, b = 1, 2
10 assert_ = EqualTest().testEqual
---> 11 assert_(a, b)
<ipython-input-4-851ce0f1f668> in testEqual(self, a, b)
4
5 def testEqual(self, a, b):
----> 6 self.assertEqual(a, b)
7
8
C:\Anaconda3\lib\unittest\case.py in assertEqual(self, first, second, msg)
818 """
819 assertion_func = self._getAssertEqualityFunc(first, second)
--> 820 assertion_func(first, second, msg=msg)
821
822 def assertNotEqual(self, first, second, msg=None):
C:\Anaconda3\lib\unittest\case.py in _baseAssertEqual(self, first, second, msg)
811 standardMsg = '%s != %s' % _common_shorten_repr(first, second)
812 msg = self._formatMessage(msg, standardMsg)
--> 813 raise self.failureException(msg)
814
815 def assertEqual(self, first, second, msg=None):
AssertionError: 1 != 2

I don't think it is straightforward to reproduce pytest's assert introspection in a standalone context. The docs contain a few more details on how it works:
pytest rewrites test modules on import. It does this by using an import hook to write a new pyc files. Most of the time this works transparently. However, if you are messing with import yourself, the import hook may interfere. If this is the case, simply use --assert=reinterp or --assert=plain. Additionally, rewriting will fail silently if it cannot write new pycs, i.e. in a read-only filesystem or a zipfile.
It looks like it would require quite some hacks to make that work in arbitrary modules, so you're probably better off to use a solution suggested in the other answers.

Error in reading and writing files in Python

I am trying to convert files from one format to other in Python. The current format is DAQ (data acquisition format), which is read in first. Then I use undaq Tools module to write the files to hdf5 format.
import glob
ctnames = glob.glob('*.daq')
Following are the few filenames (there are 100 in total):
ctnames
['Cars_20160601_01.daq',
'Cars_20160601_02.daq',
'Cars_20160601_03.daq',
'Cars_20160601_04.daq',
'Cars_20160601_05.daq',
'Cars_20160601_06.daq',
'Cars_20160601_07.daq',
.
.
.
## Importing undaq tools:
from undaqTools import Daq
Reading the DAQ files and writing to hdf5:
for n in ctnames:
x = daq.read(n)
daq.write_hd5(x)
Following is the error I got:
C:\Anaconda3\envs\py27\lib\site-packages\undaqtools-0.2.3-py2.7.egg\undaqTools\daq.py:405: RuntimeWarning: Failed loading file on frame 46970. (stopped reading file)
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-17-6fe7a8c9496d> in <module>()
1 for n in ctnames:
----> 2 x = daq.read(n)
3 daq.write_hd5(x)
C:\Anaconda3\envs\py27\lib\site-packages\undaqtools-0.2.3-py2.7.egg\undaqTools\daq.pyc in read_daq(self, filename, elemlist, loaddata, process_dynobjs, interpolate_missing_frames)
272
273 if loaddata:
--> 274 self._loaddata()
275 self._unwrap_lane_deviation()
276
C:\Anaconda3\envs\py27\lib\site-packages\undaqtools-0.2.3-py2.7.egg\undaqTools\daq.pyc in _loaddata(self)
449 assert tmpdata[name].shape[0] == frame.frame.shape[0]
450 else:
--> 451 assert tmpdata[name].shape[1] == frame.frame.shape[0]
452
453 # cast as Element objects
AssertionError:
Questions
I have 2 questions:
1. How do I know which of the 100 files is throwing the error?
2. How do I skip the files if they throw the error?

Wrap the read() call in a try/except block. If you get an exception, print the current filename and skip to the next one.
for n in ctnames:
try:
x = daq.read(n)
except AssertionError:
print 'Could not process file %s. Skipping.' % n
continue
daq.write_hd5(x)

Sage doesn't find cycle_type() attribute on a permutation element

Im trying to work on some group theory with Sage.
In particular I was trying to learn the basic commands related to symmetric groups.
My input is
G=SymmetricGroup(6)
sigma=G('(1,3,5)(4,6)')
then I use sigma.cycle_type() and according to the documentation, I should get as output a list with the lengths of the cycles that form sigma in decreasing order, in this case I should get something like [3,2]. Instead I get an "AttributeError" :
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-25-94f73ca80516> in <module>()
----> 1 sigma.cycle_type()
/home/sage/sage-7.2/src/sage/structure/element.pyx in sage.structure.element.Element.__getattr__ (/home/sage/sage-7.2/src/build/cythonized/sage/structure/element.c:4649)()
411 dummy_error_message.name = name
412 raise dummy_attribute_error
--> 413 return getattr_from_other_class(self, P._abstract_element_class, name)
414
415 def __dir__(self):
/home/sage/sage-7.2/src/sage/structure/misc.pyx in sage.structure.misc.getattr_from_other_class (/home/sage/sage-7.2/src/build/cythonized/sage/structure/misc.c:1870)()
257 dummy_error_message.cls = type(self)
258 dummy_error_message.name = name
--> 259 raise dummy_attribute_error
260 if isinstance(attribute, methodwrapper):
261 dummy_error_message.cls = type(self)
AttributeError: 'sage.groups.perm_gps.permgroup_element.SymmetricGroupElement' object has no attribute 'cycle_type'
What am I doing wrong?

Possibly you just need a newer version of Sage? In a late beta of 7.3, I get:
sage: sigma.cycle_type()
[3, 2, 1]
I should point out that the version in SageMathCloud appears to be too old for this currently, if that's your platform...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

gensim lemmatize error generator raised StopIteration - python

The error message is misleading – it occurs when there's nothing to properly lemmatize. By default, lemmatize() only accepts word tags NN|VB|JJ|RB. Pass in a regexp that matches any string to change this: >>> import re >>> lemmatize("gone", allowed_tags=re.compile('.*')) [b'go/VB']

Related

Generator raised StopIteration in find_job_titles package

Issue creating data for training and testing using 3 folders containing images

How to perform assert introspection in Python

Error in reading and writing files in Python

Sage doesn't find cycle_type() attribute on a permutation element

Categories

Resources