use polyglot package for Named Entity Recognition in hebrew

use polyglot package for Named Entity Recognition in hebrew - python

I am trying to use the polyglot package for Named Entity Recognition in hebrew.
this is my code:
# -*- coding: utf8 -*-
import polyglot
from polyglot.text import Text, Word
from polyglot.downloader import downloader
downloader.download("embeddings2.iw")
text = Text(u"in france and in germany")
print(type(text))
text2 = Text(u"נסעתי מירושלים לתל אביב")
print(type(text2))
print(text.entities)
print(text2.entities)
this is the output:
<class 'polyglot.text.Text'>
<class 'polyglot.text.Text'>
[I-LOC([u'france']), I-LOC([u'germany'])]
Traceback (most recent call last):
File "C:/Python27/Lib/site-packages/IPython/core/pyglot.py", line 15, in <module>
print(text2.entities)
File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 20, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "C:\Python27\lib\site-packages\polyglot\text.py", line 132, in entities
for i, (w, tag) in enumerate(self.ne_chunker.annotate(self.words)):
File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 20, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "C:\Python27\lib\site-packages\polyglot\text.py", line 100, in ne_chunker
return get_ner_tagger(lang=self.language.code)
File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 30, in memoizer
cache[key] = obj(*args, **kwargs)
File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 191, in get_ner_tagger
return NEChunker(lang=lang)
File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 104, in __init__
super(NEChunker, self).__init__(lang=lang)
File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 40, in __init__
self.predictor = self._load_network()
File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 109, in _load_network
self.embeddings = load_embeddings(self.lang, type='cw', normalize=True)
File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 30, in memoizer
cache[key] = obj(*args, **kwargs)
File "C:\Python27\lib\site-packages\polyglot\load.py", line 61, in load_embeddings
p = locate_resource(src_dir, lang)
File "C:\Python27\lib\site-packages\polyglot\load.py", line 43, in locate_resource
if downloader.status(package_id) != downloader.INSTALLED:
File "C:\Python27\lib\site-packages\polyglot\downloader.py", line 738, in status
info = self._info_or_id(info_or_id)
File "C:\Python27\lib\site-packages\polyglot\downloader.py", line 508, in _info_or_id
return self.info(info_or_id)
File "C:\Python27\lib\site-packages\polyglot\downloader.py", line 934, in info
raise ValueError('Package %r not found in index' % id)
ValueError: Package u'embeddings2.iw' not found in index
The english worked but not the hebrew.
Whether I try to download the package u'embeddings2.iw' or not I get:
ValueError: Package u'embeddings2.iw' not found in index

I got it!
It seems like a bug to me.
The language detection defined the language as 'iw' which is the The former ISO 639 language code for Hebrew, and was changed to 'he'.
The text.entities did not recognize the iw code, so i changes it like so:
text2.hint_language_code = 'he'

Related

xlwings recently stopped getting live data from excel via Range

I was running a script to get data from excel for over a year using the Xlwings range command like so...
list=Range('A1:D10').value
Suddenly, it stopper working. I had changed nothing in the code nor the system, other than maybe installing another network card.
This is the error when trying to use the Range assignment now.
Traceback (most recent call last):
File "G:\python32\fetcher.py", line 61, in <module>
listFull = getComData()
File "G:\python32\fetcher.py", line 38, in getComData
listFull=Range('A4:H184').value
File "G:\python32\lib\site-packages\xlwings\main.py", line 1490, in __init__
impl = apps.active.range(cell1).impl
File "G:\python32\lib\site-packages\xlwings\main.py", line 439, in range
return Range(impl=self.impl.range(cell1, cell2))
File "G:\python32\lib\site-packages\xlwings\_xlwindows.py", line 457, in range
xl1 = self.xl.Range(arg1)
File "G:\python32\lib\site-packages\xlwings\_xlwindows.py", line 341, in xl
self._xl = get_xl_app_from_hwnd(self._hwnd)
File "G:\python32\lib\site-packages\xlwings\_xlwindows.py", line 251, in get_xl_app_from_hwnd
disp = COMRetryObjectWrapper(Dispatch(p))
File "G:\python32\lib\site-packages\win32com\client\__init__.py", line 96, in Dispatch
return __WrapDispatch(dispatch, userName, resultCLSID, typeinfo, clsctx=clsctx)
File "G:\python32\lib\site-packages\win32com\client\__init__.py", line 37, in __WrapDispatch
klass = gencache.GetClassForCLSID(resultCLSID)
File "G:\python32\lib\site-packages\win32com\client\gencache.py", line 180, in GetClassForCLSID
mod = GetModuleForCLSID(clsid)
File "G:\python32\lib\site-packages\win32com\client\gencache.py", line 223, in GetModuleForCLSID
mod = GetModuleForTypelib(typelibCLSID, lcid, major, minor)
File "G:\python32\lib\site-packages\win32com\client\gencache.py", line 259, in GetModuleForTypelib
mod = _GetModule(modName)
File "G:\python32\lib\site-packages\win32com\client\gencache.py", line 622, in _GetModule
mod = __import__(mod_name)
ValueError: source code string cannot contain null bytes

PicklingError when getting the result from ray

I'm working on slowly converting my very serialized text analysis engine to use Modin and Ray. Feels like I'm nearly there, however, I seem to have hit a stumbling block. My code looks like this:
vectorizer = TfidfVectorizer(
analyzer=ngrams, encoding="ascii", stop_words="english", strip_accents="ascii"
)
tf_idf_matrix = vectorizer.fit_transform(r_strings["name"])
r_vectorizer = ray.put(vectorizer)
r_tf_idf_matrix = ray.put(tf_idf_matrix)
n = 2
match_results = []
for fn in files["c.file"]:
match_results.append(
match_name.remote(fn, r_vectorizer, r_tf_idf_matrix, r_strings, n)
)
match_returns = ray.get(match_results)
I'm following the guidance from the "anti-patterns" section in the Ray documentation, on what to avoid, and this is very similar to that of the "better" pattern.
Traceback (most recent call last):
File "alt.py", line 213, in <module>
match_returns = ray.get(match_results)
File "/home/myuser/.local/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 62, in wrapper
return func(*args, **kwargs)
File "/home/myuser/.local/lib/python3.7/site-packages/ray/worker.py", line 1501, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(PicklingError): ray::match_name() (pid=23393, ip=192.168.1.173)
File "python/ray/_raylet.pyx", line 564, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 565, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 1652, in ray._raylet.CoreWorker.store_task_outputs
File "/home/myuser/.local/lib/python3.7/site-packages/ray/serialization.py", line 327, in serialize
return self._serialize_to_msgpack(value)
File "/home/myuser/.local/lib/python3.7/site-packages/ray/serialization.py", line 307, in _serialize_to_msgpack
self._serialize_to_pickle5(metadata, python_objects)
File "/home/myuser/.local/lib/python3.7/site-packages/ray/serialization.py", line 267, in _serialize_to_pickle5
raise e
File "/home/myuser/.local/lib/python3.7/site-packages/ray/serialization.py", line 264, in _serialize_to_pickle5
value, protocol=5, buffer_callback=writer.buffer_callback)
File "/home/myuser/.local/lib/python3.7/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/home/myuser/.local/lib/python3.7/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 580, in dump
return Pickler.dump(self, obj)
_pickle.PicklingError: args[0] from __newobj__ args has the wrong class
Definitely an unexpected result. I'm not sure where to go next with this and would appreciate help from folks who have more experience with Ray and Modin.

Python connectivity with HBase using happybase

Can someone help me with the stacktrace generated while using happybase library?
I am trying to pass a dictionary object of python 3.4 in the 'put' method and the following stack trace is generated::
x
b"TWLb'25-Jan-13'"
data_values
{b'Low': b'0.10', b'Date': b'25-Jan-13', b'Volume': b'-', b'Close': b'0.12', b'High': b'0.12', b'Open': b'0.12'}
Traceback (most recent call last):
File "/home/msingal/Desktop/asd/Daily.py", line 63, in insert
hbase_test.insert_data(code, data_format)
File "/home/msingal/Desktop/asd/hbase_test.py", line 56, in insert_data
con.table(ticker, use_prefix=False).put(x, data_values)
File "/usr/lib/python3.4/site-packages/happybase/table.py", line 464, in put
batch.put(row, data)
File "/usr/lib/python3.4/site-packages/happybase/batch.py", line 137, in __exit__
self.send()
File "/usr/lib/python3.4/site-packages/happybase/batch.py", line 60, in send
self._table.connection.client.mutateRows(self._table.name, bms, {})
File "/usr/lib64/python3.4/site-packages/thriftpy/thrift.py", line 198, in _req
return self._recv(_api)
File "/usr/lib64/python3.4/site-packages/thriftpy/thrift.py", line 210, in _recv
fname, mtype, rseqid = self._iprot.read_message_begin()
File "thriftpy/protocol/cybin/cybin.pyx", line 429, in cybin.TCyBinaryProtocol.read_message_begin (thriftpy/protocol/cybin/cybin.c:6325)
File "thriftpy/protocol/cybin/cybin.pyx", line 60, in cybin.read_i32 (thriftpy/protocol/cybin/cybin.c:1546)
File "thriftpy/transport/buffered/cybuffered.pyx", line 65, in thriftpy.transport.buffered.cybuffered.TCyBufferedTransport.c_read (thriftpy/transport/buffered/cybuffered.c:1881)
File "thriftpy/transport/buffered/cybuffered.pyx", line 69, in thriftpy.transport.buffered.cybuffered.TCyBufferedTransport.read_trans (thriftpy/transport/buffered/cybuffered.c:1948)
File "thriftpy/transport/cybase.pyx", line 61, in thriftpy.transport.cybase.TCyBuffer.read_trans (thriftpy/transport/cybase.c:1472)
File "/usr/lib64/python3.4/site-packages/thriftpy/transport/socket.py", line 125, in read
message='TSocket read 0 bytes')
thriftpy.transport.TTransportException: TTransportException(message='TSocket read 0 bytes', type=4)
The lines of code are::
ticker = 'TWLB'
data_values = {b'Low': b'0.10', b'Date': b'25-Jan-13', b'Volume': b'-', b'Close': b'0.12', b'High': b'0.12', b'Open': b'0.12'}
x = (ticker + str(data_values.get(b'Date'))).encode('ASCII')
print('x')
print(x)
print('data_values')
print(data_values)
con.table(ticker, use_prefix=False).put(x, data_values)
Any help on solution and explaination would be apprciated.
I am new to StackOverflow so if my language feels offencive, please forgive me. I have tried to provide all the relevant info but if some info is missing let me know and I will update.

POLYGLOT >> ValueError: Package u'pos2.ms' not found in index

I learn to use polyglot to give POS tag Indonesian texts.
import polyglot
from polyglot.text import Text, Word
text=Text("Menurut dia, Syahganda, dikenal sebagai penggiat isu-isu pertanahan serta perburuhan.")
print text.pos_tags
But error appeared:
Traceback (most recent call last):
File "polyglot-tagger.py", line 35, in <module>
arrTag=text.pos_tags
File "/usr/local/lib/python2.7/dist-packages/polyglot/decorators.py", line 20, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "/usr/local/lib/python2.7/dist-packages/polyglot/text.py", line 147, in pos_tags
for word,t in self.pos_tagger.annotate(self.words):
File "/usr/local/lib/python2.7/dist-packages/polyglot/decorators.py", line 20, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "/usr/local/lib/python2.7/dist-packages/polyglot/text.py", line 100, in pos_tagger
return get_pos_tagger(lang=self.language.code)
File "/usr/local/lib/python2.7/dist-packages/polyglot/decorators.py", line 30, in memoizer
cache[key] = obj(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/polyglot/tag/base.py", line 147, in get_pos_tagger
return POSTagger(lang=lang)
File "/usr/local/lib/python2.7/dist-packages/polyglot/tag/base.py", line 126, in __init__
super(POSTagger, self).__init__(lang=lang)
File "/usr/local/lib/python2.7/dist-packages/polyglot/tag/base.py", line 40, in __init__
self.predictor = self._load_network()
File "/usr/local/lib/python2.7/dist-packages/polyglot/tag/base.py", line 134, in _load_network
self.model = load_pos_model(lang=self.lang, version=2)
File "/usr/local/lib/python2.7/dist-packages/polyglot/decorators.py", line 30, in memoizer
cache[key] = obj(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/polyglot/load.py", line 114, in load_pos_model
p = locate_resource(src_dir, lang)
File "/usr/local/lib/python2.7/dist-packages/polyglot/load.py", line 47, in locate_resource
if downloader.status(package_id) != downloader.INSTALLED:
File "/usr/local/lib/python2.7/dist-packages/polyglot/downloader.py", line 737, in status
info = self._info_or_id(info_or_id)
File "/usr/local/lib/python2.7/dist-packages/polyglot/downloader.py", line 507, in _info_or_id
return self.info(info_or_id)
File "/usr/local/lib/python2.7/dist-packages/polyglot/downloader.py", line 933, in info
raise ValueError('Package %r not found in index' % id)
ValueError: Package u'pos2.ms' not found in index
When I tried to download pos2.ms(Part-of-speech Model for Malay), it doesn't exists in model. What should I do?
**I use Ubuntu and python 2.7
Thanks for your help before

Check the language coverage for Malay
http://polyglot.readthedocs.org/en/latest/POS.html#languages-coverage
We are planning to add more languages in the futrue

Unable to read column family using pycassa

I've just started using pycassa, so if this is a stupid question, I apologize upfront.
I have a column family with the following schema:
create column family MyColumnFamilyTest
with column_type = 'Standard'
and comparator = 'CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.TimeUUIDType)'
and default_validation_class = 'BytesType'
and key_validation_class = 'UTF8Type'
and read_repair_chance = 0.1
and dclocal_read_repair_chance = 0.0
and populate_io_cache_on_flush = false
and gc_grace = 864000
and min_compaction_threshold = 4
and max_compaction_threshold = 32
and replicate_on_write = true
and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
and caching = 'KEYS_ONLY'
and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'};
When I try to do a get() with a valid key (works fine in cassandra-cli) I get:
Traceback (most recent call last):
File "<pyshell#19>", line 1, in <module>
cf.get('mykey',column_count=3)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 664, in get
return self._cosc_to_dict(list_col_or_super, include_timestamp, include_ttl)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 368, in _cosc_to_dict
ret[self._unpack_name(col.name)] = self._col_to_dict(col, include_timestamp, include_ttl)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 444, in _unpack_name
return self._name_unpacker(b)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 140, in unpack_composite
components.append(unpacker(bytestr[2:2 + length]))
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 374, in <lambda>
return lambda v: uuid.UUID(bytes=v)
File "/usr/lib/python2.7/uuid.py", line 144, in __init__
raise ValueError('bytes is not a 16-char string')
ValueError: bytes is not a 16-char string
Here's some more information I've discovered:
When using cassandra-cli I can see the data as:
% cassandra-cli -h 10.249.238.131
Connected to: "LocalDB" on 10.249.238.131/9160
Welcome to Cassandra CLI version 1.2.10-SNAPSHOT
Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.
[default#unknown] use Keyspace;
[default#Keyspace] list ColumnFamily;
Using default limit of 100
Using default cell limit of 100
-------------------
RowKey: urn:keyspace:ColumnFamily:a36e8ab1-7032-4e4c-a53d-e3317f63a640:
=> (name=autoZoning:::, value=01, timestamp=1391298393966000)
=> (name=creationTime:::, value=00000143efd8b76e, timestamp=1391298393966000)
=> (name=inactive:::14fe78e0-8b9b-11e3-b171-005056b700bb, value=00, timestamp=1391298393966000)
=> (name=label:::14fe78e0-8b9b-11e3-b171-005056b700bb, value=726a6d2d766e782d76613031, timestamp=1391298393966000)
1 Row Returned.
Elapsed time: 16 msec(s).
Since it was unclear what was causing the exception, I decided to add a print prior to the 'return self._name_unpacker(b)' line in columnfamily.py and I see:
>>> cf.get(dict(cf.get_range(column_count=0,filter_empty=False)).keys()[0])
Attempting to unpack: <00>\rautoZoning<00><00><00><00><00><00><00><00><00><00>
Traceback (most recent call last):
File "<pyshell#172>", line 1, in <module>
cf.get(dict(cf.get_range(column_count=0,filter_empty=False)).keys()[0])
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 665, in get
return self._cosc_to_dict(list_col_or_super, include_timestamp, include_ttl)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 368, in _cosc_to_dict
ret[self._unpack_name(col.name)] = self._col_to_dict(col, include_timestamp, include_ttl)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 445, in _unpack_name
return self._name_unpacker(b)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 140, in unpack_composite
components.append(unpacker(bytestr[2:2 + length]))
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 374, in <lambda>
return lambda v: uuid.UUID(bytes=v)
File "/usr/lib/python2.7/uuid.py", line 144, in __init__
raise ValueError('bytes is not a 16-char string')
ValueError: bytes is not a 16-char string
I have no idea where the extra characters are coming from around the column name. But that got me curious so I added another print in _cosc_to_dict in columnfamily.py and I see:
>>> cf.get(dict(cf.get_range(column_count=0,filter_empty=False)).keys()[0])
list_col_or_super is: []
list_col_or_super is: [ColumnOrSuperColumn(column=Column(timestamp=1391298393966000,
name='\x00\rautoZoning\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', value='\x01', ttl=None),
counter_super_column=None, super_column=None, counter_column=None),
ColumnOrSuperColumn(column=Column(timestamp=1391298393966000,
name='\x00\x0ccreationTime\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
value='\x00\x00\x01C\xef\xd8\xb7n', ttl=None), counter_super_column=None, super_column=None,
counter_column=None), ColumnOrSuperColumn(column=Column(timestamp=1391298393966000,
name='\x00\x08inactive\x00\x00\x00\x00\x00\x00\x00\x00\x10\x14\xfex\xe0\x8b\x9b\x11\xe3\xb1q\x00PV\xb7\x00\xbb\x00', value='\x00', ttl=None), counter_super_column=None, super_column=None,
counter_column=None), ColumnOrSuperColumn(column=Column(timestamp=1391298393966000,
name='\x00\x05label\x00\x00\x00\x00\x00\x00\x00\x00\x10\x14\xfex\xe0\x8b\x9b\x11\xe3\xb1q\x00PV\xb7\x00\xbb\x00', value='thisIsATest', ttl=None), counter_super_column=None, super_column=None, counter_column=None)]
autoZoning unpack:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/columnfamily.py", line 666, in get
return self._cosc_to_dict(list_col_or_super, include_timestamp, include_ttl)
File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/columnfamily.py", line 369, in _cosc_to_dict
ret[self._unpack_name(col.name)] = self._col_to_dict(col, include_timestamp, include_ttl)
File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/columnfamily.py", line 446, in _unpack_name
return self._name_unpacker(b)
File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/marshal.py", line 140, in unpack_composite
components.append(unpacker(bytestr[2:2 + length]))
File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/marshal.py", line 374, in <lambda>
return lambda v: uuid.UUID(bytes=v)
File "/usr/lib64/python2.6/uuid.py", line 144, in __init__
raise ValueError('bytes is not a 16-char string')
ValueError: bytes is not a 16-char string
Am I correct in assuming that the extra characters around the column names are what is responsible for the 'ValueError: bytes is not a 16-char string' exception?
Also if I try to use the column name and select it I get:
>>> cf.get(u'urn:keyspace:ColumnFamily:a36e8ab1-7032-4e4c-a53d-e3317f63a640:',columns=['autoZoning:::'])
Traceback (most recent call last):
File "<pyshell#184>", line 1, in <module>
cf.get(u'urn:keyspace:ColumnFamily:a36e8ab1-7032-4e4c-a53d-e3317f63a640:',columns=['autoZoning:::'])
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 651, in get
cp = self._column_path(super_column, column)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 383, in _column_path
self._pack_name(column, False))
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 426, in _pack_name
return self._name_packer(value, slice_start)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 115, in pack_composite
packed = packer(item)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 298, in pack_uuid
randomize=True)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/util.py", line 75, in convert_time_to_uuid
'neither a UUID, a datetime, or a number')
ValueError: Argument for a v1 UUID column name or value was neither a UUID, a datetime, or a number
Any further thoughts?
Thanks,
Rob

Turns out that the problem wasn't with the key, it was being caused, in part, by a bug in pycassa that wasn't handling an empty (null) string in the column UUID. A short-term fix is in the answer in google groups:
https://groups.google.com/d/msg/pycassa-discuss/Vf_bSgDIi9M/KTA1kbE9IXAJ
The other part of the answer was to get at the columns by using tuples (with the UUID as a UUID and not a str) instead of a string with ':' separators because that's, as I found out, a cassandra-cli thing.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

use polyglot package for Named Entity Recognition in hebrew - python

I got it! It seems like a bug to me. The language detection defined the language as 'iw' which is the The former ISO 639 language code for Hebrew, and was changed to 'he'. The text.entities did not recognize the iw code, so i changes it like so: text2.hint_language_code = 'he'

Related

xlwings recently stopped getting live data from excel via Range

PicklingError when getting the result from ray

Python connectivity with HBase using happybase

POLYGLOT >> ValueError: Package u'pos2.ms' not found in index

Unable to read column family using pycassa

Categories

Resources