I've written a script that reads an .xlsx file using openpyxl on MacOS.
I made updates to allow my script to run on MacOS or Linux. Openpyxl is raising an error when calling load_workbook.
import platform
system = platform.system()
if system == "Darwin":
cpgatt = "/Users/tlombard/desktop/py-sandbox/PickleJar/CSG_CHG_Clarity_attributes1.xlsx"
if system == 'Linux':
cpgatt = "/ws/tlombard-sjc/Picklejar/CSG_CHG_Clarity_attributes1.xlsx"
cpgatt_wb = openpyxl.load_workbook(cpgatt)
The error I get is:
"TypeError: __init__() got an unexpected keyword argument 'quotePrefix'"
The full traceback:-------
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/auto/elb_hwtools/cae/lib/python/v2.7.8/openpyxl/reader/excel.py", line 149, in load_workbook
_load_workbook(wb, archive, filename, read_only, keep_vba)
File "/auto/elb_hwtools/cae/lib/python/v2.7.8/openpyxl/reader/excel.py", line 237, in _load_workbook
keep_vba=keep_vba)
File "/auto/elb_hwtools/cae/lib/python/v2.7.8/openpyxl/reader/worksheet.py", line 324, in read_worksheet
fast_parse(ws, xml_source, shared_strings, style_table, color_index)
File "/auto/elb_hwtools/cae/lib/python/v2.7.8/openpyxl/reader/worksheet.py", line 312, in fast_parse
parser.parse()
File "/auto/elb_hwtools/cae/lib/python/v2.7.8/openpyxl/reader/worksheet.py", line 93, in parse
dispatcher[tag_name](element)
File "/auto/elb_hwtools/cae/lib/python/v2.7.8/openpyxl/reader/worksheet.py", line 189, in parse_row_dimensions
self.parse_cell(cell)
File "/auto/elb_hwtools/cae/lib/python/v2.7.8/openpyxl/reader/worksheet.py", line 134, in parse_cell
cell = Cell(self.ws, column, row, **style)
TypeError: __init__() got an unexpected keyword argument 'quotePrefix'
>>> cpgatt
'/ws/tlombard-sjc/Picklejar/CSG_CHG_Clarity_attributes1.xlsx'
>>>
I had the same problem, here is my full trace:
Traceback (most recent call last):
File "__init__.py", line 318, in readExcelFile
wb2 = load_workbook( unicode(self.ui_sourcePath.text()))#self.ui_sourcePath.text().replace("/", "\\"))
File "C:\Python27\lib\site-packages\openpyxl\reader\excel.py", line 149, in load_workbook
_load_workbook(wb, archive, filename, read_only, keep_vba)
File "C:\Python27\lib\site-packages\openpyxl\reader\excel.py", line 237, in _load_workbook
keep_vba=keep_vba)
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 325, in read_worksheet
fast_parse(ws, xml_source, shared_strings, style_table, color_index)
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 313, in fast_parse
parser.parse()
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 93, in parse
dispatcher[tag_name](element)
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 190, in parse_row_dimensions
self.parse_cell(cell)
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 135, in parse_cell
cell = Cell(self.ws, column, row, **style)
TypeError: __init__() got an unexpected keyword argument 'quotePrefix'
As you can see in the trace, the error is occuring when openpyxl tries to parse the workbook. I added the following line in C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py #135
column, row = coordinate_from_string(coordinate)
print column, row
cell = Cell(self.ws, column, row, **style)
The problem came from one of the cell in my Excel workbook, and only one. My temporary solution to this problem is to delete de quotePrefix key in the style dict:
style = {}
if style_id is not None:
style_id = int(style_id)
style = self.styles[style_id]
if "quotePrefix" in style.keys():
del style["quotePrefix"]
column, row = coordinate_from_string(coordinate)
cell = Cell(self.ws, column, row, **style)
self.ws._add_cell(cell)
I couldn't manage to find why this particular cell cause these problems and how to override a quote prefix.
Related
I was running a script to get data from excel for over a year using the Xlwings range command like so...
list=Range('A1:D10').value
Suddenly, it stopper working. I had changed nothing in the code nor the system, other than maybe installing another network card.
This is the error when trying to use the Range assignment now.
Traceback (most recent call last):
File "G:\python32\fetcher.py", line 61, in <module>
listFull = getComData()
File "G:\python32\fetcher.py", line 38, in getComData
listFull=Range('A4:H184').value
File "G:\python32\lib\site-packages\xlwings\main.py", line 1490, in __init__
impl = apps.active.range(cell1).impl
File "G:\python32\lib\site-packages\xlwings\main.py", line 439, in range
return Range(impl=self.impl.range(cell1, cell2))
File "G:\python32\lib\site-packages\xlwings\_xlwindows.py", line 457, in range
xl1 = self.xl.Range(arg1)
File "G:\python32\lib\site-packages\xlwings\_xlwindows.py", line 341, in xl
self._xl = get_xl_app_from_hwnd(self._hwnd)
File "G:\python32\lib\site-packages\xlwings\_xlwindows.py", line 251, in get_xl_app_from_hwnd
disp = COMRetryObjectWrapper(Dispatch(p))
File "G:\python32\lib\site-packages\win32com\client\__init__.py", line 96, in Dispatch
return __WrapDispatch(dispatch, userName, resultCLSID, typeinfo, clsctx=clsctx)
File "G:\python32\lib\site-packages\win32com\client\__init__.py", line 37, in __WrapDispatch
klass = gencache.GetClassForCLSID(resultCLSID)
File "G:\python32\lib\site-packages\win32com\client\gencache.py", line 180, in GetClassForCLSID
mod = GetModuleForCLSID(clsid)
File "G:\python32\lib\site-packages\win32com\client\gencache.py", line 223, in GetModuleForCLSID
mod = GetModuleForTypelib(typelibCLSID, lcid, major, minor)
File "G:\python32\lib\site-packages\win32com\client\gencache.py", line 259, in GetModuleForTypelib
mod = _GetModule(modName)
File "G:\python32\lib\site-packages\win32com\client\gencache.py", line 622, in _GetModule
mod = __import__(mod_name)
ValueError: source code string cannot contain null bytes
I have a .xlsx sheet with a huge filename like this: child_id_survey_918_choose_touching_results_1541687696.xlsx when I try to use the load_workbook() to load this file it gives me a weird error.
My python file and the excel sheet both lie in the same folder.
CODE:
from openpyxl import load_workbook
workbook = load_workbook('child_id_survey_918_choose_touching_results_1541687696.xlsx')
Error:
Traceback (most recent call last):
File "test.py", line 2, in <module>
workbook = load_workbook('child_id_survey_918_choose_touching_results_1541687696.xlsx')
File "C:\Python38\lib\site-packages\openpyxl\reader\excel.py", line 315, in load_workbook
reader.read()
File "C:\Python38\lib\site-packages\openpyxl\reader\excel.py", line 280, in read
self.read_worksheets()
File "C:\Python38\lib\site-packages\openpyxl\reader\excel.py", line 228, in read_worksheets
ws_parser.bind_all()
File "C:\Python38\lib\site-packages\openpyxl\worksheet\_reader.py", line 434, in bind_all
self.bind_cells()
File "C:\Python38\lib\site-packages\openpyxl\worksheet\_reader.py", line 337, in bind_cells
for idx, row in self.parser.parse():
File "C:\Python38\lib\site-packages\openpyxl\worksheet\_reader.py", line 153, in parse
row = self.parse_row(element)
File "C:\Python38\lib\site-packages\openpyxl\worksheet\_reader.py", line 274, in parse_row
cells = [self.parse_cell(el) for el in row]
File "C:\Python38\lib\site-packages\openpyxl\worksheet\_reader.py", line 274, in <listcomp>
cells = [self.parse_cell(el) for el in row]
File "C:\Python38\lib\site-packages\openpyxl\worksheet\_reader.py", line 181, in parse_cell
style_id = int(style_id)
ValueError: invalid literal for int() with base 10: 'string'
I have also tried using this: workbook = load_workbook(filename = 'child_id_survey_918_choose_touching_results_1541687696.xlsx')
Why am I getting this error and how can I solve it?
I'm new to Python, i was try in to access a file via openpyxl and
on using the following code:
import openpyxl
wb1=openpyxl.load_workbook('DATA_G1.xlsm')
I get an error
TypeError: __init__() got an unexpected keyword argument 'noTextEdit''
EDIT1: I'm putting here the complete line
>>> import openpyxl
>>> os.chdir('C:\\Users\\stephinj\\OneDrive\\LEARN_CODE')
>>> wb=openpyxl.load_workbook('DATA_G1.xlsm')
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "C:\Users\stephinj\AppData\Local\Programs\Python\Python37-32\lib\site-packages\openpyxl\reader\excel.py", line 276, in load_workbook
for c in find_charts(archive, rel.target):
File "C:\Users\stephinj\AppData\Local\Programs\Python\Python37-32\lib\site-packages\openpyxl\chart\reader.py", line 50, in find_charts
drawing = SpreadsheetDrawing.from_tree(tree)
File "C:\Users\stephinj\AppData\Local\Programs\Python\Python37-32\lib\site-packages\openpyxl\descriptors\serialisable.py", line 84, in from_tree
obj = desc.expected_type.from_tree(el)
File "C:\Users\stephinj\AppData\Local\Programs\Python\Python37-32\lib\site-packages\openpyxl\descriptors\serialisable.py", line 84, in from_tree
obj = desc.expected_type.from_tree(el)
File "C:\Users\stephinj\AppData\Local\Programs\Python\Python37-32\lib\site-packages\openpyxl\descriptors\serialisable.py", line 84, in from_tree
obj = desc.expected_type.from_tree(el)
[Previous line repeated 1 more times]
File "C:\Users\stephinj\AppData\Local\Programs\Python\Python37-32\lib\site-packages\openpyxl\descriptors\serialisable.py", line 100, in from_tree
return cls(**attrib)
TypeError: __init__() got an unexpected keyword argument 'noTextEdit''
Edit2:
>>> print(openpyxl.__version__)
2.5.8
>>> wb1 = openpyxl.load_workbook('test')
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "C:\Users\stephinj\AppData\Local\Programs\Python\Python37-32\lib\site-packages\openpyxl\reader\excel.py", line 175, in load_workbook
archive = _validate_archive(filename)
File "C:\Users\stephinj\AppData\Local\Programs\Python\Python37-32\lib\site-packages\openpyxl\reader\excel.py", line 122, in _validate_archive
archive = ZipFile(filename, 'r', ZIP_DEFLATED)
File "C:\Users\stephinj\AppData\Local\Programs\Python\Python37-32\lib\zipfile.py", line 1182, in __init__
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: 'test'
Edit3: Solved
Found Error, its because of some bug related to 'Autoshape' object in excel sheet
File "C:\Users\stephinj\AppData\Local\Programs\Python\Python37-32\lib\site-packages\openpyxl\chart\reader.py", line 50, in find_charts
drawing = SpreadsheetDrawing.from_tree(tree)
Solved
Found Error, its because of some bug related to 'Autoshape' object in excel sheet
File "C:\Users\stephinj\AppData\Local\Programs\Python\Python37-32\lib\site-packages\openpyxl\chart\reader.py", line 50, in find_charts
drawing = SpreadsheetDrawing.from_tree(tree)
I am trying to open an xlsx file that is created by another system (and this is the format in which the data always comes, and is not in my control). I tried both openpyxl (v2.3.2) and xlrd (v1.0.0) (as well as pandas (v0.20.1) read_excel and pd.ExcelFile(), both of which are using xlrd, and so may be moot), and I am running into errors; plus not finding answers from my searches. Any help is appreciated.
xlrd code:
import xlrd
workbook = xlrd.open_workbook(r'C:/Temp/Data.xlsx')
Error:
Traceback (most recent call last):
File "<ipython-input-3-9e5d87f720d0>", line 2, in <module>
workbook = xlrd.open_workbook(r'C:/Temp/Data.xlsx')
File "C:\Program Files\Anaconda3\lib\site-packages\xlrd\__init__.py", line 422, in open_workbook
ragged_rows=ragged_rows,
File "C:\Program Files\Anaconda3\lib\site-packages\xlrd\xlsx.py", line 833, in open_workbook_2007_xml
x12sheet.process_stream(zflo, heading)
File "C:\Program Files\Anaconda3\lib\site-packages\xlrd\xlsx.py", line 548, in own_process_stream
self_do_row(elem)
File "C:\Program Files\Anaconda3\lib\site-packages\xlrd\xlsx.py", line 685, in do_row
self.sheet.put_cell(rowx, colx, None, float(tvalue), xf_index)
ValueError: could not convert string to float:
openpyxl code:
import openpyxl
wb = openpyxl.load_workbook(r'C:/Temp/Data.xlsx')
Error:
Traceback (most recent call last):
File "<ipython-input-2-6083ad2bc875>", line 1, in <module>
wb = openpyxl.load_workbook(r'C:/Temp/Data.xlsx')
File "C:\Program Files\Anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 234, in load_workbook
parser.parse()
File "C:\Program Files\Anaconda3\lib\site-packages\openpyxl\reader\worksheet.py", line 106, in parse
dispatcher[tag_name](element)
File "C:\Program Files\Anaconda3\lib\site-packages\openpyxl\reader\worksheet.py", line 243, in parse_row_dimensions
self.parse_cell(cell)
File "C:\Program Files\Anaconda3\lib\site-packages\openpyxl\reader\worksheet.py", line 188, in parse_cell
value = _cast_number(value)
File "C:\Program Files\Anaconda3\lib\site-packages\openpyxl\cell\read_only.py", line 23, in _cast_number
return long(value)
ValueError: invalid literal for int() with base 10: ' '
pandas code:
import pandas as pd
df = pd.read_excel(r'C:/Temp/Data.xlsx', sheetname='Sheet1')
Error:
Traceback (most recent call last):
File "<ipython-input-5-b86ec98a4e9e>", line 2, in <module>
df = pd.read_excel(r'C:/Temp/Data.xlsx', sheetname='Sheet1')
File "C:\Program Files\Anaconda3\lib\site-packages\pandas\io\excel.py", line 200, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Program Files\Anaconda3\lib\site-packages\pandas\io\excel.py", line 257, in __init__
self.book = xlrd.open_workbook(io)
File "C:\Program Files\Anaconda3\lib\site-packages\xlrd\__init__.py", line 422, in open_workbook
ragged_rows=ragged_rows,
File "C:\Program Files\Anaconda3\lib\site-packages\xlrd\xlsx.py", line 833, in open_workbook_2007_xml
x12sheet.process_stream(zflo, heading)
File "C:\Program Files\Anaconda3\lib\site-packages\xlrd\xlsx.py", line 548, in own_process_stream
self_do_row(elem)
File "C:\Program Files\Anaconda3\lib\site-packages\xlrd\xlsx.py", line 685, in do_row
self.sheet.put_cell(rowx, colx, None, float(tvalue), xf_index)
ValueError: could not convert string to float:
For what its worth, here is an example snippet of the input file:
I am guessing that the errors are coming from the first row having blanks beyond the first column - because the errors vanish when I delete the first two rows and . I cannot skip the first two rows, because I want to extract the value in cell A1. I would also like to force the values read to be string type, and will later convert to float with error checking. thanks!
===========
Update(Aug 9 10AM EDT): Using Charlie's suggestion, was able to open excel file in read only mode; and was able to read most of the contents - but still running into an error somewhere.
new code (sorry it is not very pythonic - still a newbie):
wb = openpyxl.load_workbook(r'C:/Temp/Data.xlsx', read_only=True)
ws = wb['Sheet1']
ws.max_row = ws.max_column = None
i=1
for row in ws.rows:
for cell in row:
if i<2000:
i += 1
try:
print(i, cell.value)
except:
print("error")
Error:
Traceback (most recent call last):
File "<ipython-input-65-2e8f3cf2294a>", line 2, in <module>
for row in ws.rows:
File "C:\Program Files\Anaconda3\lib\site-packages\openpyxl\worksheet\read_only.py", line 125, in get_squared_range
yield tuple(self._get_row(element, min_col, max_col))
File "C:\Program Files\Anaconda3\lib\site-packages\openpyxl\worksheet\read_only.py", line 165, in _get_row
value, data_type, style_id)
File "C:\Program Files\Anaconda3\lib\site-packages\openpyxl\cell\read_only.py", line 36, in __init__
self.value = value
File "C:\Program Files\Anaconda3\lib\site-packages\openpyxl\cell\read_only.py", line 132, in value
value = _cast_number(value)
File "C:\Program Files\Anaconda3\lib\site-packages\openpyxl\cell\read_only.py", line 23, in _cast_number
return long(value)
ValueError: invalid literal for int() with base 10: ' '
=========
Update2 (10:35AM): when i read the file without ws.max_row and ws.max_column set as None, the code was reading just one column, without errors. The value in cell A66 is "Generated from:". But when i read the file with ws.max_row and ws.max_column set as None, this particular cell is causing trouble. But I can read all other cells before that, and that will work fine for me, right now. thanks, #Charlie.
Sounds like the source file is probably corrupt and contains cells that with empty strings that are typed as numbers. You might be able to use openpyxl's read-only mode to skip the first tow rows.
If your program works after you delete the first two rows then lets skip them. try use skiprows to ignore the first 2 rows that are blanks or are headers. you can use the parse method from panda.
xls = pd.read_excel('C:/Temp/Data.xlsx')
df = xls.parse('Sheet1', skiprows=2) #assuming your data is on sheet1.
I've just started using pycassa, so if this is a stupid question, I apologize upfront.
I have a column family with the following schema:
create column family MyColumnFamilyTest
with column_type = 'Standard'
and comparator = 'CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.TimeUUIDType)'
and default_validation_class = 'BytesType'
and key_validation_class = 'UTF8Type'
and read_repair_chance = 0.1
and dclocal_read_repair_chance = 0.0
and populate_io_cache_on_flush = false
and gc_grace = 864000
and min_compaction_threshold = 4
and max_compaction_threshold = 32
and replicate_on_write = true
and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
and caching = 'KEYS_ONLY'
and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'};
When I try to do a get() with a valid key (works fine in cassandra-cli) I get:
Traceback (most recent call last):
File "<pyshell#19>", line 1, in <module>
cf.get('mykey',column_count=3)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 664, in get
return self._cosc_to_dict(list_col_or_super, include_timestamp, include_ttl)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 368, in _cosc_to_dict
ret[self._unpack_name(col.name)] = self._col_to_dict(col, include_timestamp, include_ttl)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 444, in _unpack_name
return self._name_unpacker(b)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 140, in unpack_composite
components.append(unpacker(bytestr[2:2 + length]))
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 374, in <lambda>
return lambda v: uuid.UUID(bytes=v)
File "/usr/lib/python2.7/uuid.py", line 144, in __init__
raise ValueError('bytes is not a 16-char string')
ValueError: bytes is not a 16-char string
Here's some more information I've discovered:
When using cassandra-cli I can see the data as:
% cassandra-cli -h 10.249.238.131
Connected to: "LocalDB" on 10.249.238.131/9160
Welcome to Cassandra CLI version 1.2.10-SNAPSHOT
Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.
[default#unknown] use Keyspace;
[default#Keyspace] list ColumnFamily;
Using default limit of 100
Using default cell limit of 100
-------------------
RowKey: urn:keyspace:ColumnFamily:a36e8ab1-7032-4e4c-a53d-e3317f63a640:
=> (name=autoZoning:::, value=01, timestamp=1391298393966000)
=> (name=creationTime:::, value=00000143efd8b76e, timestamp=1391298393966000)
=> (name=inactive:::14fe78e0-8b9b-11e3-b171-005056b700bb, value=00, timestamp=1391298393966000)
=> (name=label:::14fe78e0-8b9b-11e3-b171-005056b700bb, value=726a6d2d766e782d76613031, timestamp=1391298393966000)
1 Row Returned.
Elapsed time: 16 msec(s).
Since it was unclear what was causing the exception, I decided to add a print prior to the 'return self._name_unpacker(b)' line in columnfamily.py and I see:
>>> cf.get(dict(cf.get_range(column_count=0,filter_empty=False)).keys()[0])
Attempting to unpack: <00>\rautoZoning<00><00><00><00><00><00><00><00><00><00>
Traceback (most recent call last):
File "<pyshell#172>", line 1, in <module>
cf.get(dict(cf.get_range(column_count=0,filter_empty=False)).keys()[0])
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 665, in get
return self._cosc_to_dict(list_col_or_super, include_timestamp, include_ttl)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 368, in _cosc_to_dict
ret[self._unpack_name(col.name)] = self._col_to_dict(col, include_timestamp, include_ttl)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 445, in _unpack_name
return self._name_unpacker(b)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 140, in unpack_composite
components.append(unpacker(bytestr[2:2 + length]))
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 374, in <lambda>
return lambda v: uuid.UUID(bytes=v)
File "/usr/lib/python2.7/uuid.py", line 144, in __init__
raise ValueError('bytes is not a 16-char string')
ValueError: bytes is not a 16-char string
I have no idea where the extra characters are coming from around the column name. But that got me curious so I added another print in _cosc_to_dict in columnfamily.py and I see:
>>> cf.get(dict(cf.get_range(column_count=0,filter_empty=False)).keys()[0])
list_col_or_super is: []
list_col_or_super is: [ColumnOrSuperColumn(column=Column(timestamp=1391298393966000,
name='\x00\rautoZoning\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', value='\x01', ttl=None),
counter_super_column=None, super_column=None, counter_column=None),
ColumnOrSuperColumn(column=Column(timestamp=1391298393966000,
name='\x00\x0ccreationTime\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
value='\x00\x00\x01C\xef\xd8\xb7n', ttl=None), counter_super_column=None, super_column=None,
counter_column=None), ColumnOrSuperColumn(column=Column(timestamp=1391298393966000,
name='\x00\x08inactive\x00\x00\x00\x00\x00\x00\x00\x00\x10\x14\xfex\xe0\x8b\x9b\x11\xe3\xb1q\x00PV\xb7\x00\xbb\x00', value='\x00', ttl=None), counter_super_column=None, super_column=None,
counter_column=None), ColumnOrSuperColumn(column=Column(timestamp=1391298393966000,
name='\x00\x05label\x00\x00\x00\x00\x00\x00\x00\x00\x10\x14\xfex\xe0\x8b\x9b\x11\xe3\xb1q\x00PV\xb7\x00\xbb\x00', value='thisIsATest', ttl=None), counter_super_column=None, super_column=None, counter_column=None)]
autoZoning unpack:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/columnfamily.py", line 666, in get
return self._cosc_to_dict(list_col_or_super, include_timestamp, include_ttl)
File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/columnfamily.py", line 369, in _cosc_to_dict
ret[self._unpack_name(col.name)] = self._col_to_dict(col, include_timestamp, include_ttl)
File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/columnfamily.py", line 446, in _unpack_name
return self._name_unpacker(b)
File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/marshal.py", line 140, in unpack_composite
components.append(unpacker(bytestr[2:2 + length]))
File "/usr/local/lib64/python2.6/site-packages/pycassa-1.11.0-py2.6.egg/pycassa/marshal.py", line 374, in <lambda>
return lambda v: uuid.UUID(bytes=v)
File "/usr/lib64/python2.6/uuid.py", line 144, in __init__
raise ValueError('bytes is not a 16-char string')
ValueError: bytes is not a 16-char string
Am I correct in assuming that the extra characters around the column names are what is responsible for the 'ValueError: bytes is not a 16-char string' exception?
Also if I try to use the column name and select it I get:
>>> cf.get(u'urn:keyspace:ColumnFamily:a36e8ab1-7032-4e4c-a53d-e3317f63a640:',columns=['autoZoning:::'])
Traceback (most recent call last):
File "<pyshell#184>", line 1, in <module>
cf.get(u'urn:keyspace:ColumnFamily:a36e8ab1-7032-4e4c-a53d-e3317f63a640:',columns=['autoZoning:::'])
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 651, in get
cp = self._column_path(super_column, column)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 383, in _column_path
self._pack_name(column, False))
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/columnfamily.py", line 426, in _pack_name
return self._name_packer(value, slice_start)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 115, in pack_composite
packed = packer(item)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/marshal.py", line 298, in pack_uuid
randomize=True)
File "/usr/local/lib/python2.7/dist-packages/pycassa-1.11.0-py2.7.egg/pycassa/util.py", line 75, in convert_time_to_uuid
'neither a UUID, a datetime, or a number')
ValueError: Argument for a v1 UUID column name or value was neither a UUID, a datetime, or a number
Any further thoughts?
Thanks,
Rob
Turns out that the problem wasn't with the key, it was being caused, in part, by a bug in pycassa that wasn't handling an empty (null) string in the column UUID. A short-term fix is in the answer in google groups:
https://groups.google.com/d/msg/pycassa-discuss/Vf_bSgDIi9M/KTA1kbE9IXAJ
The other part of the answer was to get at the columns by using tuples (with the UUID as a UUID and not a str) instead of a string with ':' separators because that's, as I found out, a cassandra-cli thing.