dvc.api.read() raises an "UnicodeDecodeError"

dvc.api.read() raises an "UnicodeDecodeError" - python

I am trying to acess a DICOM file [image saved in the Digital Imaging and Communications in Medicine (DICOM) format]:
import dvc.api
path = 'dir/image.dcm'
remote = 'remote_name'
repo = 'git_repo'
mode = 'r'
data = dvc.api.read(path = path, remote = remote, repo = repo, mode = mode)
When I run the previous code, and after the "downloading progress bar" is complete, I get the following error:
Traceback (most recent call last): File "draft.py", line 7, in <module> mode ='r') File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\site-packages\dvc\api.py", line 91, in read return fd.read() File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 764: character maps to <undefined>
I tried to overcome this issue by using the encoding argument:
data = dvc.api.read(path = path, remote = remote, repo = repo, mode = mode, encoding='ANSI')
Since, when I open a DICOM file using for example Notepad++, this is the encoding specified. However, it raises the error:
Exception ignored in: <bound method Pool.__del__ of <dvc.fs.pool.Pool object at 0x0000021D1347A160>> Traceback (most recent call last): File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\site-packages\dvc\fs\pool.py", line 42, in __del__ File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\site-packages\dvc\fs\pool.py", line 46, in close File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\site-packages\dvc\fs\ssh\connection.py", line 71, in close File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\site-packages\paramiko\sftp_client.py", line 194, in close File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\site-packages\paramiko\sftp_client.py", line 185, in _log File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\site-packages\paramiko\sftp.py", line 158, in _log File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\logging\__init__.py", line 1372, in log File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\logging\__init__.py", line 1441, in _log File "C:\Users\lbrandao\anaconda3\envs\my_env\lib\logging\__init__.py", line 1411, in makeRecord TypeError: 'NoneType' object is not callable
I also tried encoding = 'utf-8', but the "UnicodeDecodeError" continues to appear:
Traceback (most recent call last): File "draft.py", line 7, in <module> mode ='r', encoding='utf-8') File "C:\Users\lbrandao\anaconda3\envs\ccab_env_dev\lib\site-packages\dvc\api.py", line 91, in read return fd.read() File "C:\Users\lbrandao\anaconda3\envs\ccab_env_dev\lib\codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 140: invalid continuation byte
Can anyone please help? Thanks.

I am not familiar with dvc.api, but a lot of people who deal with DICOMs use either pydicom or simpleITK. I wrote a library which can get most of the the data using either (cleanX) and you can look at an example notebook here.

Related

anaconda-navigator not running after a abrupt shutdown

I was running anaconda-navigator in the past days. My system was abruptly shutdown one day. After that, when I run anaconda-navigator, it is showing the following error
dstlab2#dstlab2-Veriton-M200-H81:~$ anaconda-navigator
Traceback (most recent call last):
File "/home/dstlab2/anaconda3/bin/anaconda-navigator", line 11, in <module>
sys.exit(main())
File "/home/dstlab2/anaconda3/lib/python3.7/site-packages/anaconda_navigator/app/main.py", line 99, in main
from anaconda_navigator.utils.logs import clean_logs
File "/home/dstlab2/anaconda3/lib/python3.7/site-packages/anaconda_navigator/utils/logs.py", line 18, in <module>
from anaconda_navigator.config import (LOG_FILENAME, LOG_FOLDER,
File "/home/dstlab2/anaconda3/lib/python3.7/site-packages/anaconda_navigator/config/__init__.py", line 27, in <module>
from anaconda_navigator.config.main import CONF
File "/home/dstlab2/anaconda3/lib/python3.7/site-packages/anaconda_navigator/config/main.py", line 71, in <module>
raw_mode=True,
File "/home/dstlab2/anaconda3/lib/python3.7/site-packages/anaconda_navigator/config/user.py", line 221, in __init__
self.load_from_ini()
File "/home/dstlab2/anaconda3/lib/python3.7/site-packages/anaconda_navigator/config/user.py", line 279, in load_from_ini
self.read(self.filename(), encoding='utf-8')
File "/home/dstlab2/anaconda3/lib/python3.7/configparser.py", line 696, in read
self._read(fp, filename)
File "/home/dstlab2/anaconda3/lib/python3.7/configparser.py", line 1014, in _read
for lineno, line in enumerate(fp, start=1):
File "/home/dstlab2/anaconda3/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf2 in position 378: invalid continuation byte
can anyone help me how to set things right?

It looks as if a configuration file used by anaconda navigator has been corrupted. The file probably exists in the $HOME/.anaconda folder; it may have a .ini extension.
If you can identify the file, you could try replacing the byte that's causing the problem (make a backup copy of the file first):
>>> with open('config.ini', 'rb+') as f:
... data = f.read()
... data = data.replace(b'\xf2', b'')
... assert data
... f.seek(0)
... f.write(data)
...
0
171
Note that there may be more than one byte.
It's also possible that the file has been truncated or corrupted so much that it needs to be deleted or replaced entirely.

Rasa App breaks in Pycharm but works fine in terminal

Whenever I try to run my Rasa app using the run button in PyCharm, or try to use the debugger, I get the following error:
Traceback (most recent call last):
File "/Users/matthewspeck/anaconda3/envs/proj_env/lib/python3.6/site-packages/pykwalify/core.py", line 76, in __init__
self.source = yaml.load(stream)
File "/Users/matthewspeck/anaconda3/envs/proj_env/lib/python3.6/site-packages/ruamel/yaml/main.py", line 933, in load
loader = Loader(stream, version, preserve_quotes=preserve_quotes)
File "/Users/matthewspeck/anaconda3/envs/proj_env/lib/python3.6/site-packages/ruamel/yaml/loader.py", line 50, in __init__
Reader.__init__(self, stream, loader=self)
File "/Users/matthewspeck/anaconda3/envs/proj_env/lib/python3.6/site-packages/ruamel/yaml/reader.py", line 85, in __init__
self.stream = stream # type: Any # as .read is called
File "/Users/matthewspeck/anaconda3/envs/proj_env/lib/python3.6/site-packages/ruamel/yaml/reader.py", line 130, in stream
self.determine_encoding()
File "/Users/matthewspeck/anaconda3/envs/proj_env/lib/python3.6/site-packages/ruamel/yaml/reader.py", line 190, in determine_encoding
self.update_raw()
File "/Users/matthewspeck/anaconda3/envs/proj_env/lib/python3.6/site-packages/ruamel/yaml/reader.py", line 297, in update_raw
data = self.stream.read(size)
File "/Users/matthewspeck/anaconda3/envs/proj_env/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 473: ordinal not in range(128)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/matthewspeck/project/trainer_app/app.py", line 25, in <module>
parser=False, core=True)
File "/Users/matthewspeck/project/trainer_app/rasa_model.py", line 165, in make_rasa_model
rasa_config=rasa_config
File "/Users/matthewspeck/project/trainer_app/rasa_model.py", line 66, in __init__
self._parser = create_agent(use_rasa_nlu=True, load_models=True)
File "/Users/matthewspeck/project/trainer_app/rasa.py", line 32, in create_agent
domain = create_domain()
File "/Users/matthewspeck/project/trainer_app/rasa.py", line 83, in create_domain
domain = ClarifyDomain.load(domain_path)
File "/Users/project/clarification/domain.py", line 39, in load
domain = TemplateDomain.load(filename)
File "/Users/matthewspeck/anaconda3/envs/proj_env/lib/python3.6/site-packages/rasa_core/domain.py", line 404, in load
cls.validate_domain_yaml(filename)
File "/Users/matthewspeck/anaconda3/envs/proj_env/lib/python3.6/site-packages/rasa_core/domain.py", line 438, in validate_domain_yaml
schema_files=[schema_file])
File "/Users/matthewspeck/anaconda3/envs/proj_env/lib/python3.6/site-packages/pykwalify/core.py", line 78, in __init__
raise CoreError(u"Unable to load any data from source yaml file")
pykwalify.errors.CoreError: <CoreError: error code 3: Unable to load any data from source yaml file: Path: '/'>
Process finished with exit code 1
However, when I run the app from my terminal, or from my text editor (I use VSCode), It runs with no problems whatsoever. I've looked online and every answer I see has something to do with Rasa, but nothing mentions problems with PyCharm.
I've also checked that the yaml for the domain is properly formatted, and it is. Anyone have any idea why I would be getting this error in PyCharm, but not in any other environment, and how I could fix it?

I believe your problem was fixed with Rasa version 0.12 ([changelog][1]): https://github.com/RasaHQ/rasa_core/blob/master/CHANGELOG.rst#0120---2018-11-11 .
I recommend upgrading to a newer version of Rasa Core which parses the training data correctly.

Umlauts in JSON files lead to errors in Python code created by ANTLR4

I've created python modules from the JSON grammar on github / antlr4 with
antlr4 -Dlanguage=Python3 JSON.g4
I've written a main program "JSON2.py" following this guide: https://github.com/antlr/antlr4/blob/master/doc/python-target.md
and downloaded the example1.json also from github.
python3 ./JSON2.py example1.json # works perfectly, but
python3 ./JSON2.py bookmarks-2017-05-24.json # the bookmarks contain German Umlauts like "ü"
...
File "/home/xyz/lib/python3.5/site-packages/antlr4/FileStream.py", line 27, in readDataFrom
return codecs.decode(bytes, encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 227: ordinal not in range(128)
The offending line in JSON2.py is
input = FileStream(argv[1])
I've searched stackoverflow and tried this instead of using the above FileStream:
fp = codecs.open(argv[1], 'rb', 'utf-8')
try:
input = fp.read()
finally:
fp.close()
lexer = JSONLexer(input)
stream = CommonTokenStream(lexer)
parser = JSONParser(stream)
tree = parser.json() # This is line 39, mentioned in the error message
Execution of this program ends with an error message, even if the input file doesn't contain Umlauts:
python3 ./JSON2.py example1.json
Traceback (most recent call last):
File "./JSON2.py", line 46, in <module>
main(sys.argv)
File "./JSON2.py", line 39, in main
tree = parser.json()
File "/home/x/Entwicklung/antlr/links/JSONParser.py", line 108, in json
self.enterRule(localctx, 0, self.RULE_json)
File "/home/xyz/lib/python3.5/site-packages/antlr4/Parser.py", line 358, in enterRule
self._ctx.start = self._input.LT(1)
File "/home/xyz/lib/python3.5/site-packages/antlr4/CommonTokenStream.py", line 61, in LT
self.lazyInit()
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 186, in lazyInit
self.setup()
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 189, in setup
self.sync(0)
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 111, in sync
fetched = self.fetch(n)
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 123, in fetch
t = self.tokenSource.nextToken()
File "/home/xyz/lib/python3.5/site-packages/antlr4/Lexer.py", line 111, in nextToken
tokenStartMarker = self._input.mark()
AttributeError: 'str' object has no attribute 'mark'
This parses correctly:
javac *.java
grun JSON json -gui bookmarks-2017-05-24.json
So the grammar itself is not the problem.
So finally the question: How should I process the input file in python, so that lexer and parser can digest it?
Thanks in advance.

Make sure your input file is actually encoded as UTF-8. Many problems with character recognition by the lexer are caused by using other encodings. I just took a testbed application, added ëto the list of available characters for an IDENTIFIER and it works again. UTF-8 is the key -- and make sure your grammar also allows these characters where you want to accept them.

I solved it by passing the encoding info:
input = FileStream(sys.argv[1], encoding = 'utf8')
If without the encoding info, I will have the same issue as yours.
Traceback (most recent call last):
File "test.py", line 20, in <module>
main()
File "test.py", line 9, in main
input = FileStream(sys.argv[1])
File ".../lib/python3.5/site-packages/antlr4/FileStream.py", line 20, in __init__
super().__init__(self.readDataFrom(fileName, encoding, errors))
File ".../lib/python3.5/site-packages/antlr4/FileStream.py", line 27, in readDataFrom
return codecs.decode(bytes, encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 1: ordinal not in range(128)
Where my input data is
[今明]天(台南|高雄)的？天氣如何

First Robotics Competition, PyNetworkTables nt.thread died

We're using a Raspberry Pi as a Co-Processor, code written in Python, transmitted over to the Robot(code in Java) over pyNetworkTables. The thing is, this error did not occur until the first match on the field. It worked during practice. It also worked after the bridge was imaged.
DEBUG:nt:client connected
DEBUG:nt:NetworkConnection stopping (<ntcore.network_connection.NetworkConnection object at 0x712411b0>)
ERROR:nt:Unhandled exception during handshake
Traceback (most recent call last):
File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/network_connection.py", line 240, in _readThreadMain
handshake_success = self.m_handshake(self, _getMessage, self._sendMessages)
File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/dispatcher.py", line 488, in _clientHandshake
msg = get_msg()
File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/network_connection.py", line 228, in _getMessage
return Message.read(self.m_stream, decoder, self.m_get_entry_type)
File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/message.py", line 123, in read
value = codec.read_value(value_type, rstream)
File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/wire.py", line 126, in read_value
return Value.makeStringArray([self.read_string(rstream) for _ in range(alen)])
File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/wire.py", line 126, in <listcomp>
return Value.makeStringArray([self.read_string(rstream) for _ in range(alen)])
File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/wire.py", line 198, in read_string_v3
return rstream.read(slen).decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 47: invalid continuation byte
INFO:nt:DISCONNECTED 10.0.66.2 port 1735 (Robot)
DEBUG:nt:write thread died (<ntcore.network_connection.NetworkConnection object at 0x70088430>)

This most likely is a unicode bug in the library that you're referring to. We're tracking the bug at https://github.com/robotpy/pynetworktables/issues/42
Disclaimer: I'm the author of the library

UnicodeDecodeError when trying to save an Excel File with Python xlwt

I'm running a Python script that writes HTML code found using BeautifulSoup into multiple rows of an Excel spreadsheet column.
[...]
Col_HTML = 19
w_sheet.write(row_index, Col_HTML, str(HTML_Code))
wb.save(output)
When trying to save the file, I get the following error message:
Traceback (most recent call last):
File "C:\Users\[..]\src\MYCODE.py", line 201, in <module>
wb.save(output)
File "C:\Python27\lib\site-packages\xlwt-0.7.5-py2.7.egg\xlwt\Workbook.py", line 662, in save
doc.save(filename, self.get_biff_data())
File "C:\Python27\lib\site-packages\xlwt-0.7.5-py2.7.egg\xlwt\Workbook.py", line 637, in get_biff_data
shared_str_table = self.__sst_rec()
File "C:\Python27\lib\site-packages\xlwt-0.7.5-py2.7.egg\xlwt\Workbook.py", line 599, in __sst_rec
return self.__sst.get_biff_record()
File "C:\Python27\lib\site-packages\xlwt-0.7.5-py2.7.egg\xlwt\BIFFRecords.py", line 76, in get_biff_record
self._add_to_sst(s)
File "C:\Python27\lib\site-packages\xlwt-0.7.5-py2.7.egg\xlwt\BIFFRecords.py", line 91, in _add_to_sst
u_str = upack2(s, self.encoding)
File "C:\Python27\lib\site-packages\xlwt-0.7.5-py2.7.egg\xlwt\UnicodeUtils.py", line 50, in upack2
us = unicode(s, encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 5181: ordinal not in range(128)
I've successfully written Python script in the past to write into worksheets. It's the first time I try to write a string of HTML into cells and I'm wondering what is causing the error and how I could fix it.

Use this line before passing HTML_Code to w_sheet.write
HTML_Code = HTML_Code.decode('utf-8')
Because, in the error line UnicodeDecodeError: 'ascii' codec can't decode, Python is trying to decode unicode into ascii, so you need to decode unicode using the proper encoding format, that is, utf-8.
So, you have:
Col_HTML = 19
HTML_Code = HTML_Code.decode('utf-8')
w_sheet.write(row_index, Col_HTML, str(HTML_Code))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

dvc.api.read() raises an "UnicodeDecodeError" - python

I am not familiar with dvc.api, but a lot of people who deal with DICOMs use either pydicom or simpleITK. I wrote a library which can get most of the the data using either (cleanX) and you can look at an example notebook here.

Related

anaconda-navigator not running after a abrupt shutdown

Rasa App breaks in Pycharm but works fine in terminal

Umlauts in JSON files lead to errors in Python code created by ANTLR4

First Robotics Competition, PyNetworkTables nt.thread died

UnicodeDecodeError when trying to save an Excel File with Python xlwt

Categories

Resources