Extracting Data from a DICOMDIR file using Pydicom

Extracting Data from a DICOMDIR file using Pydicom - python

I'm unable to read in a DICOM file as I usually would, citing the error:
AttributeError: 'DicomDir' object has no attribute 'DirectoryRecordSequence'
I've tried:
pydicom.fileset.FileSet
using specific tags with dcmread
pydicom.filereader.read_dicomdir
pydicom.filereader.read_partial
using force=True in dcmread
pydicom.filereader.read_file_meta_info is about the only thing that's not returned an error and yields;
(0002, 0000) File Meta Information Group Length UL: 172
(0002, 0001) File Meta Information Version OB: b'\x00\x01'
(0002, 0002) Media Storage SOP Class UID UI: Media Storage Directory Storage
(0002, 0003) Media Storage SOP Instance UID UI: 2.25.330614241706723499239981063503184149269
(0002, 0010) Transfer Syntax UID UI: Explicit VR Little Endian
(0002, 0012) Implementation Class UID UI: 1.3.6.1.4.1.30071.8
(0002, 0013) Implementation Version Name SH: 'fo-dicom 4.0.7'
Moreover, the image is supposed to be a regular DICOM file, not a DICOMDIR. I can open the file in ImageJ and view header information there so I know the data is recoverable.
Is there a way for me to read in this file in Python or alternatively force it to ignore looking for DirectoryRecordSequence?
Edit:
Code and stacktrace from using FileSet:
from pydicom.fileset import FileSet
fs = FileSet("unprocessed.dcm")
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-4-2b6ba2e435fe> in <module>
1 from pydicom.fileset import FileSet
----> 2 fs = FileSet("unprocessed.dcm")
c:\****\appdata\local\programs\python\python38-32\lib\site-packages\pydicom\fileset.py in __init__(self, ds)
998 # Check the DICOMDIR dataset and create the record tree
999 if ds:
-> 1000 self.load(ds)
1001 else:
1002 # New File-set
c:\****\appdata\local\programs\python\python38-32\lib\site-packages\pydicom\fileset.py in load(self, ds_or_path, include_orphans, raise_orphans)
1641 ds = ds_or_path
1642 else:
-> 1643 ds = dcmread(ds_or_path)
1644
1645 sop_class = ds.file_meta.get("MediaStorageSOPClassUID", None)
c:\****\appdata\local\programs\python\python38-32\lib\site-packages\pydicom\filereader.py in dcmread(fp, defer_size, stop_before_pixels, force, specific_tags)
1027 stop_when = _at_pixel_data
1028 try:
-> 1029 dataset = read_partial(
1030 fp,
1031 stop_when,
c:\****\appdata\local\programs\python\python38-32\lib\site-packages\pydicom\filereader.py in read_partial(fileobj, stop_when, defer_size, force, specific_tags)
879 DeprecationWarning
880 )
--> 881 ds = DicomDir(
882 fileobj,
883 dataset,
c:\****\appdata\local\programs\python\python38-32\lib\site-packages\pydicom\dicomdir.py in __init__(self, filename_or_obj, dataset, preamble, file_meta, is_implicit_VR, is_little_endian)
94
95 self.patient_records: List[Dataset] = []
---> 96 self.parse_records()
97
98 def parse_records(self) -> None:
c:\****\appdata\local\programs\python\python38-32\lib\site-packages\pydicom\dicomdir.py in parse_records(self)
125
126 # Build the mapping from file offsets to records
--> 127 records = self.DirectoryRecordSequence
128 if not records:
129 return
c:\****\appdata\local\programs\python\python38-32\lib\site-packages\pydicom\dataset.py in __getattr__(self, name)
834 return {}
835 # Try the base class attribute getter (fix for issue 332)
--> 836 return object.__getattribute__(self, name)
837
838 #property
AttributeError: 'DicomDir' object has no attribute 'DirectoryRecordSequence'

pydicom reads the dataset correctly, but because it identifies as Media Storage Directory it gets processed by the deprecated DicomDir class, even when passed directly to the FileSet class. Because the dataset isn't a valid Media Storage Directory instance this fails, producing the exception seen.
You should be able to fix this by changing the file meta information's (0002,0002) Media Storage SOP Class UID during read:
from pydicom import dcmread
from pydicom import config
def fix_sop_class(elem, **kwargs):
if elem.tag == 0x00020002:
# DigitalXRayImageStorageForProcessing
elem = elem._replace(value=b"1.2.840.10008.5.1.4.1.1.1.1.1")
return elem
config.data_element_callback = fix_sop_class
ds = dcmread('path/to/file')
By changing the SOP Class UID, that processing is skipped and the dataset returned.

Related

Deleting files from blob - TypeError: quote_from_bytes() expected bytes

I have some files inside a container named data:
folder1/somepath/folder2/output/folder3/my_file1.csv
folder1/somepath/folder2/output/folder3/my_file4.csv
folder1/somepath/folder2/output/folder3/my_file23.csv
I have the following code:
file_names_prefix = os.path.join('folder1/somepath/','folder2','output','folder3','my_file')
client = BlobServiceClient('https://mystoragename.blob.core.windows.net',credential=ManagedIdentityCredential()).get_container_client('data')
blob_list = client.list_blobs(name_starts_with=file_names_prefix)
file_list = [blob.name for blob in blob_list]
The code above produces the following output:
['folder1/somepath/folder2/output/folder3/my_file1.csv',
'folder1/somepath/folder2/output/folder3/my_file4.csv',
'folder1/somepath/folder2/output/folder3/my_file23.csv']
but when trying to delete these files using:
client.delete_blobs(file_list)
There is an error:
TypeError Traceback (most recent call last)
/tmp/ipykernel_2376/712121654.py in
----> 1 client.delete_blobs(file_list)
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azure/core/tracing/decorator.py in wrapper_use_tracer(*args, **kwargs)
81 span_impl_type = settings.tracing_implementation()
82 if span_impl_type is None:
---> 83 return func(*args, **kwargs)
84
85 # Merge span is parameter is set, but only if no explicit parent are passed
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azure/storage/blob/_container_client.py in delete_blobs(self, *blobs, **kwargs)
1298 return iter(list())
1299
-> 1300 reqs, options = self._generate_delete_blobs_options(*blobs, **kwargs)
1301
1302 return self._batch_send(*reqs, **options)
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azure/storage/blob/_container_client.py in _generate_delete_blobs_options(self, *blobs, **kwargs)
1206 req = HttpRequest(
1207 "DELETE",
-> 1208 "/{}/{}{}".format(quote(container_name), quote(blob_name, safe='/~'), self._query_str),
1209 headers=header_parameters
1210 )
/anaconda/envs/azureml_py38/lib/python3.8/urllib/parse.py in quote(string, safe, encoding, errors)
817 if errors is not None:
818 raise TypeError("quote() doesn't support 'errors' for bytes")
--> 819 return quote_from_bytes(string, safe)
820
821 def quote_plus(string, safe='', encoding=None, errors=None):
/anaconda/envs/azureml_py38/lib/python3.8/urllib/parse.py in quote_from_bytes(bs, safe)
842 """
843 if not isinstance(bs, (bytes, bytearray)):
--> 844 raise TypeError("quote_from_bytes() expected bytes")
845 if not bs:
846 return ''
TypeError: quote_from_bytes() expected bytes
Can someone please help?

I tried various things, but nothing worked. Ended up deleting files in a loop.
for file in file_list:
client.delete_blob(file)

See https://github.com/Azure/azure-sdk-for-python/issues/25764. delete_blobs takes *blobs as its first argument. So
client.delete_blobs(*file_list)
should do the trick.
Here are the official docs for reference.

The error is due to lack of permissions. Azure uses Shared Access Singatures[SAS] tokens and roles to protect the Azure Blob storage objects like containers, and blobs. The above code snippet uses default credentials, which has read and list access to the Blob container that is being used, however that user is not having the correct role to delete the blob. Check Azure documentation to know the RBAC roles that allows blob deletion.
In order to delete a blob, the RBAC action that needs to be present for the role is Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete.
See Azure documentation for full list of RBAC actions
Refer this SO answer.

Read shapefile from HDFS with geopandas

I have a shapefile on my HDFS and I would like to import it in my Jupyter Notebook with geopandas (version 0.8.1).
I tried the standard read_file() method but it does not recognize the HDFS directory; instead I believe it searches in my local directory, as I made a test with the local directory and reads the shapefile correctly.
This is the code I used:
import geopandas as gpd
shp = gpd.read_file('hdfs://hdfsha/my_hdfs_directory/my_shapefile.shp')
and the error I obtained:
---------------------------------------------------------------------------
CPLE_OpenFailedError Traceback (most recent call last)
fiona/_shim.pyx in fiona._shim.gdal_open_vector()
fiona/_err.pyx in fiona._err.exc_wrap_pointer()
CPLE_OpenFailedError: hdfs://hdfsha/my_hdfs_directory/my_shapefile.shp: No such file or directory
During handling of the above exception, another exception occurred:
DriverError Traceback (most recent call last)
<ipython-input-17-3118e740e4a9> in <module>
----> 2 shp = gpd.read_file('hdfs://hdfsha/my_hdfs_directory/my_shapefile.shp' class="ansi-blue-fg">)
3 print(shp.shape)
4 shp.head(3)
/opt/venv/geocoding/lib/python3.6/site-packages/geopandas/io/file.py in _read_file(filename, bbox, mask, rows, **kwargs)
94
95 with fiona_env():
---> 96 with reader(path_or_bytes, **kwargs) as features:
97
98 # In a future Fiona release the crs attribute of features will
/opt/venv/geocoding/lib/python3.6/site-packages/fiona/env.py in wrapper(*args, **kwargs)
398 def wrapper(*args, **kwargs):
399 if local._env:
--> 400 return f(*args, **kwargs)
401 else:
402 if isinstance(args[0], str):
/opt/venv/geocoding/lib/python3.6/site-packages/fiona/__init__.py in open(fp, mode, driver, schema, crs, encoding, layer, vfs, enabled_drivers, crs_wkt, **kwargs)
255 if mode in ('a', 'r'):
256 c = Collection(path, mode, driver=driver, encoding=encoding,
--> 257 layer=layer, enabled_drivers=enabled_drivers, **kwargs)
258 elif mode == 'w':
259 if schema:
/opt/venv/geocoding/lib/python3.6/site-packages/fiona/collection.py in __init__(self, path, mode, driver, schema, crs, encoding, layer, vsi, archive, enabled_drivers, crs_wkt, ignore_fields, ignore_geometry, **kwargs)
160 if self.mode == 'r':
161 self.session = Session()
--> 162 self.session.start(self, **kwargs)
163 elif self.mode in ('a', 'w'):
164 self.session = WritingSession()
fiona/ogrext.pyx in fiona.ogrext.Session.start()
fiona/_shim.pyx in fiona._shim.gdal_open_vector()
DriverError: hdfs://hdfsha/my_hdfs_directory/my_shapefile.shp: No such file or directory
So, I was wondering whether it is actually possible to read a shapefile, stored in HDFS, with geopandas. If yes, how?

If someone is still looking for an answer to this question, I managed to find a workaround.
First of all, you need a .zip file which contains all the data related to your shapefile (.shp, .shx, .dbf, ...). Then, we use pyarrow to establish a connection to HDFS and fiona to read the zipped shapefile.
Package versions I'm using:
pyarrow==2.0.0
fiona==1.8.18
The code:
# import packages
import pandas as pd
import geopandas as gpd
import fiona
import pyarrow
# establish a connection to HDFS
fs = pyarrow.hdfs.connect()
# read zipped shapefile
with fiona.io.ZipMemoryFile(fs.open('hdfs://my_hdfs_directory/my_zipped_shapefile.zip')) as z:
with z.open('my_shp_file_within_zip.shp') as collection:
gdf = gpd.GeoDataFrame.from_features(collection)
print(gdf.shape)

How to fix unpickling key error when loading word2vec (gensim)?

I am trying to load a pre-trained word2vec model in pkl format taken from here
The line of code I use to load it:
model = gensim.models.KeyedVectors.load('enwiki_20180420_500d.pkl')
However, i keep getting the following error (full traceback):
UnpicklingError Traceback (most recent call last)
<ipython-input-15-ebd5780b6636> in <module>
55
56 #Load pretrained word2vec
---> 57 model = gensim.models.KeyedVectors.load('enwiki_20180420_500d.pkl',mmap='r')
58
~/anaconda3/lib/python3.7/site-packages/gensim/models/keyedvectors.py in load(cls, fname_or_handle, **kwargs)
1551 #classmethod
1552 def load(cls, fname_or_handle, **kwargs):
-> 1553 model = super(WordEmbeddingsKeyedVectors, cls).load(fname_or_handle, **kwargs)
1554 if isinstance(model, FastTextKeyedVectors):
1555 if not hasattr(model, 'compatible_hash'):
~/anaconda3/lib/python3.7/site-packages/gensim/models/keyedvectors.py in load(cls, fname_or_handle, **kwargs)
226 #classmethod
227 def load(cls, fname_or_handle, **kwargs):
--> 228 return super(BaseKeyedVectors, cls).load(fname_or_handle, **kwargs)
229
230 def similarity(self, entity1, entity2):
~/anaconda3/lib/python3.7/site-packages/gensim/utils.py in load(cls, fname, mmap)
433 compress, subname = SaveLoad._adapt_by_suffix(fname)
434
--> 435 obj = unpickle(fname)
436 obj._load_specials(fname, mmap, compress, subname)
437 logger.info("loaded %s", fname)
~/anaconda3/lib/python3.7/site-packages/gensim/utils.py in unpickle(fname)
1396 # Because of loading from S3 load can't be used (missing readline in smart_open)
1397 if sys.version_info > (3, 0):
-> 1398 return _pickle.load(f, encoding='latin1')
1399 else:
1400 return _pickle.loads(f.read())
UnpicklingError: invalid load key, ':'.
I tried loading it with load_word2vec_format, but no luck. Any ideas what might be wrong with it?

Per your link https://wikipedia2vec.github.io/wikipedia2vec/pretrained/ these are to be loaded using that library's Wikipedia2Vec.load() method.
Gensim's .load() methods should only be used with files saved directly from Gensim model objects.
The Wikipedia2Vec project does say that their .txt file formats would load with .load_word2vec_format(), so you could also try that - but with one of their .txt format files.
Their full model .pkl files are only going to work with their class's own loading function.

Wahoo TICKR X .fit file reading/parsing and analysis in Python

Not sure if I can post a question like this here so please redirect me if I'm in the wrong place.
I've bought a Wahoo TICKR X to monitor my heart rate during exercise. Also I would like to get more familiar with python so i decided I would like do do the analyses of my heart rate myself in python instead of in the wahoo app. I thought this would also give more freedom in the choice of visualization, testing etc.
I've recorded my heart rate for 5 minutes or so and exported the .fit file. However I can't even find a suitable library to read the .fit file. Can anyone recommend a library that works with .fit file from wahoo?
I'm using ubuntu, anaconda, python 3.7
import pyfits
# Load the FITS file into the program
hdulist = pyfits.open('/home/bradmin/Downloads/2020-03-26.fit')
# Load table data as tbdata
tbdata = hdulist[1].data
OSError Traceback (most recent call last)
<ipython-input-3-a970e2cd9dee> in <module>
2
3 # Load the FITS file into the program
----> 4 hdulist = pyfits.open('/home/bradmin/Downloads/2020-03-26.fit')
5
6 # Load table data as tbdata
~/anaconda3/lib/python3.7/site-packages/pyfits/hdu/hdulist.py in fitsopen(name, mode, memmap, save_backup, **kwargs)
122 raise ValueError('Empty filename: %s' % repr(name))
123
--> 124 return HDUList.fromfile(name, mode, memmap, save_backup, **kwargs)
125
126
~/anaconda3/lib/python3.7/site-packages/pyfits/hdu/hdulist.py in fromfile(cls, fileobj, mode, memmap, save_backup, **kwargs)
264
265 return cls._readfrom(fileobj=fileobj, mode=mode, memmap=memmap,
--> 266 save_backup=save_backup, **kwargs)
267
268 #classmethod
~/anaconda3/lib/python3.7/site-packages/pyfits/hdu/hdulist.py in _readfrom(cls, fileobj, data, mode, memmap, save_backup, **kwargs)
853 # raise and exception
854 if mode in ('readonly', 'denywrite') and len(hdulist) == 0:
--> 855 raise IOError('Empty or corrupt FITS file')
856
857 # initialize/reset attributes to be used in "update/append" mode
OSError: Empty or corrupt FITS file
link to the file: https://wetransfer.com/downloads/6d054a5d52899aefcb1bcd22bda92ba120200326161849/b9831a
EDIT
I've tried this now but i get an error:
import fitdecode
src_file = "/home/bradmin/Downloads/2020-03-26.fit"
with fitdecode.FitReader(src_file) as fit:
for frame in fit:
# The yielded frame object is of one of the following types:
# * fitdecode.FitHeader
# * fitdecode.FitDefinitionMessage
# * fitdecode.FitDataMessage
# * fitdecode.FitCRC
if isinstance(frame, fitdecode.FitDataMessage):
# Here, frame is a FitDataMessage object.
# A FitDataMessage object contains decoded values that
# are directly usable in your script logic.
print(frame.name)
file_id
developer_data_id
developer_data_id
developer_data_id
developer_data_id
developer_data_id
developer_data_id
developer_data_id
developer_data_id
developer_data_id
developer_data_id
developer_data_id
developer_data_id
field_description
field_description
field_description
field_description
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-7-e8d95d3087dc> in <module>
2
3 with fitdecode.FitReader(src_file) as fit:
----> 4 for frame in fit:
5 # The yielded frame object is of one of the following types:
6 # * fitdecode.FitHeader
~/anaconda3/lib/python3.7/site-packages/fitdecode/reader.py in __iter__(self)
191
192 def __iter__(self):
--> 193 yield from self._read_next()
194
195 #property
~/anaconda3/lib/python3.7/site-packages/fitdecode/reader.py in _read_next(self)
298 assert self._header
299
--> 300 record = self._read_record()
301 if not record:
302 break
~/anaconda3/lib/python3.7/site-packages/fitdecode/reader.py in _read_record(self)
443 self._add_dev_data_id(message)
444 elif message.mesg_type.mesg_num == profile.MESG_NUM_FIELD_DESCRIPTION:
--> 445 self._add_dev_field_description(message)
446
447 return message
~/anaconda3/lib/python3.7/site-packages/fitdecode/reader.py in _add_dev_field_description(self, message)
780 base_type_id = message.get_field('fit_base_type_id').raw_value
781 field_name = message.get_field('field_name').raw_value
--> 782 units = message.get_field('units').raw_value
783
784 try:
~/anaconda3/lib/python3.7/site-packages/fitdecode/records.py in get_field(self, field_name_or_num, idx)
188 raise KeyError(
189 f'field "{field_name_or_num}" (idx #{idx}) not found in ' +
--> 190 f'message "{self.name}"')
191
192 def get_fields(self, field_name_or_num):
KeyError: 'field "units" (idx #0) not found in message "field_description"'

The format seems to be this FIT format. pyfits is for an entirely different format, it seems.
The article above refers to a gpsbabel tool, which you could use to convert the FIT file to something more interoperable and usable, e.g. GPX (an XML-based format that's easy to parse).
Or, of course, if you want a pure-Python solution, you can port the FIT format reading bits from gpsbabel to Python use the fitdecode library.

pydicom 'Dataset' object has no attribute 'TransferSyntaxUID'

I'm using pydicom 1.0.0a1, downloaded from here, When I run the following code:
ds=pydicom.read_file('./DR/abnormal/abc.dcm',force=True)
ds.pixel_array
this error occurs:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-6-d4e81d303439> in <module>()
7 ds=pydicom.read_file('./DR/abnormal/abc.dcm',force=True)
8
----> 9 ds.pixel_array
10
/Applications/anaconda/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/dataset.pyc in __getattr__(self, name)
501 if tag is None: # `name` isn't a DICOM element keyword
502 # Try the base class attribute getter (fix for issue 332)
--> 503 return super(Dataset, self).__getattribute__(name)
504 tag = Tag(tag)
505 if tag not in self: # DICOM DataElement not in the Dataset
/Applications/anaconda/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/dataset.pyc in pixel_array(self)
1064 The Pixel Data (7FE0,0010) as a NumPy ndarray.
1065 """
-> 1066 return self._get_pixel_array()
1067
1068 # Format strings spec'd according to python string formatting options
/Applications/anaconda/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/dataset.pyc in _get_pixel_array(self)
1042 elif self._pixel_id != id(self.PixelData):
1043 already_have = False
-> 1044 if not already_have and not self._is_uncompressed_transfer_syntax():
1045 try:
1046 # print("Pixel Data is compressed")
/Applications/anaconda/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/dataset.pyc in _is_uncompressed_transfer_syntax(self)
662 """Return True if the TransferSyntaxUID is a compressed syntax."""
663 # FIXME uses file_meta here, should really only be thus for FileDataset
--> 664 return self.file_meta.TransferSyntaxUID in NotCompressedPixelTransferSyntaxes
665
666 def __ne__(self, other):
/Applications/anaconda/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/dataset.pyc in __getattr__(self, name)
505 if tag not in self: # DICOM DataElement not in the Dataset
506 # Try the base class attribute getter (fix for issue 332)
--> 507 return super(Dataset, self).__getattribute__(name)
508 else:
509 return self[tag].value
AttributeError: 'Dataset' object has no attribute 'TransferSyntaxUID'
I read the google group post , and I changed the filereader.py file to the posted file, and I got this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Applications/anaconda/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/__init__.py", line 41, in read_file
from pydicom.dicomio import read_file
File "/Applications/anaconda/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/dicomio.py", line 3, in <module>
from pydicom.filereader import read_file, read_dicomdir
File "/Applications/anaconda/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/filereader.py", line 35, in <module>
from pydicom.datadict import dictionaryVR
ImportError: cannot import name dictionaryVR
Does anybody know how to solve this problem?

You should set the TransferSyntaxUID after reading the file before trying to get the pixel_array.
import pydicom.uid
ds=pydicom.read_file('./DR/abnormal/abc.dcm',force=True)
ds.file_meta.TransferSyntaxUID = pydicom.uid.ImplicitVRLittleEndian # or whatever is the correct transfer syntax for the file
ds.pixel_array
The correction from the post you referenced was done before some changes in the code to harmonize some naming, so the error is thrown because the current master uses dictionary_VR rather than dictionaryVR. Setting the transfer syntax in user code as above avoids that problem.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting Data from a DICOMDIR file using Pydicom - python

Related

Deleting files from blob - TypeError: quote_from_bytes() expected bytes

Read shapefile from HDFS with geopandas

How to fix unpickling key error when loading word2vec (gensim)?

Wahoo TICKR X .fit file reading/parsing and analysis in Python

pydicom 'Dataset' object has no attribute 'TransferSyntaxUID'

Categories

Resources