How to get Meta data of Dicom image in SimpleITK using Python

How to get Meta data of Dicom image in SimpleITK using Python - python

I recently started using SimpleITK to modify some Dicom images. I am however unable to modify the meta data. As a matter of fact I can't even access it.
I know thanks to a script i found here: https://github.com/SimpleITK/SimpleITK/pull/262/files?diff=split that metadata is by default not loaded because it slows the process down. I also know that to load the metadata I should use the following method of the reader: ".LoadPrivateTagsOn()".
However whenever i use the '.GetMetaDataKeys()' method on my image object it returns an empty tuple. I expected the code below to give me some keys, but it didn't.
#=========================================================================
#
# Copyright Insight Software Consortium
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0.txt
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
#=========================================================================
from __future__ import print_function
import SimpleITK as sitk
import sys, time, os
import numpy as np
# if len( sys.argv ) < 2:
# print( "Usage: python " + __file__ + "<output_directory>" )
# sys.exit ( 1 )
# Create a new series from a numpy array
new_arr = np.random.uniform(-10, 10, size = (3,4,5)).astype(np.int16)
new_img = sitk.GetImageFromArray(new_arr)
new_img.SetSpacing([2.5,3.5,4.5])
directory = r"C:\Users\jeroen\Documents\2eMaster\Reconstruction3D\Projet Femur\Dicom\test"
# Write the 3D image as a series
# IMPORTANT: There are many DICOM tags that need to be updated when you modify an
# original image. This is a delicate opration and requires knowlege of
# the DICOM standard. This example only modifies some. For a more complete
# list of tags that need to be modified see:
# http://gdcm.sourceforge.net/wiki/index.php/Writing_DICOM
writer = sitk.ImageFileWriter()
# Use the study/series/frame of reference information given in the meta-data
# dictionary and not the automatically generated information from the file IO
writer.KeepOriginalImageUIDOn()
# Copy relevant tags from the original meta-data dictionary (private tags are also
# accessible).
tags_to_copy = ["0010|0010", # Patient Name
"0010|0020", # Patient ID
"0010|0030", # Patient Birth Date
"0020|000D", # Study Instance UID, for machine consumption
"0020|0010", # Study ID, for human consumption
"0008|0020", # Study Date
"0008|0030", # Study Time
"0008|0050", # Accession Number
"0008|0060" # Modality
]
modification_time = time.strftime("%H%M%S")
modification_date = time.strftime("%Y%m%d")
# Copy some of the tags and add the relevant tags indicating the change.
# For the series instance UID (0020|000e), each of the components is a number, cannot start
# with zero, and separated by a '.' We create a unique series ID using the date and time.
# tags of interest:
direction = new_img.GetDirection()
print(new_img.HasMetaDataKey("0008|0021"))
series_tag_values = [(k, new_img.GetMetaData(k)) for k in tags_to_copy if new_img.HasMetaDataKey(k)] + \
[("0008|0031",modification_time), # Series Time
("0008|0021",modification_date), # Series Date
("0008|0008","DERIVED\\SECONDARY"), # Image Type
("0020|000e", "1.2.826.0.1.3680043.2.1125."+modification_date+".1"+modification_time), # Series Instance UID
("0020|0037", '\\'.join(map(str, (direction[0], direction[3], direction[6],# Image Orientation (Patient)
direction[1],direction[4],direction[7])))),
("0008|103e", "Created-SimpleITK")] # Series Description
print(new_img.GetMetaDataKeys())
for i in range(new_img.GetDepth()):
image_slice = new_img[:,:,i]
# Tags shared by the series.
for tag, value in series_tag_values:
image_slice.SetMetaData(tag, value)
# Slice specific tags.
image_slice.SetMetaData("0008|0012", time.strftime("%Y%m%d")) # Instance Creation Date
image_slice.SetMetaData("0008|0013", time.strftime("%H%M%S")) # Instance Creation Time
image_slice.SetMetaData("0008|0060", "CT") # set the type to CT so the thickness is carried over
image_slice.SetMetaData("0020|0032", '\\'.join(map(str,new_img.TransformIndexToPhysicalPoint((0,0,i))))) # Image Position (Patient)
image_slice.SetMetaData("0020,0013", str(i)) # Instance Number
# Write to the output directory and add the extension dcm, to force writing in DICOM format.
writer.SetFileName(os.path.join(directory,str(i)+'.dcm'))
writer.Execute(image_slice)
print(new_img.GetMetaDataKeys())
# Re-read the series
# Read the original series. First obtain the series file names using the
# image series reader.
data_directory = directory
series_IDs = sitk.ImageSeriesReader.GetGDCMSeriesIDs(data_directory)
if not series_IDs:
print("ERROR: given directory \""+data_directory+"\" does not contain a DICOM series.")
sys.exit(1)
series_file_names = sitk.ImageSeriesReader.GetGDCMSeriesFileNames(data_directory, series_IDs[0])
series_reader = sitk.ImageSeriesReader()
series_reader.SetFileNames(series_file_names)
# Configure the reader to load all of the DICOM tags (publicprivate):
# By default tags are not loaded (saves time).
# By default if tags are loaded, the private tags are not loaded.
# We explicitly configure the reader to load tags, including the
# private ones.
series_reader.LoadPrivateTagsOn()
image3D = series_reader.Execute()
print(image3D.GetMetaDataKeys())
sys.exit( 0 )
Any help is greatly appreciated!
EDIT: It seems that i also need to run the '.MetaDataDictionaryArrayUpdateOn()' module on my reader. However if I try to do that he always tells me that there is no such method for the 'ImageSeriesReaderClass' even though it is mentioned in the documentation. Any suggestions?

i'm gonna answer my own question here. Thanks to a post I made on github I found the answer. It turns out that the method '.MetaDataDictionaryArrayUpdateOn()' is not implemented in this build (1.0.1).
There are 2 workarounds credits go to the SimpleITK github community.
You can find the post here: https://github.com/SimpleITK/SimpleITK/issues/331
At the next release (somewhere in January) this problem will be resolved.

Related

Assign a variable/tag to specific text within a PowerPoint Text Box (in PowerPoint)

I would like to update specific text within a PowerPoint text box by assigning that specific text a variable within PowerPoint that I can then (1) identify in Python, and (2) replace the text based on data in an Excel model.
The ultimate goal is that in the output PowerPoint presentation, we should only see the underlying value of the variable (e.g., 32.4%), not the variable name (e.g., #pct_total_var#). And if changes need to be made, the underlying reference to the variable is kept, such that I can re-run Python to update the presentation.
Note: I have already implemented a "search and replace" type solution based on answers here, and elsewhere. However the issue with that approach is that you need to work with a template PowerPoint and an output PowerPoint, because once the variables (e.g., #pct_total_var#) have been replaced with values from an Excel model, it is no longer possible to identify the variables in the output presentation file in order to update.
Is it possible to assign a variable/tag to text within a PowerPoint Text Box?
Below is the current Python code that I have. I just need to be able to define variables inside PowerPoint. But thought this might help others searching for the same.
from pptx import Presentation
import pandas as pd
# Paths to model, and template deck
model = "path/to/model.xlsx"
template = "path/to/template.pptx"
# Step 1: Get data from model to replace in PowerPoint
df = pd.read_excel(model, sheet_name='UpdateReportPPT')
df = df[['variable_id', 'formatted_value']]
values_to_update = df.set_index('variable_id').T.to_dict('records')[0]
# Step 2: Open template PowerPoint presentation
prs = Presentation(template)
# Step 3.1: Get shapes within each slide
slides = [slide for slide in prs.slides]
shapes = []
for slide in slides:
for shape in slide.shapes:
shapes.append(shape)
# Step 3.2: Update variables
for shape in shapes:
# THIS IS WHERE I'LL IDENTIFY THE VARIABLES STORED IN POWERPOINT & UPDATE BASED ON THE EXCEL MODEL
# Step 4: Save to output PowerPoint presentation
prs.save(template)
print('Done.')

How to save DICOM in same series using pydicom [duplicate]

This question already has answers here:
Which DICOM UIDs should be replaced while overwriting pixel data in DICOM?
(1 answer)
Which DICOM tags other than UIDs should be replaced while overwriting pixel data in DICOM?
(2 answers)
Closed 1 year ago.
I have saved the Preprocessed DICOM in a folder total of 300 .dcm files, but when I open this DICOM folder path in RadiANT DICOM Viewer only One slice is displayed, here is my code is attached, Can you please help me how to display the whole scan. I think the main problem in Image Position and Slice location
import os
import numpy as np
import matplotlib.pyplot as plt
import pydicom
from pydicom.encaps import encapsulate
from pydicom.uid import JPEG2000
from imagecodecs import jpeg2k_encode
basepath="/home/hammad/AssementTask/DICOM/"
des_path="/home/hammad/AssementTask/g/"
file_list = [f.path for f in os.scandir(basepath)]
ds = pydicom.dcmread(file_list[0])
for i in range(imgs_after_resamp.shape[0]):
out = imgs_after_resamp[i,:,:]
#Need to copy() to meet jpeg2k_encodes C contiguous requirement
arr_crop = out.copy()
out = out.astype(np.int16)
# jpeg2k_encode to perform JPEG2000 compression
arr_jpeg2k = jpeg2k_encode(arr_crop)
# convert from bytearray to bytes before saving to PixelData
arr_jpeg2k = bytes(arr_jpeg2k)
ds.Rows = arr_crop.shape[0]
ds.Columns = arr_crop.shape[1]
ds[0x0018, 0x0050].value=np.round(spacing[0])
ds[0x0028, 0x0030].value=[np.round(spacing[1]),np.round(spacing[2])]
ds.InstanceNumber = i
ds.PixelData = encapsulate([arr_jpeg2k])
ds.save_as((des_path + str(i) + '.dcm'.format(i)))

I am not familiar with python. But some things seem to be obvious to me, so I will try an answer:
ds = pydicom.dcmread(file_list[0])
for i in range(imgs_after_resamp.shape[0]):
[...]
You are reading one file and use it as a template for all the resampled files. At minimum, you will have to create a new SOP Instance UID (0008,0018) for each file that you save. This is very likely the reason why the viewer only displays one image. The SOP Instance UID uniquely identifies the image. If all your resampled images have the same SOP Instance UID, this will tell the viewer that the same image is loaded over and over again. I.e. the newly loaded image is considered a duplicate.
And yes, to update the geometry information, further attributes need to be set to appropriate values. This partially depends on the type of image (SOP Class UID, 0008,0016). But here are the main suspects:
Image Position Patient (0020,0032)
Image Orientation Patient (0020,0037)
Slice Location (0020,1041)
Furthermore, make sure that the Frame Of Reference UID (0020,0052) is only kept from the original images if both image sets are using the same coordinate system (i.e. an Image Position Patient in your resampled stack must refer to the same origin as in the original images). In case of doubt, assign a new FOR-UID. Must be identical for all images in your stack.
Last point: This depends on the SOP Class even more, so I can just give you a general hint. The resampling is a derivation in terms of DICOM, so the Image Type (0008,0008), must be "DERIVED" in the second component. This unleashes a phletora of other requirements, depending on the SOP Class. Usually, you have to describe the type of derivation and reference the images from which you derived the resampled image.
Not everything of this will be necessary to have the images properly displayed in a viewer. But if you intend to write your implementation in product quality, you need to consider them. Look into the module table for your IOD in DICOM Part 3 as a starting point for updating the header information.

cross section plot using python Iris module

I want to plot cross section along longitude using python Iris module which developed for oceanography and meteorology, I'm using their example:
http://scitools.org.uk/iris/docs/v1.4/examples/graphics/cross_section.html
I tried to change their code to my example but output of my code is empty.
data: http://data.nodc.noaa.gov/thredds/fileServer/woa/WOA09/NetCDFdata/temperature_annual_1deg.nc
import iris
import iris.plot as iplt
import iris.quickplot as qplt
# Enable a future option, to ensure that the netcdf load works the same way
# as in future Iris versions.
iris.FUTURE.netcdf_promote = True
# Load some test data.
fname = 'temperature_annual_1deg.nc'
theta = iris.load_cube(fname, 'sea_water_temperature')
# Extract a single depth vs longitude cross-section. N.B. This could
# easily be changed to extract a specific slice, or even to loop over *all*
# cross section slices.
cross_section = next(theta.slices(['longitude',
'depth']))
qplt.contourf(cross_section, coords=['longitude', 'depth'],
cmap='RdBu_r')
iplt.show()

What you need to understand here is that your current cross_section is defined as first member of theta.slices iterator, meaning that it starts from one end of coordinates (which are empty in current case). So you need to iterate to the next members of the iterator until you get some data. If you add these lines to the code, maybe it helps to understand what is going on:
import numpy as np
cs = theta.slices(['longitude', 'depth'])
for i in cs:
print(np.nanmax(i))
Which should print something like:
--
--
--
-0.8788
-0.9052

Extract image position from .docx file using python-docx

I'm trying to get the image index from the .docx file using python-docx library. I'm able to extract the name of the image, image height and width. But not the index where it is in the word file
import docx
doc = docx.Document(filename)
for s in doc.inline_shapes:
print (s.height.cm,s.width.cm,s._inline.graphic.graphicData.pic.nvPicPr.cNvPr.name)
output
21.228 15.920 IMG_20160910_220903848.jpg
In fact I would like to know if there is any simpler way to get the image name , like s.height.cm fetched me the height in cm. My primary requirement is to get to know where the image is in the document, because I need to extract the image and do some work on it and then again put the image back to the same location

This operation is not directly supported by the API.
However, if you're willing to dig into the internals a bit and use the underlying lxml API it's possible.
The general approach would be to access the ImagePart instance corresponding to the picture you want to inspect and modify, then read and write the ._blob attribute (which holds the image file as bytes).
This specimen XML might be helpful:
http://python-docx.readthedocs.io/en/latest/dev/analysis/features/shapes/picture.html#specimen-xml
From the inline shape containing the picture, you get the <a:blip> element with this:
blip = inline_shape._inline.graphic.graphicData.pic.blipFill.blip
The relationship id (r:id generally, but r:embed in this case) is available at:
rId = blip.embed
Then you can get the image part from the document part
document_part = document.part
image_part = document_part.related_parts[rId]
And then the binary image is available for read and write on ._blob.
If you write a new blob, it will replace the prior image when saved.
You probably want to get it working with a single image and get a feel for it before scaling up to multiple images in a single document.
There might be one or two image characteristics that are cached, so you might not get all the finer points working until you save and reload the file, so just be alert for that.
Not for the faint of heart as you can see, but should work if you want it bad enough and can trace through the code a bit :)

You can also inspect paragraphs with a simple loop, and check which xml contains an image (for example if an xml contains "graphicData"), that is which is an image container (you can do the same with runs):
from docx import Document
image_paragraphs = []
doc = Document(path_to_docx)
for par in doc.paragraphs:
if 'graphicData' in par._p.xml:
image_paragraphs.append(par)
Than you unzip docx file, images are in the "images" folder, and they are in the same order as they will be in the image_paragraphs list. On every paragraph element you have many options how to change it. If you want to extract img process it and than insert it in the same place, than
paragraph.clear()
paragraph.add_run('your description, if needed')
run = paragraph.runs[0]
run.add_picture(path_to_pic, width, height)

So, I've never really written any answers here, but i think this might be the solution to your problem. With this little code you can see the position of your images given all the paragraphs. Hope it helps.
import docx
doc = docx.Document(filename)
paraGr = []
index = []
par = doc.paragraphs
for i in range(len(par)):
paraGr.append(par[i].text)
if 'graphicData' in par[i]._p.xml:
index.append(i)

If you are using Python 3
pip install python-docx
import docx
doc = docx.Document(document_path)
P = []
I = []
par = doc.paragraphs
for i in range(len(par)):
P.append(par[i].text)
if 'graphicData' in par[i]._p.xml:
I.append(i)
print(I)
#returns list of index(Image_Reference)

EXIF info in Python - libexif

I have been using pyexiv2 to read exif information from JPEG files in python, and noticed that one tag in particular - ExposureTime - is not reported the same by exiv2 as with another exif library, libexif.
Any exiv2-based utility I've tried will simplify the exposuretime tag to a "rational" such as 0/1, 0, or similar. libexif based utilities (in particular, a tool "exif") will report a much more detailed "1/-21474836 sec." for the same tag, in the same image.
Firstly I'd like to understand: what can account for this difference? I'm assuming that the latter of the two is correct.
Secondly, and assuming that the more detailed tag as reported by libexif is correct, I'd like to be able to obtain this value in Python, where as far as I can see it is not possible using any EXIF tools that I have come across (pyexiv2 for example). Is there a tool or method that I am not considering?
I have stumbled across one potential solution with the use of the libexif C library in python with ctypes as noted in this previously answered question - though I could not find examples of how I could do this.
Any help is greatly appreciated. Thanks!

In case this helps, here are some hacks I recently did to set missing lens / F-Number,.. informations as I was using a manual lens plus I computed actaul absolute EV for automatic retrieval by later HDR processing tools (HDR Luminace). I commented out the "write" action for safety below. Should be pretty much self explanatory.
The top files section makes a list of files to work on in the current folder (here all *.ARW (Sony raw files)). Adjust the pattern and path as needed.
#!/usr/bin/env python
import os
import time
import array
import math
# make file list (take all *.ARW files in current folder)
files = [f for f in os.listdir(".") if f.endswith(".ARW")]
files.sort() # just to be nice
# have a dict. of tags to work with in particular
tags = {'Aperture':10., 'Exposure Time ':1./1250, 'Shutter Speed':1./1250, 'ISO':200., 'Stops Above Base ISO':0., 'Exposure Compensation':0. }
# arbitrary chosen base EV to get final EV compensation numbers into +/-10 range
EVref = math.log (math.pow(tags['Aperture'],2.0)/tags['Shutter Speed'], 2.0) - 4
print ('EVref=', EVref)
for f in files:
print (f)
meta=os.popen("exiftool "+f).readlines()
for tag in meta:
set = str(tag).rstrip("\n").split(":")
for t,x in tags.items():
if str(set[0]).strip(" ") == t:
tags[t] = float ( str(os.popen("calc -- "+set[1]).readlines()).strip("[]'~\\t\\n"))
print (t, tags[t], set[1])
ev = math.log (math.pow(tags['Aperture'],2.0)/tags['Shutter Speed'], 2.0)
EV = EVref - ev + tags['Stops Above Base ISO']
print ('EV=', EV)
# uncomment/edit to update EXIF in place:
# os.system('exiftool -ExposureCompensation='+str(EV)+' '+f)
# os.system('exiftool -FNumber=10 '+f)
# os.system('exiftool -FocalLength=1000.0 '+f)
# os.system('exiftool -FocalLengthIn35mmFormat=1000.0 '+f)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.