How do I import tif using gdal?
I'm trying to get my tif file in a usable format in Python, so I can analyze the data. However, every time I import it, I just get an empty list. Here's my code:
xValues = [447520.0, 432524.0, 451503.0]
yValues = [4631976.0, 4608827.0, 4648114.0]
gdal.AllRegister()
dataset = gdal.Open('final_snow.tif', GA_ReadOnly)
if dataset is None:
print 'Could not open image'
sys.exit(1)
data = np.array([gdal.Open(name, gdalconst.GA_ReadOnly).ReadAsArray() for name, descr in dataset.GetSubDatasets()])
print 'this is data ', data`
It always prints an empty list, but it doesn't throw an error. I checked out other questions, such as [this] (Create shapefile from tif file using GDAL) What might be the problem?
For osgeo.gdal, it should look like this:
from osgeo import gdal
gdal.UseExceptions() # not required, but a good idea
dataset = gdal.Open('final_snow.tif', gdal.GA_ReadOnly)
data = dataset.ReadAsArray()
Where data is either a 2D array for 1-banded rasters, or a 3D array for multiband.
An alternative with rasterio looks like:
import rasterio
with rasterio.open('final_snow.tif', 'r') as r:
data = r.read()
Where data is always a 3D array, with the first dimension as band index.
Related
I have cobbled together some code on python, to try and work through a folder of dicom files, splitting each image in two.
All my dicom files are X-rays of both the left and right feet, and I need to separate them.
To do this I am adapting some code produced by #g_unit seen here
Unfortunately - this attempt results in two unaltered copies of the original file - unsplit. It does work when writing the files as PNG or JPG, but not when writing as dicoms. My test image in the console also looks good.
In my below example, I am using a folder with only one file in it. I will adapt to write the new files and filenames after I get my single sample to work.
import matplotlib.pyplot as plt
import pydicom
import pydicom as pd
import os
def main():
path = 'C:/.../test_block_out/'
# iterate through the names of contents of the folder
for file in os.listdir(path):
# create the full input path and read the file
input_path = os.path.join(path, file)
dataset = pd.dcmread(input_path)
shape = dataset.pixel_array.shape
# get the half of the x dimension. For the y dimension use shape[0]
half_x = int(shape[1] / 2)
# slice the halves
# [first_axis, second_axis] so [:,:half_x] means slice all from first axis, slice 0 to half_x from second axis
left_part = dataset.pixel_array[:, :half_x].tobytes()
right_part = dataset.pixel_array[:,half_x:].tobytes()
#Save halves
path_to_left_image = 'C:.../test_file/left.dcm'
path_to_right_image = 'C:.../test_file/right.dcm'
dataset.save_as(path_to_left_image, left_part)
dataset.save_as(path_to_right_image, right_part)
#print test image
plt.imshow(dataset.pixel_array[:, :half_x])
#plt.imshow(dataset.pixel_array[:,half_x:])
if __name__ == '__main__':
main()
I have tried to write the pixel array to dataset.PixelData - but this throws the error:
ValueError: The length of the pixel data in the dataset (5120000 bytes) doesn't match the expected length (10240000 bytes). The dataset may be corrupted or there may be an issue with the pixel data handler.
Which makes sense, since its half my original dimensions. It will write a DCM, but I cannot load this DCM into any dicom viewer tools ('Decode error!')
Is there a way to get this to write the files as DCMs, not PNGs? Or will the DCMs always bug if the dimensions are incorrect?
A kind colleague has helped by providing the answer.
The issue was that I was saving "dataset", not "left_part".
The solution was to create a new pydicom object , deep copying the dcm file, and then modifying the copy.
Code below:
# iterate through the names of contents of the folder
for file in os.listdir(path):
# create the full input path and read the file
input_path = os.path.join(path, file)
dataset = pd.dcmread(input_path)
left_part = copy.deepcopy(dataset)
right_part = copy.deepcopy(dataset)
shape = dataset.pixel_array.shape
# get the half of the x dimension. For the y dimension use shape[0]
half_x = int(shape[1] / 2)
# slice the halves
# [first_axis, second_axis] so [:,:half_x] means slice all from first axis, slice 0 to half_x from second axis
left_part.PixelData = dataset.pixel_array[:, :half_x].tobytes()
left_part['Columns'].value=half_x
right_part.PixelData = dataset.pixel_array[:,half_x:].tobytes()
right_part['Columns'].value=shape[1]-half_x
#Save halves
path_to_left_image = os.path.join(path, 'left_'+file)
path_to_right_image = os.path.join(path, 'right_'+file)
left_part.save_as(path_to_left_image)
right_part.save_as(path_to_right_image)
#print test image
plt.imshow(left_part.pixel_array)
plt.show()
I have a set of images with the same size. And I want to insert them into a dataframe, with the rows being the names of the images and the columns being the pixels. They are all in the same directory.
I can already do this for a folder with a few images (as shown in the "Example for 7 images" link below), but when I try it for a dataset with 9912 images, the compile shows "killed". How can I optimize this code to get all the images?
from matplotlib import image
import numpy as np
import pandas as pd
import glob
columns = ["file"]
for i in range (150528):
columns.append("pixel" + str(i))
df = pd.DataFrame(columns = columns)
i = 0
for file in glob.glob('/home/nuno/resizepics/*.jpg'):
imgarr = image.imread(file)
imgarr = imgarr.flatten()
df.loc[i,"file"] = file
for j in range(len(imgarr)):
df.iloc[i,j+1] = imgarr[j]
i += 1
#print(df)
df.to_csv('pixels.csv')
Example for 7 images
If "killed" means it raises an error you can try using exeptions (try, except, else) and make it try again from the spot it stopped. You can also try to delay it a bit with time module because it works with large data.
I am trying to load a .mat dataset into my dataframe. So, I am only able to load a single file at a time from the Folder TrainingSet1 with
os.chdir('/Users/Ashi/Downloads/TrainingSet2')
data = loadmat('A2001.mat')
And i am able to see the data in it, but how am i supposed to load the whole TrainingSet1 Folder, so that i can view the whole thing.
Also, how could I view the .mat files as images?
Heres my code,
%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.vision import *
from fastai.metrics import error_rate
from mat4py import loadmat
from pylab import*
import matplotlib
import os
os.chdir('/Users/Ashi/Downloads/TrainingSet2')
data = loadmat('A2001.mat')
data
{'ECG': {'sex': 'Male', 'age': 68,
'data': [[0.009784321006571624,
0.006006033870606647,
...This is roughly how the data looks like
imshow('A2001.mat',[])
---------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-52-23bbdf3a7668> in <module>
----> 1 imshow('A2001.mat',[])...A long error is displayed
TypeError: unhashable type: 'list'
Thanks for any help
It hard to tell from your post what is the input format, and what is your desired output format.
I am giving you an example of reading all the .mat files in the folder, and an example of how to show data['data'] as image.
I hope the example is enough for you to keep advancing by your own.
I created a sample data set 'A2001.mat', 'A2002.mat', 'A2003.mat' using MATLAB.
In case you have MATLAB installation, I recommend you to execute the following code for creating a sample input (in order for the Python sample to be reproducible):
ECG.sex = 'Male';
ECG.age = 68;
data = im2double(imread('cameraman.tif')) / 10; % Divide by 10 for simulating range [0, 0.1] instead of [0, 1]
save('A2001.mat', 'ECG', 'data');
ECG.sex = 'Male';
ECG.age = 46;
data = im2double(imread('cell.tif'));
save('A2002.mat', 'ECG', 'data');
ECG.sex = 'Female';
ECG.age = 54;
data = im2double(imread('tire.tif'));
save('A2003.mat', 'ECG', 'data');
The Python code sample does the following:
Get a list of all mat files in the folder using glob.glob('*.mat').
Iterate mat files, load data from the files, and append the data to a list.
The result of the loop is a list named alldata, containing data from all mat files.
Iterate alldata and showing data['data'] as an image.
(Assuming data['data'] is the matrix you want to show as an image).
Here is the code:
from matplotlib import pyplot as plt
from mat4py import loadmat
import glob
import os
os.chdir('/Users/Ashi/Downloads/TrainingSet2')
# Get a list for .mat files in current folder
mat_files = glob.glob('*.mat')
# List for stroring all the data
alldata = []
# Iterate mat files
for fname in mat_files:
# Load mat file data into data.
data = loadmat(fname)
# Append data to the list
alldata.append(data)
# Iterate alldata elelemts, and show images
for data in alldata:
# Assume image is stored in matrix named data in MATLAB.
# data['data'], access data with string 'data', becuase data is a dictionary
img = data['data']
# Show data as image using matplotlib
plt.imshow(img, cmap='gray')
plt.show(block=True) # Show image with "blocking"
Update:
The ECG data is not an image but a list of 12 data samples.
The internal structure of the data (after data = loadmat(fname)) is:
Parent dictionary named data.
data contains a dictionary in data['ECG'].
data['ECG']['data'] is a list of 12 lists.
The following code iterates the mat files and displays the ECG data as a graph:
from matplotlib import pyplot as plt
from mat4py import loadmat
import glob
import os
import numpy as np
os.chdir('/Users/Ashi/Downloads/TrainingSet2')
# Get a list for .mat files in current folder
mat_files = glob.glob('*.mat')
# List for stroring all the data
alldata = []
# Iterate mat files
for fname in mat_files:
# Load mat file data into data.
data = loadmat(fname)
# Append data to the list
alldata.append(data)
# Iterate alldata elelemts, and show images
for data in alldata:
# The internal structure of the data is a dictionary with a dictionary.
ecg = data['ECG']
data = ecg['data'] # Data is a list of lists
# Convert data to NumPy array
ecg_data = np.array(data)
# Show data as image using matplotlib
#plt.imshow(img, cmap='gray')
plt.plot(ecg_data.T) # Plot the data as graph.
plt.show(block=True) # Show image with "blocking"
Result:
A0001.mat:
A0002.mat:
Graph with labels:
# Iterate alldata elements, and show images
for data in alldata:
# The internal structure of the data is a dictionary with a dictionary.
ecg = data['ECG']
data = ecg['data'] # Data is a list of lists
# Convert data to NumPy array
#ecg_data = np.array(data)
# Show data as graph using matplotlib
# Iterate data list:
for i in range(len(data)):
# Plot the data as graph.
# Set labels d0, d1, d2...
plt.plot(data[i], label='d'+str(i))
plt.legend() # Add legend
plt.show(block=True) # Show image with "blocking"
Result:
I am trying to create a raster file after filling NO DATA with some value using gdal in Python.
I have a function that gets me the raster array.
def raster2array(rasterfn):
try:
bndNum_Val_Dic={}
raster = gdal.Open(rasterfn)
for bandNum in range(raster.RasterCount):
bandNum += 1
band=raster.GetRasterBand(bandNum)
bandVal=band.ReadAsArray()
bndNum_Val_Dic[bandNum]=bandVal
raster=None
return bndNum_Val_Dic
except Exception as e:
print(e)
Using the array generated from from this function I am trying to write my raster which throws an error at "outband.WriteArray(array)" that dict' object has no attribute 'shape'.
import numpy as np
import gdal
from osgeo import osr
rasterfn ="MAH_20.tif"
newRasterfn ="MAH_FND.tif"
array= raster2array(rasterfn)
newValue = 100
Driver= 'GTiff'
bandNumber=1
raster = gdal.Open(rasterfn)
geotransform = raster.GetGeoTransform()
originX = geotransform[0]
originY = geotransform[3]
pixelWidth = geotransform[1]
pixelHeight = geotransform[5]
cols = raster.RasterXSize
rows = raster.RasterYSize
bandCount=raster.RasterCount
rasterDataType=raster.GetRasterBand(bandNumber).DataType
global Flag
if(Flag):
driver = gdal.GetDriverByName(Driver)
global outRaster
outRaster = driver.Create(newRasterfn, cols, rows, bandCount, rasterDataType)
Flag=False
outband = outRaster.GetRasterBand(bandNumber)
outRaster.SetGeoTransform((originX, pixelWidth, 0, originY, 0, pixelHeight))
outband = outRaster.GetRasterBand(bandNumber)
outband.WriteArray(array)
outRasterSRS = osr.SpatialReference()
outRasterSRS.ImportFromWkt(raster.GetProjectionRef())
outRaster.SetProjection(outRasterSRS.ExportToWkt())
outRaster.GetRasterBand(bandNumber).SetNoDataValue(newValue)
raster=None
if(bandNumber==bandCount):
outRaster=None
outband=None
raster=None
I am using python 3.5 and GDAL 3.0.2. Is there any way to fix this?
Any help will be appreciated
You are trying two write a dictionary while GDAL expects a Numpy array. It's not completely clear which data you are trying to write, but changing your write statement to something as shown below should at least get rid of the error message. But make sure you write the correct band.
outband.WriteArray(array[bandNumber])
I have a lot of images (pydicom files). I would like to divide in half. From 1 image, I would like 2 images: part left and part right.
Input: 1000x1000
Output: 500x1000 (width x height).
Currently, I can only read a file.
ds = pydicom.read_file(image_fps[0]) # read dicom image from filepath
First part, I would like to put half in one folder and the other half to second.
This is what I have:
enter image description here
This is what I want:
enter image description here
I use Mask-RCNN to object localization problem. I would like crop 50% of image size (pydicom file).
EDIT1:
import SimpleITK as sitk
filtered_image = sitk.GetImageFromArray(left_part)
sitk.WriteImage(filtered_image, '/home/wojtek/Mask/nnna.dcm', True)
I have dicom file, but I can't display it.
this transfer syntax JPEG 2000 Image Compression (Lossless Only), can not be read because Pillow lacks the jpeg 2000 decoder plugin
Once you have executed pydicom.dcm_read() your pixel data is available at ds.pixel_array. You can just slice the data you want and save it with any suitable library. In this example I will be using matplotlib as I also use that for verifying whether my slicing is correct. Adjust to your needs obviously, one thing you need to do is generate the correct path/filenames for saving. Have fun!
(this script assumes the filepaths are available in a paths variable)
import pydicom
import matplotlib
# for testing if the slice is correct
from matplotlib import pyplot as plt
for path in paths:
# read the dicom file
ds = pydicom.dcmread(path)
# find the shape of your pixel data
shape = ds.pixel_array.shape
# get the half of the x dimension. For the y dimension use shape[0]
half_x = int(shape[1] / 2)
# slice the halves
# [first_axis, second_axis] so [:,:half_x] means slice all from first axis, slice 0 to half_x from second axis
left_part = ds.pixel_array[:, :half_x]
right_part = ds.pixel_array[:,half_x:]
# to check whether the slices are correct, matplotlib can be convenient
# plt.imshow(left_part); do not do this in the loop
# save the files, see the documentation for matplotlib if you want a different format
# bmp, png are surely supported
path_to_left_image = 'generate\the\path\and\filename\for\the\left\image.bmp'
path_to_right_image = 'generate\the\path\and\filename\for\the\right\image.bmp'
matplotlib.image.imsave(path_to_left_image, left_part)
matplotlib.image.imsave(path_to_right_image, right_part)
If you want to save the DICOM files keep in mind that they may not be valid DICOM if you do not update the appropriate data. For instance the SOP Instance UID is technically not allowed to be the same as in the original DICOM file, or any other SOP Instance UID for that matter. How important that is, is up to you.
With a script like below you can define named slices and split any dicom image file it finds in the supplied path into the appropriate slices.
import os
import pydicom
import numpy as np
def save_partials(parts, path_to_directory):
"""
parts: list of tuples, each tuple specifying a name and a list of four slice offsets
path_to_directory: path to directory containing dicom files
any file with a .dcm extension will have its image data split into the specified slices and saved accordingly.
original file will not be modified
"""
dir_content = [os.path.join(path_to_directory, item) for item in os.listdir(path_to_directory)]
files = [i for i in dir_content if os.path.isfile(os.path.join(path_to_directory, i))]
for file in files:
root, extension = os.path.splitext(file)
if extension.lower() != '.dcm':
# not a .dcm file, continue with next iteration of loop
continue
for part in parts:
ds = pydicom.read_file(file)
if not isinstance(ds.pixel_array, np.ndarray):
# no image data available
continue
part_name = part[0]
p = part[1] # slice list
ds.PixelData = ds.pixel_array[p[0]:p[1], p[2]:p[3]].tobytes()
ds.Rows = p[1] - p[0]
ds.Columns = p[3] - p[2]
##
## Here you can modify any tags using ds.KeyWord
##
new_file_name = "{r}-{pn}{ext}".format(r=root, pn=part_name, ext=extension)
ds.save_as(new_file_name)
print('saved {}'.format(new_file_name))
dir_path = '/home/wojtek/Mask'
parts = [('left', [0,512,0,256]),
('right', [0,512,256,512])]
save_partials(parts, dir_path)