different result loading binary files matlab and python - python

I am trying to read a binary file from a readout board that will be converted to an image. In Matlab, all the bytes are correctly read and the image is completelly populated. But in python (ver2.7 using anaconda) there is a line of zeros every 127 columns.
The Matlab code is:
fid = fopen(filename);
Rawdata = fread(fid,'uint8');
Data1d = Rawdata(2:2:end).* 256+ Rawdata(1:2:end) ;
% converts Data1 to a 2D vector, adding a row of zeros to make the reshape
% possible to 3D
Data2d = [reshape(Data1d,4127,1792); zeros(1,1792)];
% reshapes again, but adding a new dimension
Data3d = reshape(Data2d(:),129,32,1792);
% selects the first 128 values in the first dimension
Data3d = Data3d(1:128,:,:);
Data2d = reshape(Data3d(:),4096,1792);
Data2d = Data2d';
CMVimage = Data2d;
fclose(fid); %VGM 2017-01-14 the file should be closed.
In python I tried np.fromfile() and directly reading from python using f.read()
with the same result.
import numpy as np
import matplotlib.pyplot as plt
"""
reads the input .dat file and converts it to an image
Problem: line of zeros every 127 columns in columns: 127,257,368...
curiosly, the columns are in the position of the new byte.
In matlab it works very well.
"""
def readDatFile(filename):
""" reads the binary file in python not in numpy
the data is byte type and it is converted to integer.
"""
import binascii
f = open(filename, 'rb')
data = f.read()
#dataByte = bytearray(data)
f.close()
data_out = []
for num in data:
aux = int(binascii.hexlify(num), 16)
data_out.append(aux)
#print aux
myarray = np.asarray(data_out)
return myarray
def rawConversionNew(filename):
# reads data from a binary file with tupe uint
# f = open(filename, 'rb')
# Rawdata = np.fromfile(f, dtype=np.uint8)
# f.close()
Rawdata = readDatFile(filename)
## gets the image
Data1d = 256*Rawdata[1::2] + Rawdata[0::2]
Data2d = Data1d.reshape(1792,4127)
Data2d = Data2d.T
Data2d = np.vstack([Data2d,np.zeros((1,1792),dtype=np.uint16)] )
Data3d = Data2d.reshape(129,32,1792)
Data3d = Data3d[0:128,:,:]
#plt.figure()
#plt.plot(np.arange(Data3d.shape[0]),Data3d[:,1,1])
#print (Data3d[:,0,0])
CMVimage = Data3d.reshape(4096,1792).T
return CMVimage

There were in fact two errors, not labeling the file as binary ("rb") and the reshape, which is done in a different way in Matlab and numpy.
If the reshape is done using reshape(dim1,dim2,order='F') the results are the same. Check: Matlab vs Python: Reshape

Related

Write all numpy arrays to binary file in a loop

I have this code:
from osgeo import gdal
import numpy as np
ds = gdal.Open('image.tif')
# loop through each band
for bi in range(ds.RasterCount):
band = ds.GetRasterBand(bi + 1)
# Read this band into a 2D NumPy array
ar = band.ReadAsArray()
print('Band %d has type %s'%(bi + 1, ar.dtype))
ar.astype('uint16').tofile("converted.raw")
As a result, I get the converted.raw file, but it only contains data from the last iteration of the for loop. How to make a file that will contain data from all iterations together.
Use np.save
Ex:
ds = gdal.Open('image.tif')
# loop through each band
with open("converted.raw", "wb") as outfile:
for bi in range(ds.RasterCount):
band = ds.GetRasterBand(bi + 1)
# Read this band into a 2D NumPy array
ar = band.ReadAsArray()
print('Band %d has type %s'%(bi + 1, ar.dtype))
np.save(outfile, ar.astype('uint16'))

Python input for Spectral Clustering

I am using the code from https://github.com/pin3da/spectral-clustering/blob/master/spectral/utils.py to spectrally cluster data in https://cs.joensuu.fi/sipu/datasets/s1.txt
May i know how I can change the code such that it can take in txt file as input?
I have given the original code below for reference
Original code from GitHub
import numpy
import scipy.io
import h5py
def load_dot_mat(path, db_name):
try:
mat = scipy.io.loadmat(path)
except NotImplementedError:
mat = h5py.File(path)
return numpy.array(mat[db_name]).transpose()
I do not understand the purpose of the variable, db_name
The code you show here just opens a given mat or h5 file. The path to the file (path) and the name of the data set within the file (db_name) are provided as arguments to the load_dot_mat function.
To load your txt file, we can create our own little load function:
def load_txt(filename):
with open(filename, "r") as f:
data = [[int(x) for x in line.split(" ") if x != ""] for line in f]
return np.array(data)
This function takes the path to your "txt" file as an argument an returns a numpy array with the data from your file. The data array has shape (5000,2) for the file you provided. You may want to use float instead of int, if other files contain float values and not only integers.
The complete clustering step for your data could then look like this:
from itertools import cycle, islice
import matplotlib.pyplot as plt
import numpy as np
import seaborn
from spectral import affinity, clustering
seaborn.set()
def load_txt(filename):
with open(filename, "r") as f:
data = [[int(x) for x in line.split(" ") if x != ""] for line in f]
return np.array(data)
data = load_txt("s1.txt")
A = affinity.com_aff_local_scaling(data)
n_cls = 15 # found by looking at your data
Y = clustering.spectral_clustering(A, n_cls)
colors = np.array(list(islice(cycle(seaborn.color_palette()), int(max(Y) + 1))))
fig = plt.figure(1)
ax = fig.add_subplot(111)
ax.scatter(data[:, 0], data[:, 1], color=colors[Y], s=6, alpha=0.6)
plt.show()

python read and convert raw 3d image file

I am working with ct scans medical images in raw format. It is basically a 3d matrix of voxels (512*512*nb of slices). I'd like to extract each slice of the file into separate files.
import numpy as np
import matplotlib.pyplot as plt
# reading the raw image into a string. The image files can be found at:
# https://grand-challenge.org/site/anode09/details/
f = open('test01.raw', 'rb')
img_str = f.read()
# converting to a uint16 numpy array
img_arr = np.fromstring(img_str, np.uint16)
# get the first image and plot it
im1 = img_arr[0:512*512]
im1 = np.reshape(im1, (512, 512))
plt.imshow(im1, cmap=plt.cm.gray_r)
plt.show()
The result definitely looks like a chest ct scan, but the texture of the image is strange, as if the pixels were misplaced.
Some relevant info might be located in the associated .mhd info file, but I'm not sure where to look:
ObjectType = Image
NDims = 3
BinaryData = True
BinaryDataByteOrderMSB = False
CompressedData = False
TransformMatrix = 1 0 0 0 1 0 0 0 1
Offset = 0 0 0
CenterOfRotation = 0 0 0
AnatomicalOrientation = RPI
ElementSpacing = 0.697266 0.697266 0.7
DimSize = 512 512 459
ElementType = MET_SHORT
ElementDataFile = test01.raw
Try it this way:
Dim_size=np.array((512,512,459),dtype=np.int) #Or read that from your mhd info File
f = open(FileName,'rb') #only opens the file for reading
img_arr=np.fromfile(f,dtype=np.uint16)
img_arr=img_arr.reshape(Dim_size[0],Dim_size[1],Dim_size[2])
if you are Memory limited read the file in chunks
f = open(FileName,'rb') #only opens the file for reading
for i in range(0,Dim_size[2]):
img_arr=np.fromfile(f,dtype=np.uint16,count=Dim_size[0]*Dim_size[1])
img=img.reshape(Dim_size[0],Dim_size[1])
#Do something with the Slice
A good way to show what's actually in the raw- File would also be to read it in ImageJ. For reading such ITK compatible files, there is even a PlugIn available, but direct raw import should also work.
https://imagej.net/Welcome
http://ij-plugins.sourceforge.net/plugins/3d-io/

Cannot load an array with numpy load()

I cannot load an array from a binary file. What am I doing wrong?
pic = imread('headey-640.bmp')
save('test.in.npy', pic)
f = open('test.in.npy','r')
A = load(f)
---------------------------------------------------------------------------
ValueError: total size of new array must be unchanged
You have to open your file in binary mode:
import numpy as np
x = np.array([1,2,3])
np.save("test.npy", x)
with open("test.npy", "rb") as npy:
a = np.load(npy)

python numpy: array of arrays

I'm trying to build a numpy array of arrays of arrays with the following code below.
Which gives me a
ValueError: setting an array element with a sequence.
My guess is that in numpy I need to declare the arrays as multi-dimensional from the beginning, but I'm not sure..
How can I fix the the code below so that I can build array of array of arrays?
from PIL import Image
import pickle
import os
import numpy
indir1 = 'PositiveResize'
trainimage = numpy.empty(2)
trainpixels = numpy.empty(80000)
trainlabels = numpy.empty(80000)
validimage = numpy.empty(2)
validpixels = numpy.empty(10000)
validlabels = numpy.empty(10000)
testimage = numpy.empty(2)
testpixels = numpy.empty(10408)
testlabels = numpy.empty(10408)
i=0
tr=0
va=0
te=0
for (root, dirs, filenames) in os.walk(indir1):
print 'hello'
for f in filenames:
try:
im = Image.open(os.path.join(root,f))
Imv=im.load()
x,y=im.size
pixelv = numpy.empty(6400)
ind=0
for i in range(x):
for j in range(y):
temp=float(Imv[j,i])
temp=float(temp/255.0)
pixelv[ind]=temp
ind+=1
if i<40000:
trainpixels[tr]=pixelv
tr+=1
elif i<45000:
validpixels[va]=pixelv
va+=1
else:
testpixels[te]=pixelv
te+=1
print str(i)+'\t'+str(f)
i+=1
except IOError:
continue
trainimage[0]=trainpixels
trainimage[1]=trainlabels
validimage[0]=validpixels
validimage[1]=validlabels
testimage[0]=testpixels
testimage[1]=testlabels
Don't try to smash your entire object into a numpy array. If you have distinct things, use a numpy array for each one then use an appropriate data structure to hold them together.
For instance, if you want to do computations across images then you probably want to just store the pixels and labels in separate arrays.
trainpixels = np.empty([10000, 80, 80])
trainlabels = np.empty(10000)
for i in range(10000):
trainpixels[i] = ...
trainlabels[i] = ...
To access an individual image's data:
imagepixels = trainpixels[253]
imagelabel = trainlabels[253]
And you can easily do stuff like compute summary statistics over the images.
meanimage = np.mean(trainpixels, axis=0)
meanlabel = np.mean(trainlabels)
If you really want all the data to be in the same object, you should probably use a struct array as Eelco Hoogendoorn suggests. Some example usage:
# Construction and assignment
trainimages = np.empty(10000, dtype=[('label', np.int), ('pixel', np.int, (80,80))])
for i in range(10000):
trainimages['label'][i] = ...
trainimages['pixel'][i] = ...
# Summary statistics
meanimage = np.mean(trainimages['pixel'], axis=0)
meanlabel = np.mean(trainimages['label'])
# Accessing a single image
image = trainimages[253]
imagepixels, imagelabel = trainimages[['pixel', 'label']][253]
Alternatively, if you want to process each one separately, you could store each image's data in separate arrays and bind them together in a tuple or dictionary, then store all of that in a list.
trainimages = []
for i in range(10000):
pixels = ...
label = ...
image = (pixels, label)
trainimages.append(image)
Now to access a single images data:
imagepixels, imagelabel = trainimages[253]
This makes it more intuitive to access a single image, but because all the data is not in one big numpy array you don't get easy access to functions that work across images.
Refer to the examples in numpy.empty:
>>> np.empty([2, 2])
array([[ -9.74499359e+001, 6.69583040e-309],
[ 2.13182611e-314, 3.06959433e-309]]) #random
Give your images a shape with the N dimensions:
testpixels = numpy.empty([96, 96])

Categories