h5py open file with unknown datasets - python

I try to use h5py to open a file which was created by another program. Unfortunately I don't know the inner structure of the file. All I know is that it should contain a 20x20 matrix which I would like to process with numpy.
Here is what I have done so far:
import numpy
import h5py
f = h5py.File('example.hdf5')
print(f.keys())
The result is as follows:
KeysViewWithLock(<HDF5 file "example.hdf5" (mode r+)>)
How do I go from here? I want to access the matrix as a single numpy.ndarray. The h5py documentation always talks about creating hdf5 files, not reading unknown files.
Thanks a lot.
SOLUTION (thanks to akash karothiya)
use print(list(f.keys())) instead. That gives the names of groups/datasets which can then be accessed as a=f['dataset'].

Ok, as mentioned before akash karothiya helped me find the solution.
Instead of print(f.keys()) use print(list(f.keys())). This returns ['dataset'].
Using this information I can get an h5py dataset object which I then converted into a numpy array as follows:
a = f['dataset']
b = numpy.zeros(np.shape(a), dtype=complex)
for i in range(numpy.size(a,0)):
b[i,:] = np.asarray(a[i]['real'] + 1j*a[i]['imag'], dtype=complex)
UPDATE:
New version without for loop, potentially faster and very versatile (works for both complex and real data and cubes with dimensions NxMxO as well):
a = f['dataset']
if len(a.dtype) == 0:
b = np.squeeze(a[()])
elif len(a.dtype) == 2:
b = np.squeeze(a[()]['real'] + 1.0j*a[()]['imag'])

Related

Is there a way to convert multiple tiff files to numpy array at once?

I'm doing a convolutional neural network classification and all my training tiles (1000 of them) are in geotiff format. I need to get all of them to a numpy array, but I only found code that will do it for one tiff file at a time.
Is there a way to convert a whole folder of tiff files at once?
Thanks!
Try using a for loop to go through your folder
Is your goal to get them into 1000 different numpy arrays, or to 1 numpy array? If you want the latter, it might be easiest to merge them all into a larger .tiff file, then use the code you have to process it.
If you want to get them into 1000 different arrays, this reads through a directory, uses your code to make a numpy array, and sticks them in a list.
import os
arrays_from_files = []
os.chdir("your-folder")
for name in os.listdir():
if os.path.isfile(name):
nparr = ##use your code here
arrays_from_files.append(nparr)
It might be a good idea to use a dictionary and map filenames to arrays to make debugging easier down the road.
import os
arrays_by_filename = {}
os.chdir("your-folder")
for name in os.listdir():
if os.path.isfile(name):
nparr = ##use your code here
arrays_by_filename[name] = nparr

numpy.fromfile seems to be unable to read large files

I wanted to write some very simple python helper tool for my project which is reading binary data from an ECG record. I have found somewhere that numpy.fromfile is the most appropriate tool to approach it, so I wrote:
#!/usr/bin/env python3
import sys
import numpy as np
arrayOfNums = np.fromfile(sys.argv[1], 'short')
print("Converting " + sys.argv[1] + "...")
conversionOutput = open("output", "x")
conversionOutput.write(np.array2string(arrayOfNums, separator=' '))
conversionOutput.close()
print("Conversion done.")
I did that to write the data which is 2 byte records unseparated. The input file is somewhat large for a simple text file (over 7MB), however not large enough I think to cause numpy troubles.
The output I got in the file: [-32243 -32141 -32666 ... -32580 -32635 -32690]
Why the dots between? It seems to convert it okay, but omits almost everything it is supposed to save. Any help would be appreciated.
Numpy reads correctly your file. To avoid a long display, numpy uses the dots:
import numpy as np
a = np.random.random(10000)
Output:
>>> a
array([0.20902653, 0.80097215, 0.06909818, ..., 0.5963183 , 0.94024005,
0.31870234])
>>> a.shape
(10000,)
a contains 10000 values and not only the 6 displayed values.
Update
To display the full output:
import sys
np.set_printoptions(threshold=sys.maxsize)
print(a)

Advice to go from MATLAB .mat file with structures to a numpy array?

I am trying to read in a large structure from a .mat file into a jupyter notebook. I am a little new to python so I'm not sure why my solution isn't working.
The structure from MATLAB(2020) I am reading in is structured like this
pose.frames.ind
Where there are 44 frames and in each frame 63 ind. I am reading it into a jupyter notebook like this with mat4py. I am trying to adapt someone else's code so after I read it in I need to convert it to a tensor so it can go into another function.
from mat4py import loadmat
val = loadmat('pose.mat')
pose_body = val['pose']['frames']['ind'][0]
pose_body = np.asarray(pose_body)
pose_body = torch.FloatTensor(pose_body).to(comp_device)
When I feed pose_body = np.zeros([1,63]) into the line that changes it to a torch tensor, the code works fine. However, when I try to feed it this array I imported something goes wrong and I get the error of
RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 2 and 1 at C:/w/1/s/tmp_conda_3.7_055306/conda/conda-bld/pytorch_1556690124416/work/aten/src\THC/generic/THCTensorMath.cu:62
Is there an easier/better way to convert the data from matlab in the format I need? I am a little unfamiliar with python, and it seems like they're the same type of array. From doing type(pose_body), and tf.size(pose_body) I know that both numpy arrays have a shape of 63, a dtype of int32, and are of class "numpy.ndarray".

Memmap in MATLAB for huge arrays

I want to create a memmap in MATLAB.
In python I could do this by:
ut = np.memmap('my_array.mmap', dtype=np.float64, mode='w+', shape=(140000,3504))
Then I use it as a normal array, the OS ensured my memory never overflowed. How to do this in MATLAB?
From the docs it seems it wants me to create some array in MATLAB first then write it to a file and read using memmap!
Matlab docs are not clear enough:
Please provide an example of creating an random array of size (140000,15000) and multiply it some other similar matrix.
You have to create an empty file first, then use memmapfile:
size=[140000,3504];
filesize=0;
datatype='float64';
filename='my_array.dat';
fid=fopen(filename,'w+');
max_chunk_size=1000000;
%fills an empty file
while filesize<prod(size)
to_write=min(prod(size)-filesize,max_chunk_size);
filesize=filesize+fwrite(f, zeros(to_write,1), datatype);
end
fclose(fid);
m = memmapfile(filename,'Format','double', 'Writable',true);
I think what you are looking for is the function memmapfile
Example:
m = memmapfile('my_array.dat','Format','double', 'Writable',true)

Matrix from Python to MATLAB

I'm working with Python and MATLAB right now and I have a 2D array in Python that I need to write to a file and then be able to read it into MATLAB as a matrix. Any ideas on how to do this?
Thanks!
If you use numpy/scipy, you can use the scipy.io.savemat function:
import numpy, scipy.io
arr = numpy.arange(9) # 1d array of 9 numbers
arr = arr.reshape((3, 3)) # 2d array of 3x3
scipy.io.savemat('c:/tmp/arrdata.mat', mdict={'arr': arr})
Now, you can load this data into MATLAB using File -> Load Data. Select the file and the arr variable (a 3x3 matrix) will be available in your environment.
Note: I did this on scipy 0.7.0. (scipy 0.6 has savemat in the scipy.io.mio module.) See the latest documentation for more detail
EDIT: updated link thanks to #gnovice.
I think ars has the most straight-forward answer for saving the data to a .mat file from Python (using savemat). To add just a little to their answer, you can also load the .mat file into MATLAB programmatically using the LOAD function instead of doing it by hand using the MATLAB command window menu...
You can use either the command syntax form of LOAD:
load c:/tmp/arrdata.mat
or the function syntax form (if you have the file path stored in a string):
filePath = 'c:/tmp/arrdata.mat';
data = load(filePath);
I wrote a small function to do this same thing, without need for numpy. It takes a list of lists and returns a string with a MATLAB-formatted matrix.
def arrayOfArrayToMatlabString(array):
return '[' + "\n ".join(" ".join("%6g" % val for val in line) for line in array) + ']'
Write "myMatrix = " + arrayOfArrayToMatlabString(array) to a .m file, open it in matlab, and execute it.
I would probably use numpy.savetxt('yourfile.mat',yourarray) in Python
and then yourarray = load('yourfile.mat') in MATLAB.
You could write the matrix in Python to a CSV file and read it in MATLAB using csvread.
You can also call matlab directly from python:
from mlabwrap import mlab
import numpy
a = numpy.array([1,2,3])
mlab.plot(a)
The toolbox npy-matlab can read *.npy binary files into MATLAB. *.npy files can be directly exported with the NumPy module. From the documentation:
>> a = rand(5,4,3);
>> writeNPY(a, 'a.npy');
>> b = readNPY('a.npy');
>> sum(a(:)==b(:))
ans =
60
npy-matlab is a simple collection of M-files available from GitHub, with a 2-clause BSD licence.

Categories