Read a float binary file into 2D arrays in python and matlab

Read a float binary file into 2D arrays in python and matlab - python

I have some binary input files (extension ".bin") that describe a 2D field of ocean depth and which are all negative float numbers. I have been able to load them in matlab as follows:
f = fopen(filename,'r','b');
data = reshape(fread(f,'float32'),[128 64]);
This matlab code gives me double values between 0 and -5200. However, when I try to the same in Python, I strangely get values between 0 and 1e-37. The Python code is:
f = open(filename, 'rb')
data = np.fromfile(f, np.float32)
data.shape = (64,128)
The strange thing is that there is a mask value of 0 for land which shows up in the right places in the (64,128) array in both cases. It seems to just be the magnitude and sign of the numpy.float32 values that are off.
What am I doing wrong in the Python code?

numpy.fromfile isn't platform independant, especially the "byte-order" is mentioned in the documentation:
Do not rely on the combination of tofile and fromfile for data storage, as the binary files generated are are not platform independent. In particular, no byte-order or data-type information is saved.
You could try:
data = np.fromfile(f, '>f4') # big-endian float32
and:
data = np.fromfile(f, '<f4') # little-endian float32
and check which one (big endian or little endian) gives the correct values.

Base on your matlab fopen, the file is in big endian ('b'). But your python code does not take care of the endianness.

Related

trouble saving numpy array to matlab readable file

I have an image sequence as a numpy array;
Mov (15916, 480, 768)
dtype = int16
i've tried using Mov.tofile(filename)
this saves the array and I can load it again in python and view the images.
In matlab the images are corrupted after about 3000 frames.
Using the following also works but has the same problem when I retrieve the images in matlab;
fp = np.memmap(sbxpath, dtype='int16', mode='w+', shape=Mov.shape)
fp[:,:,:] = Mov[:,:,:]
If I use:
mv['mov'] = Mov
sio.savemat(sbxpath, mv)
I get the following error;
OverflowError: Python int too large to convert to C long
what am I doing wrong?

I'm sorry for this, because it is a beginners problem. Python saves variables as integers or floats depending on how they are initialized. Matlab defaults to 8 byte doubles. My matlab script expects doubles, my python script was outputting all kinds of variable types, so naturally things got messed up.

Converting float32 to bit-equivalent int32

The Pillow module in Python insists on opening a 32-bit/pixel TIFF file I have as if the pixels were of type float32, whereas I believe the correct interpretation is unsigned int32. If I go ahead and load the data into a 640x512 array of type float32, how can I retype it as uint32 while preserving the underlying binary representation?
In Fortran and C, it's easy to have pointers or data structures of different type pointing to the same block of physical memory so that the raw memory contents can be easily be interpreted according to whatever type I want. Is there an equivalent procedure in Python?
Sample follows (note that I have no information about compression etc.; the file in question was extracted by a commercial software program from a proprietary file format):
from PIL import Image
infile = "20181016_071207_367_R.tif"
im = Image.open(infile)
data = np.array(im.getdata())
print(data)
[ -9.99117374 -10.36103535 -9.80696869 ... -18.41988373 -18.35027885
-18.69905663]

Assuming you have im.mode originally equal to F, you can force Pillow to re-load the same data under a different mode (an very unusual desire indeed) in a somewhat hackish way like that:
imnew = im.convert(mode='I')
imnew.frombytes(im.tobytes())
More generally (outside the context of PIL), whenever you encounter the need to deal with raw memory representation in Python, you should usually rely on numpy or Python's built-in memoryview class with the struct module.
Here is an example of reinterpreting an array of numpy float32 as int32:
a = np.array([1.0, 2.0, 3.0], dtype='float32')
a_as_int32 = a.view('int32')
Here is an example of doing the same using memoryview:
# Create a memory buffer
b = bytearray(4*3)
# Write three floats
struct.pack_into('fff', b, 0, *[1.0, 2.0, 3.0])
# View the same memory as three ints
mem_as_ints = memoryview(b).cast('I')

The answer, in this case, is that Pillow is loading the image with the correct type (float 32) as specified in the image exported from the thermal camera. There is no need to cast the image to integer, and doing so would cause an incorrect result.

Changing a wav file's volume using Python

I have tried using the Pydub library; however, it only allows the reduction or increase of a certain amount of decibels. How would I proceed if I wanted, for example, to reduce the volume of the wav by a certain percent?

This is simple enough to just do with just the tools in the stdlib.
First, you use wave to open the input file and create an output file:
pathout = os.path.splitext(path)[0] + '-quiet.wav'
with wave.open(path, 'rb') as fin, wave.open(pathout, 'wb') as fout:
Now, you have to copy over all the wave params, and hold onto the sample width for later:
fout.setparams(fin.getparams())
sampwidth = fin.getsampwidth()
Then you loop over frames until done:
while True:
frames = bytearray(fin.readframes(1024))
if not frames:
break
You can use audioop to process this data:
frames = audioop.mul(frames, sampwidth, factor)
… but this will only work for 16-bit little-endian signed LPCM wave files (the most common kind, but not the only kind). You could solve that with other functions—most importantly, lin2lin to handle 8-bit unsigned LPCM (the second most common kind). But it's worth understanding how to do it manually:
for i in range(0, len(frames), sampwidth):
if sampwidth == 1:
# 8-bit unsigned
frames[i] = int(round((sample[0] - 128) * factor + 128)
else:
# 16-, 24-, or 32-bit signed
sample = int.from_bytes(frames[i:i+sampwidth], 'little', signed=True)
quiet = round(sample * factor)
frames[i:i+sampwidth] = int(quiet).to_bytes(sampwidth, 'little', signed=True)
audioop.mul only handles the else part—but it does more than I've done here. In particular, it has to handle cases with factors over 1—a naive multiply would clip, which will just add weird distortion without adding the desired max energy. (It's worth reading the pure Python implementation from PyPy if you want to learn the basics of this stuff.)
If you also want to handle float32 files, you need to look at the format, because they have the same sampwidth as int32, and you'll probably want the struct module or the array module to pack and unpack them. If you want to handle even less common formats, like a-law and µ-law, you'll need to read a more detailed format spec. Notice that audioop has tools for handling most of them, like ulaw2lin to convert µ-law to LPCM so you can process it and convert it back—but again, it might be worth learning how to do it manually. And for some of them, like CoolEdit float24/32, you pretty much have to do it manually.
Anyway, once you've got the quieted frames, you just write them out:
fout.writeframes(frames)

You could use the mul function from the built-in audioop module. This is what pydub uses internally, after converting the decibel value to a multiplication factor.

interpreting binary data in Python

I am using Python 2.7
I am trying to use a large binary file that specifies the latitude/longitude of each pixel of an image.
Using this code: open('input', 'rb').read(50)
The file looks like this:
\x80\x0c\x00\x00\x00\x06\x00\x00.\x18\xca\xe4.\x18\xcc\xe4.\x18\xcf\xe4.\x18\xd1\xe4.\x18\xd3\xe4.\x18\xd5\xe4.\x18\xd7\xe4.\x18\xd9\xe4/\x18\xdb\xe4/\x18\xdd\xe4/\x18...
The read-me for the file gives the following information for decoding but I am unsure of how to apply them:
The files are in LSBF byte order. Files start with 2 4-byte integer values giving the pixel and line (x,y) size of the file. After the files have succeeding pairs of elements are 2-byte integer values of latitude and longitude multiplied 100 and truncated (e.g. 75.324 E is "-7532").
Thanks for any help.
Note the reason for doing this ultimately would be to draw/alter an image based on lat/long and not the pixel # in case anyone was wondering.

Generally when working with binary files you'll want to use the pack and unpack (https://docs.python.org/2/library/struct.html) functions. That will allow you to control the endianess of the data as well as the data types. In your specific case you'll probably want to read the header info first, something like
with open('input', 'rb') as fp:
# read the header bytes to get the number of elements
header_bytes = fp.read(8)
# convert the bytes to two unsigned integers.
x, y = unpack("II", header_bytes)
# Loop through the file getting the rest of the data
# read Y lines of X pixels
...
The above was not actually run or test, just trying to give you a general sense of an approach.

how to open a large bin file ( ~ 8 GB image stream) in opencv python?

Background
The binary file contain successive raw output from a camera sensor which is in the form of a bayer pattern. i.e. the data is successive blocks containing information of the form shown below and where each block is a image in image stream
[(bayer width) * (bayer height) * sizeof(short)]
Objective
To read information from a specific block of data and store it as an array for processing. I was digging through the opencv documentation, and totally lost on how to proceed. I apologize for the novice question but any suggestions?

Assuming you can read the binary file (as a whole), I would try to use
Numpy to read it into a numpy.array. You can use numpy.fromstring and depending on the system the file was written on (little or big endian), use >i2 or <i2 as your data type (you can find the list of data types here).
Also note that > means big endian and < means little endian (more on that here)
You can set an offset and specify the length in order to read to read a certain block.
import numpy as np
with open('datafile.bin','r') as f:
dataBytes = f.read()
data = np.fromstring(dataBytes[blockStartIndex:blockEndIndex], dtype='>i2')
In case you cannot read the file as a whole, I would use mmap (requires a little knowledge of C) in order to break it down to multiple files and then use the method above.

OP here, with #lsxliron's suggestion I looked into using Numpy to achieve my goals and this is what I ended up doing
import numpy as np
# Data read length
length = (bayer width) * (bayer height)
# In terms of bytes: short = 2
step = 2 * length
# Open filename
img = open("filename","rb")
# Block we are interested in i
img.seek(i * step)
# Copy data as Numpy array
Bayer = np.fromfime(img,dtype=np.uint16,count=length)
Bayer now holds the bayer pattern values in the form of an numpy array success!!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read a float binary file into 2D arrays in python and matlab - python

Base on your matlab fopen, the file is in big endian ('b'). But your python code does not take care of the endianness.

Related

trouble saving numpy array to matlab readable file

Converting float32 to bit-equivalent int32

Changing a wav file's volume using Python

interpreting binary data in Python

how to open a large bin file ( ~ 8 GB image stream) in opencv python?

Categories

Resources