numpy.fromfile differences between python 2.7.3 and 2.7.6 - python

I'm having an issue running code between two consoles and I've gotten it down to a difference between the versions of python installed on these computers (2.7.3 and 2.7.6 respectively).
Here is the input file found on github (https://github.com/tkkanno/PhD_work/blob/master/1r).
when in python 2.7.3 and numpy version 1.11.1 the following code works as expected:
import numpy as np
s = 'directory/to/file'
f = open(s, 'rb')
y = np.fromfile(f,'<l')
y.shape
this give gets an numpy array of shape (16384,). However, when it is run on python 2.7.6/numpy 1.11.1 it gives an array half the size (8192,). This isnt' acceptable for me
I can't understand why numpy is acting this way with different versions of python. I would be grateful for any suggestions

Converted from my comment:
You're likely running on different Python/OS builds with different notions of how big a long is. C doesn't require a specific long size, and in practice, Windows always treats it as 32 bits, while other common desktop OSes treat it as 32 bits if the OS & Python are built for 32 bit CPUs (ILP32), and 64 bits if built for 64 bit CPUs (LP64).
If you want a fixed width type on all OSes, don't use the system-dependent-width types. Use fixed width types instead. From your comments, the expected behavior is to load 32 bit/4 byte values. If you were just using native endianness, you could just pass numpy.int32 (numpy recognizes the raw class as datatypes). Since you want to specify endianness explicitly (perhaps this might run on a big endian system), you can instead pass '<i4' which explicitly states it's a little endian (<) signed integer (i) four bytes in size (4):
import numpy as np
s = 'directory/to/file'
with open(s, 'rb') as f: # Use with statements when opening files
y = np.fromfile(f, '<i4') # Use explicit fixed width type
y.shape

Related

trouble saving numpy array to matlab readable file

I have an image sequence as a numpy array;
Mov (15916, 480, 768)
dtype = int16
i've tried using Mov.tofile(filename)
this saves the array and I can load it again in python and view the images.
In matlab the images are corrupted after about 3000 frames.
Using the following also works but has the same problem when I retrieve the images in matlab;
fp = np.memmap(sbxpath, dtype='int16', mode='w+', shape=Mov.shape)
fp[:,:,:] = Mov[:,:,:]
If I use:
mv['mov'] = Mov
sio.savemat(sbxpath, mv)
I get the following error;
OverflowError: Python int too large to convert to C long
what am I doing wrong?
I'm sorry for this, because it is a beginners problem. Python saves variables as integers or floats depending on how they are initialized. Matlab defaults to 8 byte doubles. My matlab script expects doubles, my python script was outputting all kinds of variable types, so naturally things got messed up.

np.fromfile with count=-1 adds unexpected zeros

I am trying to use np.fromfile in order to read a binary file that I have written with Fortran using direct access. However if I set count=-1, instead of max_items, np.fromfile returns a larger array than expected; adding zeros to the vector I've written in binary.
Fortran test code:
program testing
implicit none
integer*4::i
open(1,access='DIRECT', recl=20, file='mytest', form='unformatted',convert='big_endian')
write(1,rec=1) (i,i=1,20)
close(1)
end program
How I am using np.fromfile:
import numpy as np
f=open('mytest','rb')
f.seek(0)
x=np.fromfile(f,'>i4',count=20)
print len(x),x
so if I use it like this it returns exactly my [1,...,20] np array, but setting count=-1 returns [1,...,20,0,0,0,0,0] with a size of 1600.
I am using a little endian machine (shouldn't affect anything) and I am compiling the Fortran code with ifort.
I am just curious about the reason this happens, to avoid any surprises in the future.

Read a float binary file into 2D arrays in python and matlab

I have some binary input files (extension ".bin") that describe a 2D field of ocean depth and which are all negative float numbers. I have been able to load them in matlab as follows:
f = fopen(filename,'r','b');
data = reshape(fread(f,'float32'),[128 64]);
This matlab code gives me double values between 0 and -5200. However, when I try to the same in Python, I strangely get values between 0 and 1e-37. The Python code is:
f = open(filename, 'rb')
data = np.fromfile(f, np.float32)
data.shape = (64,128)
The strange thing is that there is a mask value of 0 for land which shows up in the right places in the (64,128) array in both cases. It seems to just be the magnitude and sign of the numpy.float32 values that are off.
What am I doing wrong in the Python code?
numpy.fromfile isn't platform independant, especially the "byte-order" is mentioned in the documentation:
Do not rely on the combination of tofile and fromfile for data storage, as the binary files generated are are not platform independent. In particular, no byte-order or data-type information is saved.
You could try:
data = np.fromfile(f, '>f4') # big-endian float32
and:
data = np.fromfile(f, '<f4') # little-endian float32
and check which one (big endian or little endian) gives the correct values.
Base on your matlab fopen, the file is in big endian ('b'). But your python code does not take care of the endianness.

struct.error: unpack requires a string argument of length 4 - audio file

I am a very beginner in programming and I use Ubuntu.
But now I am trying to perform sound analysis with Python.
In the following code I used wav package to open the wav file and the struct to convert the information:
from wav import *
from struct import *
fp = wave.open(sound.wav, "rb")
total_num_samps = fp.getnframes()
num_fft = (total_num_samps / 512) - 2 #for a fft lenght of 512
for i in range(num_fft):
tempb = fp.readframes(512);
tempb2 = struct.unpack('f', tempb)
print (tempb2)
So in terminal the message that appears is:
struct.error: unpack requires a string argument of length 4
Please, can someone help me to solve this? Someone have a suggestion of other strategy to interpret the sound file?
The format string provided to struct has to tell it exactly the format of the second argument. For example, "there are one hundred and three unsigned shorts". The way you've written it, the format string says "there is exactly one float". But then you provide it a string with way more data than that, and it barfs.
So issue one is that you need to specify the exact number of packed c types in your byte string. In this case, 512 (the number of frames) times the number of channels (likely 2, but your code doesn't take this into account).
The second issue is that your .wav file simply doesn't contain floats. If it's 8-bit, it contains unsigned chars, if it's 16 bit it contains signed shorts, etc. You can check the actual sample width for your .wav by doing fp.getsampwidth().
So then: let's assume you have 512 frames of two-channel 16 bit audio; you would write the call to struct as something like:
channels = fp.getnchannels()
...
tempb = fp.readframes(512);
tempb2 = struct.unpack('{}h'.format(512*channels), tempb)
Using SciPy, you could load the .wav file into a NumPy array using:
import scipy.io.wavfile as wavfile
sample_rate, data = wavfile.read(FILENAME)
NumPy/SciPy will also be useful for computing the FFT.
Tips:
On Ubuntu, you can install NumPy/SciPy with
sudo apt-get install python-scipy
This will install NumPy as well, since NumPy is a dependency of SciPy.
Avoid using * imports such as from struct import *. This copies
names from the struct namespace into the current module's global
namespace. Although it saves you a bit of typing, you pay an awful
price later when the script becomes more complex and you lose
track of where variables are coming from (or worse, the imported variables
mask the value of other variables with the same name).

It appears I've run out of 32-bit address space. What are my options?

I'm trying to take the covariance of a large matrix using numpy.cov. I get the following error:
Python(22498,0xa02e3720) malloc: *** mmap(size=1340379136) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Process Python bus error
It seems that this is not uncommon for 32-bit machines/builds (I have a 64-bit mac os x 10.5, but using a 32-bit python and numpy build as I had trouble building numpy+scipy+matplotlib on a 64-bit installation).
So at this point what would be the recommended course of action that will allow me to proceed with the analysis, if not switching machines (none others are available to me at the moment)? Export to fortran/C? Is there a simple(r) solution? Thanks for your suggestions.
To be at your place, I would try to "pickle" (save) the matrix on my hard drive, close python , then in command line re-open the pickeled file and do my computation on a "fresh python" instance.
I would do that because maybe your problem is before computing the covariance.
import cPickle
import numpy
M = numpy.array([[1,2],[3,4]]) # here it will be your matrix
cPickle( M , open( "~/M.pic", "w") ) # here it's where you pickle the file
Here you close python. Your file should be saved in you home directory as "M.pic".
import cPickle
import numpy
M = cPickle.load( open( "~/M.pic", "r") )
M = numpy.coa( M )
If it still does not work, try setting a "good" dtype for your data. numpy seams to use dtype 'float64' of 'int64' by default. This is huge and if you do not need this precision, you might want to reduce it to 'int32' or 'float32'.
import numpy
M = numpy.array([[1,2],[3,4]] , dtype.float32 )
Indeed, I can guarantee you that C/Fortran is not an option for you. Numpy is already written in C/Fortran and probably by people cleverer than you and me ;)
By curiosity, how big is your matrix? how big is your pickled file?

Categories