Reading wav files - Native int16 values not doubles - python

Is it possible to extract the same values as Python does in their library?
For example, I'm using C++ and have managed to extract values of type double from a .wav file, however, when working in Python the values are in int16 format. Here is what I have so far:
uint16_t c;
for(unsigned i=0; (i < size); i++)
{
c = (unsigned)(data[i]);
rawSignal.push_back(c);
}
This does not work, because when reading the wav file in using Python:
w = wave.open(wavefile,"rb")
p = w.getparams()
s = w.readframes(p[3])
w.close()
sd = np.fromstring(s, np.int16)
I get the following graph displayed:
However using the method above, I get the following:
NOTE: When using double values, it displays correctly. So is there an effective way to convert the double into the 'uint16_t' format?

Related

Using Python to write a mix of integer and floating point numbers to a binary file read by a code in C

I have code in C which reads data from a file in a binary format:
FILE *file;
int int_var;
double double_var;
file = fopen("file.dat", "r");
fread(&int_var, sizeof(int), 1, file);
fread(&double_var, sizeof(double), 1, file);
The above is a simplified but accurate version of the actual code. I have no choice over this code or the format of this file.
The data being read in C is produced using Python code. How do I write this data to a file in the same binary format? I looked into bytes and bytearrays, but they seem to only work with integers and strings. I need something like:
f = open('file.dat', 'wb')
f.write(5)
f.write(5.0)
f.close()
that will work with the above C code.
As mentioned in a comment, you need the struct library:
Creating file.dat with
#!/usr/bin/env python3
import struct
with open('file.dat', 'wb') as f:
f.write(struct.pack('=id', 1, 5.0))
and then reading it with
#include <stdio.h>
int main(void) {
int int_var;
double double_var;
FILE *file = fopen("file.dat", "rb");
if (!file) {
fprintf(stderr, "couldn't open file.dat!\n");
return 1;
}
if (fread(&int_var, sizeof(int), 1, file) != 1) {
fprintf(stderr, "failed to read int!\n");
return 1;
}
if (fread(&double_var, sizeof(double), 1, file) != 1) {
fprintf(stderr, "failed to read double!\n");
return 1;
}
printf("int = %d\ndouble = %f\n", int_var, double_var);
fclose(file);
return 0;
}
will output
int = 1
double = 5.000000
Note the = in the pack format definition; that tells python not to add alignment padding bytes like you'd get in a C structure like
struct foo {
int int_var;
double double_Var;
};
Without that, you'll get unexpected results reading the double in this example. You also have to worry a little bit about endianess if you want the file to be portably read on any other computer.

How to read struct.pack encoded string in Objective-C?

I have a Python server that sends data to the client as bytes using the struct.pack function. The data is constructed as struct.pack("!bhhh", 0x1, x, y, z).
How do I read back all the arguments on the client side in Objecive-C?
I use the following code right now:
NSString *command = [[NSString alloc] initWithBytes:buffer length:len encoding:NSASCIIStringEncoding];
and get a result as
ÿÿÿø
Since you are using the ! prefix, the data is written in Big Endian. This prevents you from simply type-casting your input data to the appropriate type.
Therefore, you need to calculate the Words (your three h values) using bitwise shifting. You can easily create a macro to simplify this:
// Read a byte (b) from buf at pos
#define GET_B(buf, pos) (uint8)buf[0]
// Read a (signed) word (H) from buf at pos
#define GET_h(buf, pos) ((int8_t)buf[pos+1] | (int8_t)buf[pos]<<8)
// Read an (unsigned) word (h) from buf at pos
#define GET_H(buf, pos) ((uint8)buf[pos+1] | (uint8)buf[pos]<<8)
Using this with some example input looks like this:
// created with struct.pack('!bhhh', 1, -200, -300, -400)
unsigned char input[] = {0x01, 0xff, '8', 0xfe, 0xd4, 0xfe, 'p'};
NSLog(#"a: %d", GET_B(input, 0));
NSLog(#"x: %d", GET_h(input, 1));
NSLog(#"y: %d", GET_h(input, 3));
NSLog(#"z: %d", GET_h(input, 5));
Please be aware how many bytes the different data types occupy. b is just one byte, but h is two. Therefore the offsets are 0, 1, 3 and 5.
This isn't string data. Don't use NSString.
You need to define a structure with an equivalent layout. This is a bit tricky, because the structure you're using is oddly aligned:
struct __attribute__((__packed__)) {
uint8_t b;
uint16_t h1, h2, h3;
} *bhhh = (void *) buffer;
You can now refer to the contents of the field as bhhh->b, bhhh->h1, etc.
Note that the names I'm using are all totally bogus, because I have no idea what the data represents. Don't copy them verbatim.

reading struct in python from created struct in c

I am very new at using Python and very rusty with C, so I apologize in advance for how dumb and/or lost I sound.
I have function in C that creates a .dat file containing data. I am opening the file using Python to read the file. One of the things I need to read are a struct that was created in the C function and printed in binary. In my Python code I am at the appropriate line of the file to read in the struct. I have tried both unpacking the stuct item by item and as a whole without success. Most of the items in the struct were declared 'real' in the C code. I am working on this code with someone else and the main source code is his and has declared the variables as 'real'. I need to put this in a loop because I want to read all of the files in the directory that end in '.dat'. To start the loop I have:
for files in os.listdir(path):
if files.endswith(".dat"):
part = open(path + files, "rb")
for line in part:
Which then I read all of the lines previous to the one containing the struct. Then I get to that line and have:
part_struct = part.readline()
r = struct.unpack('<d8', part_struct[0])
I'm trying to just read the first thing stored in the struct. I saw an example of this somewhere on here. And when I try this I'm getting an error that reads:
struct.error: repeat count given without format specifier
I will take any and all tips someone can give me. I have been stuck on this for a few days and have tried many different things. To be honest, I think I don't understand the struct module but I've read as much as I could on it.
Thanks!
You could use ctypes.Structure or struct.Struct to specify format of the file. To read structures from the file produced by C code in #perreal's answer:
"""
struct { double v; int t; char c;};
"""
from ctypes import *
class YourStruct(Structure):
_fields_ = [('v', c_double),
('t', c_int),
('c', c_char)]
with open('c_structs.bin', 'rb') as file:
result = []
x = YourStruct()
while file.readinto(x) == sizeof(x):
result.append((x.v, x.t, x.c))
print(result)
# -> [(12.100000381469727, 17, 's'), (12.100000381469727, 17, 's'), ...]
See io.BufferedIOBase.readinto(). It is supported in Python 3 but it is undocumented in Python 2.7 for a default file object.
struct.Struct requires to specify padding bytes (x) explicitly:
"""
struct { double v; int t; char c;};
"""
from struct import Struct
x = Struct('dicxxx')
with open('c_structs.bin', 'rb') as file:
result = []
while True:
buf = file.read(x.size)
if len(buf) != x.size:
break
result.append(x.unpack_from(buf))
print(result)
It produces the same output.
To avoid unnecessary copying Array.from_buffer(mmap_file) could be used to get an array of structs from a file:
import mmap # Unix, Windows
from contextlib import closing
with open('c_structs.bin', 'rb') as file:
with closing(mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_COPY)) as mm:
result = (YourStruct * 3).from_buffer(mm) # without copying
print("\n".join(map("{0.v} {0.t} {0.c}".format, result)))
Some C code:
#include <stdio.h>
typedef struct { double v; int t; char c;} save_type;
int main() {
save_type s = { 12.1f, 17, 's'};
FILE *f = fopen("output", "w");
fwrite(&s, sizeof(save_type), 1, f);
fwrite(&s, sizeof(save_type), 1, f);
fwrite(&s, sizeof(save_type), 1, f);
fclose(f);
return 0;
}
Some Python code:
import struct
with open('output', 'rb') as f:
chunk = f.read(16)
while chunk != "":
print len(chunk)
print struct.unpack('dicccc', chunk)
chunk = f.read(16)
Output:
(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')
(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')
(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')
but there is also the padding issue. The padded size of save_type is 16, so we read 3 more characters and ignore them.
A number in the format specifier means a repeat count, but it has to go before the letter, like '<8d'. However you said you just want to read one element of the struct. I guess you just want '<d'. I guess you are trying to specify the number of bytes to read as 8, but you don't need to do that. d assumes that.
I also noticed you are using readline. That seems wrong for reading binary data. It will read until the next carriage return / line feed, which will occur randomly in binary data. What you want to do is use read(size), like this:
part_struct = part.read(8)
r = struct.unpack('<d', part_struct)
Actually, you should be careful, as read can return less data than you request. You need to repeat it if it does.
part_struct = b''
while len(part_struct) < 8:
data = part.read(8 - len(part_struct))
if not data: raise IOException("unexpected end of file")
part_struct += data
r = struct.unpack('<d', part_struct)
I had same problem recently, so I had made module for the task, stored here: http://pastebin.com/XJyZMyHX
example code:
MY_STRUCT="""typedef struct __attribute__ ((__packed__)){
uint8_t u8;
uint16_t u16;
uint32_t u32;
uint64_t u64;
int8_t i8;
int16_t i16;
int32_t i32;
int64_t i64;
long long int lli;
float flt;
double dbl;
char string[12];
uint64_t array[5];
} debugInfo;"""
PACKED_STRUCT='\x01\x00\x01\x00\x00\x01\x00\x00\x00\x00\x00\x01\x00\x00\x00\xff\x00\xff\x00\x00\xff\xff\x00\x00\x00\x00\xff\xff\xff\xff*\x00\x00\x00\x00\x00\x00\x00ff\x06#\x14\xaeG\xe1z\x14\x08#testString\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00'
if __name__ == '__main__':
print "String:"
print depack_bytearray_to_str(PACKED_STRUCT,MY_STRUCT,"<" )
print "Bytes in Stuct:"+str(structSize(MY_STRUCT))
nt=depack_bytearray_to_namedtuple(PACKED_STRUCT,MY_STRUCT,"<" )
print "Named tuple nt:"
print nt
print "nt.string="+nt.string
The result should be:
String:
u8:1
u16:256
u32:65536
u64:4294967296
i8:-1
i16:-256
i32:-65536
i64:-4294967296
lli:42
flt:2.09999990463
dbl:3.01
string:u'testString\x00\x00'
array:(1, 2, 3, 4, 5)
Bytes in Stuct:102
Named tuple nt:
CStruct(u8=1, u16=256, u32=65536, u64=4294967296L, i8=-1, i16=-256, i32=-65536, i64=-4294967296L, lli=42, flt=2.0999999046325684, dbl=3.01, string="u'testString\\x00\\x00'", array=(1, 2, 3, 4, 5))
nt.string=u'testString\x00\x00'
Numpy can be used to read/write binary data. You just need to define a custom np.dtype instance that defines the memory layout of your c-struct.
For example, here is some C++ code defining a struct (should work just as well for C structs, though I'm not a C expert):
struct MyStruct {
uint16_t FieldA;
uint16_t pad16[3];
uint32_t FieldB;
uint32_t pad32[2];
char FieldC[4];
uint64_t FieldD;
uint64_t FieldE;
};
void write_struct(const std::string& fname, MyStruct h) {
// This function serializes a MyStruct instance and
// writes the binary data to disk.
std::ofstream ofp(fname, std::ios::out | std::ios::binary);
ofp.write(reinterpret_cast<const char*>(&h), sizeof(h));
}
Based on the advice I found at stackoverflow.com/a/5397638, I've included some padding in the struct (the pad16 and pad32 fields) so that serialization will happen in a more predictable way. I think that this is a C++ thing; it might not be necessary when using plain ol' C structs.
Now, in python, we create a numpy.dtype object describing the memory-layout of MyStruct:
import numpy as np
my_struct_dtype = np.dtype([
("FieldA" , np.uint16 , ),
("pad16" , np.uint16 , (3,) ),
("FieldB" , np.uint32 ),
("pad32" , np.uint32 , (2,) ),
("FieldC" , np.byte , (4,) ),
("FieldD" , np.uint64 ),
("FieldE" , np.uint64 ),
])
Then use numpy's fromfile to read the binary file where you've saved your c-struct:
# read data
struct_data = np.fromfile(fpath, dtype=my_struct_dtype, count=1)[0]
FieldA = struct_data["FieldA"]
FieldB = struct_data["FieldB"]
FieldC = struct_data["FieldC"]
FieldD = struct_data["FieldD"]
FieldE = struct_data["FieldE"]
if FieldA != expected_value_A:
raise ValueError("Bad FieldA, got %d" % FieldA)
if FieldB != expected_value_B:
raise ValueError("Bad FieldB, got %d" % FieldB)
if FieldC.tobytes() != b"expc":
raise ValueError("Bad FieldC, got %s" % FieldC.tobytes().decode())
...
The count=1 argument in the above call np.fromfile(..., count=1) is so that the returned array will have only one element; this means "read the first struct instance from the file". Note that I am indexing [0] to get that element out of the array.
If you have appended the data from many c-structs to the same file, you can use fromfile(..., count=n) to read n struct instances into a numpy array of shape (n,). Setting count=-1, which is the default for the np.fromfile and np.frombuffer functions, means "read all of the data", resulting in a 1-dimensional array of shape (number_of_struct_instances,).
You can also use the offset keyword argument to np.fromfile to control where in the file the data read will begin.
To conclude, here are some numpy functions that will be useful once your custom dtype has been defined:
Reading binary data as a numpy array:
np.frombuffer(bytes_data, dtype=...):
Interpret the given binary data (e.g. a python bytes instance)
as a numpy array of the given dtype. You can define a custom
dtype that describes the memory layout of your c struct.
np.fromfile(filename, dtype=...):
Read binary data from filename. Should be the same result as
np.frombuffer(open(filename, "rb").read(), dtype=...).
Writing a numpy array as binary data:
ndarray.tobytes():
Construct a python bytes instance containing
raw data from the given numpy array. If the array's data has dtype
corresponding to a c-struct, then the bytes coming from
ndarray.tobytes can be deserialized
by c/c++ and interpreted as an (array of) instances of that c-struct.
ndarray.tofile(filename):
Binary data from the array is written to filename.
This data could then be deserialized by c/c++.
Equivalent to open("filename", "wb").write(a.tobytes()).

Python Fast Input Output Using Buffer Competitive Programming

I have seen people using buffer in different languages for fast input/output in Online Judges. For example this http://www.spoj.pl/problems/INTEST/ is done with C like this:
#include <stdio.h>
#define size 50000
int main (void){
unsigned int n=0,k,t;
char buff[size];
unsigned int divisible=0;
int block_read=0;
int j;
t=0;
scanf("%lu %lu\n",&t,&k);
while(t){
block_read =fread(buff,1,size,stdin);
for(j=0;j<block_read;j++){
if(buff[j]=='\n'){
t--;
if(n%k==0){
divisible++;
}
n=0;
}
else{
n = n*10 + (buff[j] - '0');
}
}
}
printf("%d",divisible);
return 0;
How can this be done with python?
import sys
file = sys.stdin
size = 50000
t = 0
while(t != 0)
block_read = file.read(size)
...
...
Most probably this will not increase performance though – Python is interpreted language, so you basically want to spend as much time in native code (standard library input/parsing routines in this case) as possible.
TL;DR either use built-in routines to parse integers or get some sort of 3rd party library which is optimized for speed.
I tried solving this one in Python 3 and couldn't get it to work no matter how I tried reading the input. I then switched to running it under Python 2.5 so I could use
import psyco
psyco.full()
After making that change I was able to get it to work by simply reading input from sys.stdin one line at a time in a for loop. I read the first line using raw_input() and parsed the values of n and k, then used the following loop to read the remainder of the input.
for line in sys.stdin:
count += not int(line) % k

export matlab variable to text for python usage

So let's start off by saying I'm a total beginner in matlab. I'm working with python and now I've recieved some data in a matlab file that I need to export to a format I could use with python.
I've googled around and found I can export a matlab variable to a text file using:
dlmwrite('my_text', MyVariable, 'delimiter' , ',');
Now the variable I need to export is a 16000 x 4000 matrix of doubles of the form 0.006747668446927. Now here is where the problem starts. I need to export the full values for each double. Trying with that function lead me to export the numbers in a format of 0.0067477. This won't do since I need a whole lot more of precision for what I'm doing. So how can I export the full values of each of these variables? Or if you have a more elegant way of using that huge matlab matrix in python please feel free.
Regards,
Bogdan
To exchange big chunks of numerical data between Python and Matlab I
recommend HDF5
http://en.wikipedia.org/wiki/Hierarchical_Data_Format
The Python binding is called h5py
http://code.google.com/p/h5py
Here are two examples for both directions. First from
Matlab to Python
% matlab
points = [1 2 3 ; 4 5 6 ; 7 8 9 ; 10 11 12 ];
hdf5write('test.h5', '/Points', points);
# python
import h5py
with h5py.File('test.h5', 'r') as f:
points = f['/Points'].value
And now from Python to Matlab
# python
import h5py
import numpy
points = numpy.array([ [1., 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12] ])
with h5py.File('test.h5', 'w') as f:
f['/Points'] = points
% matlab
points = hdf5read('test.h5', '/Points');
NOTE A column in Matlab will come out as a row in Python and vice versa. This isn't a bug but the difference between the way C and Fortran interpret a continuous piece of data in memory.
Scipy has tools for reading MATLAB .mat files natively: see e.g. http://www.janeriksolem.net/2009/05/reading-and-writing-mat-files-with.html
While I like the hdf5 based answer, I still think text files and CSVs are nice for smaller things (you can open them in text editors, spreadsheets whatever). In that case I would use MATLABs fopen/fprintf/fclose rather than dlmwrite - I like to make things explicit. Then again, this dlmwrite might be better for multi-dimensional arrays.
You can simply write your variable to file as binary data, then read it in any language you want, be it MATLAB, Python, C, etc.. Example:
MATLAB (write)
X = rand([100 1],'single');
fid = fopen('file.bin', 'wb');
count = fwrite(fid, X, 'single');
fclose(fid);
MATLAB (read)
fid = fopen('file.bin', 'rb');
data = fread(fid, Inf, 'single=>single');
fclose(fid);
Python (read)
import struct
data = []
f = open("file.bin", "rb")
try:
# read 4 bytes at a time (float)
bytes = f.read(4) # returns a sequence of bytes as a string
while bytes != "":
# string byte-sequence to float
num = struct.unpack('f',bytes)[0]
# append to list
data.append(num);
# read next 4 bytes
bytes = f.read(4)
finally:
f.close()
# print list
print data
C (read)
#include <stdio.h>
#include <stdlib.h>
int main()
{
FILE *fp = fopen("file.bin", "rb");
// Determine size of file
fseek(fp, 0, SEEK_END);
long int lsize = ftell(fp);
rewind(fp);
// Allocate memory, and read file
float *numbers = (float*) malloc(lsize);
size_t count = fread(numbers, 1, lsize, fp);
fclose(fp);
// print data
int i;
int numFloats = lsize / sizeof(float);
for (i=0; i<numFloats; i+=1) {
printf("%g\n", numbers[i]);
}
return 0;
}

Categories