Read 16bit byte array into integers without truncation - python

I've got an I2S microphone connected to a microcontroller and have managed to dump 16-bit audio WAV audio to a python bytearray object that looks like this (using a micropython library):
raw = bytearray(b"\xac\xffW\x00\xfc\xfe\xac\xffs\xfe\xfc\xfe+\xfes\xfe7\xfe+\xfe\x8c\xfe7\xfe\x1f\xff\x8c\xfe\xcf\xff\x1f\xfft\x00\xcf\xff\xfb\x00t\x00?\x01\xfb...")
I have successfully written these bytearray dumps to a file I created like this:
wav = open('16bitaudio.wav','wb')
#....some code to write the wav header
wav.write(raw)
wav.close()
When I open this on my PC it plays the samples I've recorded faithfully, sounds great.
My issue comes - I want to translate this data to an integer which represents the average intensity of sound in my samples. I first attempted to do this:
intensity = sum(raw)/count(raw)
However, this tends to result in a number ~128 almost all the time - suggesting to me it's being read as random bytes. Upon further investigation, these array functions seem to assume that we've only got an 8 bit byte (reading the value b'\xffW' which I believe is a little endian 22527):
>>> int(raw[1])
255
which appears to be just the b'\xff' part.
I can get my expected value by parsing just the byte manyally into int.from_bytes:
>>> int.from_bytes(b'\xffW','little')
22527
However I can't seem to iterate through the bytearray without it truncating to 8-bit.
Finally, I have read the struct.unpack methods - which look OK, but I'm not sure bytearray get packed with bytes of consistent length.... e.g.:
>>> len(bytearray(b'\xfdo\xfe\x7f\xfd\xd3\xf1d'))
8
Even though I see only 6 bytes represented. The ultimate problem I have with unpacking I'm not sure if each byte is 8 or 16 bit ahead of time, so I don't know what combination of letters to use in the second argument...
So, given the b-string representation, it seems that python DOES have knowledge of the way the bytes are encoded, however it seems like the normal array functions I've got on hand are getting this info from the bytearray. I'm sure there is a pythonic way to parse this bytearray to integers, but I just can't find it...
Any help extremely appreciated

Thanks #juanpa.arrivillaga for the answer I was looking for. I used the array library which seemed to solve all of my problems:
import array
result = array.array('h', raw)
Graphing the values output here is the same as the oscilloscope for my audio file.... Cheers!
integers
oscilloscope of my working .wav

Related

Python Modbus String decoding issue

I have an issue with the pymodbus decoder with strings. For example, when I try to read 'abcdefg' pymodbus gives me 'badcfehg'. The byteorder and the wordorder don't change the result.
Here is my code:
result=client.read_holding_registers(25000,4)
decoder = BinaryPayloadDecoder.fromRegisters(result.registers,byteorder=Endian.Little,wordorder=Endian.Big)
decoder.decode_string(8)
Can someone explain why the order does not change the result? I try with the builder and it's the same problem. However, I don't have this problem with 32 bits floats for example.
I also tried with an older version of pymodbus and it works:
decoder = BinaryPayloadDecoder.fromRegisters(registers,endian=Endian.Little)
Note: I already read the following topic: pymodbus: Issue reading String & multiple type of data from Modbus device but I don't have any access to the modbus server.
The problem is that Modbus specs does not define in what order the two bytes for char strings are sent or even in what order 16-bit words are sent for 32-bit types.
Then some Modbus devices send bytes or words in an order and others do the opposite.
If you are writing a Modbus client then you should add the option in the configuration to be able to invert the order of both bytes and 16-bit words in 32-bit data types.
https://en.wikipedia.org/wiki/Endianness
I've encountered this same issue with byteorder not working (wordorder seems to be fine). The solution I came up with is to use Struct:
import struct
Then:
count = 4 #Read 4 16bit registers
result = client.read_holding_registers(25000,count)
for i in range(count):
result.registers[i] = struct.unpack("<H", struct.pack(">H", result.registers[i]))[0]
decoder = BinaryPayloadDecoder.fromRegisters(result.registers)
print(decoder.decode_string(7)) #Since string is 7 characters long
This uses Struct to unpack and pack as an unsigned short integer. The endianness does not matter since all you're doing is swapping bytes. The result overwrites the registers so you can then use the BinaryPayloadDecoder as you normally would.
I would have preferred to iterate through the responses instead of using range(count), but couldn't find a way to do it and wanted to post this workaround. If I figure it out, I will let you know.

read a .h264 file

I'll be happy for your help with some problem that I have.
Goal: To read a .h264 file (I extracted the raw bitstream to a file using ffmpeg) using python, and save it in some data structure (probably a list, I'll be happy for suggestions).
I want to read the data as hexa, for example I'll show how the data looks like:
What I want is to feed each byte(2 hexa digits), into a list, or some other data structure.
But any step forward will help me.
My Attempts:
First I tried to read the way I know:
with open(path, 'r') as fp:
data = fp.read()
Didn't work, got just ".
After a lot of changes, I tried something else, I saw online:
with open(path, 'r') as fp:
hex_list = ["{:02}".format(ord(c)) for c in fp.read()]
Still got an empty list.
I'll be happy for you help.
Thanks a lot.
EDIT:
Thanks to the comment below, I tried to open using 'rb', but still with no luck.
If you have an h264 mp4 file, you can open it and get a hexadecimal string representation like this using binascii.hexlify():
import binascii
with open('test.mp4', 'rb') as fin:
hexa = binascii.hexlify(fin.read())
print(hexa[0:1000])
hexa will be a python bytes object, and you can easily get back the binary representation by doing binascii.unhexlify(hexa). This will be much more efficient than storing the hex representation as strings in a list(), both in terms of space and time. You can access the bytes array with indices/slices, so whatever you were intending to do with the list will probably work fine with this (it will just be much faster and use a lot less memory).
One thing to keep in mind though is to get the the first hexadecimal digit from a bytes object, you don't do hexa[0], but rather hexa[0:1]. To get the first pair of hexadecimal digits (byte), you do: hexa[0:2]. The second byte is hexa[2:4] etc. As explained in the docs for hex():
Since bytes objects are sequences of integers (akin to a tuple), for a
bytes object b, b[0] will be an integer, while b[0:1] will be a bytes
object of length 1. (This contrasts with text strings, where both
indexing and slicing will produce a string of length 1)

Passing a sequence of bits to a file python

As a part of a bigger project, I want to save a sequence of bits in a file so that the file is as small as possible. I'm not talking about compression, I want to save the sequence as it is but using the least amount of characters. The initial idea was to turn mini-sequences of 8 bits into chars using ASCII encoding and saving those chars, but due to some unknown problem with strange characters, the characters retrieved when reading the file are not the same that were originally written. I've tried opening the file with utf-8 encoding, latin-1 but none seems to work. I'm wondering if there's any other way, maybe by turning the sequence into a hexadecimal number?
technically you can not write less than a byte because the os organizes memory in bytes (write individual bits to a file in python), so this is binary file io, see https://docs.python.org/2/library/io.html there are modules like struct
open the file with the 'b' switch, indicates binary read/write operation, then use i.e. the to_bytes() function (Writing bits to a binary file) or struct.pack() (How to write individual bits to a text file in python?)
with open('somefile.bin', 'wb') as f:
import struct
>>> struct.pack("h", 824)
'8\x03'
>>> bits = "10111111111111111011110"
>>> int(bits[::-1], 2).to_bytes(4, 'little')
b'\xfd\xff=\x00'
if you want to get around the 8 bit (byte) structure of the memory you can use bit manipulation and techniques like bitmasks and BitArrays
see https://wiki.python.org/moin/BitManipulation and https://wiki.python.org/moin/BitArrays
however the problem is, as you said, to read back the data if you use BitArrays of differing length i.e. to store a decimal 7 you need 3 bit 0x111 to store a decimal 2 you need 2 bit 0x10. now the problem is to read this back.
how can your program know if it has to read the value back as a 3 bit value or as a 2 bit value ? in unorganized memory the sequence decimal 72 looks like 11110 that translates to 111|10 so how can your program know where the | is ?
in normal byte ordered memory decimal 72 is 0000011100000010 -> 00000111|00000010 this has the advantage that it is clear where the | is
this is why memory on its lowest level is organized in fixed clusters of 8 bit = 1 byte. if you want to access single bits inside a bytes/ 8 bit clusters you can use bitmasks in combination with logic operators (http://www.learncpp.com/cpp-tutorial/3-8a-bit-flags-and-bit-masks/). in python the easiest way for single bit manipulation is the module ctypes
if you know that your values are all 6 bit maybe it is worth the effort, however this is also tough...
(How do you set, clear, and toggle a single bit?)
(Why can't you do bitwise operations on pointer in C, and is there a way around this?)

python proper way to decode raw udp packet

I'm making a script to get Valve's server information (players online, map, etc)
the packet I get when I request for information is this:
'\xff\xff\xff\xffI\x11Stargate Central CAP SBEP\x00sb_wuwgalaxy_fix\x00garrysmod\x00Spacebuild\x00\xa0\x0f\n\x0c\x00dw\x00\x0114.09.08\x00\xb1\x87i\x06\xb4g\x17.\x15#\x01gm:spacebuild3\x00\xa0\x0f\x00\x00\x00\x00\x00\x00'
This may help you to see what I'm trying to do https://developer.valvesoftware.com/wiki/Server_queries#A2S_INFO
The problem is, I don't know how to decode this properly, it's easy to get the string but I have no idea how to get other types like byte and short
for example '\xa0\x0f'
For now I'm doing multiple split but do you know if there is any better way of doing this?
Python has functions for encoding/decoding different data types into bytes. Take a look at the struct package, the functions struct.pack() and struct.unpack() are your friends there.
taken from https://docs.python.org/2/library/struct.html
>>> from struct import *
>>> pack('hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
The first argument of the unpack function defines the format of the data stored in the second argument. Now you need to translate the description given by valve to a format string. If you wanted to unpack 2 bytes and a short from a data string (that would have a length of 4 bytes, of course), you could do something like this:
(first_byte, second_byte, the_short) = unpack("cc!h", data)
You'll have to take care yourself, to get the correct part of the data string (and I don't know if those numbers are signed or not, be sure to take care of that, too).
The strings you'll have to do differently (they are null-terminated here, so start were you know a string starts and read to the first "\0" byte).
pack() work's the other way around and stores data in a byte string. Take a look at the examples on the python doc and play around with it a bit to get a feel for it (when a tuple is returned/needed, e.g.).
struct supports you in getting the right byte order, which most of the time is network byte order and different from your system. That is of course only necessary for multi byte integers (like short) - so a format string of `"!h" should unpack a short correctly.

Using struct.unpack() without knowing anything about the string

I need to parse a big-endian binary file and convert it to little-endian. However, the people who have handed the file over to me seem unable to tell me anything about what data types it contains, or how it is organized — the only thing they know for certain is that it is a big-endian binary file with some old data. The function struct.unpack(), however, requires a format character as its first argument.
This is the first line of the binary file:
import binascii
path = "BC2003_lr_m32_chab_Im.ised"
with open(path, 'rb') as fd:
line = fd.readline()
print binascii.hexlify(line)
a0040000dd0000000000000080e2f54780f1094840c61a4800a92d48c0d9424840a05a48404d7548e09d8948a0689a48e03fad48a063c248c01bda48c0b8f448804a0949100b1a49e0d62c49e0ed41499097594900247449a0a57f4900d98549b0278c49a0c2924990ad9949a0eba049e080a8490072b049c0c2b849d077c1493096ca494022d449a021de49a099e849e08ff349500a
Is it possible to change the endianness of a file without knowing anything about it?
You cannot do this without knowing the datatypes. There is little point in attempting to do so otherwise.
Even if it was a homogeneous sequence of one datatype, you'd still need to know what you are dealing with; flipping the byte order in double values is very different from short integers.
Take a look at the formatting characters table; anything with a different byte size in it will result in a different set of bytes being swapped; for double values, you need to reverse the order of every 8 bytes, for example.
If you know what data should be in the file, then at least you have a starting point; you'd have to puzzle out how those values fit into the bytes given. It'll be a puzzle, but with a target set of values you can build a map of the datatypes contained, then write a byte-order adjustment script. If you don't even have that, best not to start as the task is impossible to achieve.

Categories