First of all, I just started python yet I really tried hard to find what fits for me. The thing I am going to do is a simple file system for linux but to tell the truth I don't even sure if it is achievable with python. So I need a bit help of here.
I tried to create a class structure and named tuples (one at a time which one fits) and I decided classes would be better for me. The thing is I couldn't read byte by byte because of the size of my class was 888 while in C it was 44 (I used sys.getsizeof() there) It will be more understand what I want to achieve with some code below
For this structure
struct sb{
int inode_bitmap;
int data_bitmap[10];
};
I used
#SUPER BLOCK
class sb(object):
__slots__ = ['inode_bitmap', 'data_bitmap'] #REDUCE RAM USAGE
def __init__(bruh, inode_bitmap, data_bitmap):
bruh.inode_bitmap = inode_bitmap
bruh.data_bitmap = [None] * 10 #DEFINITION OF ARRAY
Everything was fine till I read it
FILE * fin = fopen("simplefs.bin", "r");
struct inode slash;
fseek(fin, sizeof(struct sb), SEEK_SET);
fread(&slash,sizeof(slash),1,fin);
fin = open("simplefs.bin", "rb")
slash = inode
print("pos:", fin.tell())
contents = fin.read(sys.getsizeof(sb))
print(contents)
Since actual file size was smth like 4800 however when I was reading the size was approximately 318
I am pretty aware that python is not C but I am just doing some experiments if it is achievable
You cannot design a struct and then try to read/write it to the file and expect it to be binary identical. If you want to parse any binary data, you have module struct that allows you to interpret the data you have read as int, float and a dozen of other formats. Still you have to write the formats manually. In your particular case:
import struct
with ('datafile.dat') as fin :
raw_data = fin.read()
data = struct.unpack_from( '11I', raw_data ) # 11 integers
inode_bitmap = data[0]
data_bitmap = data[1:]
Or something along the lines...
Related
I have tried using the Pydub library; however, it only allows the reduction or increase of a certain amount of decibels. How would I proceed if I wanted, for example, to reduce the volume of the wav by a certain percent?
This is simple enough to just do with just the tools in the stdlib.
First, you use wave to open the input file and create an output file:
pathout = os.path.splitext(path)[0] + '-quiet.wav'
with wave.open(path, 'rb') as fin, wave.open(pathout, 'wb') as fout:
Now, you have to copy over all the wave params, and hold onto the sample width for later:
fout.setparams(fin.getparams())
sampwidth = fin.getsampwidth()
Then you loop over frames until done:
while True:
frames = bytearray(fin.readframes(1024))
if not frames:
break
You can use audioop to process this data:
frames = audioop.mul(frames, sampwidth, factor)
… but this will only work for 16-bit little-endian signed LPCM wave files (the most common kind, but not the only kind). You could solve that with other functions—most importantly, lin2lin to handle 8-bit unsigned LPCM (the second most common kind). But it's worth understanding how to do it manually:
for i in range(0, len(frames), sampwidth):
if sampwidth == 1:
# 8-bit unsigned
frames[i] = int(round((sample[0] - 128) * factor + 128)
else:
# 16-, 24-, or 32-bit signed
sample = int.from_bytes(frames[i:i+sampwidth], 'little', signed=True)
quiet = round(sample * factor)
frames[i:i+sampwidth] = int(quiet).to_bytes(sampwidth, 'little', signed=True)
audioop.mul only handles the else part—but it does more than I've done here. In particular, it has to handle cases with factors over 1—a naive multiply would clip, which will just add weird distortion without adding the desired max energy. (It's worth reading the pure Python implementation from PyPy if you want to learn the basics of this stuff.)
If you also want to handle float32 files, you need to look at the format, because they have the same sampwidth as int32, and you'll probably want the struct module or the array module to pack and unpack them. If you want to handle even less common formats, like a-law and µ-law, you'll need to read a more detailed format spec. Notice that audioop has tools for handling most of them, like ulaw2lin to convert µ-law to LPCM so you can process it and convert it back—but again, it might be worth learning how to do it manually. And for some of them, like CoolEdit float24/32, you pretty much have to do it manually.
Anyway, once you've got the quieted frames, you just write them out:
fout.writeframes(frames)
You could use the mul function from the built-in audioop module. This is what pydub uses internally, after converting the decibel value to a multiplication factor.
I have a kernel module in C which reads continuously data from a photiodiode and writes the values together with current time into a memory mapped file. From the C program in ther user space I can access the data from the kernel. I tried to do the same in Python via the mmap functionality. However, when I try to mmap the file I get errors like "mmap length is greater than file size" or "mmap file is empty". It seems that Python cannot access the mmaped file from C, is that correct? In the end, I need a numpy array of the photodiode data for further processing.
Details about kernel data structure:
The mmap contains a struct with index to latest voltage values and a struct array with voltage and time. The kernel has one big struct array and writes the photodiode data in chunks of page size into the struct array. The C user space program reads then each chunk for futher processing.
Python code to read mmaped C file:
num_pages = 103
page_size = 10000
max_buffer_size = num_pages * page_size
class buf_element(ctypes.Structure):
_fields_ = [("voltage", ctypes.c_int),
("time", ctypes.c_uint)]
class data(ctypes.Structure):
_fields_ = [("latest_page_offset", ctypes.c_int),
("buffer", ctypes.POINTER(buf_element))]
length_data = ctypes.sizeof(ctypes.c_int) + max_buffer_size * ctypes.sizeof(buf_element);
fd = os.open(data_file, os.O_RDWR)
buf = mmap.mmap(fd, length_data, mmap.MAP_SHARED, mmap.PROT_READ)
test_data = data.from_buffer(buf)
print test_data.latest_page_offset
os.close(fd)
My idea was to use the already existing and working C code from python via C extensions. So, python calls C and hand over a numpy array and C writes the data into that. Is that the fastest way? Other recommendations?
To get it working, I use the C code via Cython from Python.
I am trying to apply _pickle to save data onto disk. But when calling _pickle.dump, I got an error
OverflowError: cannot serialize a bytes object larger than 4 GiB
Is this a hard limitation to use _pickle? (cPickle for python2)
Not anymore in Python 3.4 which has PEP 3154 and Pickle 4.0
https://www.python.org/dev/peps/pep-3154/
But you need to say you want to use version 4 of the protocol:
https://docs.python.org/3/library/pickle.html
pickle.dump(d, open("file", 'w'), protocol=4)
Yes, this is a hard-coded limit; from save_bytes function:
else if (size <= 0xffffffffL) {
// ...
}
else {
PyErr_SetString(PyExc_OverflowError,
"cannot serialize a bytes object larger than 4 GiB");
return -1; /* string too large */
}
The protocol uses 4 bytes to write the size of the object to disk, which means you can only track sizes of up to 232 == 4GB.
If you can break up the bytes object into multiple objects, each smaller than 4GB, you can still save the data to a pickle, of course.
There is a great answers above for why pickle doesn't work.
But it still doesn't work for Python 2.7, which is a problem
if you are are still at Python 2.7 and want to support large
files, especially NumPy (NumPy arrays over 4G fail).
You can use OC serialization, which has been updated to work for data over
4Gig. There is a Python C Extension module available from:
http://www.picklingtools.com/Downloads
Take a look at the Documentation:
http://www.picklingtools.com/html/faq.html#python-c-extension-modules-new-as-of-picklingtools-1-6-0-and-1-3-3
But, here's a quick summary: there's ocdumps and ocloads, very much like
pickle's dumps and loads::
from pyocser import ocdumps, ocloads
ser = ocdumps(pyobject) : Serialize pyobject into string ser
pyobject = ocloads(ser) : Deserialize from string ser into pyobject
The OC Serialization is 1.5-2x faster and also works with C++ (if you are mixing langauges). It works with all built-in types, but not classes
(partly because it is cross-language and it's hard to build C++ classes
from Python).
I am a very beginner in programming and I use Ubuntu.
But now I am trying to perform sound analysis with Python.
In the following code I used wav package to open the wav file and the struct to convert the information:
from wav import *
from struct import *
fp = wave.open(sound.wav, "rb")
total_num_samps = fp.getnframes()
num_fft = (total_num_samps / 512) - 2 #for a fft lenght of 512
for i in range(num_fft):
tempb = fp.readframes(512);
tempb2 = struct.unpack('f', tempb)
print (tempb2)
So in terminal the message that appears is:
struct.error: unpack requires a string argument of length 4
Please, can someone help me to solve this? Someone have a suggestion of other strategy to interpret the sound file?
The format string provided to struct has to tell it exactly the format of the second argument. For example, "there are one hundred and three unsigned shorts". The way you've written it, the format string says "there is exactly one float". But then you provide it a string with way more data than that, and it barfs.
So issue one is that you need to specify the exact number of packed c types in your byte string. In this case, 512 (the number of frames) times the number of channels (likely 2, but your code doesn't take this into account).
The second issue is that your .wav file simply doesn't contain floats. If it's 8-bit, it contains unsigned chars, if it's 16 bit it contains signed shorts, etc. You can check the actual sample width for your .wav by doing fp.getsampwidth().
So then: let's assume you have 512 frames of two-channel 16 bit audio; you would write the call to struct as something like:
channels = fp.getnchannels()
...
tempb = fp.readframes(512);
tempb2 = struct.unpack('{}h'.format(512*channels), tempb)
Using SciPy, you could load the .wav file into a NumPy array using:
import scipy.io.wavfile as wavfile
sample_rate, data = wavfile.read(FILENAME)
NumPy/SciPy will also be useful for computing the FFT.
Tips:
On Ubuntu, you can install NumPy/SciPy with
sudo apt-get install python-scipy
This will install NumPy as well, since NumPy is a dependency of SciPy.
Avoid using * imports such as from struct import *. This copies
names from the struct namespace into the current module's global
namespace. Although it saves you a bit of typing, you pay an awful
price later when the script becomes more complex and you lose
track of where variables are coming from (or worse, the imported variables
mask the value of other variables with the same name).
I'm reading an image file of dpx format, and want to extract the "Orientation" in the image section of the header, and also modify it. I have never tried to interpret binary data, so I'm a bit at a loss. I'm trying to use the struct module, but I really don't know how to do it properly. The file header specification is here:
http://www.fileformat.info/format/dpx/egff.htm
Thanks.
There seems to be a constant offset to the Orientation so if this is all you want to change then I wouldn't bother trying to parse it all, just work out the offset (which I think is just the size of the GENERICFILEHEADER plus one byte for the high byte of the orientation word) and read / manipulate it directly.
Using a bytearray would be my first choice. The offset varies by one depending on if it's in a big or little endian format, so something like this might work for you:
b = bytearray(your_byte_data)
big_endian = (b[0] == 0x52)
offset = 768 + big_endian
current_orientation = b[offset] # get current orientation
b[offset] = new_offset # set it to something new
open('out_file', 'wb').write(b)
You might want to consider using Imagemagic to do that. Open source and supports the dpx format.
The Python Imaging Library PIL has am attribute .info that might return the relevant data