I have a script that reads some data from a binary stream of "packets" containing "parameters". The parameters read are stored in a dictionary for each packet, which is appended to an array representing the packet stream.
At the end this array of dict is written to an output CSV file.
Among the read data is a CUC7 datetime, stored as coarse/fine integer parts of a GPS time, which I also want to convert to a UTC ISO string.
from astropy.time import Time
def cuc2gps_time(coarse, fine):
return Time(coarse + fine / (2**24), format='gps')
def gps2utc_time(gps):
return Time(gps, format='isot', scale='utc')
The issue is that I realized that these two time conversions make up 90% of the total run time of my script, most of the work of my script being done in the 10% remaining (read binary file, decode 15 other parameters, write to CSV).
I somehow improved the situation by making the conversions in batches on Numpy arrays, instead of on each packet. This reduces the total runtime by about half.
import numpy as np
while end_not_reached:
# Read 1 packet
# (...)
nb_packets += 1
end_not_reached = ... # boolean
# Process in batches for better performance
if not nb_packets%1000 or not end_not_reached:
# Convert CUC7 time to GPS and UTC times
all_coarse = np.array([packet['lobt_coarse'] for packet in packets])
all_fine = np.array([packet['lobt_fine'] for packet in packets])
all_gps = cuc2gps_time(all_coarse, all_fine)
all_utc = gps2utc_time(all_gps)
# Add times to each packet
for packet, gps_time, utc_time in zip(packets, all_gps, all_utc):
packet.update({'gps_time': gps_time, 'utc_time': utc_time})
But my script is still absurdly slow. Reading 60000 packets from a 1.2GB file and writing it as a CSV takes 12s, against only 2.5s if I remove the time conversion.
So:
Is it expected that Astropy time conversions are so slow? Am I using it wrong? Is there a better library?
Is there a way to improve my current implementation? I suspect that the remaining "for" loop in there is very costly, but could not find a good way to replace it.
I think the problem is that you're doing multiple loops over your sequence of packets. In Python I would recommend having arrays representing each parameter, instead of having a list of objects, each with a bunch of scalar parameters.
If you can read all the packets in at once, I would recommend something like:
num_bytes = ...
num_bytes_per_packet = ...
num_packets = num_bytes / num_bytes_per_packet
param1 = np.empty(num_packets)
param2 = np.empty(num_packets)
...
time_coarse = np.empty(num_packets)
time_fine = np.empty(num_packets)
...
param_N = np.empty(num_packets)
for i in range(num_packets):
param_1[i], param_2[i], ..., time_coarse[i], time_fine[i], ... param_N[i] = decode_packet(...)
time_gps = Time(time_coarse + time_fine / (2**24), format='gps')
time_utc = time_gps.utc.isot
Related
I faced the module Struct for the first time and my code gives me an error: "unpack requires a buffer of 1486080 bytes"
Here is my code:
def speed_up(n):
source = wave.open('sound.wav', mode='rb')
dest = wave.open('out.wav', mode='wb')
dest.setparams(source.getparams())
frames_count = source.getnframes()
data = struct.unpack("<" + str(frames_count) + "h", source.readframes(frames_count))
new_data = []
for i in range(0, len(data), n):
new_data.append(data[i])
newframes = struct.pack('<' + str(len(new_data)) + 'h', new_data)
dest.writeframes(newframes)
source.close()
dest.close()
How to figure out which format should I use?
The issue in your code is that you're providing struct.unpack with the wrong number of bytes. This is because of your usage of the wave module: Each frame in a wave file has getnchannels() samples, so when calling readframes(n) you will get back n * getnchannels() samples and this is the number you'd have to pass to struct.unpack.
To make your code more robust, you'd also need to look at getsampwidth() and use an appropriate format character, but the vast majority of wave files are 16-bit.
In the comments you also mentioned that the code didn't work after adding print(len(source.readframes(frames_count))). You didn't show the full code but I assume this is because you called readframes twice without calling rewind, so the second call didn't have any more data to return. It would be best to store the result in a variable if you want to use it in multiple lines.
I am generating strings in Julia to use in Python. I would like to use Shared Memory (InterProcessCommunication.jl and Multiprocessing in Python). Currently, Julia generates strings then sends them to Python, which then reads the first number (so determine string length) before converting the rest into an encoded string.
I thought that shared memory would be much faster, but my method of timing (see below) seems to give 60-65 micros to:
Send the string and string length
Detect change in python, read the message and convert to bytes.
Send back an indication for julia to detect.
I am using Ubuntu. Comparatively, using TCP sockets gives 200 micros (so only a 3x speedup).
GSTiming comes from here:
#How can I get millisecond and microsecond-resolution timestamps in Python?
Julia:
using InterProcessCommunication
using Random
function copy_string_to_shared_memory(Aptr::Ptr{UInt8}, p::Vector{UInt8})
for i = 1:length(p)
unsafe_store!(Aptr, p[i], i + 4)
end
end
function main()
A = SharedMemory("myid"; readonly=false)
Aptr = convert(Ptr{UInt8}, pointer(A))
Bptr = convert(Ptr{UInt32}, pointer(A))
u = []
for i = 1:105
# p = Vector{UInt8}(randstring(rand(50:511)))
p = Vector{UInt8}("Hello Stack Exchange" * randstring(rand(1:430)))
a = time_ns()
copy_string_to_shared_memory(Aptr, p)
unsafe_store!(Bptr, length(p), 1)
# Make sure we can write to it
while unsafe_load(Aptr, 1) != 1
nanosleep(10e-7)
end
b = time_ns()
println((b - a) / 1000) # How i get times
sleep(0.01)
end
end
main()
Python Code:
from multiprocessing import shared_memory
import array
import time
import GSTiming
def main():
shm_a = shared_memory.SharedMemory("myid", create=True, size=512)
shm_a.buf[0] = 1 # Modify single byte at a time
u = []
for i in range(105):
while shm_a.buf[0] == 1:
GSTiming.delayMicroseconds(1)
s = bytes(shm_a.buf[4:shm_a.buf[0]+4])
shm_a.buf[0] = 1
shm_a.close()
shm_a.unlink() # Call unlink only once to release the shared memory
main()
I'm currently working on a project that involves some very remote data gathering. At the end of each day, a very short summary is sent back to the server via a satellite connection.
Because sending data over the satellite is very expensive, I want the data to be as compact as possible. Additionally, the service I'm using only allows for sending data in the following two formats: ASCII, hexadecimal. Most of the data I will be sending consists of floats, where the precision should be as high as possible without taking up too much space.
Below I have a working version of what I'm currently using, but there should be a more efficient way to store the data. Any help would be much appreciated.
import ctypes
#------------------------------------------------------------------
#This part is known to the client as well as the server
class my_struct(ctypes.Structure):
_pack_ = 1
_fields_ = [('someText', ctypes.c_char * 12),
('underRange', ctypes.c_float),
('overRange', ctypes.c_float),
('TXPDO', ctypes.c_float)]
def print_struct(filled_struct):
d = filled_struct
for name,typ in d._fields_:
value = getattr(d, name)
print('{:20} {:20} {}'.format(name, str(value), str(typ)))
#------------------------------------------------------------------
# This part of the code is performed on the client side
#Filling the struct with some random data, real data will come from sensors
s = my_struct()
s.someText = 'Hello World!'.encode()
s.underRange = 4.01234
s.overRange = 4.012345
s.TXPDO = 1.23456789
#Rounding errors occurred when converting to ctypes.c_float
print('Data sent:')
print_struct(s)
data = bytes(s) #Total length is 24 bytes (12 for the string + 3x4 for the floats)
data_hex = data.hex() #Total length is 48 bytes in hexadecimal format
#Now the data is sent over a satellite connection, it should be as small as possible
print('\nLength of sent data: ',len(data_hex),'bytes\n')
#------------------------------------------------------------------
# This part of the code is performed on the server side
def move_bytes_to_struct(struct_to_fill,bytes_to_move):
adr = ctypes.addressof(struct_to_fill)
struct_size = ctypes.sizeof(struct_to_fill)
bytes_to_move = bytes_to_move[0:struct_size]
ctypes.memmove(adr, bytes_to_move, struct_size)
return struct_to_fill
#Data received can be assumed to be the same as data sent
data_hex_received = data_hex
data_bytes = bytes.fromhex(data_hex_received)
data_received = move_bytes_to_struct(my_struct(), data_bytes)
print('Data received:')
print_struct(data_received)
I don't know if you are overcomplicating things a bit.
The struct module will let you do pretty much what you are doing, but simpler:
struct.pack("fffs12", 4.01234, 4.012345, 1.23456789, 'Hello World!'.encode())
This of course depends on you knowing how long your string is, but you could also not care about that:
struct.pack("fff", 4.01234, 4.012345, 1.23456789) + 'Hello World!'.encode()
But, about saving things more efficient:
The more you know about your data, the more shortcuts you can take.
Is the string only ascii? You could squeeze each char into maybe 7 bits, or even 6.
That could bring your 12 bytes string to 9.
If you know the range of your floats, you could perhaps trim that as well.
If you can send larger batches, compression might help as well.
I'm planning to implement a "DSP-like" signal processor in Python. It should capture small fragments of audio via ALSA, process them, then play them back via ALSA.
To get things started, I wrote the following (very simple) code.
import alsaaudio
inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NORMAL)
inp.setchannels(1)
inp.setrate(96000)
inp.setformat(alsaaudio.PCM_FORMAT_U32_LE)
inp.setperiodsize(1920)
outp = alsaaudio.PCM(alsaaudio.PCM_PLAYBACK, alsaaudio.PCM_NORMAL)
outp.setchannels(1)
outp.setrate(96000)
outp.setformat(alsaaudio.PCM_FORMAT_U32_LE)
outp.setperiodsize(1920)
while True:
l, data = inp.read()
# TODO: Perform some processing.
outp.write(data)
The problem is, that the audio "stutters" and is not gapless. I tried experimenting with the PCM mode, setting it to either PCM_ASYNC or PCM_NONBLOCK, but the problem remains. I think the problem is that samples "between" two subsequent calls to "inp.read()" are lost.
Is there a way to capture audio "continuously" in Python (preferably without the need for too "specific"/"non-standard" libraries)? I'd like the signal to always get captured "in the background" into some buffer, from which I can read some "momentary state", while audio is further being captured into the buffer even during the time, when I perform my read operations. How can I achieve this?
Even if I use a dedicated process/thread to capture the audio, this process/thread will always at least have to (1) read audio from the source, (2) then put it into some buffer (from which the "signal processing" process/thread then reads). These two operations will therefore still be sequential in time and thus samples will get lost. How do I avoid this?
Thanks a lot for your advice!
EDIT 2: Now I have it running.
import alsaaudio
from multiprocessing import Process, Queue
import numpy as np
import struct
"""
A class implementing buffered audio I/O.
"""
class Audio:
"""
Initialize the audio buffer.
"""
def __init__(self):
#self.__rate = 96000
self.__rate = 8000
self.__stride = 4
self.__pre_post = 4
self.__read_queue = Queue()
self.__write_queue = Queue()
"""
Reads audio from an ALSA audio device into the read queue.
Supposed to run in its own process.
"""
def __read(self):
inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NORMAL)
inp.setchannels(1)
inp.setrate(self.__rate)
inp.setformat(alsaaudio.PCM_FORMAT_U32_BE)
inp.setperiodsize(self.__rate / 50)
while True:
_, data = inp.read()
self.__read_queue.put(data)
"""
Writes audio to an ALSA audio device from the write queue.
Supposed to run in its own process.
"""
def __write(self):
outp = alsaaudio.PCM(alsaaudio.PCM_PLAYBACK, alsaaudio.PCM_NORMAL)
outp.setchannels(1)
outp.setrate(self.__rate)
outp.setformat(alsaaudio.PCM_FORMAT_U32_BE)
outp.setperiodsize(self.__rate / 50)
while True:
data = self.__write_queue.get()
outp.write(data)
"""
Pre-post data into the output buffer to avoid buffer underrun.
"""
def __pre_post_data(self):
zeros = np.zeros(self.__rate / 50, dtype = np.uint32)
for i in range(0, self.__pre_post):
self.__write_queue.put(zeros)
"""
Runs the read and write processes.
"""
def run(self):
self.__pre_post_data()
read_process = Process(target = self.__read)
write_process = Process(target = self.__write)
read_process.start()
write_process.start()
"""
Reads audio samples from the queue captured from the reading thread.
"""
def read(self):
return self.__read_queue.get()
"""
Writes audio samples to the queue to be played by the writing thread.
"""
def write(self, data):
self.__write_queue.put(data)
"""
Pseudonymize the audio samples from a binary string into an array of integers.
"""
def pseudonymize(self, s):
return struct.unpack(">" + ("I" * (len(s) / self.__stride)), s)
"""
Depseudonymize the audio samples from an array of integers into a binary string.
"""
def depseudonymize(self, a):
s = ""
for elem in a:
s += struct.pack(">I", elem)
return s
"""
Normalize the audio samples from an array of integers into an array of floats with unity level.
"""
def normalize(self, data, max_val):
data = np.array(data)
bias = int(0.5 * max_val)
fac = 1.0 / (0.5 * max_val)
data = fac * (data - bias)
return data
"""
Denormalize the data from an array of floats with unity level into an array of integers.
"""
def denormalize(self, data, max_val):
bias = int(0.5 * max_val)
fac = 0.5 * max_val
data = np.array(data)
data = (fac * data).astype(np.int64) + bias
return data
debug = True
audio = Audio()
audio.run()
while True:
data = audio.read()
pdata = audio.pseudonymize(data)
if debug:
print "[PRE-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata))
ndata = audio.normalize(pdata, 0xffffffff)
if debug:
print "[PRE-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata))
print "[PRE-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata)))))
#ndata += 0.01 # When I comment in this line, it wreaks complete havoc!
if debug:
print "[POST-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata)))))
print "[POST-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata))
pdata = audio.denormalize(ndata, 0xffffffff)
if debug:
print "[POST-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata))
print ""
data = audio.depseudonymize(pdata)
audio.write(data)
However, when I even perform the slightest modification to the audio data (e. g. comment that line in), I get a lot of noise and extreme distortion at the output. It seems like I don't handle the PCM data correctly. The strange thing is that the output of the "level meter", etc. all appears to make sense. However, the output is completely distorted (but continuous) when I offset it just slightly.
EDIT 3: I just found out that my algorithms (not included here) work when I apply them to wave files. So the problem really appears to actually boil down to the ALSA API.
EDIT 4: I finally found the problems. They were the following.
1st - ALSA quietly "fell back" to PCM_FORMAT_U8_LE upon requesting PCM_FORMAT_U32_LE, thus I interpreted the data incorrectly by assuming that each sample was 4 bytes wide. It works when I request PCM_FORMAT_S32_LE.
2nd - The ALSA output seems to expect period size in bytes, even though they explicitely state that it is expected in frames in the specification. So you have to set the period size four times as high for output if you use 32 bit sample depth.
3rd - Even in Python (where there is a "global interpreter lock"), processes are slow compared to Threads. You can get latency down a lot by changing to threads, since the I/O threads basically don't do anything that's computationally intensive.
When you
read one chunk of data,
write one chunk of data,
then wait for the second chunk of data to be read,
then the buffer of the output device will become empty if the second chunk is not shorter than the first chunk.
You should fill up the output device's buffer with silence before starting the actual processing. Then small delays in either the input or output processing will not matter.
You can do that all manually, as #CL recommend in his/her answer, but I'd recommend just using
GNU Radio instead:
It's a framework that takes care of doing all the "getting small chunks of samples in and out your algorithm"; it scales very well, and you can write your signal processing either in Python or C++.
In fact, it comes with an Audio Source and an Audio Sink that directly talk to ALSA and just give/take continuous samples. I'd recommend reading through GNU Radio's Guided Tutorials; they explain exactly what is necessary to do your signal processing for an audio application.
A really minimal flow graph would look like:
You can substitute the high pass filter for your own signal processing block, or use any combination of the existing blocks.
There's helpful things like file and wav file sinks and sources, filters, resamplers, amplifiers (ok, multipliers), …
I finally found the problems. They were the following.
1st - ALSA quietly "fell back" to PCM_FORMAT_U8_LE upon requesting PCM_FORMAT_U32_LE, thus I interpreted the data incorrectly by assuming that each sample was 4 bytes wide. It works when I request PCM_FORMAT_S32_LE.
2nd - The ALSA output seems to expect period size in bytes, even though they explicitely state that it is expected in frames in the specification. So you have to set the period size four times as high for output if you use 32 bit sample depth.
3rd - Even in Python (where there is a "global interpreter lock"), processes are slow compared to Threads. You can get latency down a lot by changing to threads, since the I/O threads basically don't do anything that's computationally intensive.
Audio is gapless and undistorted now, but latency is far too high.
I am struggling to get to grips with this.
I create a netcdf4 file with the following dimensions and variables (note in particular the unlimited point dimension):
dimensions:
point = UNLIMITED ; // (275935 currently)
realization = 24 ;
variables:
short mod_hs(realization, point) ;
mod_hs:scale_factor = 0.01 ;
short mod_ws(realization, point) ;
mod_ws:scale_factor = 0.01 ;
short obs_hs(point) ;
obs_hs:scale_factor = 0.01 ;
short obs_ws(point) ;
obs_ws:scale_factor = 0.01 ;
short fchr(point) ;
float obs_lat(point) ;
float obs_lon(point) ;
double obs_datetime(point) ;
}
I have a Python program that populated this file with data in a loop (hence the unlimited record dimension - I don't know apriori how big the file will be).
After populating the file, it is 103MB in size.
My issue is that reading data from this file is quite slow. I guessed that this is something to do with chunking and the unlmited point dimension?
I ran ncks --fix_rec_dmn on the file and (after a lot of churning) it produced a new netCDF file that is only 32MB in size (which is about the right size for the data it contains).
This is a massive difference in size - why is the original file so bloated? Also - accessing the data in this file is orders of magnitude quicker. For example, in Python, to read in the contents of the hs variable takes 2 seconds on the original file and 40 milliseconds on the fixed record dimension file.
The problem I have is that some of my files contain a lot of points and seem to be too big to run ncks on (my machine runs out of memoery and I have 8GB), so I can't convert all the data to fixed record dimension.
Can anyone explain why the file sizes are so different and how I can make the original files smaller and more efficient to read?
By the way - I am not using zlib compression (I have opted for scaling floating point values to an integer short).
Chris
EDIT
My Python code is essentially building up one single timeseries file of collocated model and observation data from multiple individual model forecast files over 3 months. My forecast model runs 4 times a day, and I am aggregateing 3 months of data, so that is ~120 files.
The program extracts a subset of the forecast period from each file (e.t. T+24h -> T+48h), so it is not a simple matter of concatenating the files.
This is a rough approxiamtion of what my code is doing (it actually reads/writes more variables, but I am just showing 2 here for clarity):
# Create output file:
dout = nc.Dataset(fn, mode='w', clobber=True, format="NETCDF4")
dout.createDimension('point', size=None)
dout.createDimension('realization', size=24)
for varname in ['mod_hs','mod_ws']:
v = ncd.createVariable(varname, np.short,
dimensions=('point', 'realization'), zlib=False)
v.scale_factor = 0.01
# Cycle over dates
date = <some start start>
end_dat = <some end date>
# Keeo track if record dimension ('point') size:
n = 0
while date < end_date:
din = nc.Dataset("<path to input file>", mode='r')
fchr = din.variables['fchr'][:]
# get mask for specific forecast hour range
m = np.logical_and(fchr >= 24, fchr < 48)
sz = np.count_nonzero(m)
if sz == 0:
continue
dout.variables['mod_hs'][n:n+sz,:] = din.variables['mod_hs'][:][m,:]
dout.variables['mod_ws'][n:n+sz,:] = din.variables['mod_wspd'][:][m,:]
# Increment record dimension count:
n += sz
din.close()
# Goto next file
date += dt.timedelta(hours=6)
dout.close()
Interestingly, if I make the output file format NETCDF3_CLASSIC rather that NETCDF4 the output size the size that I would expect. NETCDF4 output seesm to be bloated.
My experience has been that the default chunksize for record dimensions depends on the version of the netCDF library underneath. For 4.3.3.1, it is 524288. 275935 records is about half a record-chunk. ncks automatically chooses (without telling you) more sensible chunksizes than netCDF defaults, so the output is better optimized. I think this is what is happening. See http://nco.sf.net/nco.html#cnk
Please try to provide a code that works without modification if possible, I had to edit to get it working, but it wasn't too difficult.
import netCDF4 as nc
import numpy as np
dout = nc.Dataset('testdset.nc4', mode='w', clobber=True, format="NETCDF4")
dout.createDimension('point', size=None)
dout.createDimension('realization', size=24)
for varname in ['mod_hs','mod_ws']:
v = dout.createVariable(varname, np.short,
dimensions=('point', 'realization'), zlib=False,chunksizes=[1000,24])
v.scale_factor = 0.01
date = 1
end_date = 5000
n = 0
while date < end_date:
sz=100
dout.variables['mod_hs'][n:n+sz,:] = np.ones((sz,24))
dout.variables['mod_ws'][n:n+sz,:] = np.ones((sz,24))
n += sz
date += 1
dout.close()
The main difference is in createVariable command. For file size, without providing "chunksizes" in creating variable, I also got twice as large file compared to when I added it. So for file size it should do the trick.
For reading variables from file, I did not notice any difference actually, maybe I should add more variables?
Anyway, it should be clear how to add chunk size now, You probably need to test a bit to get good conf for Your problem. Feel free to ask more if it still does not work for You, and if You want to understand more about chunking, read the hdf5 docs
I think your problem is that the default chunk size for unlimited dimensions is 1, which creates a huge number of internal HDF5 structures. By setting the chunksize explicitly (obviously ok for unlimited dimensions), the second example does much better in space and time.
Unlimited dimensions require chunking in HDF5/netCDF4, so if you want unlimited dimensions you have to think about chunking performance, as you have discovered.
More here:
https://www.unidata.ucar.edu/software/netcdf/docs/netcdf_perf_chunking.html