Read and Write to McIDAS Area files in Python - python

I'm working with GOES data from NOAA, and need to save the result off in McIDAS Area format. Currently, I'm using the struct module to pack/unpack the binary data according to the byte format described in the McIDAS Programmer's Guide.
Is there something better suited for dealing with these binary formats (hopefully something that can be used with Python)?

Related

Converting Vector *.blf data to normal readable data for spreadsheet use with python-can or cantools

I have a few files that were created by exporting CAN-Bus data in CANalyzer or Vector. The problem is the mainly interesting data in the file are encoded and look like this: "40c1 bf1b 490d 34b0 46c5 6ed0 a853 d856".
Is there a way to "convert" this data into normal, human readable numbers by means of the python-can or cantools?

Which data files output format from C++ to be read in python (or other) is more size efficient?

I want to write output files containing tabular datas (float values in lines and columns) from a C++ program.
I need to open those files later on with other languages/softwares (here python and paraview, but might change).
What would be the most efficient output format for tabular files (efficient for files memory sizes efficiency) that would be compatible with other languages ?
E.g., txt files, csv, xml, binarized or not ?
Thanks for advices
HDF5 might be a good option for you. It’s a standard format for storing large amounts of data, and there are Python and C++ libraries for reading and writing to it.
See here for an example
1- Your output files contain tabular data (float values in lines and columns), in other words, a kind of matrix.
2- You need to open those files later on with other languages/softwares
3- You want to have files memory sizes efficiency
That's said, you have to consider one of the two formats below:
CSV: if your data are very simple (a matrix of float without particualr structure)
JSON if you need a minimum structure for your files
These two formats are standard, supported by almost all the known languages and maintained softwares.
Last, if your data have a great complexity structure, prefer to look at a format like XML but the price to pay is then in the size of your files!
Hope this helps!
First of all the efficiency of i/o operations is limited by the buffer size. So if you want to achieve higher throughput you might have to play with the input output buffers. And regarding your doubt of what way to output into the files is dependent on your data and what delimiters you want to use to separate the data in the files.

storing matrices in golang in compressed binary format

I am exploring a comparison between Go and Python, particularly for mathematical computation. I noticed that Go has a matrix package mat64.
1) I wanted to ask someone who uses both Go and Python if there are functions / tools comparable that are equivalent of Numpy's savez_compressed which stores data in a npz format (i.e. "compressed" binary, multiple matrices per file) for Go's matrics?
2) Also, can Go's matrices handle string types like Numpy does?
1) .npz is a numpy specific format. It is unlikely that Go itself would ever support this format in the standard library. I also don't know of any third party library that exists today, and (10 second) search didn't pop one up. If you need npz specifically, go with python + numpy.
If you just want something similar from Go, you can use any format. Binary formats include golang binary and gob. Depending on what you're trying to do, you could even use a non-binary format like json and just compress it on your own.
2) Go doesn't have built-in matrices. That library you found is third party and it only handles float64s.
However, if you just need to store strings in matrix (n-dimensional) format, you would use a n-dimensional slice. For 2-dimensional it looks like this: var myStringMatrix [][]string.
npz files are zip archives. Archiving and compression (optional) are handled by the Python zip module. The npz contains one npy file for each variable that you save. Any OS based archiving tool can decompress and extract the component .npy files.
So the remaining question is - can you simulate the npy format? It isn't trivial, but also not difficult either. It consists of a header block that contains shape, strides, dtype, and order information, followed by a data block, which is, effectively, a byte image of the data buffer of the array.
So the buffer information, and data are closely linked to the numpy array content. And if the variable isn't a normal array, save uses the Python pickle mechanism.
For a start I'd suggest using the csv format. It's not binary, and not fast, but everyone and his brother can generate and read it. We constantly get SO questions about reading such files using np.loadtxt or np.genfromtxt. Look at the code for np.savetxt to see how numpy produces such files. It's pretty simple.
Another general purpose choice would be JSON using the tolist format of an array. That comes to mind because GO is Google's home grown alternative to Python for web applications. JSON is a cross language format based on simplified Javascript syntax.

Defining type of binary file string

Is it possible to check what type of data does represent string binary file?
For example:
We have binary data read from image file (f.e. example.jpg). Can we guess that this binary data represent image file? If yes, how we can do that?
You could look at the python-magic library. It seems to do what you want (although I've never used it myself).

HDF5 : storing NumPy data

when I used NumPy I stored it's data in the native format *.npy. It's very fast and gave me some benefits, like this one
I could read *.npy from C code as
simple binary data(I mean *.npy are
binary-compatibly with C structures)
Now I'm dealing with HDF5 (PyTables at this moment). As I read in the tutorial, they are using NumPy serializer to store NumPy data, so I can read these data from C as from simple *.npy files?
Does HDF5's numpy are binary-compatibly with C structures too?
UPD :
I have matlab client reading from hdf5, but don't want to read hdf5 from C++ because reading binary data from *.npy is times faster, so I really have a need in reading hdf5 from C++ (binary-compatibility)
So I'm already using two ways for transferring data - *.npy (read from C++ as bytes,from Python natively) and hdf5 (access from Matlab)
And if it's possible,want to use the only one way - hdf5, but to do this I have to find a way to make hdf5 binary-compatibly with C++ structures, pls help, If there is some way to turn-off compression in hdf5 or something else to make hdf5 binary-compatibly with C++ structures - tell me where i can read about it...
The proper way to read hdf5 files from C is to use the hdf5 API - see this tutorial. In principal it is possible to directly read the raw data from the hdf5 file as you would with the .npy file, assuming you have not used advanced storage options such as compression in your hdf5 file. However this essentially defies the whole point of using the hdf5 format and I cannot think of any advantage to doing this instead of using the proper hdf5 API. Also note that the API has a simplified high level version which should make reading from C relatively painless.
I feel your pain. I've been dealing extensively with massive amounts of data stored in HDF5 formatted files, and I've gleaned a few bits of information you may find useful.
If you are in "control" of the file creation (and writing the data - even if you use an API) you should be able to largely entirely circumvent the HDF5 libraries.
If you the output datasets are not chunked, they will be written contiguously. As long as you aren't specifying any byte-order conversion in your datatype definitions (i.e. you are specifying the data should be written in native float/double/integer format) you should be able to achieve "binary-compatibility" as you put it.
To solve my problem I wrote an HDF5 file parser using the file specification http://www.hdfgroup.org/HDF5/doc/H5.format.html
With a fairly simple parser you should be able to identify the offset to (and size of) any dataset. At that point simply fseek and fread (in C, that is, perhaps there is a higher level approach you can take in C++).
If your datasets are chunked, then more parsing is necessary to traverse the b-trees used to organize the chunks.
The only other issue you should be aware of is handling any (or eliminating) any system dependent structure padding.
HDF5 takes care of binary compatibility of structures for you. You simply have to tell it what your structs consist of (dtype) and you'll have no problems saving/reading record arrays - this is because the type system is basically 1:1 between numpy and HDF5. If you use H5py I'm confident to say the IO should be fast enough provided you use all native types and large batched reads/writes - the entire dataset of allowable. After that it depends on chunking and what filters (shuffle, compression for example) - it's also worth noting sometimes those can speed up by greatly reducing file size so always look at benchmarks. Note that the the type and filter choices are made on the end creating the HDF5 document.
If you're trying to parse HDF5 yourself, you're doing it wrong. Use the C++ and C apis if you're working in C++/C. There are examples of so called "compound types" on the HDF5 groups website.

Categories