Matlab output format that Python can consume - python

I am pretty new to Matlab but relatively familiar with Python. I am now using an existing Matlab code but I want the the program to generate output that Python can consume. The standard output format, as far as I know is .mat, which is more like a binary format.
Another function I considered is the built-in csvwrite but the problem with that is the variable I want to output is more like a dictionary and it can have several level subfield (e.g., feature.subfeature.subsubfeature = [1, 2, 3]). Another possibility is to output json format but it seems there is no built-in method for output json. There are some toolbox I can use but I don't have sudo permission on the machine that I am using.
Any suggestion on what's a better way to output a format that python can consume? Thanks.

The solution with the least effort on MATLAB side would be to use SciPy, which read/write mat-files:
SciPy
The mat-format is binary, but (surprisingly) open and document by Mathworks.

Related

pyAgrum - printing and plotting

I'm just starting to learn pyAgrum. I am looking for functions to both plot a network and print the tables/potentials within a python session. I.e., I'm looking to do this without the need for any sort of HTML-based interpreter (pylab, etc.). I'm coming from the R-world, where I'm used to this kind of workflow using R's version of igraph e.g., and where tables can be printed as ordinary R-arrays. I know, that pyAgrum::Potential's are lower-level C++ classes but is there a way to achieve the above? I like to stay in my editor :)
I already answered you somewhere else :-) but for the sake of the other readers :
to print an ascii version of a table, you can just use the __str__() methods. Hence print(p) where pis a Potential will do the job.
to export an image of a BN, you can use pyAgrum.lib.image :
import pyAgrum
import pyAgrum.lib.image as gimg
bn=gum.fastBN("A->B->C")
gimg.export(bn,"test.pdf")
gimg.exportInference(bn,"test.png",evs={"A":1})
will export test.pdf containing the graph and test.png containing the (graphical) result of an inference.

storing matrices in golang in compressed binary format

I am exploring a comparison between Go and Python, particularly for mathematical computation. I noticed that Go has a matrix package mat64.
1) I wanted to ask someone who uses both Go and Python if there are functions / tools comparable that are equivalent of Numpy's savez_compressed which stores data in a npz format (i.e. "compressed" binary, multiple matrices per file) for Go's matrics?
2) Also, can Go's matrices handle string types like Numpy does?
1) .npz is a numpy specific format. It is unlikely that Go itself would ever support this format in the standard library. I also don't know of any third party library that exists today, and (10 second) search didn't pop one up. If you need npz specifically, go with python + numpy.
If you just want something similar from Go, you can use any format. Binary formats include golang binary and gob. Depending on what you're trying to do, you could even use a non-binary format like json and just compress it on your own.
2) Go doesn't have built-in matrices. That library you found is third party and it only handles float64s.
However, if you just need to store strings in matrix (n-dimensional) format, you would use a n-dimensional slice. For 2-dimensional it looks like this: var myStringMatrix [][]string.
npz files are zip archives. Archiving and compression (optional) are handled by the Python zip module. The npz contains one npy file for each variable that you save. Any OS based archiving tool can decompress and extract the component .npy files.
So the remaining question is - can you simulate the npy format? It isn't trivial, but also not difficult either. It consists of a header block that contains shape, strides, dtype, and order information, followed by a data block, which is, effectively, a byte image of the data buffer of the array.
So the buffer information, and data are closely linked to the numpy array content. And if the variable isn't a normal array, save uses the Python pickle mechanism.
For a start I'd suggest using the csv format. It's not binary, and not fast, but everyone and his brother can generate and read it. We constantly get SO questions about reading such files using np.loadtxt or np.genfromtxt. Look at the code for np.savetxt to see how numpy produces such files. It's pretty simple.
Another general purpose choice would be JSON using the tolist format of an array. That comes to mind because GO is Google's home grown alternative to Python for web applications. JSON is a cross language format based on simplified Javascript syntax.

It is possible to use python in SPSS to bypass limitations of the SPSS syntax (f.ex. in loops)?

You can use python to automate things in SPSS or to shorten the way, but I need to know if it is possible to replace the SPSS Syntax with python for example to aggregate data in loops etc..
Or another example. I have 2 datesets with the follwing variables id, begin, end and type. It is
possible to put them into different arrays/lists and then compare the arrays/lists so that at the end i have a new table/dataset
with non matching entries and a dataset with the matching entries in SPSS.
My idea is to extend the context of matching files in SPSS.
Normally programming languages like python or php can handle this.
Excuse me. I hope someone will understand what I mean.
There are many ways to do this sort of thing with Python. The SPSS module Dataset class allows you to read and write the case data. The spssdata module provides a somewhat simpler way to do this. These are included when you install the Python Essentials. There are also utility modules available from the SPSS Community website. In particular, the extended Transforms module provides a standard lookup function and an interval-based lookup.
I'm not sure, though, that the standard MATCH FILES won't do what you need here. Mismatches will generate missing data in the variables, and you can select subsets based on that criterion.
This question explains several ways how to import an SPSS dataset in Python code: Importing SPSS dataset into Python
Afterwards, you can use the standard Python tools to analyze them.
Note: I've had some success with simply formatting the data in a text file. I can then use any diff tool to compare the files.
The advantage of this approach is that's usually very easy to write text exporters which sort the data to make it easier for the diff tool to see what is similar.
The drawback is that text only works for simple cases. When your data has a recursive structure, then text is not ideal. In that case, try an XML diff tool.

Streaming in Python for scientific data analysis

I just began using hadoop on a single node cluster on my laptop and I tried to do it in Python which I know better than Java. Apparently streaming is the simplest way to do so without installing any others packages.
Well my question is, when I do a little data analysis with streaming, I had to:
Transform my data (matrix, array ... ) into text file which fit in the default input file format for streaming.
Re-construct my data in my mapper.py to make explicitly (key, value) pairs and print them out.
Read the result in text format and transform then into matrix data so that I could do other things with them.
When you do a wordcount with text file as input, everything looks fine. But how do you handle data structure within streaming then? The way I did seems just unacceptable...
For python and hadoop, please look for MRjob package, http://pythonhosted.org/mrjob/
You can write your ouwn encoding-decoding protocol, streaming matrix row as a rownum-values pair, or every element as row:col-value pair and so on.
Either way, hadoop is not the best framework to work with for matrix operations, since its designed for big amounts of non-interrelated data, i.e. when you key-value processing do not depend on other values, or depends in a very limited way.
Using json as a text format makes for very convenient encoding and decoding.
For example a 4*4 identity matrix on hdfs could be stored as:
{"row":3, "values":[0,0,1,0]}
{"row":2, "values":[0,1,0,0]}
{"row":4, "values":[0,0,0,1]}
{"row":1, "values":[1,0,0,0]}
In the mapper use json.loads() from the json library to parse each line into a python dictionary which is very easy to manipulate. Then return a key followed by more json (use json.dumps() to encode a python object as json):
1 {"values":[1,0,0,0]}
2 {"values":[0,1,0,0]}
3 {"values":[0,0,1,0]}
4 {"values":[0,0,0,1]}
In the reducer use json.loads() on the values to create a python dictionary. These could then be easily converted into a numpy array for example.

Using matlab matrix as input for a python code

Is it possible to use a matrix generated with matlab and saved in a binary file as input of a python script?
Of course it's possible. bits are bits, you just need to know how to interpret them :). Fortunately for you, it looks like someone has already done the hard work of figuring out the matlab file format and has written a reader for it ... Have a look at the scipy.io module. Specifically, the loadmat function might be useful.
scipy isn't in the python standard library, but generally speaking, if you're going to be trying to use python to replicate something done in matlab, you'll probably want to have it and it's sibling/child package numpy installed.

Categories