I'm just starting to learn pyAgrum. I am looking for functions to both plot a network and print the tables/potentials within a python session. I.e., I'm looking to do this without the need for any sort of HTML-based interpreter (pylab, etc.). I'm coming from the R-world, where I'm used to this kind of workflow using R's version of igraph e.g., and where tables can be printed as ordinary R-arrays. I know, that pyAgrum::Potential's are lower-level C++ classes but is there a way to achieve the above? I like to stay in my editor :)
I already answered you somewhere else :-) but for the sake of the other readers :
to print an ascii version of a table, you can just use the __str__() methods. Hence print(p) where pis a Potential will do the job.
to export an image of a BN, you can use pyAgrum.lib.image :
import pyAgrum
import pyAgrum.lib.image as gimg
bn=gum.fastBN("A->B->C")
gimg.export(bn,"test.pdf")
gimg.exportInference(bn,"test.png",evs={"A":1})
will export test.pdf containing the graph and test.png containing the (graphical) result of an inference.
Related
Quick Aside So, I'm a bit of a rookie with Python; therefore forgive my incorrect ways of describing things AND ask me questions if I don't provide enough information.
Ask my title indicates, I'm attempting to bring in a data set that is Lisp data structure. I'm trying to start small and work with a smaller data set (as I'm going to be dealing with much larger eventually) however, I'm unclear as to how I should set up my separators for my pandas
So, I'm bringing in a .dat file from a lisp data structure, and reading it with pandas (or attempting to).
My goal, is to try and have it be a normal data set, where I can separate a given, say function, with its' respected outputs.
My Lisp Data set looks like the following:
(setf nameoffile?'
((function-1 output1) (function-2 output2 output3 output4) (function-3 output5 output 6 output7...)
(function-4 output)
...
(function-N outputN outputM ... )) )
Hopefully this is not too cryptic. Please, let me know if I'm not providing enough information.
Lastly, my goal is to have all of the functions, lets say in a row and have the outputs read across the row in a pandas dataframe (since I'm used to that); for example:
function-1: output1
function-2: output2 and so on and so forth...
Again, please let me know if I'm a bit confusing, or did not provide enough information.
Thank you so much in advance!
EDIT:
My specific question is how can I insert this somewhat ambiguous lisp data structure into a pandas dataframe? Additionally, I dont know how to modify what I want into their desired rows and on how to separate them (delimiter/sep = ?). When I insert this via pandas, I get a very mumble jumbled dataframe. I think a key issue is how do I separate them appropriately?
As noted by #molbdnilo and #sds, it's probably easier to export data from lisp in a common format and then import them in Python using an existing parser.
For example you can save them to CSV file from Lisp, using the cl-csv library that is also available on quicklisp.
As you can see from cl-csv tests, you can get a csv string from you data using the write-csv function:
(write-csv *your-data-rows* :always-quote t)
Or, if you want to proceed line-by-line, you can use write-csv-row function.
Then will be easy to save the resulting string into a file and read this CSV from Python.
If your Lisp program isn't already too large, consider rewriting it in Hy. Hy is a Lisp dialect, so you can continue writing in Lisp. But also,
Hy maintains, over everything else, 100% compatibility in both directions with Python itself.
This means you can use Python libraries when writing Hy, and you can write a module in Hy to use in Python.
I don't know how your project is setup (and I don't know Pandas), but perhaps you can use this to communicate directly with Pandas?
I am working on some algorithms that create network graphs, and I am finding it really hard to debug the output. The code is written in Python, and I am looking for the simplest way to view the resulting network.
Every node has a reference to its parent elements, but a helper function could be written to format the network in any other way.
What is the simplest way to display a network graph from Python? Even if it's not fully written in Python, ie it uses some other programs available to Linux, it would be fine.
It sounds like you want something to help debugging the network you are constructing. For this you might want to consider implementing a function that converts your network to DOT, a graph description language, which can then be rendered to a graph visualization using a number of tools, such as GraphViz. You can then log the output from this function to help debug.
Have you tried Netwulf? It takes a networkx.Graph object as input and launches an interactive d3-powered visualization in a separate browser window. The resulting image (and data) can then be posted back to Python for further processing.
Disclaimer: I'm a co-author of Netwulf.
Think about using existing graph libraries for your problem domain, e.g. NetworkX. Drawing can be done from there with matplotlib or pygraphviz.
For bigger projects, you might also want to check out a graph database like Neo4j with its toolkit (and own query language CYPHER) for working with python.
A good interface markup is also GraphML, can be useful with drawing tools like yEd in case you have small graphs and need some manual finish.
You can use python to automate things in SPSS or to shorten the way, but I need to know if it is possible to replace the SPSS Syntax with python for example to aggregate data in loops etc..
Or another example. I have 2 datesets with the follwing variables id, begin, end and type. It is
possible to put them into different arrays/lists and then compare the arrays/lists so that at the end i have a new table/dataset
with non matching entries and a dataset with the matching entries in SPSS.
My idea is to extend the context of matching files in SPSS.
Normally programming languages like python or php can handle this.
Excuse me. I hope someone will understand what I mean.
There are many ways to do this sort of thing with Python. The SPSS module Dataset class allows you to read and write the case data. The spssdata module provides a somewhat simpler way to do this. These are included when you install the Python Essentials. There are also utility modules available from the SPSS Community website. In particular, the extended Transforms module provides a standard lookup function and an interval-based lookup.
I'm not sure, though, that the standard MATCH FILES won't do what you need here. Mismatches will generate missing data in the variables, and you can select subsets based on that criterion.
This question explains several ways how to import an SPSS dataset in Python code: Importing SPSS dataset into Python
Afterwards, you can use the standard Python tools to analyze them.
Note: I've had some success with simply formatting the data in a text file. I can then use any diff tool to compare the files.
The advantage of this approach is that's usually very easy to write text exporters which sort the data to make it easier for the diff tool to see what is similar.
The drawback is that text only works for simple cases. When your data has a recursive structure, then text is not ideal. In that case, try an XML diff tool.
I saw this just yesterday but it's for matplotlib which, as far as I know, is Python only. This functionality that would be stupendously useful for my work.
Is anything similar available for R? I've looked around and the closest I've seen mentioned is iPlots/Acinonyx, but the websites for those are a few years out of date. Do those packages work reasonably well? I've not seen any examples of their use.
Alternatively, does mpld3/matplotlib/python play well with R? By that I mean, could I load my dataframes in R, use mpld3/matplotlib/python for exploring my data, then making up final/pretty plots in R?
Full disclosure: I'm a newbie (R is the first programming language that I've really tried to learn since QBASIC as a child...).
While R doesn't seem to have anything quite like this yet, I want to note that mpld3 now has a well-defined JSON layout for figure representations, in some ways similar to Vega (but at a much lower level). I'm not an R/ggplot user, but it seems like the ggvis ggplot-to-vega approach could be rather easily adapted to convert from ggplot to mpld3.
I've forgotten how to do linked plots with brushing in R, but I know the capability is there. I use GGobi for that, however - http://ggobi.org/. It's designed for exploratory data analysis using visualizations, and there are R packages to communicate with it and script it.
There's a pretty good book on GGobi - Interactive and Dynamic Graphics for Data Analysis: With R and GGobi.
The R package ggvis will have similar functionality. It is still in relatively early development, as version 0.1 was just tagged a few days ago. (Although that's also true of mpld3).
To answer your second question, yes they work reasonably well together. The easiest way to do what you suggested would use the R magic function in the IPython notebook.
The package JGR provides a java interface for R. From here, you can call the library iplots. In your R terminal, type
install.packages("JGR");
library(JGR);
JGR()
This will open a new window that you can use just like the standard R terminal.
You should now be able to brush using iplots:
X = matrix(rnorm(900), ncol = 3);
iplot(X[,1], X[,2]);
iplot(X[,1], X[,3]);
ihist(X[,1])
Also take a look at http://cranvas.org/ - it might be somewhat hard to install (especially for a newbie) but it's well worth the effort.
I'm looking into speeding up my python code, which is all matrix math, using some form of CUDA. Currently my code is using Python and Numpy, so it seems like it shouldn't be too difficult to rewrite it using something like either PyCUDA or CudaMat.
However, on my first attempt using CudaMat, I realized I had to rearrange a lot of the equations in order to keep the operations all on the GPU. This included the creation of many temporary variables so I could store the results of the operations.
I understand why this is necessary, but it makes what were once easy to read equations into somewhat of a mess that difficult to inspect for correctness. Additionally, I would like to be able to easily modify the equations later on, which isn't in their converted form.
The package Theano manages to do this by first creating a symbolic representation of the operations, then compiling them to CUDA. However, after trying Theano out for a bit, I was frustrated by how opaque everything was. For example, just getting the actual value for myvar.shape[0] is made difficult since the tree doesn't get evaluated until much later. I would also much prefer less of a framework in which my code much conform to a library that acts invisibly in the place of Numpy.
Thus, what I would really like is something much simpler. I don't want automatic differentiation (there are other packages like OpenOpt that can do that if I require it), or optimization of the tree, but just a conversion from standard Numpy notation to CudaMat/PyCUDA/somethingCUDA. In fact, I want to be able to have it evaluate to just Numpy without any CUDA code for testing.
I'm currently considering writing this myself, but before even consider such a venture, I wanted to see if anyone else knows of similar projects or a good starting place. The only other project I know that might be close to this is SymPy, but I don't know how easy it would be to adapt to this purpose.
My current idea would be to create an array class that looked like a Numpy.array class. It's only function would be to build a tree. At any time, that symbolic array class could be converted to a Numpy array class and be evaluated (there would also be a one-to-one parity). Alternatively, the array class could be traversed and have CudaMat commands be generated. If optimizations are required they can be done at that stage (e.g. re-ordering of operations, creation of temporary variables, etc.) without getting in the way of inspecting what's going on.
Any thoughts/comments/etc. on this would be greatly appreciated!
Update
A usage case may look something like (where sym is the theoretical module), where we might be doing something such as calculating the gradient:
W = sym.array(np.rand(size=(numVisible, numHidden)))
delta_o = -(x - z)
delta_h = sym.dot(delta_o, W)*h*(1.0-h)
grad_W = sym.dot(X.T, delta_h)
In this case, grad_W would actually just be a tree containing the operations that needed to be done. If you wanted to evaluate the expression normally (i.e. via Numpy) you could do:
npGrad_W = grad_W.asNumpy()
which would just execute the Numpy commands that the tree represents. If on the other hand, you wanted to use CUDA, you would do:
cudaGrad_W = grad_W.asCUDA()
which would convert the tree into expressions that can executed via CUDA (this could happen in a couple of different ways).
That way it should be trivial to: (1) test grad_W.asNumpy() == grad_W.asCUDA(), and (2) convert your pre-existing code to use CUDA.
Have you looked at the GPUArray portion of PyCUDA?
http://documen.tician.de/pycuda/array.html
While I haven't used it myself, it seems like it would be what you're looking for. In particular, check out the "Single-pass Custom Expression Evaluation" section near the bottom of that page.