Loading a Neo4j subgraph into Networkx

Loading a Neo4j subgraph into Networkx - python

I have been dealing with Neo4j through python's Bulbflow and now need a way to save/export subgraphs. I have seen Java and even Ruby approaches for doing this, however a simple Python approach seems to be hiding from me..
So far, I have found two potential paths:
Accessing Geoff through py2neo, but there is surprisingly little documentation for extracting a subgraph from a big local neo4j database or from a neo4jserver.
Using Networkx:
I found networkx can load graphs from many different formats (I am unsure which format neo4j stores their dbs), however I haven't found a way to extract a only a subgraph into Networkx. I assume this should be done from a gremlin query, but I'm not sure how to go about this.
I have a preference for the Networkx path, as it also comes with network analysis algorithms I wish to apply to subgraphs. I feel it would also avoid potential clashes between Bulbflow and py2neo, although I'm not sure whether such a clash would exist.
Any advice would be much appreciated!
Thanks in advance

I didn't know the answer until you asked, but it seems like you can just export in gml, which networkx can read. Here are a few answers that might be useful:
Neo4j export Tree
Convert Neo4j DB to XML?
https://github.com/tinkerpop/gremlin/wiki/Gremlin-Methods
Hope that helps.

I know it's an old question, but if you stumbled upon it like me - know that networkx has a subgraph command, so you can load the whole graph from Neo4j and use that.

Related

QGIS python processing tools - saving/exporting graph

I just tried QGIS processing tools for the first time.
When using shortest path search for the first time, the algorithm is building a path graph (of course). However, the graph is never re-used. Every time the algorithm is used, the graph building starts again. I am not familiar with the exact code used for it, but I guess the graph is network-wide, not related to the specific points I selected. So is there a way to re-use the graph? Er even export it to a file?
My network is large (more than 200k features), so efficiency is important. The network is rarely updated, so calculating a graph could easily be done just once in a while.
I looked up the docs and settings of processing tools, and this option seems to be unavailable (which is surprising). So maybe I am missing something, or maybe someone could suggest a way to serialize the graph and save it using python code? I am using QGIS 3.1 (A Coruna).
As found on anitagraser github page processing tools that use graphs look straightforward using Dijkstra algoritghm.
builder = QgsGraphBuilder( crs )
graph = builder.graph()
from_id = graph.findVertex(from_point)
to_id = graph.findVertex(to_point)
(tree,cost) = QgsGraphAnalyzer.dijkstra(graph,from_id,0)
I guess using that would require building a different tool and not using the user-friendly shortest path search tool with a nice GUI and integration with point selecting and so on. That would be not acceptable. The goal was to be able to perform the same task on any pc with un-altered QGIS (no add-ins needed, only a script). But it may be that it is not possible. So the problem leads to:
is it possible to tweak the existing processing tool, to cache the graph or even store it in a file?
can I somehow duplicate the tool and apply some small changes?
I asked this question already on gis stack echange. My question probably has a programming answer, so I am reposting as an exception.

Networkit graphEvent (python)

Another Networkit question. Seems like this module doesn't get much support (and I certainly don't want to open issues on github just to get help), but you don't get if you don't ask. By reading the docs it seems like there's a lot of functions to perform certain operations in an optimal way... but often I just don't get how to do use those functions.
This time I am trying to understand what a GraphEvent is. Let's say that I build a graph, I calculate the connected components and then I remove edges and nodes iteratively, based on some condition; then I want to calculate the connected components again. I thought that I could do something like:
cc=components.DynConnectedComponents(G)
cc.run()
...
#edge removals
...
cc.update()
but components.DynConnectedComponents(Graph).update(GraphEvent), which updates the connected components after an event... well it requires a GraphEvent object, and I haven't the slightest idea of what it might be and how to handle it. There's nothing in the docs that clarifies it and I would appreciate a lot if someone could explain me this.
Thanks!

I received an answer to another question where the graphEvent is explained too.

How to create an udf for hive using python with 3rd party package like sklearn?

I know how to create a hive udf with transform and using, but I can't use sklearn because not all the node in hive cluster has sklearn.
I have an anaconda2.tar.gz with sklearn, What should I do ?

I recently started looking into this approach and I feel like the problem is not about to get all the 'hive nodes' having sklearn on them (as you mentioned above), I feel like it is rather a compatibility issue than 'sklearn node availability' one. I think sklearn is not (yet) designed to run as a parallel algorithm such that large amount of data can be processed in a short time.
What I'm trying to do, as an approach, is to communicate python to 'hive' through 'pyhive' (for example) and implement the necessary sklearn libraries/calls within that code. The rough assumption here that this 'sklearn-hive-python' code will run in each node and deal with the data at the 'map-reduce' level.
I cannot say this is the right solution or correct approach (yet) but this is what I can conclude after searching for sometime.

create p2p-network with "save-option" in python

I need an implementation of network with nodes (<100) in python. Nodes can send response on all nodes and on two neighbor-nodes. Nodes can save small data. Does anyone know of such library?
I use btpeer http://cs.berry.edu/~nhamid/p2p/framework-python.html, but there is no "save-data"-option.

You may find doozerd an interesting concept. I am not sure about consistency guarantees it provides though.

Simple Network Graph Plotting in Python?

I am working on some algorithms that create network graphs, and I am finding it really hard to debug the output. The code is written in Python, and I am looking for the simplest way to view the resulting network.
Every node has a reference to its parent elements, but a helper function could be written to format the network in any other way.
What is the simplest way to display a network graph from Python? Even if it's not fully written in Python, ie it uses some other programs available to Linux, it would be fine.

It sounds like you want something to help debugging the network you are constructing. For this you might want to consider implementing a function that converts your network to DOT, a graph description language, which can then be rendered to a graph visualization using a number of tools, such as GraphViz. You can then log the output from this function to help debug.

Have you tried Netwulf? It takes a networkx.Graph object as input and launches an interactive d3-powered visualization in a separate browser window. The resulting image (and data) can then be posted back to Python for further processing.
Disclaimer: I'm a co-author of Netwulf.

Think about using existing graph libraries for your problem domain, e.g. NetworkX. Drawing can be done from there with matplotlib or pygraphviz.
For bigger projects, you might also want to check out a graph database like Neo4j with its toolkit (and own query language CYPHER) for working with python.
A good interface markup is also GraphML, can be useful with drawing tools like yEd in case you have small graphs and need some manual finish.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.