Retrieving node locations from pydotplus (or any layered graph drawing engine)

Retrieving node locations from pydotplus (or any layered graph drawing engine) - python

I'm preparing a layered graph drawing using a dataframe containing node data:
type label
0 Class Insurance Product
1 Class Person
2 Class Address
3 Class Insurance Policy
And another containing relationship data:
froml tol rel fromcard tocard
0 Insurance Policy Insurance Product ConveysProduct One One
1 Person Insurance Policy hasPolicy One Many
2 Person Address ResidesAt None None
I populate a pydotplus dot graph with the content, which I can then use to generate a rendering:
pdp_graph = pydotplus.graphviz.Dot(graph_name="pdp_graph", graph_type='digraph', prog="dot")
for i,e in b_rels_df.iterrows():
edge = pydotplus.graphviz.Edge(src=e['froml'], dst=e['tol'], label=e['rel'])#, set_fromcard=e['fromcard'], set_tocard=e['tocard'])
pdp_graph.add_edge(edge)
for i,n in ents_df.iterrows():
node = pydotplus.graphviz.Node(name=n['label'], set_type=n['type'], set_label=n['label'])
pdp_graph.add_node(node)
png = pdp_graph.create_png()
display(Image(png))
So far so good - but now I want to retrieve the node positions for use in my own interactive layout (the png is a nice example/diagram, but I want to build upon it), so am attempting to retrieve the node locations calculated via:
[n.get_pos() for n in pdp_graph.get_nodes()]
But this only returns:
> [None, None, None, None]
I've tried lots of different methods, graphviz/dot are installed fine - as proven by the image of the layout - how can I extract the positions of the nodes as data from any type of dot-style layout?
There is a way I can do this via the pygraphviz library via networkx, but the installation-overhead restricts me (pygraphviz needs to be recompiled to cinch with the graphviz install) from being able to use that for the target installations where I've less control over the base environments, hence my attempt to use pydotplus, which appears less demanding in terms of install requirements.
How do I retrieve the layout data from a layered graph drawing using this setup (or one similar), such that I can use it elsewhere? I'm looking for x,y values that I can map back to the nodes that they belong to.

I know nothing about your python setup. ( As usual with python it seems awkward and restrictive )
I suggest using Graphviz directly. In particular the 'attributed DOT format' which is a plain text file containing the layout ( resulting from the layout engine ) produced when the engine is run with the command option -Tdot. The text file is easily parsed to get exactly what you need.
Here is a screenshot of the first paragraph of the relevant documentation
The graphviz.org website contains all the additional details you may need.

You are creating an output file of png format. All position data is lost in this process. Instead, create output format="dot". Then, read that back in & modify as desired.

Related

pyosmium - Build a Geojson linestring based on OSM Relation

I have a python script to analyse OSM data, and the objective is to build a GeoJson with specific data issued from OSM relation.
I'm currently focusing on OSM relation that represents 'hiking' trail like this one.
According to the document:
members
(read-only) Ordered list of relation members. See osmium.osm.RelationMemberList.
the relation object has an attribute members which collects all members of the relation.
Hence The first part of the script manages to extract all relation that have a tag sac_scale=hiking and collects all its ways.
The following script is on purpose focusing only on 1 specific relation : r104369
class HikingWaysfromRelations(osmium.SimpleHandler):
def __init__(self):
super(HikingWaysfromRelations, self).__init__()
self.dict = {}
def _getWays(self, elem, elem_type):
# tag
if 'sac_scale' in elem.tags and elem.tags['sac_scale']=='hiking' and elem.id==104369:
list=[]
for mem in elem.members:
if mem.type=="w":
list.append(str(mem.ref))
self.dict["r"+str(elem.id)]=list
else:
pass
def relation(self,r):
self._getWays(r, "relation")
ml = HikingWaysfromRelations()
ml.apply_file('../pbf/new-zealand.osm.pbf')
The result is a dictionary containing the expected relation as the only key, and its ways:
{"r104369": ["191668175", "765285136", "765285135", "765285138", "765285139", "191668225", "765542429", "765542430", "765542432", "765542431", "765542435", "765542436", "765542434", "765542433", "765542437", "765542438", "765542439", "765542440", "765542441", "765542442", "765548983", "271277386", "765548985", "765548984", "684295241", "684295239", "464603363", "464603364", "464607430", "299788481", "178920047", "155711655", "155711646", "684294192", "259362037", "684294189", "259362038", "259362041", "259362036", "259362043", "259362039", "259362040"]}
Now the question is: How to build a GeoJson containing a single Feature MultiLineString that connects all those ways and rebuild the expected hiking trail?
Based on what I've found on the net, I should re-run a simpleHander on the full .pbf file, and each time I encounter a way I'm looking for - based on the values of the above dictionary - I could reconstruct a LineString with:
import shapely.wkb as wkblib
wkbfab = osmium.geom.WKBFactory()
def getRelationGeometry(elem):
wkb=wkbfab.create_linestring(elem)
return wkb
The issue is that it looks like some ways have only 1 node, hence triggering following error:
RuntimeError: need at least two points for linestring (way_id=155711655)
So what would be the solution to re-build a GeoJson feature - multiLineString - of multiple ways, to be able to plot the result on https://geojson.io/#map=2/20.0/0.0 ?
How for instance openstreetmap manages to re-build the track of a relation when I hit link if not by connecting all nodes (from all ways) issued from the relation ?
Thanks a lot for your help
I know there is way with bash, where you first filter the initial pbf by keeping only relation with the tag sac_scale=hiking, and then transforming this filtered result to GeoJson - but I really want to be able to generate the same with python to understand how OSM data are stored. I just can't figure out an easy way to do so, knowing pyosmium is equivalent (supposedly) to osmium, I believe there should be an easy way there too
osmium export output/output_food-drinks.pbf -f geojson

Looking at the way with the id shown in the error in your post (155711655), it has two nodes, not one. Visible here as of the time of this answer.
Knowing that, I can think of two reasons why you would get that error:
You're not passing in the argument location=True to the apply_file method as suggested by the documentation:
Because of the way that OSM data is structured, osmium needs to internally cache node geometries, when the handler wants to process the geometries of ways and areas. The apply_file() method cannot deduce by itself if this cache is needed. Therefore locations need to be explicitly enabled by setting the locations parameter to True:
h.apply_file("test.osm.pbf", locations=True, idx='flex_mem')
Looking at your code above, the apply_file method only has the input file as an argument, so I think this is likely your problem.
The way may be referencing a node that is missing in your pbf extract. This is simple to verify with the osmium cli tool:
osmium check-refs <your pbf file>
This is the result I get from running that on a valid pbf of my own
There are 6702885 nodes, 431747 ways, and 2474 relations in this file.
Nodes in ways missing: 0
Note the Nodes in ways missing: 0.

Python friends network visualization

I have hundreds of lists (each list corresponds to 1 person). Each list contains 100 strings, which are the 100 friends of that person.
I want to 3D visualize this people network based on the number of common friends they have. Considering any 2 lists, the more same strings they have, the closer they should appear together in this 3D graph. I wanted to show each list as a dot on the 3D graph without nodes/connections between the dots.
For brevity, I have included only 3 people here.
person1 = ['mike', 'alex', 'arker','locke','dave','david','ross','rachel','anna','ann','darl','carl','karle']
person2 = ['mika', 'adlex', 'parker','ocke','ave','david','rosse','rachel','anna','ann','darla','carla','karle']
person3 = ['mika', 'alex', 'parker','ocke','ave','david','rosse','ross','anna','ann','darla','carla','karle', 'sasha', 'daria']

Gephi Setup steps:
Install Gephi and then start it
You probably want to upgrade all the plugins now, see the button in the lower right corner.
Now create a new project.
Make sure the current workspace is Workspace1
Enable Graph Streaming plugin
In Streaming tab that then appears configure server to use http and port 8080
start the server (it will then have a green dot underneath it instead of a red dot).
Python steps:
install gephistreamer package (pip install gephistreamer)
Copy the following python cod to something like friends.py:
from gephistreamer import graph
from gephistreamer import streamer
import random as rn
stream = streamer.Streamer(streamer.GephiWS(hostname="localhost",port=8080,workspace="workspace1"))
szfak = 100 # this scales up everything - somehow it is needed
cdfak = 3000
nodedict = {}
def addfnode(fname):
# grab the node out of the dictionary if it is there, otherwise make a newone
if (fname in nodedict):
nnode = nodedict[fname]
else:
nnode = graph.Node(fname,size=szfak,x=cdfak*rn.random(),y=cdfak*rn.random(),color="#8080ff",type="f")
nodedict[fname] = nnode # new node into the dictionary
return nnode
def addnodes(pname,fnodenamelist):
pnode = graph.Node(pname,size=szfak,x=cdfak*rn.random(),y=cdfak*rn.random(),color="#ff8080",type="p")
stream.add_node(pnode)
for fname in fnodenamelist:
print(pname+"-"+fname)
fnode = addfnode(fname)
stream.add_node(fnode)
pfedge = graph.Edge(pnode,fnode,weight=rn.random())
stream.add_edge(pfedge)
person1friends = ['mike','alex','arker','locke','dave','david','ross','rachel','anna','ann','darl','carl','karle']
person2friends = ['mika','adlex','parker','ocke','ave','david','rosse','rachel','anna','ann','darla','carla','karle']
person3friends = ['mika','alex','parker','ocke','ave','david','rosse','ross','anna','ann','darla','carla','karle','sasha','daria']
addnodes("p1",person1friends)
addnodes("p2",person2friends)
addnodes("p3",person3friends)
Run it with the command python friends.py
You should see all the nodes appear. There are then lots of ways you can lay it out to make it look better, I am using the Force Atlas layouter here and you can see the parameters I am using on the left.
Some notes:
you can get the labels to show or disappear by clicking on the T on the bottom status/control bar.
View the data in the nodes and edges by opening Window/Data Table.
It is a very rich program, there are more options than you can shake a stick at.
You can set more properties on your nodes and edges in the python code and then they will show up in the data table view and can be used to filter, etc.
You want to pay attention to that update button in the bottom right corner of Gephi, there are a lot of bugs to fix.
This will get you started (as you asked), but for your particular problem:
you will also need to calculate weights for your persons (the "p" nodes), and link them to each other with those weights
Then you need to find a layouter and paramters that positions those nodes the way you want them based on the new weights.
So you don't really need to show the type="f" nodes, you need just the "p" nodes.
The weight between to "p" nodes should be based on the intersection of the sets of the friend names.
There are also Gephi plugins that can then display this in 3D, but that is actually a completely separate issue, you probably want to get it working in 2D first.
This is running on Windows 10 using Anaconda 4.4.1 and Python 3.5.2 and Gephi 0.9.1.

Django finding paths between two vertexes in a graph

This is mostly a logical question, but the context is done in Django.
In our Database we have Vertex and Line Classes, these form a (neural)network, but it is unordered and I can't change it, it's a Legacy Database
class Vertex(models.Model)
code = models.AutoField(primary_key=True)
lines = models.ManyToManyField('Line', through='Vertex_Line')
class Line(models.Model)
code = models.AutoField(primary_key=True)
class Vertex_Line(models.Model)
line = models.ForeignKey(Line, on_delete=models.CASCADE)
vertex = models.ForeignKey(Vertex, on_delete=models.CASCADE)
Now, in the application, the user will be able to visually select TWO vertexes (the green circles below)
The javascript will then send the pk of these two Vertexes to Django, and it has to find the Line classes that satisfy a route between them, in this case, the following 4 red Lines :
Business Logic:
A Vertex can have 1-4 Lines related to it
A Line can have 1-2 Vertexes related to it
There will only be one possible route between two Vertexes
What I have so far:
I understand that the answer probably includes recursion
The path must be found by trying every path from one Vertex untill the other is find, it can't be directly found
Since there are four and three-way junctions, all the routes being tried must be saved throughout the recursion(unsure of this one)
I know the basic logic is looping through all the lines of each Vertex, and then get the other Vertex of these lines, and keep walking recursively, but I really don't know where to start on this one.
This is as far as I could get, but it probably does not help (views.py) :
def findRoute(request):
data = json.loads(request.body.decode("utf-8"))
v1 = Vertex.objects.get(pk=data.get('v1_pk'))
v2 = Vertex.objects.get(pk=data.get('v2_pk'))
lines = v1.lines.all()
routes = []
for line in lines:
starting_line = line
#Trying a new route
this_route_index = len(routes)
routes[this_route_index] = [starting_line.pk]
other_vertex = line.vertex__set.all().exclude(pk=v1.pk)
#There are cases with dead-ends
if other_vertex.length > 0:
#Mind block...

As you has pointed, this is not a Django/Python related question, but a logical/algorithmic matter.
To find paths between two vertexes in a graph you can use lot of algorithms: Dijkstra, A*, DFS, BFS, Floyd–Warshall etc.. You can choose depending on what you need: shortest/minimum path, all paths...
How to implement this in Django? I suggest to don't apply the algorithm over the models itself, since this could be expensive (in term of time, db queries, etc...) specially for large graphs; instead, I'd rather to map the graph in an in-memory data structure and execute the algorithm over it.
You can take a look to this Networkx, which is a very complete (data structure + algorithms) and well documented library; python-graph, which provides a suitable data structure and a whole set of important algorithms (including some of the mentioned above). More options at Python Graph Library

how to iterate over all the objects in a PDF page and check which ones are text objects?

I want to iterate over all the objects in a page of a pdf using pypdf.
I also want to check that what is the type of the object, whether it is text or graphics.
A code snippet would be a great help.
Thanks a lot

I think that PyPDF is not the correct tool for the job. What you need is to parse the page itself (for which PyPDF has limited support, see the API documentation), and then to be able to save the results in another PDF object after changing some of the objects.
You might decompress the PDF using pdftk, and this would allow you to use pdfrw.
However, from what you write,
My ultimate goal is to color each text object differently.
a "text object" may be a quite complex object made up of (e.g.) different lines in different paragraphs. This might be, and you might see it as, a single entity. In this entity there might be already be several different text-color commands.
For example you might have a single stream with this sequence of text (this is written in "internal" language):
12.84 0 Td(S)Tj
0.08736 Tc
9 0 Td(e)Tj
0.06816 Tc
0.5 g
7.55999 0 Td(qu)Tj
0.08736 Tc
1 g
16.5599 0 Td(e)Tj
0.06816 Tc
7.55999 0 Td(n)Tj
0.08736 Tc
8.27996 0 Td(c)Tj
-0.03264 Tc
0.13632 Tw
7.55999 0 Td(e )Tj
0.06816 Tc
0 Tw
This might write "Sequence". It's actually made up of seven text subobjects, and there is no library I know of that can "decrypt" the stream into its component subobjects, much less assign to them the proper attributes (which in PDF descend from graphics state, while in any hierarchical structure such as XML would probably be associated to the single node, maybe through inheritance).
More: the stream might include non-text commands (e.g. lines). Then changing the "text" stroking color would actually change also non textual objects' color.
A library should provide you a level of detail access similar to that achieved by directly reading the text stream; so doing this through a library seems unlikely.
Since this is word processing work, you might look into the possibility of converting PDF to OpenOffice (using the PDF Import extension), manipulating it through OOo python, then exporting it back to PDF from within OpenOffice itself.
Beware, however, for there be dragons: the documentation is sketchy, and the interface is sometimes unstable. Accessing "text" might not be practical (the more so, since text will be available to you only on a line by line basis).
Another possibility (again, not for the faint of heart) is to decode the PDF yourself. Start by getting it in uncompressed format through pdftk. This will yield a header followed by a stream of objects in the form
INDEX R obj
<<
COMMANDS OR DATA
>>
[ stream
STREAM OF TEXT
endstream ]
endobj
You can read the stream, and for each object:
If COMMANDS OR DATA is only /Length length, it is likely a text stream. Else GOTO 3.
Parse the object (see below). If length changes, remember to update /Length appropriately.
Note the current output file offset, save it in XREF[i] ("reference offset for the i-th object"), and save it to the output file.
At the end of objects you will find a XREF object, wherein each object is indicated with the file offset at which it resides. These offsets (10-digits numbers) will have to be rewritten according to the new offsets you saved in XREF. The start of this object shall go into the startxref at the end of the PDF file.
(To debug, start by writing a routine that copies all objects without modifications. It must recalculate xrefs and offsets, and still yield a PDF object identical to the original).
The PDF thus obtained can be recompressed by pdftk to save space.
As regards the PDF textual object parsing, you basically check it line by line looking for text output commands (See PDF Reference 5.3.2). Mostly the command you'll see will be Tj:
9.95999 0 Td(Hello, world)Tj
and color changing commands (see #4.5.1; again the most used are g and rg.)
1 g # Sets color to black (1 in colorspace Gray)
1 0 0 rg # Sets color to red (1,0,0 in colorspace RGB)
You will then keep track of whatever color we're using, and might for example include each Tj command between a couple of RG commands of your choosing - one that sets your text color, one that restores the original. This way you will be sure that the graphic state does not "spill" to any nearby objects, lines, etc.; it will increase the object Length and also make the resulting PDF a little bit slower (but not very much. You might not even notice).

PDF structure is very complex. Another way is to export text than parse it if you want only text.
iterate on each page and use extractText on it

ArcMap Data Driven Pages Dynamic Feature Labels

I am trying to work out a way to change between two sets of labels on a map. I have a map with zip codes that are labeled and I want to be able to output two maps: one with the zip code label (ZIP) and one with a value from a field I have joined to the data (called chrlabel). The goal is to have one map showing data for each zip code, and a second map giving the zip code as reference.
My initial attempt which I can't get working looks like this:
1) I added a second data frame to my map and add a new layer that contains two polygons with names "zip" and "chrlabel".
2) I use this frame to enable data driven pages and then I hide it behind the primary frame (I don't want to see those polygons, I just want to use them to control the data driven pages).
3) In the zip code labels I tried to write a VBScript expression like this pseudo-code:
test = "
If test = "zip" then
label = ZIP
else
label = CHRLABEL
endif
This does not work because the dynamic text does not resolve to the page name in the VBScript.
Is there some way to call the page name in VBScript so that I can make this work?
If not, is there another way to do this?
My other thought is to add another field to the layer that gets filled with a one or a zero. Then I could replace the if-then test condition with if NewField = 1.
Then I would just need to write a script that updates all the NewFields for the zipcode features when the data driven page advances to the second page. Is there a way to trigger a script (python or other) when a data driven page changes?
Thanks

8 months too late, but for posterity...
You're making things hard on yourself - it would be much easier to set up a duplicate layer and use different layers, then adjust layer visibility. I'm not familiar with VBScript for this sort of thing, but in Python (using ESRI's library) it would look something like so [python 2.6, ArcMap 10 - sample only, haven't debugged this but I do similar things quite often]:
from arcpy import mapping
## Load the map from disk
mxdFilePath = "C:\\GIS_Maps_Folder\\MyMap.mxd"
mapDoc = mapping.MapDocument(mxdFilePath)
## Load map elements
dataFrame = mapping.ListDataFrames(mapDoc)[0] #assumes you want the first dataframe; you can also search by name
mxdLayers = mapping.ListLayers(dataFrame)
## Adjust layers
for layer in mxdLayers:
if (layer.name == 'zip'):
zip_lyr = layer
elif(layer.name == 'sample_units'):
labels_lyr = layer
## Print zip code map
zip_lyr.visible = True
zip_lyr.showLabels = True
labels_lyr.visible = False
labels_lyr.showLabels = False
zip_path = "C:\\Output_Folder\\Zips.pdf"
mapping.ExportToPDF(mapDoc, zip_path, layers_attributes="NONE", resolution=150)
## Print labels map
zip_lyr.visible = False
zip_lyr.showLabels = False
labels_lyr.visible = True
labels_lyr.showLabels = True
labels_path = "C:\\Output_Folder\\Labels.pdf"
mapping.ExportToPDF(mapDoc, labels_path, layers_attributes="NONE", resolution=150)
## Combine files (if desired)
pdfDoc = mapping.PDFDocumentCreate("C:\\Output_Folder\\Output.pdf"")
pdfDoc.appendPages(zip_path)
pdfDoc.appendPages(labels_path)
pdfDoc.saveAndClose()
As far as the Data Driven Pages go, you can export them all at once or in a loop, and adjust whatever you want, although I'm not sure why you'd need to if you use something similar to the above. The ESRI documentation and examples are actually quite good on this. (You should be able to get to all the other Python documentation pretty easily from that page.)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.