Extracting edge information from network in python

Extracting edge information from network in python - python

i have the following network:
net = nx.Graph()
node_list = ["Gur","Qing","Samantha","Jorge","Lakshmi","Jack","John","Jill"]
edge_list = [("Gur","Qing",{"source":"work"}),
("Gur","Jorge", {"source":"family"}),
("Samantha","Qing", {"source":"family"}),
("Jack","Qing", {"source":"work"}),
("Jorge","Lakshmi", {"source":"work"}),
("Jorge","Samantha",{"source":"family"}),
("Samantha","John", {"source":"family"}),
("Lakshmi","Jack", {"source":"family"}),
("Jack","Jill", {"source":"charity"}),
("Jill","John",{"source":"family"})]
net.add_nodes_from(nodes)
net.add_edges_from(edges)
In this network every person is a node and in this nodes are all connected with each other based in a type of relationship. The relationships that connect the nodes are the edges in this case.
What i need to do is extract the relationship information contained in the edges in order to create a function that given a person name and a relationship type, tells to which other people is connected based on the the specified relationship type.
I'm using the networkx package in python to perform this task. Since i'm totally new to networks this confuses me a bit so i will appreciate any suggestions on this.
Thanks in advance

What I would do is to create a new graph only containing the edges that match the given "source", e.g. for "family":
family = nx.Graph([(u,v,d) for u,v,d in net.edges(data=True) if d["source"]=="family"])
You can then use
list(nx.bfs_tree(family, "Gur"))
To get the complete family of Gur

Related

Retrieving node locations from pydotplus (or any layered graph drawing engine)

I'm preparing a layered graph drawing using a dataframe containing node data:
type label
0 Class Insurance Product
1 Class Person
2 Class Address
3 Class Insurance Policy
And another containing relationship data:
froml tol rel fromcard tocard
0 Insurance Policy Insurance Product ConveysProduct One One
1 Person Insurance Policy hasPolicy One Many
2 Person Address ResidesAt None None
I populate a pydotplus dot graph with the content, which I can then use to generate a rendering:
pdp_graph = pydotplus.graphviz.Dot(graph_name="pdp_graph", graph_type='digraph', prog="dot")
for i,e in b_rels_df.iterrows():
edge = pydotplus.graphviz.Edge(src=e['froml'], dst=e['tol'], label=e['rel'])#, set_fromcard=e['fromcard'], set_tocard=e['tocard'])
pdp_graph.add_edge(edge)
for i,n in ents_df.iterrows():
node = pydotplus.graphviz.Node(name=n['label'], set_type=n['type'], set_label=n['label'])
pdp_graph.add_node(node)
png = pdp_graph.create_png()
display(Image(png))
So far so good - but now I want to retrieve the node positions for use in my own interactive layout (the png is a nice example/diagram, but I want to build upon it), so am attempting to retrieve the node locations calculated via:
[n.get_pos() for n in pdp_graph.get_nodes()]
But this only returns:
> [None, None, None, None]
I've tried lots of different methods, graphviz/dot are installed fine - as proven by the image of the layout - how can I extract the positions of the nodes as data from any type of dot-style layout?
There is a way I can do this via the pygraphviz library via networkx, but the installation-overhead restricts me (pygraphviz needs to be recompiled to cinch with the graphviz install) from being able to use that for the target installations where I've less control over the base environments, hence my attempt to use pydotplus, which appears less demanding in terms of install requirements.
How do I retrieve the layout data from a layered graph drawing using this setup (or one similar), such that I can use it elsewhere? I'm looking for x,y values that I can map back to the nodes that they belong to.

I know nothing about your python setup. ( As usual with python it seems awkward and restrictive )
I suggest using Graphviz directly. In particular the 'attributed DOT format' which is a plain text file containing the layout ( resulting from the layout engine ) produced when the engine is run with the command option -Tdot. The text file is easily parsed to get exactly what you need.
Here is a screenshot of the first paragraph of the relevant documentation
The graphviz.org website contains all the additional details you may need.

You are creating an output file of png format. All position data is lost in this process. Instead, create output format="dot". Then, read that back in & modify as desired.

pyosmium - Build a Geojson linestring based on OSM Relation

I have a python script to analyse OSM data, and the objective is to build a GeoJson with specific data issued from OSM relation.
I'm currently focusing on OSM relation that represents 'hiking' trail like this one.
According to the document:
members
(read-only) Ordered list of relation members. See osmium.osm.RelationMemberList.
the relation object has an attribute members which collects all members of the relation.
Hence The first part of the script manages to extract all relation that have a tag sac_scale=hiking and collects all its ways.
The following script is on purpose focusing only on 1 specific relation : r104369
class HikingWaysfromRelations(osmium.SimpleHandler):
def __init__(self):
super(HikingWaysfromRelations, self).__init__()
self.dict = {}
def _getWays(self, elem, elem_type):
# tag
if 'sac_scale' in elem.tags and elem.tags['sac_scale']=='hiking' and elem.id==104369:
list=[]
for mem in elem.members:
if mem.type=="w":
list.append(str(mem.ref))
self.dict["r"+str(elem.id)]=list
else:
pass
def relation(self,r):
self._getWays(r, "relation")
ml = HikingWaysfromRelations()
ml.apply_file('../pbf/new-zealand.osm.pbf')
The result is a dictionary containing the expected relation as the only key, and its ways:
{"r104369": ["191668175", "765285136", "765285135", "765285138", "765285139", "191668225", "765542429", "765542430", "765542432", "765542431", "765542435", "765542436", "765542434", "765542433", "765542437", "765542438", "765542439", "765542440", "765542441", "765542442", "765548983", "271277386", "765548985", "765548984", "684295241", "684295239", "464603363", "464603364", "464607430", "299788481", "178920047", "155711655", "155711646", "684294192", "259362037", "684294189", "259362038", "259362041", "259362036", "259362043", "259362039", "259362040"]}
Now the question is: How to build a GeoJson containing a single Feature MultiLineString that connects all those ways and rebuild the expected hiking trail?
Based on what I've found on the net, I should re-run a simpleHander on the full .pbf file, and each time I encounter a way I'm looking for - based on the values of the above dictionary - I could reconstruct a LineString with:
import shapely.wkb as wkblib
wkbfab = osmium.geom.WKBFactory()
def getRelationGeometry(elem):
wkb=wkbfab.create_linestring(elem)
return wkb
The issue is that it looks like some ways have only 1 node, hence triggering following error:
RuntimeError: need at least two points for linestring (way_id=155711655)
So what would be the solution to re-build a GeoJson feature - multiLineString - of multiple ways, to be able to plot the result on https://geojson.io/#map=2/20.0/0.0 ?
How for instance openstreetmap manages to re-build the track of a relation when I hit link if not by connecting all nodes (from all ways) issued from the relation ?
Thanks a lot for your help
I know there is way with bash, where you first filter the initial pbf by keeping only relation with the tag sac_scale=hiking, and then transforming this filtered result to GeoJson - but I really want to be able to generate the same with python to understand how OSM data are stored. I just can't figure out an easy way to do so, knowing pyosmium is equivalent (supposedly) to osmium, I believe there should be an easy way there too
osmium export output/output_food-drinks.pbf -f geojson

Looking at the way with the id shown in the error in your post (155711655), it has two nodes, not one. Visible here as of the time of this answer.
Knowing that, I can think of two reasons why you would get that error:
You're not passing in the argument location=True to the apply_file method as suggested by the documentation:
Because of the way that OSM data is structured, osmium needs to internally cache node geometries, when the handler wants to process the geometries of ways and areas. The apply_file() method cannot deduce by itself if this cache is needed. Therefore locations need to be explicitly enabled by setting the locations parameter to True:
h.apply_file("test.osm.pbf", locations=True, idx='flex_mem')
Looking at your code above, the apply_file method only has the input file as an argument, so I think this is likely your problem.
The way may be referencing a node that is missing in your pbf extract. This is simple to verify with the osmium cli tool:
osmium check-refs <your pbf file>
This is the result I get from running that on a valid pbf of my own
There are 6702885 nodes, 431747 ways, and 2474 relations in this file.
Nodes in ways missing: 0
Note the Nodes in ways missing: 0.

How do I pull specific property values from an injected object when adding a batch of edges in a Gremlin/TinkerPop traversal?

I want to add batches of edges to a JanusGraph db that already contains nodes. I want my edges to support setting dynamic/optional properties.
I've cobbled together the following traversal (based on this SO question) that I believe illustrates what I want to do:
1..inject() a batch of edges
2. Pull to/from vertex ids from the objects in the injected edge batch
3. Set all fields in edge batch objects as edge properties with .sideEffect()
uuid_1 = "89079f8fa3ee849a61a45e0b3e6d28cd"
uuid_2 = "00a9ae430dc812f483b0660212264190"
edge_batch = [
{
"from_uuid": uuid_1,
"to_uuid": uuid_2,
"posted_at": 1650012568000,
"test_property_2": "I was here"
},
{
"from_uuid": uuid_2,
"to_uuid": uuid_1,
"posted_at": 1650012568888,
"test_property_3": "I'M STILL HERE"
}
]
new_edges = (
g
.inject(edge_batch)
.unfold()
.as_("edge_batch")
.V()
.has("uuid", __.select("edge_batch").select("to_uuid"))
.as_("to_v")
.V()
.has("uuid", __.select("edge_batch").select("from_uuid"))
.addE("MY_EDGE_TYPE")
.to("to_v")
.as_("new_edge")
.sideEffect(
__.select("edge_batch")
.unfold()
.as_("kvp")
.select("new_edge")
.property(
__.select("kvp").by(Column.keys), __.select("kvp").by(Column.values)
)
)
.iterate()
)
As written, the code above results in a traversal timeout when the referenced vertices exist. If I replace the first two __.select("edge_batch")... expressions above with references to the uuid_1 and uuid_2 variables, the code works. I think my problem is I just can't figure out how I'm supposed to reference properties of the injected, unfolded edge batch objects.
I'm using gremlin-python v3.6.0, JanusGraph v0.6.1, TinkerPop v3.5.1.

Your code runs just fine with a small graph. Only, the two unfolds from a list with two elements makes your code run four times, unintentionally I guess.
As to why the code does not run on your janusgraph installation:
Be sure uuid is an indexed property if your graph is large
Maybe you were confused by vertices with uuid_1 and uuid_2 being present in the janusgraph cache, because .has("uuid", __.select("edge_batch").select("to_uuid")) and .has("uuid", uuid_1) really do the same.

maya component id data

Does anyone know how to obtain the data in maya called the component ID of the vertices.
I know how to get the vert number but the component ID on the vert is something that changes as the model has been changed.
It seems that there is data in the vertex but just can't find any command to extract it. Any help would be helpful.
I even tried using the maya api but this also just seem to give me the vertices index number and not the actual ID (which is not a sequence as the vertices indexes)
Thanks

try that,
import maya.OpenMaya as om
sel = om.MSelectionList()
om.MGlobal.getActiveSelectionList(sel)
dag = om.MDagPath()
comp = om.MObject()
sel.getDagPath(0, dag, comp)
itr = om.MItMeshFaceVertex(dag, comp)
print '| %-15s| %-15s| %-15s' % ('Face ID', 'Object VertID', 'Face-relative VertId')
while not itr.isDone():
print '| %-15s| %-15s| %-15s' % (itr.faceId(), itr.vertId(), itr.faceVertId())
itr.next()
there are many solutions, i find this one.... src: Link

Maya components do not have persistent identities; the 'vertex id' is just the index of one entry in the vertex table (or the tables for faces, normals, etc). That's why its so easy to mess up a model with construction history if you go 'upstream' and change things that affect the model's component count or topology.
You can attach persistent data to a vertex using the PolyBlindData system, which attaches arbitrary info to faces, vertices or edges. You could attach data to a particular vertex and the data would probably survive, though the same considerations which can mess up things like vert colors or UVs when construction history changes upstream will also mess with blind data.

Follow-up on iterating over a graph using XML minidom

This is a follow-up to the question (Link)
What I intend on doing is using the XML to create a graph using NetworkX. Looking at the DOM structure below, all nodes within the same node should have an edge between them, and all nodes that have attended the same conference should have a node to that conference. To summarize, all authors that worked together on a paper should be connected to each other, and all authors who have attended a particular conference should be connected to that conference.
<conference name="CONF 2009">
<paper>
<author>Yih-Chun Hu(UIUC)</author>
<author>David McGrew(Cisco Systems)</author>
<author>Adrian Perrig(CMU)</author>
<author>Brian Weis(Cisco Systems)</author>
<author>Dan Wendlandt(CMU)</author>
</paper>
<paper>
<author>Dan Wendlandt(CMU)</author>
<author>Ioannis Avramopoulos(Princeton)</author>
<author>David G. Andersen(CMU)</author>
<author>Jennifer Rexford(Princeton)</author>
</paper>
</conference>
I've figured out how to connect authors to conferences, but I'm unsure about how to connect authors to each other. The thing that I'm having difficulty with is how to iterate over the authors that have worked on the same paper and connect them together.
dom = parse(filepath)
conference=dom.getElementsByTagName('conference')
for node in conference:
conf_name=node.getAttribute('name')
print conf_name
G.add_node(conf_name)
#The nodeValue is split in order to get the name of the author
#and to exclude the university they are part of
plist=node.getElementsByTagName('paper')
for p in plist:
author=str(p.childNodes[0].nodeValue)
author= author.split("(")
#Figure out a way to create edges between authors in the same <paper> </paper>
alist=node.getElementsByTagName('author')
for a in alist:
authortext= str(a.childNodes[0].nodeValue).split("(")
if authortext[0] in dict:
edgeQuantity=dict[authortext[0]]
edgeQuantity+=1
dict[authortext[0]]=edgeQuantity
G.add_edge(authortext[0],conf_name)
#Otherwise, add it to the dictionary and create an edge to the conference.
else:
dict[authortext[0]]= 1
G.add_node(authortext[0])
G.add_edge(authortext[0],conf_name)
i+=1

I'm unsure about how to connect authors to each other.
You need to generate (author, otherauthor) pairs so you can add them as edges. The typical way to do that would be a nested iteration:
for thing in things:
for otherthing in things:
add_edge(thing, otherthing)
This is a naïve implementation that includes self-loops (giving an author an edge connecting himself to himself), which you may or may not want; it also includes both (1,2) and (2,1), which if you're doing an undirected graph is redundant. (In Python 2.6, the built-in permutations generator also does this.) Here's a generator that fixes these things:
def pairs(l):
for i in range(len(l)-1):
for j in range(i+1, len(l)):
yield l[i], l[j]
I've not used NetworkX, but looking at the doc it seems to say you can call add_node on the same node twice (with nothing happening the second time). If so, you can discard the dict you were using to try to keep track of what nodes you'd inserted. Also, it seems to say that if you add an edge to an unknown node, it'll add that node for you automatically. So it should be possible to make the code much shorter:
for conference in dom.getElementsByTagName('conference'):
var conf_name= node.getAttribute('name')
for paper in conference.getElementsByTagName('paper'):
authors= paper.getElementsByTagName('author')
auth_names= [author.firstChild.data.split('(')[0] for author in authors]
# Note author's conference attendance
#
for auth_name in auth_names:
G.add_edge(auth_name, conf_name)
# Note combinations of authors working on same paper
#
for auth_name, other_name in pairs(auth_names):
G.add_edge(auth_name, otherauth_name)

im not entirely sure what you're looking for, but based on your description i threw together a graph which I think encapsulates the relationships you describe.
http://imgur.com/o2HvT.png
i used openfst to do this. i find it much easier to clearly layout the graphical relationships before plunging into the code for something like this.
also, do you actually need to generate an explicit edge between authors? this seems like a traversal issue.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting edge information from network in python - python

What I would do is to create a new graph only containing the edges that match the given "source", e.g. for "family": family = nx.Graph([(u,v,d) for u,v,d in net.edges(data=True) if d["source"]=="family"]) You can then use list(nx.bfs_tree(family, "Gur")) To get the complete family of Gur

Related

Retrieving node locations from pydotplus (or any layered graph drawing engine)

pyosmium - Build a Geojson linestring based on OSM Relation

How do I pull specific property values from an injected object when adding a batch of edges in a Gremlin/TinkerPop traversal?

maya component id data

Follow-up on iterating over a graph using XML minidom

Categories

Resources