Adjusting the width of edges in a python graphviz graph - python

I am trying to visualize a transition probability matrix for a finite Markov chain using the python interface to graphviz. I want the states of the Markov chain to be nodes in the graph, and I want the edges of the graph to have width proportional to the conditional probability of a transition between states. I.e. I want thick edges drawn for edges with big weights and skinny ones for edges with small weights.
The discussion at (directed weighted graph from pandas dataframe)
is similar to what I want, but it would present transition probability information as textual labels rather than by edge width, which would lead to an unhelpful and difficult-to-read graph.
I am happy to consider tools other than graphviz for this task.
Here is the class I'm trying to build:
import graphviz
import matplotlib.pyplot as plt
import numpy as np
class MarkovViz:
"""
Visualize the transition probability matrix of a Markov chain as a directed
graph, where the width of an edge is proportional to the transition
probability between two states.
"""
def __init__(self, transition_probability_matrix=None):
self._graph = None
if transition_probability_matrix is not None:
self.build_from_matrix(transition_probability_matrix)
def build_from_matrix(self, trans, labels=None):
"""
Args:
trans: A pd.DataFrame or 2D np.array. A square matrix containing the
conditional probabability of a transition from the level
represented by the row to the level represented by the column.
Each row sums to 1.
labels: A list-like sequence of labels to use for the rows and
columns of 'trans'. If trans is a pd.DataFrame or similar then
this entry can be None and labels will be taken from the column
names of 'trans'.
Effects:
self._graph is created as a directed graph, and populated with nodes
and edges, with edge weights taken from 'trans'.
"""
if labels is None and hasattr(trans, "columns"):
labels = list(trans.columns)
index = list(trans.index)
if labels != index:
raise Exception("Mismatch between index and columns of "
"the transition probability matrix.")
trans = trans.values
trans = np.array(trans)
self._graph = graphviz.Digraph("MyGraph")
dim = trans.shape[0]
if trans.shape[1] != dim:
raise Exception("Matrix must be symmetric")
for i in range(dim):
for j in range(dim):
if trans[i, j] > 0:
self._graph.edge(labels[i], labels[j], weight=trans[i, j])
def plot(self, ax: plt.Axes):
self._graph.view()
I would initialize an example object using a data frame that looks something like
foo bar baz
foo 0.5 0.5 0
bar 0.0 0.0 1
baz 1.0 0.0 0
I'm running into the following error
File "<stdin>", line 1, in <module>
File "/.../markov/markovviz.py", line 16, in __init__
self.build_from_matrix(transition_probability_matrix)
File "/.../markov/markovviz.py", line 53, in build_from_matrix
self._graph.edge(labels[i], labels[j], weight=trans[i, j])
File "/.../graphviz/dot.py", line 153, in edge
attr_list = self._attr_list(label, attrs, _attributes)
File "/.../graphviz/lang.py", line 139, in attr_list
content = a_list(label, kwargs, attributes)
File "/.../graphviz/lang.py", line 112, in a_list
for k, v in tools.mapping_items(kwargs) if v is not None]
File "/.../graphviz/lang.py", line 112, in <listcomp>
for k, v in tools.mapping_items(kwargs) if v is not None]
File ".../graphviz/lang.py", line 73, in quote
if is_html_string(identifier) and not isinstance(identifier, NoHtml):
TypeError: cannot use a string pattern on a bytes-like object
which says to me that the only allowable attributes for an edge are strings or bytes. My questions:
Is it even possible to show the graph I'm trying to build in the python interface to graphviz?
If so, how do I associate numeric weights with the edges?
Once I have the weights attached to the edges, how do I draw the graph?

Your problems stems from the line:
self._graph.edge(labels[i], labels[j], weight=trans[i, j])
The problem here is that dot attributes can only be string values, whereas looking at the rest of your code, it looks as if trans[i, j] will probably return a float value.
The simplest solution is probably to just call str():
self._graph.edge(labels[i], labels[j], weight=str(trans[i, j]))
Here's a test that reproduces the problem and the solution:
>>> import graphviz
>>> g = graphviz.Digraph()
>>> g.edge('a', 'b', weight=1.5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/lars/.local/share/virtualenvs/python-LD_ZK5QN/lib/python3.9/site-packages/graphviz/dot.py", line 153, in edge
attr_list = self._attr_list(label, attrs, _attributes)
File "/home/lars/.local/share/virtualenvs/python-LD_ZK5QN/lib/python3.9/site-packages/graphviz/lang.py", line 139, in attr_list
content = a_list(label, kwargs, attributes)
File "/home/lars/.local/share/virtualenvs/python-LD_ZK5QN/lib/python3.9/site-packages/graphviz/lang.py", line 111, in a_list
items = [f'{quote(k)}={quote(v)}'
File "/home/lars/.local/share/virtualenvs/python-LD_ZK5QN/lib/python3.9/site-packages/graphviz/lang.py", line 111, in <listcomp>
items = [f'{quote(k)}={quote(v)}'
File "/home/lars/.local/share/virtualenvs/python-LD_ZK5QN/lib/python3.9/site-packages/graphviz/lang.py", line 73, in quote
if is_html_string(identifier) and not isinstance(identifier, NoHtml):
TypeError: expected string or bytes-like object
>>> g.edge('a', 'b', weight=str(1.5))
>>> print(g)
digraph {
a -> b [weight=1.5]
}
>>>
Once I have the weights attached to the edges, how do I draw the graph?
Take a look at the render and view methods:
>>> help(g.render)
render(filename=None, directory=None, view=False, cleanup=False, format=None, renderer=None, formatter=None, quiet=False, quiet_view=False) method of graphviz.dot.Digraph instance
Save the source to file and render with the Graphviz engine.
[...]
>>> help(g.view)
view(filename=None, directory=None, cleanup=False, quiet=False, quiet_view=False) method of graphviz.dot.Digraph instance
Save the source to file, open the rendered result in a viewer.
[...]

Related

Load a Graph from .osm file using Osmnx/Python

I want to load a graph from XML, i.e. .osm file, using Osmnx Python library.
The .osm file contains roads not connected each other, for example only highway=primary and highway=primary_link of a country's region.
I use the parameter retain_all to avoid discarding all the roads, since
retain_all: if True, return the entire graph even if it is not connected. otherwise, retain only the largest weakly connected component.
I use this instruction:
graph = ox.graph_from_xml('temp.osm', retain_all=True)
But I get the following error
AttributeError: 'float' object has no attribute 'deg2rad'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\code.py", line 37, in <module>
graph = ox.graph_from_xml('temp.osm', retain_all=True)
File "D:\Python\lib\site-packages\osmnx\graph.py", line 518, in graph_from_xml
G = _create_graph(response_jsons, bidirectional=bidirectional, retain_all=retain_all)
File "D:\Python\lib\site-packages\osmnx\graph.py", line 587, in _create_graph
G = distance.add_edge_lengths(G)
File "D:\Python\lib\site-packages\osmnx\distance.py", line 154, in add_edge_lengths
dists = great_circle_vec(c[:, 0], c[:, 1], c[:, 2], c[:, 3]).round(precision)
File "D:\Python\lib\site-packages\osmnx\distance.py", line 60, in great_circle_vec
y1 = np.deg2rad(lat1)
TypeError: loop of ufunc does not support argument 0 of type float which has no callable deg2rad method
If I remove retain_all parameter, of course, the error does not occurr but the graph will contain only one primary road.
How can I keep all the roads even if not connected in the map?
I forgot to post my solution. I solved using another Python library, called Pyrosm:
osm = OSM('temp.pbf')
nodes, edges = osm.get_network(nodes=True, network_type='driving')
graph = osm.to_graph(nodes, edges, graph_type='networkx', retain_all=True)

Having Trouble with numpy.histogramdd

I am trying to create N-Dimensional histogram from 2D array which has complex values. I want to count the number of occurrences in real and imaginary parts of the array given the bins and store the result in a 3D array. It only runs for the first iteration when I hard code i=0 and remove the for loop. I have never used histograms in python before and I just cannot understand the error. The code is given below.
xsoft is defined as 2d array of complex type and I somehow compute bnd_edges by finding max, min values from xsoft and create edges to be given as bins.
xsoft = np.empty((M, MAX,), dtype=complex) # e.g has dims 4*100
xsoft[:] = np.nan
edges = np.linspace(-bnd_edges, bnd_edges, numbin) #numbin=10
pSOFT = np.empty((len(edges)-1, M, len(edges)-1)) # len(edges)= 10
pSOFT[:] = np.nan
for i in range(M):
pSOFT[:, i, :], edges = np.histogramdd((xsoft[i, :].real, xsoft[i, :].imag), bins=(edges, edges))
The code results in the following error
Traceback (most recent call last):
File " ", line 194, in <module>
pSOFT[:, i, :], edges = np.histogramdd((xsoft[i, :].real, xsoft[i, :].imag), bins=(edges, edges))
File "<__array_function__ internals>", line 5, in histogramdd
File " " line 1066, in histogramdd
raise ValueError(
ValueError: `bins[0]` must be a scalar or 1d array
Process finished with exit code 1
You are getting this error because you are overriding the original definition of edges with the second return value of histogramdd.
Replace the last line in your code with this:
pSOFT[:, i, :], edges_i = np.histogramdd((xsoft[i, :].real, xsoft[i, :].imag), bins=(edges, edges))

OSMNX KeyError: 'x' when trying to get_nearest_nodes()

I currently have a process where I
Download Open Street data using ox.geocode_to_gdf()
Get the Geopackage edges and nodes using and use gpd.overlay() to edit the edges and nodes based on another map
Convert edited edges back to OSMNX as a graph using ox.graph_from_gdfs()
At this stage, I have a graph (sample here)
where I would like to use to estimate the shortest path among some points. I have the Easting and Northing of these points and I am trying to get he nearest nodes to these cooridnates using
nodes_flood = ox.distance.get_nearest_nodes(g_post_200_Cur_centre, Easting, Northing)
so that I can run nx.shortest_path() with the graph and destination nodes. But I get an error message
File "<ipython-input-74-ed9774662a83>", line 1, in <module>
nodes_flood = ox.distance.get_nearest_nodes(g_post_200_Cur_centre, Easting, Northing)
File "/Users/opt/anaconda3/envs/Kinshasa/lib/python3.7/site-packages/osmnx/distance.py", line 256, in get_nearest_nodes
nn = [get_nearest_node(G, (y, x), method="haversine") for x, y in zip(X, Y)]
File "/Users/opt/anaconda3/envs/Kinshasa/lib/python3.7/site-packages/osmnx/distance.py", line 256, in <listcomp>
nn = [get_nearest_node(G, (y, x), method="haversine") for x, y in zip(X, Y)]
File "/Users/opt/anaconda3/envs/Kinshasa/lib/python3.7/site-packages/osmnx/distance.py", line 138, in get_nearest_node
df = pd.DataFrame(coords, columns=["node", "x", "y"]).set_index("node")
File "/Users/opt/anaconda3/envs/Kinshasa/lib/python3.7/site-packages/pandas/core/frame.py", line 563, in __init__
data = list(data)
File "/Users/opt/anaconda3/envs/Kinshasa/lib/python3.7/site-packages/osmnx/distance.py", line 137, in <genexpr>
coords = ((n, d["x"], d["y"]) for n, d in G.nodes(data=True))
KeyError: 'x'
Not sure what is causing this. I have OSMNX v1.0.
I fixed the error by passing the method argument to the get_nearest_nodes(). If you chose the 'kdtree' as the method, you will not have the error.
nodes_flood = ox.distance.get_nearest_nodes(g_post_200_Cur_centre, Easting, Northing, method='kdtree')
However, I still don't know the reason for the error if I was not passing the method name.

Export graph to graphml with node positions using NetworkX

I'm using NetworkX 1.9.1.
I have a graph that I need to organize with positions and I then export to graphml format.
I've tried code in this question. It does not work, here is my example
import networkx as nx
import matplotlib.pyplot as plt
G = nx.read_graphml("colored.graphml")
pos=nx.spring_layout(G) # an example of quick positioning
nx.set_node_attributes(G, 'pos', pos)
nx.write_graphml(G, "g.graphml")
nx.draw_networkx(G, pos)
plt.savefig("g.pdf")
Here are the errors I get, the problem is how positions are saved (graphml does not accept arrays).
C:\Anaconda\python.exe C:/Users/sturaroa/Documents/PycharmProjects/node_labeling_test.py
Traceback (most recent call last):
File "C:/Users/sturaroa/Documents/PycharmProjects/node_labeling_test.py", line 11, in <module>
nx.write_graphml(G, "g.graphml")
File "<string>", line 2, in write_graphml
File "C:\Anaconda\lib\site-packages\networkx\utils\decorators.py", line 220, in _open_file
result = func(*new_args, **kwargs)
File "C:\Anaconda\lib\site-packages\networkx\readwrite\graphml.py", line 82, in write_graphml
writer.add_graph_element(G)
File "C:\Anaconda\lib\site-packages\networkx\readwrite\graphml.py", line 350, in add_graph_element
self.add_nodes(G,graph_element)
File "C:\Anaconda\lib\site-packages\networkx\readwrite\graphml.py", line 307, in add_nodes
self.add_attributes("node", node_element, data, default)
File "C:\Anaconda\lib\site-packages\networkx\readwrite\graphml.py", line 300, in add_attributes
scope=scope, default=default_value)
File "C:\Anaconda\lib\site-packages\networkx\readwrite\graphml.py", line 288, in add_data
'%s as data values.'%element_type)
networkx.exception.NetworkXError: GraphML writer does not support <type 'numpy.ndarray'> as data values.
I'm under the impression that I would be better off defining positions as 2 separate node attributes, x and y, and save them separately, defining a key for each of them in the graphml format, like this.
However, I'm not that familiar with Python, and would like your opinion before I make a mess iterating back and forth.
Thanks.
You are right, GraphML want's simpler attributes (no numpy arrays or lists).
You can set the x and y positions of the nodes as attributes like this
G = nx.path_graph(4)
pos = nx.spring_layout(G)
for node,(x,y) in pos.items():
G.node[node]['x'] = float(x)
G.node[node]['y'] = float(y)
nx.write_graphml(G, "g.graphml")

Python: create multiple boxplots in one pannel

I have been using R for long time and I am recently learning Python.
I would like to create multiple box plots in one panel in Python.
My dataset is in a vector form and a label vector indicates which box plot each element of data corresponds. The example looks like this:
N = 50
data = np.random.lognormal(size=N, mean=1.5, sigma=1.75)
label = np.repeat([1,2,3,4,5],N/5)
From various websites (e.g., matplotlib: Group boxplots), Creating multiple boxplots requires a matrix object input whose column contains samples for one boxplot. So I created a list object based on data and label:
savelist = data[ label == 1]
for i in [2,3,4,5]:
savelist = [savelist, data[ label == i]]
However, the code below gives me an error:
boxplot(savelist)
Traceback (most recent call last):
File "<ipython-input-222-1a55d04981c4>", line 1, in <module>
boxplot(savelist)
File "/Users/yumik091186/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.py", line 2636, in boxplot
meanprops=meanprops, manage_xticks=manage_xticks)
File "/Users/yumik091186/anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.py", line 3045, in boxplot labels=labels)
File "/Users/yumik091186/anaconda/lib/python2.7/site-packages/matplotlib/cbook.py", line 1962, in boxplot_stats
stats['mean'] = np.mean(x)
File "/Users/yumik091186/anaconda/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 2727, in mean
out=out, keepdims=keepdims)
File "/Users/yumik091186/anaconda/lib/python2.7/site-packages/numpy/core/_methods.py", line 66, in _mean
ret = umr_sum(arr, axis, dtype, out, keepdims)
ValueError: operands could not be broadcast together with shapes (2,) (10,)
Can anyone explain what is going on?
You're ending up with a nested list instead of a flat list. Try this instead:
savelist = [data[label == 1]]
for i in [2,3,4,5]:
savelist.append(data[label == i])
And it should work.

Categories