OSMNX KeyError: 'x' when trying to get_nearest_nodes() - python

I currently have a process where I
Download Open Street data using ox.geocode_to_gdf()
Get the Geopackage edges and nodes using and use gpd.overlay() to edit the edges and nodes based on another map
Convert edited edges back to OSMNX as a graph using ox.graph_from_gdfs()
At this stage, I have a graph (sample here)
where I would like to use to estimate the shortest path among some points. I have the Easting and Northing of these points and I am trying to get he nearest nodes to these cooridnates using
nodes_flood = ox.distance.get_nearest_nodes(g_post_200_Cur_centre, Easting, Northing)
so that I can run nx.shortest_path() with the graph and destination nodes. But I get an error message
File "<ipython-input-74-ed9774662a83>", line 1, in <module>
nodes_flood = ox.distance.get_nearest_nodes(g_post_200_Cur_centre, Easting, Northing)
File "/Users/opt/anaconda3/envs/Kinshasa/lib/python3.7/site-packages/osmnx/distance.py", line 256, in get_nearest_nodes
nn = [get_nearest_node(G, (y, x), method="haversine") for x, y in zip(X, Y)]
File "/Users/opt/anaconda3/envs/Kinshasa/lib/python3.7/site-packages/osmnx/distance.py", line 256, in <listcomp>
nn = [get_nearest_node(G, (y, x), method="haversine") for x, y in zip(X, Y)]
File "/Users/opt/anaconda3/envs/Kinshasa/lib/python3.7/site-packages/osmnx/distance.py", line 138, in get_nearest_node
df = pd.DataFrame(coords, columns=["node", "x", "y"]).set_index("node")
File "/Users/opt/anaconda3/envs/Kinshasa/lib/python3.7/site-packages/pandas/core/frame.py", line 563, in __init__
data = list(data)
File "/Users/opt/anaconda3/envs/Kinshasa/lib/python3.7/site-packages/osmnx/distance.py", line 137, in <genexpr>
coords = ((n, d["x"], d["y"]) for n, d in G.nodes(data=True))
KeyError: 'x'
Not sure what is causing this. I have OSMNX v1.0.

I fixed the error by passing the method argument to the get_nearest_nodes(). If you chose the 'kdtree' as the method, you will not have the error.
nodes_flood = ox.distance.get_nearest_nodes(g_post_200_Cur_centre, Easting, Northing, method='kdtree')
However, I still don't know the reason for the error if I was not passing the method name.

Related

Can't get correct input for DBSCAN clustersing

I have a node2vec embedding stored as a .csv file, values are a square symmetric matrix. I have two versions of this, one with node names in the first column and another with node names in the first row. I would like to cluster this data with DBSCAN, but I can't seem to figure out how to get the input right. I tried this:
import numpy as np
import pandas as pd
from sklearn.cluster import DBSCAN
from sklearn import metrics
input_file = "node2vec-labels-on-columns.emb"
# for tab delimited use:
df = pd.read_csv(input_file, header = 0, delimiter = "\t")
# put the original column names in a python list
original_headers = list(df.columns.values)
emb = df.as_matrix()
db = DBSCAN(eps=0.3, min_samples=10).fit(emb)
labels = db.labels_
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
n_noise_ = list(labels).count(-1)
print("Estimated number of clusters: %d" % n_clusters_)
print("Estimated number of noise points: %d" % n_noise_)
This leads to an error:
dbscan.py:14: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
emb = df.as_matrix()
Traceback (most recent call last):
File "dbscan.py", line 15, in <module>
db = DBSCAN(eps=0.3, min_samples=10).fit(emb)
File "C:\Python36\lib\site-packages\sklearn\cluster\_dbscan.py", line 312, in fit
X = self._validate_data(X, accept_sparse='csr')
File "C:\Python36\lib\site-packages\sklearn\base.py", line 420, in _validate_data
X = check_array(X, **check_params)
File "C:\Python36\lib\site-packages\sklearn\utils\validation.py", line 73, in inner_f
return f(**kwargs)
File "C:\Python36\lib\site-packages\sklearn\utils\validation.py", line 646, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "C:\Python36\lib\site-packages\sklearn\utils\validation.py", line 100, in _assert_all_finite
msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
I've tried other input methods that lead to the same error. All the tutorials I can find use datasets imported form sklearn so those are of not help figuring out how to read from a file. Can anyone point me in the right direction?
The error does not come from the fact that you are reading the dataset from a file but on the content of the dataset.
DBSCAN is meant to be used on numerical data. As stated in the error, it does not support NaNs.
If you are willing to cluster strings or labels, you should find some other model.

Load a Graph from .osm file using Osmnx/Python

I want to load a graph from XML, i.e. .osm file, using Osmnx Python library.
The .osm file contains roads not connected each other, for example only highway=primary and highway=primary_link of a country's region.
I use the parameter retain_all to avoid discarding all the roads, since
retain_all: if True, return the entire graph even if it is not connected. otherwise, retain only the largest weakly connected component.
I use this instruction:
graph = ox.graph_from_xml('temp.osm', retain_all=True)
But I get the following error
AttributeError: 'float' object has no attribute 'deg2rad'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\code.py", line 37, in <module>
graph = ox.graph_from_xml('temp.osm', retain_all=True)
File "D:\Python\lib\site-packages\osmnx\graph.py", line 518, in graph_from_xml
G = _create_graph(response_jsons, bidirectional=bidirectional, retain_all=retain_all)
File "D:\Python\lib\site-packages\osmnx\graph.py", line 587, in _create_graph
G = distance.add_edge_lengths(G)
File "D:\Python\lib\site-packages\osmnx\distance.py", line 154, in add_edge_lengths
dists = great_circle_vec(c[:, 0], c[:, 1], c[:, 2], c[:, 3]).round(precision)
File "D:\Python\lib\site-packages\osmnx\distance.py", line 60, in great_circle_vec
y1 = np.deg2rad(lat1)
TypeError: loop of ufunc does not support argument 0 of type float which has no callable deg2rad method
If I remove retain_all parameter, of course, the error does not occurr but the graph will contain only one primary road.
How can I keep all the roads even if not connected in the map?
I forgot to post my solution. I solved using another Python library, called Pyrosm:
osm = OSM('temp.pbf')
nodes, edges = osm.get_network(nodes=True, network_type='driving')
graph = osm.to_graph(nodes, edges, graph_type='networkx', retain_all=True)

Adjusting the width of edges in a python graphviz graph

I am trying to visualize a transition probability matrix for a finite Markov chain using the python interface to graphviz. I want the states of the Markov chain to be nodes in the graph, and I want the edges of the graph to have width proportional to the conditional probability of a transition between states. I.e. I want thick edges drawn for edges with big weights and skinny ones for edges with small weights.
The discussion at (directed weighted graph from pandas dataframe)
is similar to what I want, but it would present transition probability information as textual labels rather than by edge width, which would lead to an unhelpful and difficult-to-read graph.
I am happy to consider tools other than graphviz for this task.
Here is the class I'm trying to build:
import graphviz
import matplotlib.pyplot as plt
import numpy as np
class MarkovViz:
"""
Visualize the transition probability matrix of a Markov chain as a directed
graph, where the width of an edge is proportional to the transition
probability between two states.
"""
def __init__(self, transition_probability_matrix=None):
self._graph = None
if transition_probability_matrix is not None:
self.build_from_matrix(transition_probability_matrix)
def build_from_matrix(self, trans, labels=None):
"""
Args:
trans: A pd.DataFrame or 2D np.array. A square matrix containing the
conditional probabability of a transition from the level
represented by the row to the level represented by the column.
Each row sums to 1.
labels: A list-like sequence of labels to use for the rows and
columns of 'trans'. If trans is a pd.DataFrame or similar then
this entry can be None and labels will be taken from the column
names of 'trans'.
Effects:
self._graph is created as a directed graph, and populated with nodes
and edges, with edge weights taken from 'trans'.
"""
if labels is None and hasattr(trans, "columns"):
labels = list(trans.columns)
index = list(trans.index)
if labels != index:
raise Exception("Mismatch between index and columns of "
"the transition probability matrix.")
trans = trans.values
trans = np.array(trans)
self._graph = graphviz.Digraph("MyGraph")
dim = trans.shape[0]
if trans.shape[1] != dim:
raise Exception("Matrix must be symmetric")
for i in range(dim):
for j in range(dim):
if trans[i, j] > 0:
self._graph.edge(labels[i], labels[j], weight=trans[i, j])
def plot(self, ax: plt.Axes):
self._graph.view()
I would initialize an example object using a data frame that looks something like
foo bar baz
foo 0.5 0.5 0
bar 0.0 0.0 1
baz 1.0 0.0 0
I'm running into the following error
File "<stdin>", line 1, in <module>
File "/.../markov/markovviz.py", line 16, in __init__
self.build_from_matrix(transition_probability_matrix)
File "/.../markov/markovviz.py", line 53, in build_from_matrix
self._graph.edge(labels[i], labels[j], weight=trans[i, j])
File "/.../graphviz/dot.py", line 153, in edge
attr_list = self._attr_list(label, attrs, _attributes)
File "/.../graphviz/lang.py", line 139, in attr_list
content = a_list(label, kwargs, attributes)
File "/.../graphviz/lang.py", line 112, in a_list
for k, v in tools.mapping_items(kwargs) if v is not None]
File "/.../graphviz/lang.py", line 112, in <listcomp>
for k, v in tools.mapping_items(kwargs) if v is not None]
File ".../graphviz/lang.py", line 73, in quote
if is_html_string(identifier) and not isinstance(identifier, NoHtml):
TypeError: cannot use a string pattern on a bytes-like object
which says to me that the only allowable attributes for an edge are strings or bytes. My questions:
Is it even possible to show the graph I'm trying to build in the python interface to graphviz?
If so, how do I associate numeric weights with the edges?
Once I have the weights attached to the edges, how do I draw the graph?
Your problems stems from the line:
self._graph.edge(labels[i], labels[j], weight=trans[i, j])
The problem here is that dot attributes can only be string values, whereas looking at the rest of your code, it looks as if trans[i, j] will probably return a float value.
The simplest solution is probably to just call str():
self._graph.edge(labels[i], labels[j], weight=str(trans[i, j]))
Here's a test that reproduces the problem and the solution:
>>> import graphviz
>>> g = graphviz.Digraph()
>>> g.edge('a', 'b', weight=1.5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/lars/.local/share/virtualenvs/python-LD_ZK5QN/lib/python3.9/site-packages/graphviz/dot.py", line 153, in edge
attr_list = self._attr_list(label, attrs, _attributes)
File "/home/lars/.local/share/virtualenvs/python-LD_ZK5QN/lib/python3.9/site-packages/graphviz/lang.py", line 139, in attr_list
content = a_list(label, kwargs, attributes)
File "/home/lars/.local/share/virtualenvs/python-LD_ZK5QN/lib/python3.9/site-packages/graphviz/lang.py", line 111, in a_list
items = [f'{quote(k)}={quote(v)}'
File "/home/lars/.local/share/virtualenvs/python-LD_ZK5QN/lib/python3.9/site-packages/graphviz/lang.py", line 111, in <listcomp>
items = [f'{quote(k)}={quote(v)}'
File "/home/lars/.local/share/virtualenvs/python-LD_ZK5QN/lib/python3.9/site-packages/graphviz/lang.py", line 73, in quote
if is_html_string(identifier) and not isinstance(identifier, NoHtml):
TypeError: expected string or bytes-like object
>>> g.edge('a', 'b', weight=str(1.5))
>>> print(g)
digraph {
a -> b [weight=1.5]
}
>>>
Once I have the weights attached to the edges, how do I draw the graph?
Take a look at the render and view methods:
>>> help(g.render)
render(filename=None, directory=None, view=False, cleanup=False, format=None, renderer=None, formatter=None, quiet=False, quiet_view=False) method of graphviz.dot.Digraph instance
Save the source to file and render with the Graphviz engine.
[...]
>>> help(g.view)
view(filename=None, directory=None, cleanup=False, quiet=False, quiet_view=False) method of graphviz.dot.Digraph instance
Save the source to file, open the rendered result in a viewer.
[...]

Export graph to graphml with node positions using NetworkX

I'm using NetworkX 1.9.1.
I have a graph that I need to organize with positions and I then export to graphml format.
I've tried code in this question. It does not work, here is my example
import networkx as nx
import matplotlib.pyplot as plt
G = nx.read_graphml("colored.graphml")
pos=nx.spring_layout(G) # an example of quick positioning
nx.set_node_attributes(G, 'pos', pos)
nx.write_graphml(G, "g.graphml")
nx.draw_networkx(G, pos)
plt.savefig("g.pdf")
Here are the errors I get, the problem is how positions are saved (graphml does not accept arrays).
C:\Anaconda\python.exe C:/Users/sturaroa/Documents/PycharmProjects/node_labeling_test.py
Traceback (most recent call last):
File "C:/Users/sturaroa/Documents/PycharmProjects/node_labeling_test.py", line 11, in <module>
nx.write_graphml(G, "g.graphml")
File "<string>", line 2, in write_graphml
File "C:\Anaconda\lib\site-packages\networkx\utils\decorators.py", line 220, in _open_file
result = func(*new_args, **kwargs)
File "C:\Anaconda\lib\site-packages\networkx\readwrite\graphml.py", line 82, in write_graphml
writer.add_graph_element(G)
File "C:\Anaconda\lib\site-packages\networkx\readwrite\graphml.py", line 350, in add_graph_element
self.add_nodes(G,graph_element)
File "C:\Anaconda\lib\site-packages\networkx\readwrite\graphml.py", line 307, in add_nodes
self.add_attributes("node", node_element, data, default)
File "C:\Anaconda\lib\site-packages\networkx\readwrite\graphml.py", line 300, in add_attributes
scope=scope, default=default_value)
File "C:\Anaconda\lib\site-packages\networkx\readwrite\graphml.py", line 288, in add_data
'%s as data values.'%element_type)
networkx.exception.NetworkXError: GraphML writer does not support <type 'numpy.ndarray'> as data values.
I'm under the impression that I would be better off defining positions as 2 separate node attributes, x and y, and save them separately, defining a key for each of them in the graphml format, like this.
However, I'm not that familiar with Python, and would like your opinion before I make a mess iterating back and forth.
Thanks.
You are right, GraphML want's simpler attributes (no numpy arrays or lists).
You can set the x and y positions of the nodes as attributes like this
G = nx.path_graph(4)
pos = nx.spring_layout(G)
for node,(x,y) in pos.items():
G.node[node]['x'] = float(x)
G.node[node]['y'] = float(y)
nx.write_graphml(G, "g.graphml")

ZeroDivisionError when using scipy.interpolate.griddata

I'm getting a ZeroDivisionError from the following code:
#stacking the array into a complex array allows np.unique to choose
#truely unique points. We also keep a handle on the unique indices
#to allow us to index `self` in the same order.
unique_points,index = np.unique(xdata[mask]+1j*ydata[mask],
return_index=True)
#Now we break it into the data structure we need.
points = np.column_stack((unique_points.real,unique_points.imag))
xx1,xx2 = self.meta['rcm_xx1'],self.meta['rcm_xx2']
yy1 = self.meta['rcm_yy2']
gx = np.arange(xx1,xx2+dx,dx)
gy = np.arange(-yy1,yy1+dy,dy)
GX,GY = np.meshgrid(gx,gy)
xi = np.column_stack((GX.ravel(),GY.ravel()))
gdata = griddata(points,self[mask][index],xi,method='linear',
fill_value=np.nan)
Here, xdata,ydata and self are all 2D numpy.ndarrays (or subclasses thereof) with the same shape and dtype=np.float32. mask is a 2d ndarray with the same shape and dtype=bool. Here's a link for those wanting to peruse the scipy.interpolate.griddata documentation.
Originally, xdata and ydata are derived from a non-uniform cylindrical grid that has a 4 point stencil -- I thought that the error might be coming from the fact that the same point was defined multiple times, so I made the set of input points unique as suggested in this question. Unfortunately, that hasn't seemed to help. The full traceback is:
Traceback (most recent call last):
File "/xxxxxxx/rcm.py", line 428, in <module>
x[...,1].to_pz0()
File "/xxxxxxx/rcm.py", line 285, in to_pz0
fill_value=fill_value)
File "/usr/local/lib/python2.7/site-packages/scipy/interpolate/ndgriddata.py", line 183, in griddata
ip = LinearNDInterpolator(points, values, fill_value=fill_value)
File "interpnd.pyx", line 192, in scipy.interpolate.interpnd.LinearNDInterpolator.__init__ (scipy/interpolate/interpnd.c:2935)
File "qhull.pyx", line 996, in scipy.spatial.qhull.Delaunay.__init__ (scipy/spatial/qhull.c:6607)
File "qhull.pyx", line 183, in scipy.spatial.qhull._construct_delaunay (scipy/spatial/qhull.c:1919)
ZeroDivisionError: float division
For what it's worth, the code "works" (No exception) if I use the "nearest" method.

Categories