geopandas difference only if a column's value is greater - python

Initialize data:
import pandas as pd
from shapely.geometry import Polygon
geoms = gpd.GeoSeries([
Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
Polygon([(1, 1), (3, 1), (3, 3), (1, 3)]),
Polygon([(0, 0), (3, 0), (3, 3), (0, 3)]),
])
gdf = gpd.GeoDataFrame(geometry=geoms)
gdf["value"] = [3, 2, 1]
gdf.plot(cmap='tab10', alpha=0.5)
original
Then I want to make holes into the polygons where values are greater than the current row.
gdf_list = []
for value in gdf["value"]:
gdf_equal_value = gdf.loc[gdf["value"] == value, "geometry"]
gdf_above_value = gdf.loc[gdf["value"] > value, "geometry"]
gdf_list.append(
(value, gdf_equal_value.difference(gdf_above_value.unary_union))
)
import matplotlib.pyplot as plt
for value, geom in gdf_list:
geom.plot()
plt.xlim(0, 3)
plt.ylim(0, 3)
plt.title(value)
holes
Since I have much more unique values in my actual dataset, is there a way to optimize this (e.g. not have to loop through each one)?

As I mentioned in my comment, I'm not 100% sure I understand what you want your final product to look like. Please consider editing your question to make that clearer.
In your original question, your final product was a list of (value, geodataframe) pairs, and the geodataframe contained the rows of the original gdf differenced with respect to a dissolved polygon of the gdf elements whose values were larger than the reference value.
Is that exactly what you want?
Here's a quick solution to get to something similar, but not exactly identical.
import numpy as np
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon
import matplotlib.pyplot as plt
geoms = gpd.GeoSeries([
Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
Polygon([(1, 1), (3, 1), (3, 3), (1, 3)]),
Polygon([(0, 0), (3, 0), (3, 3), (0, 3)]),
])
gdf = gpd.GeoDataFrame(geometry=geoms)
gdf["value"] = [3, 2, 1]
gdf_list = []
for value in gdf["value"].unique():
gdf['classif'] = np.select(
condlist=[(gdf['value'] == value), (gdf['value'] > value)],
choicelist=['Equal','Larger'],
default=np.nan)
gdf_diss = gdf.dissolve(by='classif',dropna=True).reset_index()
if gdf_diss['classif'].isin(['Equal','Larger']).sum() == 2:
gdf_list.append(
(value, gdf_diss.iloc[0]['geometry'].difference(gdf_diss.iloc[1]['geometry']))
)
In this case, the gdf_list contains (value,Polygon) pairs. The Polygons are the result of the difference between two other polygons:
A) The dissolved polygon of all the rows that have a value in the value column that is equal to the reference value.
B) The dissolved polygon of all the rows that have a value in the value column that is larger to the reference value.
Note that the result isn't a GeoDataFrame of the differences - for each value, it's a single Polygon.
While this might not be exactly what you were looking for, I hope the tricks I used (dissolving instead of subsetting) might help what you're trying to do.

Related

Getting the correct max value from a list of tuples

My list of tuples look like this:
[(0, 0), (3, 0), (3, 3), (0, 3), (0, 0), (0, 6), (3, 6), (3, 9), (0, 9), (0, 6), (6, 0), (9, 0), (9, 3), (6, 3), (6, 0), (0, 3), (3, 3), (3, 6), (0, 6), (0, 3)]
It has the format of (X, Y) where I want to get the max and min of all Xs and Ys in this list.
It should be min(X)=0, max(X)=9, min(Y)=0, max(Y)=9
However, when I do this:
min(listoftuples)[0], max(listoftuples)[0]
min(listoftuples)[1], max(listoftuples)[1]
...for the Y values, the maximum value shown is 3 which is incorrect.
Why is that?
for the Y values, the maximum value shown is 3
because max(listoftuples) returns the tuple (9, 3), so max(listoftuples)[0] is 9 and max(listoftuples)[1] is 3.
By default, iterables are sorted/compared based on the values of the first index, then the value of the second index, and so on.
If you want to find the tuple with the maximum value in the second index, you need to use key function:
from operator import itemgetter
li = [(0, 0), (3, 0), ... ]
print(max(li, key=itemgetter(1)))
# or max(li, key=lambda t: t[1])
outputs
(3, 9)
Here is a simple way to do it using list comprehensions:
min([arr[i][0] for i in range(len(arr))])
max([arr[i][0] for i in range(len(arr))])
min([arr[i][1] for i in range(len(arr))])
max([arr[i][1] for i in range(len(arr))])
In this code, I have used a list comprehension to create a list of all X and all Y values and then found the min/max for each list. This produces your desired answer.
The first two lines are for the X values and the last two lines are for the Y values.
Tuples are ordered by their first value, then in case of a tie, by their second value (and so on). That means max(listoftuples) is (9, 3). See How does tuple comparison work in Python?
So to find the highest y-value, you have to look specifically at the second elements of the tuples. One way you could do that is by splitting the list into x-values and y-values, like this:
xs, ys = zip(*listoftuples)
Or if you find that confusing, you could use this instead, which is roughly equivalent:
xs, ys = ([t[i] for t in listoftuples] for i in range(2))
Then get each of their mins and maxes, like this:
x_min_max, y_min_max = [(min(L), max(L)) for L in (xs, ys)]
print(x_min_max, y_min_max) # -> (0, 9) (0, 9)
Another way is to use NumPy to treat listoftuples as a matrix.
import numpy as np
a = np.array(listoftuples)
x_min_max, y_min_max = [(min(column), max(column)) for column in a.T]
print(x_min_max, y_min_max) # -> (0, 9) (0, 9)
(There's probably a more idiomatic way to do this, but I'm not super familiar with NumPy.)

How can I add a random binary info into current 'coordinate'? (Python)

This is part of the code I'm working on: (Using Python)
import random
pairs = [
(0, 1),
(1, 2),
(2, 3),
(3, 0), # I want to treat 0,1,2,3 as some 'coordinate' (or positional infomation)
]
alphas = [(random.choice([1, -1]) * random.uniform(5, 15), pairs[n]) for n in range(4)]
alphas.sort(reverse=True, key=lambda n: abs(n[0]))
A sample output looks like this:
[(13.747649802587832, (2, 3)),
(13.668274782626717, (1, 2)),
(-9.105374057105703, (0, 1)),
(-8.267840318934667, (3, 0))]
Now I'm wondering is there a way I can give each element in 0,1,2,3 a random binary number, so if [0,1,2,3] = [0,1,1,0], (By that I mean if the 'coordinates' on the left list have the corresponding random binary information on the right list. In this case, coordinate 0 has the random binary number '0' and etc.) then the desired output using the information above looks like:
[(13.747649802587832, (1, 0)),
(13.668274782626717, (1, 1)),
(-9.105374057105703, (0, 1)),
(-8.267840318934667, (0, 0))]
Thanks!!
One way using dict:
d = dict(zip([0,1,2,3], [0,1,1,0]))
[(i, tuple(d[j] for j in c)) for i, c in alphas]
Output:
[(13.747649802587832, (1, 0)),
(13.668274782626717, (1, 1)),
(-9.105374057105703, (0, 1)),
(-8.267840318934667, (0, 0))]
You can create a function to convert your number to the random binary assigned. Using a dictionary within this function would make sense. Something like this should work where output1 is that first sample output you provide and binary_code would be [0, 1, 1, 0] in your example:
def convert2bin(original, binary_code):
binary_dict = {n: x for n, x in enumerate(binary_code)}
return tuple([binary_code[x] for x in original])
binary_code = np.random.randint(2, size=4)
[convert2bin(x[1], binary_code) for x in output1]

how do I create a program that takes random tuples from a list of tuples and generates a basic plot in python?

I want to create a program which randomly takes a tuple from a list of tuples and plot them as a line plot. if I had a list of tuples dataP = [(1,3), (5,4), (2,2)] I would want it to generate a line plot, taking [0] as the x-axis and [1] as the y-axis.
in fact I have succeeded in plotting. I've tried this code:
dataP = [(1,3)]
x_val = [x[0] for x in dataP]
y_val = [x[1] for x in dataP]
print(x_val)
plt.plot(x_val,y_val)
plt.plot(x_val,y_val,'or')
plt.show()
and I got this:
[basic plot using tuples, jupyter notebook][1]
I also tried to make a line plot:
import matplotlib.pyplot as plt
data = [(0, 3), (1,2),
(2, 5), (3,1),
(4, 4), (5, 0)]
x_val = [x[0] for x in data]
y_val = [x[1] for x in data]
print(x_val)
plt.plot(x_val,y_val)
plt.plot(x_val,y_val,'or')
plt.show()
a satisfying result:
[line plot of tuples, jupyter notebook][2]
what I am unable to do is write a program which takes a random tuple from the list and plots a point (which in this case is a dot) on the graph.
how do I do it? thank you.
random package has sample and choice methods
choice returns a random value from a list
sample returns k random values from a list
Here's an example
import random
data = [(0, 3), (1, 2),
(2, 5), (3, 1),
(4, 4), (5, 0)]
print(random.choice(data))
print(random.sample(data, 2))
Here's my output
(3, 1)
[(5, 0), (2, 5)]
Here's your code that shows 3 random points and 1 random point respectively
import matplotlib.pyplot as plt
data = [(0, 3), (1, 2),
(2, 5), (3, 1),
(4, 4), (5, 0)]
# multiple points
random_points = random.sample(data, 3)
x_val = [i[0] for i in random_points]
y_val = [i[1] for i in random_points]
plt.plot(x_val, y_val)
plt.plot(x_val, y_val, 'or')
plt.show()
# single point
single_random_point = random.choice(data)
plt.plot(*single_random_point, 'or')
plt.show()

Arrange line segments consecutively to make a polygon

Im trying to arrange line segments to create a closed polygon with python. At the moment I've managed to solve it but is really slow when the number of segments increase (its like a bubble sort but for the end point of segments). I'm attaching a sample file of coordinates (the real ones are really complex but is useful for testing purposes). The file contains the coordinates for the segments of two separetes closed polygons. The image below is the result of the coordinates I've attached.
This is my code for joining the segments. The file 'Curve' is in the dropbox link above:
from ast import literal_eval as make_tuple
from random import shuffle
from Curve import Point, Curve, Segment
def loadFile():
print 'Loading File'
file = open('myFiles/coordinates.txt','r')
for line in file:
pairs.append(make_tuple(line))
file.close()
def sortSegment(segPairs):
polygons = []
segments = segPairs
while (len(segments) > 0):
counter = 0
closedCurve = Curve(Point(segments[0][0][0], segments[0][0][1]), Point(segments[0][1][0], segments[0][1][1]))
segments.remove(segments[0])
still = True
while (still):
startpnt = Point(segments[counter][0][0], segments[counter][0][1])
endpnt = Point(segments[counter][1][0], segments[counter][1][1])
seg = Segment(startpnt, endpnt)
val= closedCurve.isAppendable(seg)
if(closedCurve.isAppendable(seg)):
if(closedCurve.isClosed(seg)):
still =False
polygons.append(closedCurve.vertex)
segments.remove(segments[counter])
else:
closedCurve.appendSegment(Segment(Point(segments[counter][0][0], segments[counter][0][1]), Point(segments[counter][1][0], segments[counter][1][1])))
segments.remove(segments[counter])
counter = 0
else:
counter+=1
if(len(segments)<=counter):
counter = 0
return polygons
def toTupleList(list):
curveList = []
for curve in list:
pointList = []
for point in curve:
pointList.append((point.x,point.y))
curveList.append(pointList)
return curveList
def convertPolyToPath(polyList):
path = []
for curves in polyList:
curves.insert(1, 'L')
curves.insert(0, 'M')
curves.append('z')
path = path + curves
return path
if __name__ == '__main__':
pairs =[]
loadFile();
polygons = sortSegment(pairs)
polygons = toTupleList(polygons)
polygons = convertPolyToPath(polygons)
Assuming that you are only looking for the approach and not the code, here is how I would attempt it.
While you read the segment coordinates from the file, keep adding the coordinates to a dictionary with one coordinate (string form) of the segment as the key and the other coordinate as the value. At the end, it should look like this:
{
'5,-1': '5,-2',
'4,-2': '4,-3',
'5,-2': '4,-2',
...
}
Now pick any key-value pair from this dictionary. Next, pick the key-value pair from the dictionary where the key is same as the value in the previous key-value pair. So if first key-value pair is '5,-1': '5,-2', next look for the key '5,-2' and you will get '5,-2': '4,-2'. Next look for the key '4,-2' and so on.
Keep removing the key-value pairs from the dictionary so that once one polygon is complete, you can check if there are any elements left which means there might be more polygons.
Let me know if you need the code as well.
I had to do something similar. I needed to turn coastline segments (that were not ordered properly) into polygons. I used NetworkX to arrange the segments into connected components and order them using this function.
It turns out that my code will work for this example as well. I use geopandas to display the results, but that dependency is optional for the original question here. I also use shapely to turn the lists of segments into polygons, but you could just use CoastLine.rings to get the lists of segments.
I plan to include this code in the next version of PyRiv.
from shapely.geometry import Polygon
import geopandas as gpd
import networkx as nx
class CoastLine(nx.Graph):
def __init__(self, *args, **kwargs):
"""
Build a CoastLine object.
Parameters
----------
Returns
-------
A CoastLine object
"""
self = super(CoastLine, self).__init__(*args, **kwargs)
#classmethod
def read_shp(cls, shp_fn):
"""
Construct a CoastLine object from a shapefile.
"""
dig = nx.read_shp(shp_fn, simplify=False)
return cls(dig)
def connected_subgraphs(self):
"""
Get the connected component subgraphs. See the NetworkX
documentation for `connected_component_subgraphs` for more
information.
"""
return nx.connected_component_subgraphs(self)
def rings(self):
"""
Return a list of rings. Each ring is a list of nodes. Each
node is a coordinate pair.
"""
rings = [list(nx.dfs_preorder_nodes(sg)) for sg in self.connected_subgraphs()]
return rings
def polygons(self):
"""
Return a list of `shapely.Polygon`s representing each ring.
"""
return [Polygon(r) for r in self.rings()]
def poly_geodataframe(self):
"""
Return a `geopandas.GeoDataFrame` of polygons.
"""
return gpd.GeoDataFrame({'geometry': self.polygons()})
With this class, the original question can be solved:
edge_list = [
((5, -1), (5, -2)),
((6, -1), (5, -1)),
((1, 0), (1, 1)),
((4, -3), (2, -3)),
((2, -2), (1, -2)),
((9, 0), (9, 1)),
((2, 1), (2, 2)),
((0, -1), (0, 0)),
((5, 0), (6, 0)),
((2, -3), (2, -2)),
((6, 0), (6, -1)),
((4, 1), (5, 1)),
((10, -1), (8, -1)),
((10, 1), (10, -1)),
((2, 2), (4, 2)),
((5, 1), (5, 0)),
((8, -1), (8, 0)),
((9, 1), (10, 1)),
((8, 0), (9, 0)),
((1, -2), (1, -1)),
((1, 1), (2, 1)),
((5, -2), (4, -2)),
((4, 2), (4, 1)),
((4, -2), (4, -3)),
((1, -1), (0, -1)),
((0, 0), (1, 0)) ]
eG = CoastLine()
for e in edge_list:
eG.add_edge(*e)
eG.poly_geodataframe().plot()
This will be the result:

python: finding smallest distance between two points in two arrays

I've got two lists containing a series of tuples (x,y), representing different points on a Cartesian plane:
a = [(0, 0), (1, 2), (1, 3), (2, 4)]
b = [(3, 4), (4, 1), (5, 3)]
I'd like to find the two points (one for each list, not within the same list) at the smaller distance, in this specific case:
[((2, 4), (3, 4))]
whose distance is equal to 1. I was using list comprehension, as:
[(Pa, Pb) for Pa in a for Pb in b \
if math.sqrt(math.pow(Pa[0]-Pb[0],2) + math.pow(Pa[1]-Pb[1],2)) <= 2.0]
but this uses a threshold value. Is there a way to append an argmin() somewhere or something like that and get only the pair [((xa, ya), (xb, yb))] smallest distance? Thanks.
import numpy
e = [(Pa, Pb) for Pa in a for Pb in b]
e[numpy.argmin([math.sqrt(math.pow(Pa[0]-Pb[0],2) + math.pow(Pa[1]-Pb[1],2)) for (Pa, Pb) in e])]
Will use argmin as you suggested and return ((2, 4), (3, 4))
Just use list comprehension and min as follows:
dist = [(Pa, Pb, math.sqrt(math.pow(Pa[0]-Pb[0],2) + math.pow(Pa[1]-Pb[1],2)))
for Pa in a for Pb in b]
print min(dist, key=lambda x:x[2])[0:2]
Solution similar to DevShark's one with a few optimization tricks:
import math
import itertools
import numpy as np
def distance(p1, p2):
return math.hypot(p2[0] - p1[0], p2[1] - p1[1])
a = [(0, 0), (1, 2), (1, 3), (2, 4)]
b = [(3, 4), (4, 1), (5, 3)]
points = [tup for tup in itertools.product(a, b)]
print(points[np.argmin([distance(Pa, Pb) for (Pa, Pb) in points])])
You could also use the scipy.spatial library with the following :
import scipy.spatial as spspat
import numpy as np
distanceMatrix = spspat.distance_matrix(a,b)
args = np.argwhere(distanceMatrix==distanceMatrix.min())
print(args)
This will return you the following : array([[3, 0]]) , being the position of the points in each list.
This should also work in any dimension.

Categories