Beginner - Find in which "district" a point is located

Beginner - Find in which "district" a point is located - python

I use a NumPy library in Python and I have an exercise as follows:
Let's admit a map (a false one) with both x and y axes going from 0 to 1, representing 1 Km² of territory. Each point is an individual and it is placed on this map using normalized coordinates.
There is also a frontier separating the map into two "districts". This is not a straight one, and it must pass through A/B/C/D points, which are :
A = (0., 0.3711257)
B = (0.496042,0.62673)
C = (0.781478,0.510147)
D = (1.,0.73035714)
One part of the exercice is to know in which district any new point is located.
Obviously, I cannot just make a condition to compare the coordinates of an individual with A-B-C-D, but I don't know what I can do.
I'll be glad for any suggestions.

Related

Anyone knows a more efficient way to run a pairwise comparison of hundreds of trajectories?

So I have two different files containing multiple trajectories in a squared map (512x512 pixels). Each file contains information about the spatial position of each particle within a track/trajectory (X and Y coordinates) and to which track/trajectory that spot belongs to (TRACK_ID).
My goal was to find a way to cluster similar trajectories between both files. I found a nice way to do this (distance clustering comparison), but the code it's too slow. I was just wondering if someone has some suggestions to make it faster.
My files look something like this:
The approach that I implemented finds similar trajectories based on something called Fréchet Distance (maybe not to relevant here). Below you can find the function that I wrote, but briefly this is the rationale:
group all the spots by track using pandas.groupby function for file1 (growth_xml) and file2 (shrinkage_xml)
for each trajectories in growth_xml (loop) I compare with each trajectory in growth_xml
if they pass the Fréchet Distance criteria that I defined (an if statement) I save both tracks in a new table. you can see an additional filter condition that I called delay, but I guess that is not important to explain here.
so really simple:
def distance_clustering(growth_xml,shrinkage_xml):
coords_g = pd.DataFrame() # empty dataframes to save filtered tracks
coords_s = pd.DataFrame()
counter = 0 #initalize counter to count number of filtered tracks
for track_g, param_g in growth_xml.groupby('TRACK_ID'):
# define growing track as multi-point line object
traj1 = [(x,y) for x,y in zip(param_g.POSITION_X.values, param_g.POSITION_Y.values)]
for track_s, param_s in shrinkage_xml.groupby('TRACK_ID'):
# define shrinking track as a second multi-point line object
traj2 = [(x,y) for x,y in zip(param_s.POSITION_X.values, param_s.POSITION_Y.values)]
# compute delay between shrinkage and growing ends to use as an extra filter
delay = (param_s.FRAME.iloc[0] - param_g.FRAME.iloc[0])
# keep track only if the frechet Distance is lower than 0.2 microns
if frechetDist(traj1, traj2) < 0.2 and delay > 0:
counter += 1
param_g = param_g.assign(NEW_ID = np.ones(param_g.shape[0]) * counter)
coords_g = pd.concat([coords_g, param_g])
param_s = param_s.assign(NEW_ID = np.ones(param_s.shape[0]) * counter)
coords_s = pd.concat([coords_s, param_s])
coords_g.reset_index(drop = True, inplace = True)
coords_s.reset_index(drop = True, inplace = True)
return coords_g, coords_s
The main problem is that most of the times I have more than 2 thousand tracks (!!) and this pairwise combination takes forever. I'm wondering if there's a simple and more efficient way to do this. Perhaps by doing the pairwise combination in multiple small areas instead of the whole map? not sure...

Have you tried to make a matrix (DeltaX,DeltaY) lookUpTable for the pairwise combination distance. It will take some long time to calc the LUT once, or you can write it in a file and load it when the algo starts.
Then you'll only have to look on correct case to have the result instead of calc each time.
You can too make a polynomial regression for the distance calc, it will be less precise but definitely faster

Maybe not an outright answer, but it's been a while. Could you not segment the lines and use minimum bounding box around each segment to assess similarities? I might be thinking of your problem the wrong way around. I'm not sure. Right now I'm trying to work with polygons from two different data sets and want to optimize the processing by first identifying the polygons in both geometries that overlap.
In your case, I think segments would you leave you with some edge artifacts. Maybe look at this paper: https://drops.dagstuhl.de/opus/volltexte/2021/14879/pdf/OASIcs-ATMOS-2021-10.pdf or this paper (with python code): https://www.austriaca.at/0xc1aa5576_0x003aba2b.pdf

Find third coordinate of (right) triangle given 2 coordinates and ray to third

I start explaining my problem from very far, so you could suggest completely different approaches and understand custom objects and functions.
Over years I have recorded many bicycle GPS tracks (.gpx). I decided to merge these (mostly overlapping) tracks into a large graph and merge/remove most of track points. So far, I have managed to simplify tracks (feature in gpxpy module, that removes about 90% of track-points, while preserving positions of corners) and load them into my current program.
Current Python 3 program consists of loading gpx tracks and optimising graph with four scans. Here's planned steps in my program:
Import points from gpx (working)
Join points located close to each other (working)
Merge edges under small angles (Problem is with this step)
Remove points on straights (angle between both edges is over 170 degrees). Looks like it is working.
Clean-up by resetting unique indexing of points (working)
Final checking of all edges in graph.
In my program I started counting steps with 0, because first one is simply opening and parsing file. Stackoverflow doesn't let me to start ordering from 0.
To store graph, I have a dictionary punktid (points in estonian), where punkt (point) object is stored at key uid/ui (unique ID). Unique ID is also stored in point itself too. Weight attribute is used in 2-nd and 3-rd step to find average of points while taking into account earlier merges.
class punkt:
def __init__(self,lo,la,idd,edge=set(),ele=0, wei=1):
self.lng=lo #Longtitude
self.lat=la #Latitude
self.uid=idd #Unique ID
self.edges=edge #Set of neighbour nodes
self.att=ele #Elevation
self.weight=wei #Used to get weighted average
>>> punktid
{1: <__main__.punkt object at 0x0000006E9A9F7FD0>,
2: <__main__.punkt object at 0x0000006E9AADC470>, 3: ...}
>>> punktid[1].__dict__
{'weight': 90, 'uid': 9000, 'att': 21.09333333333333, 'lat': 59.41757, 'lng': 24.73907, 'edges': {1613, 1218, 1530}}
As you can see, there is a minor bug in clean-up, where uid was not updated. I have fixed it by now, but I left it in, so you could see scale of graph. Largest index in punktid was 1699/11787.
Getting to core problem
Let's say I have 3 points: A, B and C (i, lyhem(2) and lyhem(0) respectively in following code slice). A has common edge with B and C, but B and C might not have common edge. C is closer to A than B. To reduce size of graph, I want to move C closer to edge AB (while respecting weights of B and C) and redirect AB through C.
Solution I came up with is to find temporary point D on AB, which is closest to C. Then find weighted average between D and C, save it as E and redirect all C edges and AB to that. Simplified figure - note, that E=(C+D)/2 is not completely accurate. I cannot add more than two links, but I have additional 2 images illustrating my problem.
Biggest problem was finding coordinates of D. I found possible solution on Mathematica site, but it contains ± sign, because when finding coordinate there are two possible coordinates. But I have line, where point is located on. Anyway, I don't know how to implement it correctly and my code has become quite messy:
# 2-nd run: Merge edges under small angles
for i in set(punktid.keys()):
try:
naabrid1=frozenset(punktid[i].edges) # naabrid / neighbours
for e in naabrid1:
t=set(naabrid1)
t.remove(e)
for u in t:
try:
a=nurk_3(punktid[i], punktid[e], punktid[u]) #Returns angle EIU in degrees. 0<=a<=180
if a<10:
de=((punktid[i].lat-punktid[e].lat)**2+
((punktid[i].lng-punktid[u].lng))*2 **2) #distance i-e
du=((punktid[i].lat-punktid[u].lat)**2+
((punktid[i].lng-punktid[u].lng)*2) **2) #distance i-u
b=radians(a)
if du<de:
lyhem=[u,du,e] # lühem in English is shorter
else: # but currently it should be lähem/closer
lyhem=[e,de,u]
if sin(b)*lyhem[1]<r:
lr=abs(sin(b)*lyhem[1])
ml=tan(nurk_coor(punktid[i],punktid[lyhem[0]])) #Lühema tõus / Slope of closer (C)
mp=tan(nurk_coor(punktid[i],punktid[lyhem[2]])) #Pikema / ...farer / B
mr=-1/ml #Ristsirge / ...BD
p1=(punktid[i].lng+lyhem[1]*(1/(1+ml**2)**0.5), punktid[i].lat+lyhem[1]*(ml/(1+ml**2)**0.5))
p2=(punktid[i].lng-lyhem[1]*(1/(1+ml**2)**0.5), punktid[i].lat-lyhem[1]*(ml/(1+ml**2)**0.5))
d1=((punktid[lyhem[0]].lat-p1[1])**2+
((punktid[lyhem[0]].lng-p1[0])*2)**2)**0.5 #distance i-e
d2=((punktid[lyhem[0]].lat-p2[1])**2+
((punktid[lyhem[0]].lng-p2[0])*2)**2)**0.5 #distance i-u
if d1<d2: # I experimented with one idea,
x=p1[0]#but it made things worse.
y=p1[1]#Originally I simply used p1 coordinates
else:
x=p2[0]
y=p2[1]
lo=punktid[lyhem[2]].weight*p2[0] # Finding weighted average
la=punktid[lyhem[2]].weight*p2[1]
la+=punktid[lyhem[0]].weight*punktid[lyhem[0]].lat
lo+=punktid[lyhem[0]].weight*punktid[lyhem[0]].lng
kaal=punktid[lyhem[2]].weight+punktid[lyhem[0]].weight #kaal = weight
c=(la/kaal,lo/kaal)
punktid[ui]=punkt(c[1],c[0], ui,punktid[lyhem[0]].edges, punktid[lyhem[0]].att,kaal)
punktid[i].edges.remove(lyhem[2])
punktid[lyhem[2]].edges.remove(i)
try:
for n in punktid[ui].edges: #In all neighbours
try: #Remove link to old point
punktid[n].edges.remove(lyhem[0])
except KeyError:
pass #If it doesn't link to current
punktid[n].edges.add(ui) #And add new point
if log:
printf(punktid[n].edges,'naabri '+str(n)+' edges')
except KeyError: #If neighbour itself has been removed
pass #(in same merge), Ignore
punktid[ui].edges.add(lyhem[2])
punktid[lyhem[2]].edges.add(ui)
punktid.pop(lyhem[0])
ui+=1
except KeyError: # u has been removed
pass
except KeyError: # i has been removed
pass
This is a code segment and it is likely to not run after copy-pasting because of missing variables/functions. New point is being calculated on lines 22 to 43, in 3rd if-statement from beginning if sin(b)*lyhem[1]<r to punktid[ui]=... After that is redirecting old edges to new node.
Stating question clearly: How to find point on ray (AB), if two coordinates of line segment (AC) and angles at these points are known (angle ACB should be 90 degrees)? How to implement it in Python 3.5?
PS. (Meta) If somebody needs full source, how could I provide it (uploading single text file without registration)? Pastebin or pasting (spamming) it here? If I upload it to other site, how to provide link, if newbie users are limited to two?

Using python array's column as boolean to change another column's values

I am farily new to python, so that may be why I haven't been able to search properly in this site for an answer to my problem, in case someone know about it already.
I am reading several data files consisting in six columns: XYZ coordinates and 3-vector components around a sphere. I am using the X and Z coordinates to find the angle location of my vectors, and using the angle results as a new column to my array, as a fourth column. So, I have a 7 column array.
The angles are calculated with np.arctan and they get (-) or (+) sign depending on the quadrant the XZ coordinates are located.
Now, I want to make all the angles around the upper half of the sphere (Z positive) positive and spanning from 0º to 180º. On the other hand, I want my angles around the lower half to be negative, and going from 0º to -180º.
The operations to change sign and apply the 0º-180º range are easy to implement. But, I want to use the third column of my array (Z coordinates) as the boolean to decide how to change the angles (fourth column of the array). I have read quite a lot of info about slicing arrays, or modifying columns/rows based on arbitrary boolean conditions applied to the same columns/rows.
I am trying to avoid combining for-loops and if-statements, and learn how to use the power of the python :)
Thanks in advance!

np.where(cond, if_true, if_false) solves the problem you think you have:
fixed_angle = np.where(z > 0, angle, -angle)
The problem you actually have is that you are not using atan2, or as it's spelt in numpy, np.arctan2

Getting the position of an (real) object

The story: Lets say I have a robot with a distance (ultrasound?) sensor. The robot can calculate the distance from any object in front of it but it cannot know the coordinates of the object. So the robot moves a little bit to get a diffrent view angle and calculates the distance from that view knowing how far it is from the first view.
How do I get the coordinates or some kind of position of a real life object in Python 3.4 with the following input.
Distance from object at view A.
Distance from object at view B.
Distance between view A and B.
A and B are always on the same X coordinate.
Input example:
a = 3.5 #Distance from object at point A (in cm)
b = 7 # Disatance from object at point B (in cm)
c = 5 # Distance between A and B (in cm)
The output should be some coordinates or something that I can use to find out the position of an object.
How would I calculate where the object is? I know there is some kind of an algorithm but I don't know what it's called or how it works.
I guess this is more a math question than a programming question but I want to implement this programmatically.
Anyways the input doesn't need to be exactly this. I guess you would also need an angle or something similar so if extre input is needed just use it in the anwser.
Thanks!
(I am on Win 10, 64bit, Python 3.4)
If you know how to do this or some algorithm name but you don't know Python, please point it out or give an example of how to do it with math, and I will try to implement it in Python.

Draw the triangle ABC with coordinates (0,0), (b,0), (cx,cy) (fix the origin at A for want of anywhere better to put it -- you can always shift your coordinates later).
Then you know the quantities AC^2 = cx^2 + cy^2, BC^2 = (cx-b)^2 + cy^2. These equations you can solve for cx = (AC^2 - BC^2 + b^2)/2b and cy = +/- sqrt(AC^2 - cx^2).
Note that you don't provide enough information to deduce the sign of cy (which "side" of the x-axis your object is).
This is straightforward to code in Python.

The object is in two circumferences, one with center in position 1 of the robot (known) and radius a (= distance from position 1 to object) and another with center in position 2 and radius b (= distance from position 2 to object). Then it is a matter of finding the intersection of these two circumferences. Since the circumferences will intersect in two points you still have to determine in which one of these points the object is. Therefore you will need some additional information to decide this, but I'm sure with this little help you will get started.

Are you using an ultrasonic sensor?
A = (-c/2, 0)
B = (+c/2, 0)
C = (tx, ty), ty >= 0
then
AC^2 = (tx+c/2)^2 + ty^2 = a^2
BC^2 = (tx-c/2)^2 + ty^2 = b^2
Since the answer is uniquely determined, it is only solved after.

Modeling a graph in Python

I'm trying to solve a problem related to graphs in Python. Since its a comeptitive programming problem, I'm not using any other 3rd party packages.
The problem presents a graph in the form of a 5 X 5 square grid.
A bot is assumed to be at a user supplied position on the grid. The grid is indexed at (0,0) on the top left and (4,4) on the bottom right. Each cell in the grid is represented by any of the following 3 characters. ‘b’ (ascii value 98) indicates the bot’s current position, ‘d’ (ascii value 100) indicates a dirty cell and ‘-‘ (ascii value 45) indicates a clean cell in the grid.
For example below is a sample grid where the bot is at 0 0:
b---d
-d--d
--dd-
--d--
----d
The goal is to clean all the cells in the grid, in minimum number of steps.
A step is defined as a task, where either
i) The bot changes it position
ii) The bot changes the state of the cell (from d to -)
Assume that initially the position marked as b need not be cleaned. The bot is allowed to move UP, DOWN, LEFT and RIGHT.
My approach
I've read a couple of tutorials on graphs,and decided to model the graph as an adjacency matrix of 25 X 25 with 0 representing no paths, and 1 representing paths in the matrix (since we can move only in 4 directions). Next, I decided to apply Floyd Warshell's all pairs shortest path algorithm to it, and then sum up the values of the paths.
But I have a feeling that it won't work.
I'm in a delimma that the problem is either one of the following:
i) A Minimal Spanning Tree (which I'm unable to do, as I'm not able to model and store the grid as a graph).
ii) A* Search (Again a wild guess, but the same problem here, I'm not able to model the grid as a graph properly).
I'd be thankful if you could suggest a good approach at problems like these. Also, some hint and psuedocode about various forms of graph based problems (or links to those) would be helpful. Thank

I think you're asking two questions here.
1. How do I represent this problem as a graph in Python?
As the robot moves around, he'll be moving from one dirty square to another, sometimes passing through some clean spaces along the way. Your job is to figure out the order in which to visit the dirty squares.
# Code is untested and may contain typos. :-)
# A list of the (x, y) coordinates of all of the dirty squares.
dirty_squares = [(0, 4), (1, 1), etc.]
n = len(dirty_squares)
# Everywhere after here, refer to dirty squares by their index
# into dirty_squares.
def compute_distance(i, j):
return (abs(dirty_squares[i][0] - dirty_squares[j][0])
+ abs(dirty_squares[i][1] - dirty_squares[j][1]))
# distances[i][j] is the cost to move from dirty square i to
# dirty square j.
distances = []
for i in range(n):
distances.append([compute_distance(i, j) for j in range(n)])
# The x, y coordinates of where the robot starts.
start_node = (0, 0)
# first_move_distances[i] is the cost to move from the robot's
# start location to dirty square i.
first_move_distances = [
abs(start_node[0] - dirty_squares[i][0])
+ abs(start_node[1] - dirty_squares[i][1]))
for i in range(n)]
# order is a list of the dirty squares.
def cost(order):
if not order:
return 0 # Cleaning 0 dirty squares is free.
return (first_move_distances[order[0]]
+ sum(distances[order[i]][order[i+1]]
for i in range(len(order)-1)))
Your goal is to find a way to reorder list(range(n)) that minimizes the cost.
2. How do I find the minimum number of moves to solve this problem?
As others have pointed out, the generalized form of this problem is intractable (NP-Hard). You have two pieces of information that help constrain the problem to make it tractable:
The graph is a grid.
There are at most 24 dirty squares.
I like your instinct to use A* here. It's often good for solving find-the-minimum-number-of-moves problems. However, A* requires a fair amount of code. I think you'd be better of going with a Branch-and-Bound approach (sometimes called Branch-and-Prune), which should be almost as efficient but is much easier to implement.
The idea is to start enumerating all possible solutions using a depth-first-search, like so:
# Each list represents a sequence of dirty nodes.
[]
[1]
[1, 2]
[1, 2, 3]
[1, 3]
[1, 3, 2]
[2]
[2, 1]
[2, 1, 3]
Every time you're about to recurse into a branch, check to see if that branch is more expensive than the cheapest solution found so far. If so, you can skip the whole branch.
If that's not efficient enough, add a function to calculate a lower bound on the remaining cost. Then if cost([2]) + lower_bound(set([1, 3])) is more expensive than the cheapest solution found so far, you can skip the whole branch. The tighter lower_bound() is, the more branches you can skip.

Let's say V={v|v=b or v=d}, and get a full connected graph G(V,E). You could calculate the cost of each edge in E with a time complexity of O(n^2). Afterwards the problem becomes exactly the same as: Start at a specified vertex, and find a shortest path of G which covers V.
We call this Traveling Salesman Problem(TSP) since 1832.

The problem can certainly be stored as a graph. The cost between nodes (dirty cells) is their Manhattan distance. Ignore the cost of cleaning cells, because that total cost will be the same no matter what path taken.

This problem looks to me like the Minimum Rectilinear Steiner Tree problem. Unfortunately, the problem is NP hard, so you'll need to come up with an approximation (a Minimum Spanning Tree based on Manhattan distance), if I am correct.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.