Finding unique maximum values in a list using python

Finding unique maximum values in a list using python - python

I have a list of points as shown below
points=[ [x0,y0,v0], [x1,y1,v1], [x2,y2,v2].......... [xn,yn,vn]]
Some of the points have duplicate x,y values. What I want to do is to extract the unique maximum value x,y points
For example, if I have points [1,2,5] [1,1,3] [1,2,7] [1,7,3]
I would like to obtain the list [1,1,3] [1,2,7] [1,7,3]
How can I do this in python?
Thanks

For example:
import itertools
def getxy(point): return point[:2]
sortedpoints = sorted(points, key=getxy)
results = []
for xy, g in itertools.groupby(sortedpoints, key=getxy):
results.append(max(g, key=operator.itemgetter(2)))
that is: sort and group the points by xy, for every group with fixed xy pick the point with the maximum z. Seems straightforward if you're comfortable with itertools (and you should be, it's really a very powerful and useful module!).
Alternatively you could build a dict with (x,y) tuples as keys and lists of z as values and do one last pass on that one to pick the max z for each (x, y), but I think the sort-and-group approach is preferable (unless you have many millions of points so that the big-O performance of sorting worries you for scalability purposes, I guess).

You can use dict achieve this, using the property that "If a given key is seen more than once, the last value associated with it is retained in the new dictionary." This code sorts the points to make sure that the highest values come later, creates a dictionary whose keys are a tuple of the first two values and whose value is the third coordinate, then translates that back into a list
points = [[1,2,5], [1,1,3], [1,2,7], [1,7,3]]
sp = sorted(points)
d = dict( ( (a,b), c) for (a,b,c) in sp)
results = [list(k) + [v] for (k,v) in d.iteritems()]
There may be a way to further improve that, but it satisfies all your requirements.

If I understand your question .. maybe use a dictionary to map (x,y) to the max z
something like this (not tested)
dict = {}
for x,y,z in list
if dict.has_key((x,y)):
dict[(x,y)] = max(dict[(x,y)], z)
else:
dict[(x,y)] = z
Though the ordering will be lost

Related

Making a Marginal Distribution from Joint Distribution in python

I have 3 arrays of X values, Y values, and probabilities. I'm trying to do two things but they're practically the same coding wise I imagine.
I want to find all the X values that are the same, and add up the corresponding probabilities into another array. (So if my X values are [3,7,4,7] and my probabilities are [.2,.3,.1,.4] I would want to add .3 and .4 together. I'm trying to do this with a loop, but because I only picked up python two weeks ago I'm struggling.
My thought process that I want to try:
MargX=np.unique(X array)
MargXp=np.zeros(len(MargX))
for Ind in range(len(MargX):
?
(Here I want to take the values in my X array that are equal, grab the corresponding value from my p array, and then add them into my zero array MargXp)
I've tried a couple of different ways to set up my loop so that it would add the values into the zero arrays that I made, but to no avail because I keep getting syntax errors and various other things.

If you try to compact the X's down with unique, then finding the corresponding probabilities would involve searching through the X array for indices and using those to find the corresponding probabilities. I think you'd be happier using python's dictionary concept to associate keys (x-values) with values (probabilities). Using defaultdict allows you to specify a default value for keys which aren't already in the dictionary. In this case, start off with the idea that an arbitrary x-value has zero probability. As you iterate through the x/probability pairs, you can then use increment to add the probability to the stored or default value associated with the x.
The result looks something like this:
from collections import defaultdict
# synchronized arrays from your example.
x = [3, 7, 4, 7]
probs = [0.2, 0.3, 0.1, 0.4]
marginal = defaultdict(lambda: 0.0) # 0.0 is the default value
for key, value in zip(x, probs): # zip combines the arrays as pairs
marginal[key] += value # increment to accumulate total probability
# The following line is not strictly needed since all values are
# in the dictionary, but by default key values are not ordered.
orderedkeys = sorted(marginal.keys())
for key in orderedkeys:
print(key, marginal[key])
which produces:
3 0.2
4 0.1
7 0.7

check if subarray is in array of arrays

I've got an array of arrays where I store x,y,z coordinates and a measurement at that coordinate like:
measurements = [[x1,y1,z1,val1],[x2,y2,z2,val2],[...]]
Now before adding a measurement for a certain coordinate I want to check if there is already a measurement for that coordinate. So I can only keep the maximum val measurement.
So the question is:
Is [xn, yn, zn, ...] already in measurements
My approach so far would be to iterate over the array and compare with a sclied entry like
for measurement in measurements:
if measurement_new[:3] == measurement[:3]:
measurement[3] = measurement_new[3] if measurement_new[3] > measurement[3] else measurement[3]
But with the measurements array getting bigger this is very unefficient.
Another approach would be two separate arrays coords = [[x1,y1,z1], [x2,y2,z2], [...]] and vals = [val1, val2, ...]
This would allow to check for existing coordinates effeciently with [x,y,z] in coords but would have to merge the arrays later on.
Can you suggest a more efficent method for soving this problem?

If you want to stick to built-in types (if not see last point in Notes below) I suggest using a dict for the measurements:
measurements = {(x1,y1,z1): val1,
(x2,y2,z2): val2}
Then adding a new value (x,y,z,val) can simply be:
measurements[(x,y,z)] = max(measurements.get((x,y,z), 0), val)
Notes:
The value 0 in measurements.get is supposed to be the lower bound of the values you are expecting. If you have values below 0 then change it to an appropriate lower bound such that whenever (x,y,z) is not present in your measures get returns the lower bound and thus max will return val. You can also avoid having to specify the lower bound and write:
measurements[(x,y,z)] = max(measurements.get((x,y,z), val), val)
You need to use tuple as type for your keys, hence the (x,y,z). This is because lists cannot be hashed and so not permitted as keys.
Finally, depending on the complexity of the task you are performing, consider using more complex data types. I would recommend having a look at pandas DataFrames they are ideal to deal with such kind of things.

looping through complicated nested dictionary

I have a rather complex list of dictionaries with nested dictionaries and arrays. I am trying to figure out a way to either,
make the list of data less complicated and then loop through the
raster points or,
find a way to loop through the array of raster points as is.
What I am ultimately trying to do is loop through all raster points within each polygon, perform a simple greater than or less than on the value assigned to that raster point (values are elevation values). If greater than a given value assign 1, if less than given value assign 0. I would then create a separate array of these 1s and 0s of which I can then get an average value.
I have found all these points (allpoints within pts), but they are in arrays within a dictionary within another dictionary within a list (of all polygons) at least I think, I could be wrong in the organization as dictionaries are rather new to me.
The following is my code:
import numpy as np
def mystat(x):
mystat = dict()
mystat['allpoints'] = x
return mystat
stats = zonal_stats('acp.shp','myGeoTIFF.tif')
pts = zonal_stats('acp.shp','myGeoTIFF.tif', add_stats={'mystat':mystat})
Link to my documents. Any help or direction would be greatly appreciated!

I assume you are using rasterstats package. You could try something like this:
threshold_value = 15 # You may change this threshold value to yours
for o_idx in range(0, len(pts)):
data = pts[o_idx]['mystat']['allpoints'].data
for d_idx in range(0, len(data)):
for p_idx in range(0, len(data[d_idx])):
# You may change the conditions below as you want
if data[d_idx][p_idx] > threshold_value:
data[d_idx][p_idx] = 1
elif data[d_idx][p_idx] <= threshold_value:
data[d_idx][p_idx] = 0;
It is going to update the data within the pts list

Find the closest match of a list in a list containing lists

I have a list with two elements like this:
list_a = [27.666521, 85.437447]
and another list like this:
big_list = [[27.666519, 85.437477], [27.666460, 85.437622], ...]
And I want to find the closest match of list_a within list_b.
For example, here the closest match would be [27.666519, 85.437477].
How would I be able to achieve this?
I found a similar problem here for finding the closest match of a string in an array but was unable to reproduce it similarly for the above mentioned problem.
P.S.The elements in the list are the co-ordinates of points on the earth.

From your question, it's hard to tell how you want to measure the distance, so I simply assume you mean Euclidean distance.
You can use the key parameter to min():
from functools import partial
def distance_squared(x, y):
return (x[0] - y[0])**2 + (x[1] - y[1])**2
print min(big_list, key=partial(distance_squared, list_a))

Assumptions:
You intend to make this type query more than once on the same list of lists
Both the query list and the lists in your list of lists represent points in a n-dimensional euclidean space (here: a 2-dimensional space, unlike GPS positions that come from a spherical space).
This reads like a nearest neighbor search. Probably you should take into consideration a library dedicated for this, like scikits.ann.
Example:
import scikits.ann as ann
import numpy as np
k = ann.kdtree(np.array(big_list))
indices, distances = k.knn(list_a, 1)
This uses euclidean distance internally. You should make sure, that the distance measure you apply complies your idea of proximity.
You might also want to have a look on Quadtree, which is another data structure that you could apply to avoid the brute force minimum search through your entire list of lists.

Construct a dictionary merging multiple lists

I have a list of objects (clusters) and each object has an attribute vertices which is a list of numbers. I want to construct a dictionary (using a one liner) such that the key is a vertex number and the value is the index of the corresponding cluster in the actual list.
Ex:
clusters[0].vertices = [1,2]
clusters[1].vertices = [3,4]
Expected Output:
{1:0,2:0,3:1,4:1}
I came up with the following:
dict(reduce(lambda x,y:x.extend(y) or x, [
dict(zip(vertices, [index]*len(vertices))).items()
for index,vertices in enumerate([i.vertices for i in clusters])]))
It works... but is there a better way of doing this?
Also comment on the efficiency of the above piece of code.
PS: The vertex lists are disjoint.

This is a fairly simple solution, using a nested for:
dict((vert, i) for (i, cl) in enumerate(clusters) for vert in cl.vertices)
This is also more efficient than the version in the question, since it doesn't build lots of intermediate lists while collecting the data for the dict.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding unique maximum values in a list using python - python

If I understand your question .. maybe use a dictionary to map (x,y) to the max z something like this (not tested) dict = {} for x,y,z in list if dict.has_key((x,y)): dict[(x,y)] = max(dict[(x,y)], z) else: dict[(x,y)] = z Though the ordering will be lost

Related

Making a Marginal Distribution from Joint Distribution in python

check if subarray is in array of arrays

looping through complicated nested dictionary

Find the closest match of a list in a list containing lists

Construct a dictionary merging multiple lists

Categories

Resources