Related
I need help to write a function that:
takes as input set of tuples
returns the number of tuples that has unique numbers
Example 1:
# input:
{(0, 1), (3, 4), (0, 0), (1, 1), (3, 3), (2, 2), (1, 0)}
# expected output: 3
The expected output is 3, because:
(3,4) and (3,3) contain common numbers, so this counts as 1
(0, 1), (0, 0), (1, 1), and (1, 0) all count as 1
(2, 2) counts as 1
So, 1+1+1 = 3
Example 2:
# input:
{(0, 1), (2, 1), (0, 0), (1, 1), (0, 3), (2, 0), (0, 2), (1, 0), (1, 3)}
# expected output: 1
The expected output is 1, because all tuples are related to other tuples by containing numbers in common.
This may not be the most efficient algorithm for it, but it is simple and looks nice.
from functools import reduce
def unisets(iterables):
def merge(fsets, fs):
if not fs: return fsets
unis = set(filter(fs.intersection, fsets))
return {reduce(type(fs).union, unis, fs), *fsets-unis}
return reduce(merge, map(frozenset, iterables), set())
us = unisets({(0,1), (3,4), (0,0), (1,1), (3,3), (2,2), (1,0)})
print(us) # {frozenset({3, 4}), frozenset({0, 1}), frozenset({2})}
print(len(us)) # 3
Features:
Input can be any kind of iterable, whose elements are iterables (any length, mixed types...)
Output is always a well-behaved set of frozensets.
this code works for me
but check it maby there edge cases
how this solution?
def count_groups(marked):
temp = set(marked)
save = set()
for pair in temp:
if pair[1] in save or pair[0] in save:
marked.remove(pair)
else:
save.add(pair[1])
save.add(pair[0])
return len(marked)
image
This is part of the code I'm working on: (Using Python)
import random
pairs = [
(0, 1),
(1, 2),
(2, 3),
(3, 0), # I want to treat 0,1,2,3 as some 'coordinate' (or positional infomation)
]
alphas = [(random.choice([1, -1]) * random.uniform(5, 15), pairs[n]) for n in range(4)]
alphas.sort(reverse=True, key=lambda n: abs(n[0]))
A sample output looks like this:
[(13.747649802587832, (2, 3)),
(13.668274782626717, (1, 2)),
(-9.105374057105703, (0, 1)),
(-8.267840318934667, (3, 0))]
Now I'm wondering is there a way I can give each element in 0,1,2,3 a random binary number, so if [0,1,2,3] = [0,1,1,0], (By that I mean if the 'coordinates' on the left list have the corresponding random binary information on the right list. In this case, coordinate 0 has the random binary number '0' and etc.) then the desired output using the information above looks like:
[(13.747649802587832, (1, 0)),
(13.668274782626717, (1, 1)),
(-9.105374057105703, (0, 1)),
(-8.267840318934667, (0, 0))]
Thanks!!
One way using dict:
d = dict(zip([0,1,2,3], [0,1,1,0]))
[(i, tuple(d[j] for j in c)) for i, c in alphas]
Output:
[(13.747649802587832, (1, 0)),
(13.668274782626717, (1, 1)),
(-9.105374057105703, (0, 1)),
(-8.267840318934667, (0, 0))]
You can create a function to convert your number to the random binary assigned. Using a dictionary within this function would make sense. Something like this should work where output1 is that first sample output you provide and binary_code would be [0, 1, 1, 0] in your example:
def convert2bin(original, binary_code):
binary_dict = {n: x for n, x in enumerate(binary_code)}
return tuple([binary_code[x] for x in original])
binary_code = np.random.randint(2, size=4)
[convert2bin(x[1], binary_code) for x in output1]
On one hand, I have a grid defaultdict that stores the neighboring nodes of each node on a grid and its weight (all 1 in the example below).
node (w nbr_node)
grid = { 0: [(1, -5), (1, -4), (1, -3), (1, -1), (1, 1), (1, 3), (1, 4), (1, 5)],
1: [(1, -4), (1, -3), (1, -2), (1, 0), (1, 2), (1, 4), (1, 5), (1, 6)],
2: [(1, -3), (1, -2), (1, -1), (1, 1), (1, 3), (1, 5), (1, 6), (1, 7)],
3: [(1, -2), (1, -1), (1, 0), (1, 2), (1, 4), (1, 6), (1, 7), (1, 8)],
...
}
On the other, I have a Djisktra function that computes the shortest path between 2 nodes on this grid. The algorithm uses the heapq module and works perfectly fine.
import heapq
def Dijkstra(s, e, grid): #startpoint, endpoint, grid
visited = set()
distances = {s: 0}
p = {}
queue = [(0, s)]
while queue != []:
weight, node = heappop(queue)
if node in visited:
continue
visited.add(node)
for n_weight, n_node in grid[node]:
if n_node in visited:
continue
total = weight + n_weight
if n_node not in distances or distances[n_node] > total:
distances[n_node] = total
heappush(queue, (total, n_node))
p[n_node] = node
Problem: when calling the Djikstra function multiple times, heappush is... adding new keys in the grid dictionary for no reason !
Here is a MCVE:
from collections import defaultdict
# Creating the dictionnary
grid = defaultdict(list)
N = 4
kernel = (-N-1, -N, -N+1, -1, 1, N-1, N, N+1)
for i in range(N*N):
for n in kernel:
if i > N and i < (N*N) - 1 - N and (i%N) > 0 and (i%N) < N - 1:
grid[i].append((1, i+n))
# Calling Djikstra multiple times
keys = [*range(N*N)]
while keys:
k1, k2 = random.sample(keys, 2)
Dijkstra(k1, k2, grid)
keys.remove(k1)
keys.remove(k2)
The original grid defaultdict:
dict_keys([5, 6, 9, 10])
...and after calling the Djikstra function multiple times:
dict_keys([5, 6, 9, 10, 4, 0, 1, 2, 8, 3, 7, 11, 12, 13, 14, 15])
When calling the Djikstra function multiple times without heappush (just commenting heappush at the end):
dict_keys([5, 6, 9, 10])
Question:
How can I avoid this strange behavior ?
Please note that I'm using Python 2.7 and can't use numpy.
I could reproduce and fix. The problem is in the way you are building grid: it contains values that are not in keys from -4 to 0 and from 16 to 20 in the example. So you push those inexistant nodes on the head, and later pop them.
And you end in executing for n_weight, n_node in grid[node]: where node does not (still) exists in grid. As grid is a defaultdict, a new node is automatically inserted with an empty list as value.
The fix is trivial (at least for the example data): it is enough to ensure that all nodes added as value is grid exist as key with a modulo:
for i in range(N*N):
for n in kernel:
grid[i].append((1, (i+n + N + 1)%(N*N)))
But even for real data it should not be very hard to ensure that all nodes existing in grid values also exist in keys...
BTW, if grid had been a simple dict the error would have been immediate with a KeyError on grid[node].
Im trying to arrange line segments to create a closed polygon with python. At the moment I've managed to solve it but is really slow when the number of segments increase (its like a bubble sort but for the end point of segments). I'm attaching a sample file of coordinates (the real ones are really complex but is useful for testing purposes). The file contains the coordinates for the segments of two separetes closed polygons. The image below is the result of the coordinates I've attached.
This is my code for joining the segments. The file 'Curve' is in the dropbox link above:
from ast import literal_eval as make_tuple
from random import shuffle
from Curve import Point, Curve, Segment
def loadFile():
print 'Loading File'
file = open('myFiles/coordinates.txt','r')
for line in file:
pairs.append(make_tuple(line))
file.close()
def sortSegment(segPairs):
polygons = []
segments = segPairs
while (len(segments) > 0):
counter = 0
closedCurve = Curve(Point(segments[0][0][0], segments[0][0][1]), Point(segments[0][1][0], segments[0][1][1]))
segments.remove(segments[0])
still = True
while (still):
startpnt = Point(segments[counter][0][0], segments[counter][0][1])
endpnt = Point(segments[counter][1][0], segments[counter][1][1])
seg = Segment(startpnt, endpnt)
val= closedCurve.isAppendable(seg)
if(closedCurve.isAppendable(seg)):
if(closedCurve.isClosed(seg)):
still =False
polygons.append(closedCurve.vertex)
segments.remove(segments[counter])
else:
closedCurve.appendSegment(Segment(Point(segments[counter][0][0], segments[counter][0][1]), Point(segments[counter][1][0], segments[counter][1][1])))
segments.remove(segments[counter])
counter = 0
else:
counter+=1
if(len(segments)<=counter):
counter = 0
return polygons
def toTupleList(list):
curveList = []
for curve in list:
pointList = []
for point in curve:
pointList.append((point.x,point.y))
curveList.append(pointList)
return curveList
def convertPolyToPath(polyList):
path = []
for curves in polyList:
curves.insert(1, 'L')
curves.insert(0, 'M')
curves.append('z')
path = path + curves
return path
if __name__ == '__main__':
pairs =[]
loadFile();
polygons = sortSegment(pairs)
polygons = toTupleList(polygons)
polygons = convertPolyToPath(polygons)
Assuming that you are only looking for the approach and not the code, here is how I would attempt it.
While you read the segment coordinates from the file, keep adding the coordinates to a dictionary with one coordinate (string form) of the segment as the key and the other coordinate as the value. At the end, it should look like this:
{
'5,-1': '5,-2',
'4,-2': '4,-3',
'5,-2': '4,-2',
...
}
Now pick any key-value pair from this dictionary. Next, pick the key-value pair from the dictionary where the key is same as the value in the previous key-value pair. So if first key-value pair is '5,-1': '5,-2', next look for the key '5,-2' and you will get '5,-2': '4,-2'. Next look for the key '4,-2' and so on.
Keep removing the key-value pairs from the dictionary so that once one polygon is complete, you can check if there are any elements left which means there might be more polygons.
Let me know if you need the code as well.
I had to do something similar. I needed to turn coastline segments (that were not ordered properly) into polygons. I used NetworkX to arrange the segments into connected components and order them using this function.
It turns out that my code will work for this example as well. I use geopandas to display the results, but that dependency is optional for the original question here. I also use shapely to turn the lists of segments into polygons, but you could just use CoastLine.rings to get the lists of segments.
I plan to include this code in the next version of PyRiv.
from shapely.geometry import Polygon
import geopandas as gpd
import networkx as nx
class CoastLine(nx.Graph):
def __init__(self, *args, **kwargs):
"""
Build a CoastLine object.
Parameters
----------
Returns
-------
A CoastLine object
"""
self = super(CoastLine, self).__init__(*args, **kwargs)
#classmethod
def read_shp(cls, shp_fn):
"""
Construct a CoastLine object from a shapefile.
"""
dig = nx.read_shp(shp_fn, simplify=False)
return cls(dig)
def connected_subgraphs(self):
"""
Get the connected component subgraphs. See the NetworkX
documentation for `connected_component_subgraphs` for more
information.
"""
return nx.connected_component_subgraphs(self)
def rings(self):
"""
Return a list of rings. Each ring is a list of nodes. Each
node is a coordinate pair.
"""
rings = [list(nx.dfs_preorder_nodes(sg)) for sg in self.connected_subgraphs()]
return rings
def polygons(self):
"""
Return a list of `shapely.Polygon`s representing each ring.
"""
return [Polygon(r) for r in self.rings()]
def poly_geodataframe(self):
"""
Return a `geopandas.GeoDataFrame` of polygons.
"""
return gpd.GeoDataFrame({'geometry': self.polygons()})
With this class, the original question can be solved:
edge_list = [
((5, -1), (5, -2)),
((6, -1), (5, -1)),
((1, 0), (1, 1)),
((4, -3), (2, -3)),
((2, -2), (1, -2)),
((9, 0), (9, 1)),
((2, 1), (2, 2)),
((0, -1), (0, 0)),
((5, 0), (6, 0)),
((2, -3), (2, -2)),
((6, 0), (6, -1)),
((4, 1), (5, 1)),
((10, -1), (8, -1)),
((10, 1), (10, -1)),
((2, 2), (4, 2)),
((5, 1), (5, 0)),
((8, -1), (8, 0)),
((9, 1), (10, 1)),
((8, 0), (9, 0)),
((1, -2), (1, -1)),
((1, 1), (2, 1)),
((5, -2), (4, -2)),
((4, 2), (4, 1)),
((4, -2), (4, -3)),
((1, -1), (0, -1)),
((0, 0), (1, 0)) ]
eG = CoastLine()
for e in edge_list:
eG.add_edge(*e)
eG.poly_geodataframe().plot()
This will be the result:
My problem is the following: I am parsing users interactions, each time an interaction is detected I emit ((user1,user2),((date1,0),(0,1))). The zero's are here for the direction of the interaction.
I cannot figure out why I cannot reduce this output with the following reduce function:
def myFunc2(x1,x2):
return (min(x1[0][0],x2[0][0]),max(x1[0][0],x2[0][0]),min(x1[0][1],x2[0][1]),max(x1[0][1],x2[0][1]),x1[1][0]+x2[1][0],x1[1][1]+x2[1][1])
The output of my mapper (flatmap(myFunc)) is correct:
((7401899, 5678002), ((1403185440.0, 0), (1, 0)))
((82628194, 22251869), ((0, 1403185452.0), (0, 1)))
((2162276, 98056200), ((1403185451.0, 0), (1, 0)))
((0509420, 4827510), ((1403185449.0, 0), (1, 0)))
((7974923, 9235930), ((1403185450.0, 0), (1, 0)))
((250259, 6876774), ((0, 1403185450.0), (0, 1)))
((642369, 6876774), ((0, 1403185450.0), (0, 1)))
((82628194, 22251869), ((0, 1403185452.0), (0, 1)))
((2162276, 98056200), ((1403185451.0, 0), (1, 0)))
But running
lines.flatMap(myFunc) \
.map(lambda x: (x[0], x[1])) \
.reduceByKey(myFunc2)
Gives me the error
return (min(x1[0][0],x2[0][0]),max(x1[0][0],x2[0][0]),min(x1[0][1],x2[0][1]),max(x1[0][1],x2[0][1]),x1[1][0]+x2[1][0],x1[1][1]+x2[1][1])
TypeError: 'int' object has no attribute 'getitem'
I guess I am messing something up in my keys but I don't know why (I tried to recast the key to tuple as said here but same error)
Some idea ? Thanks a lot
Okay, I think the problem here is that you are indexing too deep in items that don't go as deep as you think.
Let's examine myFunc2
def myFunc2(x1,x2):
return (min(x1[0][0],x2[0][0]),max(x1[0][0],x2[0][0]),min(x1[0][1],x2[0][1]),max(x1[0][1],x2[0][1]),x1[1][0]+x2[1][0],x1[1][1]+x2[1][1])
Given your question above, the input data will look like this:
((467401899, 485678002), ((1403185440.0, 0), (1, 0)))
Let's go ahead and assign that data row equal to a variable.
x = ((467401899, 485678002), ((1403185440.0, 0), (1, 0)))
What happens when we run x[0]? We get (467401899, 485678002). When we run x[1]? We get ((1403185440.0, 0), (1, 0)). That's what your map statement is doing, I believe.
Okay. That's clear.
In your function myFunc2, you have two parameters, x1 and x2. Those correspond to the variables above: x1 = x[0] = (467401899, 485678002) and x2 = x[1] = ((1403185440.0, 0), (1, 0))
Now let's examine just the first part of your return statement in your function.
min(x1[0][0], x2[0][0])
So, x1 = (467401899, 485678002). Cool. Now, what's x1[0]? Well, that's 467401899. Obviously. But wait! What's x1[0][0]? You're tryinig to get the zeroth index of the item at x1[0], but the item at x1[0] isn't a list or a tuple, it's just an int. And objects of <type 'int'> don't have a method called getitem.
To summarize: you're digging too deep into objects that are not nested that deeply. Think carefully about what you are passing into myFunc2, and how deep your objects are.
I think the first part of the return statement for myFunc2 should look like:
return min(x1[0], x2[0][0]). You can index deeper on x2 because x2 has more deeply nested tuples!
When I run the following, it works just fine:
a = sc.parallelize([((7401899, 5678002), ((1403185440.0, 0), (1, 0))),
((82628194, 22251869), ((0, 1403185452.0), (0, 1))),
((2162276, 98056200), ((1403185451.0, 0), (1, 0))),
((1509420, 4827510), ((1403185449.0, 0), (1, 0))),
((7974923, 9235930), ((1403185450.0, 0), (1, 0))),
((250259, 6876774), ((0, 1403185450.0), (0, 1))),
((642369, 6876774), ((0, 1403185450.0), (0, 1))),
((82628194, 22251869), ((0, 1403185452.0), (0, 1))),
((2162276, 98056200), ((1403185451.0, 0), (1, 0)))])
b = a.map(lambda x: (x[0], x[1])).reduceByKey(myFunc2)
b.collect()
[((1509420, 4827510), ((1403185449.0, 0), (1, 0))),
((2162276, 98056200), (1403185451.0, 1403185451.0, 0, 0, 2, 0)),
((7974923, 9235930), ((1403185450.0, 0), (1, 0))),
((7401899, 5678002), ((1403185440.0, 0), (1, 0))),
((642369, 6876774), ((0, 1403185450.0), (0, 1))),
((82628194, 22251869), (0, 0, 1403185452.0, 1403185452.0, 0, 2)),
((250259, 6876774), ((0, 1403185450.0), (0, 1)))]