RDD creation and variable binding

RDD creation and variable binding - python

I have a very simple code:
def fun(x, n):
return (x, n)
rdds = []
for i in range(2):
rdd = sc.parallelize(range(5*i, 5*(i+1)))
rdd = rdd.map(lambda x: fun(x, i))
rdds.append(rdd)
a = sc.union(rdds)
print a.collect()
I had expected the output to be the following:
[(0, 0), (1, 0), (2, 0), (3, 0), (4, 0), (5, 1), (6, 1), (7, 1), (8, 1), (9, 1)]
However, the output is the following:
[(0, 1), (1, 1), (2, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1), (8, 1), (9, 1)]
This is bewildering, to say the least.
It seems, due to lazy evaluation of RDDs, the value of i that is being used to create RDDs is the one it bears when collect() is called, which is 1 (from the last run of the for loop).
Now, both elements of the tuple are derived from i.
But it seems, for the first element of the tuple, i bears values 0 and 1 while for the second element of the tuple i bears the value 2.
Can somebody please explain what's happening?
Thanks.

just change
rdd = rdd.map(lambda x: fun(x, i))
to
rdd = rdd.map(lambda x, i=i: (x, i))
That is only about Python, look at this
https://docs.python.org/2.7/tutorial/controlflow.html#default-argument-values

sc.parallelize() is an action which will be executed instantly. So both the values of i i.e 0 and 1 will be used.
But in case of rdd.map() only the last value of i will be used when you call collect() later.
rdd = sc.parallelize(range(5*i, 5*(i+1)))
rdd = rdd.map(lambda x: fun(x, i))
Here rdd.map wont transform the rdd, it will just create DAG(Directed Acyclic Graph), i.e lambda function will not be applied to elements of rdd.
When you call collect(), then the lambda function will be called but by that time i has a value of 1. If you reassign i=10 before calling collect then that value of i will be used.

Related

Find common union groups among tuples in a set

I need help to write a function that:
takes as input set of tuples
returns the number of tuples that has unique numbers
Example 1:
# input:
{(0, 1), (3, 4), (0, 0), (1, 1), (3, 3), (2, 2), (1, 0)}
# expected output: 3
The expected output is 3, because:
(3,4) and (3,3) contain common numbers, so this counts as 1
(0, 1), (0, 0), (1, 1), and (1, 0) all count as 1
(2, 2) counts as 1
So, 1+1+1 = 3
Example 2:
# input:
{(0, 1), (2, 1), (0, 0), (1, 1), (0, 3), (2, 0), (0, 2), (1, 0), (1, 3)}
# expected output: 1
The expected output is 1, because all tuples are related to other tuples by containing numbers in common.

This may not be the most efficient algorithm for it, but it is simple and looks nice.
from functools import reduce
def unisets(iterables):
def merge(fsets, fs):
if not fs: return fsets
unis = set(filter(fs.intersection, fsets))
return {reduce(type(fs).union, unis, fs), *fsets-unis}
return reduce(merge, map(frozenset, iterables), set())
us = unisets({(0,1), (3,4), (0,0), (1,1), (3,3), (2,2), (1,0)})
print(us) # {frozenset({3, 4}), frozenset({0, 1}), frozenset({2})}
print(len(us)) # 3
Features:
Input can be any kind of iterable, whose elements are iterables (any length, mixed types...)
Output is always a well-behaved set of frozensets.

this code works for me
but check it maby there edge cases
how this solution?
def count_groups(marked):
temp = set(marked)
save = set()
for pair in temp:
if pair[1] in save or pair[0] in save:
marked.remove(pair)
else:
save.add(pair[1])
save.add(pair[0])
return len(marked)
image

Paths in Python/Sage

I've been working on this problem (https://imgur.com/a/nJEMfM9) asking me to plot all lattice paths in a nxn grid for the last week, and I have no idea how to proceed.
This is about as far as I've been able to get
def NE_lattice_paths(x,y):
Vn= vector([0,1])
Ve= vector([1,0])
plot(Vn) + plot(Ve, start=Vn)
I know I have to use vectors, and I have to use the "def" command to make a function, but how would I make a function that can plot every path and know to take a different one each time? What I wrote doesn't really make sense, but I could use some guidance on how to proceed. Thank you!

You can get all the paths with a nested for loop (or list comprehension).
So this will give all the paths.
def NE_lattice_paths(x,y):
paths = []
for i in range(x):
path = []
for j in range(y):
path.append((i,j))
paths.append(path)
return paths
result = NE_lattice_paths(5,3)
print(result)
result
[[(0, 0), (0, 1), (0, 2)], [(1, 0), (1, 1), (1, 2)], [(2, 0), (2, 1), (2, 2)], [(3, 0), (3, 1), (3, 2)], [(4, 0), (4, 1), (4, 2)]]
I will leave it as an excersize for the OP to do the animation...

Getting the correct max value from a list of tuples

My list of tuples look like this:
[(0, 0), (3, 0), (3, 3), (0, 3), (0, 0), (0, 6), (3, 6), (3, 9), (0, 9), (0, 6), (6, 0), (9, 0), (9, 3), (6, 3), (6, 0), (0, 3), (3, 3), (3, 6), (0, 6), (0, 3)]
It has the format of (X, Y) where I want to get the max and min of all Xs and Ys in this list.
It should be min(X)=0, max(X)=9, min(Y)=0, max(Y)=9
However, when I do this:
min(listoftuples)[0], max(listoftuples)[0]
min(listoftuples)[1], max(listoftuples)[1]
...for the Y values, the maximum value shown is 3 which is incorrect.
Why is that?

for the Y values, the maximum value shown is 3
because max(listoftuples) returns the tuple (9, 3), so max(listoftuples)[0] is 9 and max(listoftuples)[1] is 3.
By default, iterables are sorted/compared based on the values of the first index, then the value of the second index, and so on.
If you want to find the tuple with the maximum value in the second index, you need to use key function:
from operator import itemgetter
li = [(0, 0), (3, 0), ... ]
print(max(li, key=itemgetter(1)))
# or max(li, key=lambda t: t[1])
outputs
(3, 9)

Here is a simple way to do it using list comprehensions:
min([arr[i][0] for i in range(len(arr))])
max([arr[i][0] for i in range(len(arr))])
min([arr[i][1] for i in range(len(arr))])
max([arr[i][1] for i in range(len(arr))])
In this code, I have used a list comprehension to create a list of all X and all Y values and then found the min/max for each list. This produces your desired answer.
The first two lines are for the X values and the last two lines are for the Y values.

Tuples are ordered by their first value, then in case of a tie, by their second value (and so on). That means max(listoftuples) is (9, 3). See How does tuple comparison work in Python?
So to find the highest y-value, you have to look specifically at the second elements of the tuples. One way you could do that is by splitting the list into x-values and y-values, like this:
xs, ys = zip(*listoftuples)
Or if you find that confusing, you could use this instead, which is roughly equivalent:
xs, ys = ([t[i] for t in listoftuples] for i in range(2))
Then get each of their mins and maxes, like this:
x_min_max, y_min_max = [(min(L), max(L)) for L in (xs, ys)]
print(x_min_max, y_min_max) # -> (0, 9) (0, 9)
Another way is to use NumPy to treat listoftuples as a matrix.
import numpy as np
a = np.array(listoftuples)
x_min_max, y_min_max = [(min(column), max(column)) for column in a.T]
print(x_min_max, y_min_max) # -> (0, 9) (0, 9)
(There's probably a more idiomatic way to do this, but I'm not super familiar with NumPy.)

How can I add a random binary info into current 'coordinate'? (Python)

This is part of the code I'm working on: (Using Python)
import random
pairs = [
(0, 1),
(1, 2),
(2, 3),
(3, 0), # I want to treat 0,1,2,3 as some 'coordinate' (or positional infomation)
]
alphas = [(random.choice([1, -1]) * random.uniform(5, 15), pairs[n]) for n in range(4)]
alphas.sort(reverse=True, key=lambda n: abs(n[0]))
A sample output looks like this:
[(13.747649802587832, (2, 3)),
(13.668274782626717, (1, 2)),
(-9.105374057105703, (0, 1)),
(-8.267840318934667, (3, 0))]
Now I'm wondering is there a way I can give each element in 0,1,2,3 a random binary number, so if [0,1,2,3] = [0,1,1,0], (By that I mean if the 'coordinates' on the left list have the corresponding random binary information on the right list. In this case, coordinate 0 has the random binary number '0' and etc.) then the desired output using the information above looks like:
[(13.747649802587832, (1, 0)),
(13.668274782626717, (1, 1)),
(-9.105374057105703, (0, 1)),
(-8.267840318934667, (0, 0))]
Thanks!!

One way using dict:
d = dict(zip([0,1,2,3], [0,1,1,0]))
[(i, tuple(d[j] for j in c)) for i, c in alphas]
Output:
[(13.747649802587832, (1, 0)),
(13.668274782626717, (1, 1)),
(-9.105374057105703, (0, 1)),
(-8.267840318934667, (0, 0))]

You can create a function to convert your number to the random binary assigned. Using a dictionary within this function would make sense. Something like this should work where output1 is that first sample output you provide and binary_code would be [0, 1, 1, 0] in your example:
def convert2bin(original, binary_code):
binary_dict = {n: x for n, x in enumerate(binary_code)}
return tuple([binary_code[x] for x in original])
binary_code = np.random.randint(2, size=4)
[convert2bin(x[1], binary_code) for x in output1]

Issue with python recursion

I have the following code written in python 2.7 to find n time Cartesian product of a set (AxAxA...xA)-
prod=[]
def cartesian_product(set1,set2,n):
if n>=1:
for x in set1:
for y in set2:
prod.append('%s,%s'%(x,y))
#prod='[%s]' % ', '.join(map(str, prod))
#print prod
cartesian_product(set1,prod,n-1)
else:
print prod
n=raw_input("Number of times to roll: ")
events=["1","2","3","4","5","6"]
cartesian_product(events,events,1)
This works properly when n=1. But changing the parameter value from cartesian_product(events,events,1) to cartesian_product(events,events,2) doesn't work. Seems there's an infinite loop is running. I can't figure where exactly I'm making a mistake.

When you pass the reference to the global variable prod to the recursive call, you are modifying the list that set2 also references. This means that set2 is growing as you iterate over it, meaning the iterator never reaches the end.
You don't need a global variable here. Return the computed product instead.
def cartesian_product(set1, n):
# Return a set of n-tuples
rv = set()
if n == 0:
# Degenerate case: A^0 == the set containing the empty tuple
rv.add(())
else:
rv = set()
for x in set1:
for y in cartesian_product(set1, n-1):
rv.add((x,) + y)
return rv
If you want to perserve the order of the original argument, use rv = [] and rv.append instead.

def cartesian_product(*X):
if len(X) == 1: #special case, only X1
return [ (x0, ) for x0 in X[0] ]
else:
return [ (x0,)+t1 for x0 in X[0] for t1 in cartesian_product(*X[1:]) ]
n=int(raw_input("Number of times to roll: "))
events=[1,2,3,4,5,6]
prod=[]
for arg in range(n+1):
prod.append(events)
print cartesian_product(*prod)
Output:
Number of times to roll: 1
[(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)]
you can also pass string in your events list but it'll print string in tuple also.

inside the recursive call cartesian_product(set1,prod,n-1) you are passing the list prod, and you are again appending values to it, so it just grows over time and the inner loop never terminates. Perhaps you might need to change your implementation.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

RDD creation and variable binding - python

just change rdd = rdd.map(lambda x: fun(x, i)) to rdd = rdd.map(lambda x, i=i: (x, i)) That is only about Python, look at this https://docs.python.org/2.7/tutorial/controlflow.html#default-argument-values

Related

Find common union groups among tuples in a set

Paths in Python/Sage

Getting the correct max value from a list of tuples

How can I add a random binary info into current 'coordinate'? (Python)

Issue with python recursion

Categories

Resources