Exclude specific column in rdd map - python

I have a huge dataset with about 20 columns.
I'm working with rdds in pyspark and need to do something like
rdd.map(lambda x: (x[9], x[:] - x[9]))
basically. create a ley value pair such that one of the columns is the key and rest of them are values. I'm unable to slice it in a way that makes sense.
i've tried
rdd.map(lambda x: (x[9], x[:] - x[9]))
rdd.map(lambda x: (x[9], x[:8] + x[10:]))
rdd.map(lambda x: (x[9], list(x[:8].append(x[10:]))))
none of it seems to be working. I'm not sure what the right way to do it would be

I would break the problem into steps.
# First we set it up
data = [(1,2,3,4,5,6,7,8,9,10)] # one row
rdd = spark.sparkContext.parallelize(data)
rdd.collect()
#[(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)]
Next we need a function that pops a value from a tuple and makes it a key.
def key_elem_to_rest(key_index, tup):
l = list(tup)
key = l.pop(key_index)
return {key: tuple(l)}
Next up we use it in the map
rdd.map(lambda x: key_elem_to_rest(0, x)).collect() # index = 0
#[{1: (2, 3, 4, 5, 6, 7, 8, 9, 10)}]
rdd.map(lambda x: key_elem_to_rest(5, x)).collect() # index = 5
#[{6: (1, 2, 3, 4, 5, 7, 8, 9, 10)}]

You can try using this:
rdd.filter(lambda x: x[0] != x[9]).map(lambda x: (x[9], [x[:-1]]))
This is checking if the x[9] is not a key and it is making it as a key and rest as value.

I finally figured it out myself.
units_rdd1 = units_rdd.map(lambda x: (x[9], list(x[0:9]+x[10:])))

Related

How to to remove all zeros from a list

I want to remove all zeros from the list after sorting in descending order.
for x in range (1,count):
exec("col"+str(x) + "=[]")
with open (xvg_input, 'r') as num:
line_to_end = num.readlines()
for line in line_to_end:
if "#" not in line and "#" not in line:
line=list(map(float,line.split()))
for x in range (2,count):
exec("col" +str (x)+ ".append(line["+ str(x-1) + "])")
exec("col" +str(x) + ".sort(reverse = True)")
exec("while (col"+str(x) + ".count(0.000)):")
exec("col" +str(x) +".remove(0.000)")
I am getting the syntax error. I am not getting where I am doing wrong. I just want to sort in descending order and delete all the zeroes.
Does this make sense
def remove_values(the_list, val):
return [value for value in the_list if value != val]
x = [1, 0, 3, 4, 0, 0, 3]
x = remove_values(x, 0)
print x
# [1, 3, 4, 3]
Try using filter method:
list = [9,8,7,6,5,4,3,2,1,0,0,0,0,0,0]
filter(lambda x: x != 0,a) #iterates items, returning the ones that meet the condition in the lambda function
# [9, 8, 7, 6, 5, 4, 3, 2, 1]

Filtering with a only one conditional

I'm trying to write a code that uses only lambdas, filter, map and reduce (its a riddle) that accepts a tuple of integers and a tuple of functions, and returns a new tuple of integers who only return one true from the list of functions:
As an example, if the tuple is (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) and the tuple of functions is (lambda x: x > 3, lambda x: x % 2 == 0) I should get a new tuple that looks like [2, 5, 7, 9] because they make only one of the two rules to return True. this is my code so far and I have no idea how to do that...
func = (lambda x: x > 3, lambda x: x % 2 == 0)
data = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
a = lambda func, data: tuple(filter(lambda x: tuple(filter(None, map(lambda f: f(x), func))), data))
print(a(func, data))
This code returns only the integers that apply to both of the terms, but I need to make it to just one.
Here it is:
a = lambda func, data: tuple(filter(lambda d: (sum(map(lambda f: f(d), func)) == 1), data))
Since this really is just a contrived exercise for the sake of it I won't endeavor explaining this horrible expression. But the way to get to it is to first write the code in a clear way with for loops, if blocks etc..., and then one step at a time replace each component with the appropriate map, filter etc...
The one trick I use here is automatic boolean conversion to int: sum(sequence_of_bool) == 1 means exactly one bool is True.
My full test code:
func = (lambda x: x > 3, lambda x: x % 2 == 0)
data = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
a = lambda func, data: tuple(filter(lambda d: (sum(map(lambda f: f(d), func)) == 1), data))
print(a(func, data))
(2, 5, 7, 9)
You can use bool.__xor__ to ensure that only one of the two functions in the func tuple is satisfied:
from functools import reduce
tuple(filter(lambda x: reduce(bool.__xor__, map(lambda f: f(x), func)), data))
This returns:
(2, 5, 7, 9)

Create a multiset from a Set X

in a Multiset it is allowed to have multiple elements
For Example. if X (normal set) = {0,2,4,7,10}, then ∆X (multiset) = {2,2,3,3,4,5,6,7,8,10}.
∆X denotes the multiset of all 􏰃(N 2) pairwise distances between points in X
How can i Write this in Python?
I have created a List X but i don't know how to put all differences in another list and order them.
I hope you can help me.
It is basically just one line.
import itertools
s = {0,2,4,7,10}
sorted([abs(a-b) for (a,b) in itertools.combinations(s,2)])
you can use itertools
import itertools
s = {0,2,4,7,10}
k = itertools.combinations(s,2)
distance = []
l = list(k)
for p in l:
distance.append(abs(p[1]-p[0]))
print(sorted(distance))
A simple way is to convert your set to a list, sort it, and then use a double for loop to compute the differences:
X = {0,2,4,7,10} # original set
sorted_X = sorted(list(X))
diffs = []
for i, a in enumerate(sorted_X):
for j, b in enumerate(sorted_X):
if j > i:
diffs.append(b-a)
print(diffs)
#[2, 4, 7, 10, 2, 5, 8, 3, 6, 3]
And if you want the diffs sorted as well:
print(sorted(diffs))
#[2, 2, 3, 3, 4, 5, 6, 7, 8, 10]
Another option that would work in this case is to use itertools.product:
from itertools import product
print(sorted([(y-x) for x,y in product(sorted_X, sorted_X) if y>x]))
#[2, 2, 3, 3, 4, 5, 6, 7, 8, 10]

Different ways to access tuples created with zip built-in function

I want to sort a tuple of integers(in decreasing order) and I then want to save a tuple with the order of indices after sorting in a set. The following piece of code does the job:
my_set = set()
l = (1, 3, 6, 10, 15, 21)
my_set.add(list(zip(*sorted(enumerate(l), key=lambda x: x[1], reverse=True)))[0])
If I evaluate my_set I now have {(5, 4, 3, 2, 1, 0)}
I was trying to do the same with the following code(i for indices, v for values):
my_set.add(i for i, v in zip(*sorted(enumerate(l), key=lambda x: x[1], reverse=True)))
It doesn't work in the same way. The set becomes { <generator object <genexpr> at 0x7f82e49ce360>}
Why is it that if I feed the zip result into a list I can access the tuples inside but I can't use the other syntax?
Is there an alternative way of obtaining some tuple created by zip without having to feed into a list and then indexing into it?
Look at the output that zip returns.
>>> list(zip(*sorted(enumerate(l), key=lambda x: x[1], reverse=True)))
[(5, 4, 3, 2, 1, 0), (21, 15, 10, 6, 3, 1)]
It's always going to be a list of 2 tuples - one tuple being the argsorted indices, and the other being the actual sorted items.
First up, you don't realise this because you're hashing a generator inside a set... but when you decide to exhaust the generator, be prepared for a
ValueError: too many values to unpack (expected 2)
In summary, you're iterating over zip incorrectly. You don't even need to iterate over zip, if it's just the first tuple you're interested in.
What you should instead do, is use next;
>>> my_set.add(next(zip(*sorted(enumerate(l), key=lambda x: x[1], reverse=True))))
>>> my_set
{(5, 4, 3, 2, 1, 0)}
Which gets you just the first tuple.

In Python how can I change the values in a list to meet certain criteria

In Python, I have several lists that look like variations of:
[X,1,2,3,4,5,6,7,8,9,X,11,12,13,14,15,16,17,18,19,20]
[X,1,2,3,4,5,6,7,8,9,10,X,12,13,14,15,16,17,18,19,20]
[0,X,2,3,4,5,6,7,8,9,10,11,X,13,14,15,16,17,18,19,20]
The X can fall anywhere. There are criteria where I put an X, but it's not important for this example. The numbers are always contiguous around/through the X.
I need to renumber these lists to meet a certain criteria - once there is an X, the numbers need to reset to zero. Each X == a reset. Each X needs to become a zero, and counting resumes from there to the next X. Results I'd want:
[0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,10]
[0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9]
Seems like a list comprehension of some type or a generator could help me here, but I can't get it right.
I'm new and learning - your patience and kindness are appreciated. :-)
EDIT: I'm getting pummeled with downvotes, like I've reposted on reddit or something. I want to be a good citizen - what is getting me down arrows? I didn't show code? Unclear question? Help me be better. Thanks!
Assuming the existing values don't matter this would work
def fixList(inputList, splitChar='X'):
outputList = inputList[:]
x = None
for i in xrange(len(outputList)):
if outputList[i] == splitChar:
outputList[i] = x = 0
elif x is None:
continue
else:
outputList[i] = x
x += 1
return outputList
eg
>>> a = ['X',1,2,3,4,5,6,7,8,9,'X',11,12,13,14,15,16,17,18,19,20]
>>> fixList(a)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> b = ['y',1,2,3,4,5,6,7,8,9,10,'y',12,13,14,15,16,17,18,19,20]
>>> fixList(b, splitChar='y')
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
EDIT: fixed to account for the instances where list does not start with either X or 0,1,2,...
Using the string 'X' as X and the_list as list:
[0 if i == 'X' else i for i in the_list]
This will return the filtered list.

Categories