Suppose we have the following set, S, and the value v:
S = {(0,1),(2,3),(4,5)}
v = 3
I want to test if v is the second element of any of the pairs within the set. My current approach is:
for _, y in S:
if y == v:
return True
return False
I don't really like this, as I have to put it in a separate function and something is telling me there's probably a nicer way to do it. Can anyone shed some light?
The any function is tailor-made for this:
any( y == v for (_, y) in S )
If you have a large set that doesn't change often, you might want to project the y values onto a set.
yy = set( y for (_, y) in S )
v in yy
Of course, this is only of benefit if you compute yy once after S changes, not before every membership test.
You can't do an O(1) lookup, so you don't get much benefit from having a set. You might consider building a second set, especially if you'll be doing lots of lookups.
S = {(0,1), (2,3), (4,5)}
T = {x[1] for x in S}
v = 3
if v in T:
# do something
Trivial answer is any (see Marcelo's answer).
Alternative is zip.
>>> zip(*S)
[(4, 0, 2), (5, 1, 3)]
>>> v in zip(*S)[1]
True
Related
I currently have the code below working fine:
Can someone help me solve the collision created from having two keys with the same number in the dictionary?
I tried multiple approach (not listed here) to try create an array to handle it but my approaches are still unsuccessful.
I am using #python3.7
def find_key(dic1, n):
'''
Return the key '3' from the dict
below.
'''
d = {}
for x, y in dic1.items():
# swap keys and values
# and update the result to 'd'
d[y] = x
try:
if n in d:
return d[y]
except Exception as e:
return (e)
dic1 = {'james':2,'david':3}
# Case to test that return ‘collision’
# comment 'dic1' above and replace it by
# dic1 below to create a 'collision'
# dic1 = {'james':2,'david':3, 'sandra':3}
n = 3
print(find_key(dic1,n))
Any help would be much appreciated.
You know there should be multiple returns, so plan for that in advance.
def find_keys_for_value(d, value):
for k, v in d.items():
if v == value:
yield k
data = {'james': 2, 'david': 3, 'sandra':3}
for result in find_keys_for_value(data, 3):
print (result)
You can use a defaultdict:
from collections import defaultdict
def find_key(dct, n):
dd = defaultdict(list)
for x, y in dct.items():
dd[y].append(x)
return dd[n]
dic1 = {'james':2, 'david':3, 'sandra':3}
print(find_key(dic1, 3))
print(find_key(dic1, 2))
print(find_key(dic1, 1))
Output:
['david', 'sandra']
['james']
[]
Building a defaultdict from all keys and values is only justified if you will repeatedly search for keys of the same dict given different values, though. Otherwise, the approach of Kenny Ostrom is preferrable. In any case, the above makes little sense if left as it stands.
If you are not at ease with generators and yield, here is the approach of Kenny Ostrom translated to lists (less efficient than generators, better than the above for one-shot searches):
def find_key(dct, n):
return [x for x, y in dct.items() if y == n]
The output is the same as above.
Heyo everyone, I have a question.
I have three variables, rF, tF, and dF.
Now these values can range from -100 to +100. I want to check all of them and see if they are less than 1; if they are, set them to 1.
An easy way of doing this is just 3 if statements, like
if rF < 1:
rF = 1
if tF < 1:
tF = 1
if dF < 1:
dF = 1
However, as you can see, this looks bad, and if i had, say 50 of these values, this could get out of hand quite easily.
I tried to put them in an array like so:
for item in [rF, tF, dF]:
if item < 1:
item = 1
However this doesn't work. I believe that when you do that you create a completely different object (the array), and when you change the items you are not changing the variables themselves but the values of the array.
So my question is: What is an elegant way of doing this?
Why not use a dictionary, if you've only got three variables of which to keep track?
rF, tF, dF = 100, -100, 1
d = {'rF': rF, 'tF': tF, 'dF': dF}
for k in d:
if d[k] < 1:
d[k] = 1
print(d)
{'rF': 100, 'tF': 1, 'dF': 1}
Then if you're referencing any of those values later, you can simply do this (as a trivial example):
def f(var):
print("'%s' is equal to %d" % (var, d[var]))
>>> f('rF')
'rF' is equal to 100
If you really wanted to use lists, and you knew the order of your list, you could do this (but dictionaries are made for this type of problem):
arr = [rF, tF, dF]
arr = [1 if x < 1 else x for x in arr]
print(arr)
[100, 1, 1]
Note that the list comprehension approach won't actually change the values of rF, tF, and dF.
You can simply use a dictionary and then unpack the dict:
d = {'rF': rF, 'tF': tF, 'dF': dF}
for key in d:
if d[key] < 1:
d[key] = 1
rF, tF, dF = d['rF'], d['tF'], d['dF']
You can use the following instead of the last line:
rF, tF, dF = map(d.get, ('rF', 'tF', 'dF'))
Here's exactly what you asked for:
rF = -3
tF = 9
dF = -2
myenv = locals()
for k in list(myenv.keys()):
if len(k) == 2 and k[1] == "F":
myenv[k] = max(1, myenv[k])
print(rF, tF, dF)
# prints 1 9 1
This may accidentally modify any variables you don't really want to change, so I recommend using a proper data structure instead of hacking the user environment.
Edit: Fixed an error for RuntimeError: dictionary changed size during iteration. Dictionaries cannot be iterated over and modified at the same time. Avoid this by first copying the dictionary keys, and iterating over the original keys instead of the actual dictionary. Should work in Python 2 and 3 now, just Python 2 before.
Use List Comprehension and max function.
items = [-32, 0, 43]
items = [max(1, item) for item in items]
rF, tF, dF = items
print(rF, tF, dF)
Trying to convert some pySpark over to the scala equivalent and I am having issues with the correct syntax for a double list comprehension. The code takes a list of key values and returns a list of values in tuple form that occurred for the same key. Meaning (2, ('user1','user2','user3')) would return (('user1','user2'),('user1','user3'),('user2','user3')).
#source rdd
[(2, ['user1', 'user3']), (1, ['user1', 'user2', 'user1']), (3, ['user2', 'user4', 'user4', 'user3'])]
#current list comprehension in pySpark
rdd2 = rdd.flatMap(lambda kv: [(x, y) for x in kv[1] for y in kv[1] if x < y])
//scala attempt to make equivelent is currently throwing errors for syntax issues
val rdd2 = rdd.flatMap((x,y) => for (x <- _(1)) yield x for(y <- _(1)) yield y if x < y)
Scala supports multiple iterators in a comprehension.
Try this
val rdd2 = rdd.flatMap {
case (_, v) => for {
x <- v
y <- v if x < y
} yield (x,y)
}
Notes
The underscore won't work as you did it (twice); either way unwrapping the tuple with Scala's pattern matching is clearer (and closer to Python*). Since you don't use the first tuple item, you can use an undescore there to "throw it away".
*FWIW, you could do the Python slightly neater:
lambda (_,v): [(x, y) for x in v for y in v if x < y]
While the answer provided by Nick B translates your code directly it makes more sense to use combinations here:
rdd.values.flatMap(_.toSeq.distinct.sorted.combinations(2))
I'm creating a recommendation engine for work and ended up with an 8,000 by 8,000 item-item similarity matrix. The matrix is pretty sparse so I set out to make a dictionary with many keys where each key points to a list which is a sorted array of product recommendations (in the form of tuples). I got this to work, see below.
In [191]: dictionary["15454-M6-ECU2="]
Out[191]:
[('15454-M-TSCE-K9=', 0.8),
('15454-M2-AC=', 0.52),
('15454-M6-DC-RF', 0.45),
('15454-M6-ECU2=', 0.63)]
However, I now have a problem in interpreting the result:
In [204]: sys.getsizeof(dictionary)
Out[204]: 786712
In [205]: sys.getsizeof(similarity_matrix)
Out[205]: 69168
Even though I eliminated a ton of zeros (which were each being represented with either 32 or 64 bits) why did the object size increase even though we eliminated the sparsity in the matrix?
sys.getsizeof only returns the size of the container, not container plus size of the items inside. The dict returns the same size regardless of the size of the contained values and its still only 98 bytes per key/value pair. Its storing a reference to the key and a reference to the value plus other overhead for the hash /buckets.
>>> sys.getsizeof(dict((i,'a'*10000) for i in range(8000)))
786712
>>> sys.getsizeof(dict((i,'a'*1) for i in range(8000)))
786712
>>> 786712/8000
98
A tuple is much smaller, only storing the reference itself.
>>> sys.getsizeof(tuple((i,'a'*10000) for i in range(8000)))
64056
>>> sys.getsizeof(tuple((i,'a'*1) for i in range(8000)))
64056
>>> 64056/8000
8
According to the size of your dictionary it seems that you have one key/value pair for each possible key (even where there might be no other keys that are similar to that key).
I imagine your code looks something like this:
# initialise sparse dict with one empty list of similar nodes for each node
sparse_dict = dict((key, []) for key in range(1000))
sparse_dict[0].append((2, 0.5)) # 0 is similar to 2 by 50%
def get_similarity(d, x, y):
for key, value in d[x]:
if key == y:
return value
return 0
assert get_similarity(sparse_dict, 0, 1) == 0
assert get_similarity(sparse_dict, 0, 2) == 0.5
However, using the get method of a dict you can implement even sparser dictionaries
# initialise empty mapping -- literally an empty dict
very_sparse_dict = {}
very_sparse_dict[0] = [(2, 0.5)] # 0 is similar to 2 by 50%
def get_similarity2(d, x, y):
for key, value in d.get(x, ()):
if key == y:
return value
return 0
# 0 not linked to 1, so 0% similarity
assert get_similarity2(very_sparse_dict, 0, 1) == 0
# 0 and 2 are similar
assert get_similarity2(very_sparse_dict, 0, 2) == 0.5
# 1 not similar to anything as it is not even present in the dict
assert get_similarity2(very_sparse_dict, 1, 2) == 0
And the size of each dict is:
>>>> print("sparse_dict:", sys.getsizeof(sparse_dict))
sparse_dict: 49248
>>> print("very_sparse_dict", sys.getsizeof(very_sparse_dict))
very_sparse_dict: 288
Given a list of items, and a map from a predicate function to the "value" function, the code below applies "value" functions to the items satisfying the corresponding predicates:
my_re0 = re.compile(r'^([a-z]+)$')
my_re1 = re.compile(r'^([0-9]+)$')
my_map = [
(my_re0.search, lambda x: x),
(my_re1.search, lambda x: x),
]
for x in ['abc','123','a1']:
for p, f in my_map:
v = p(x)
if v:
print f(v.groups())
break
Is there a way to express the same with a single statement?
If I did not have to pass the value returned by the predicate to the "value" function then I could do
for x in ['abc','123','a1']:
print next((f(x) for p, f in my_map if p(x)), None)
Can something similar be done for the code above? I know, maybe it is better to leave these nested for loops, but I am just curious whether it is possible.
A bit less terse than Nate's ;-)
from itertools import product
comb = product(my_map, ['abc','123','a1'])
mapped = ((p(x),f) for (p,f),x in comb)
groups = (f(v.groups()) for v,f in mapped if v)
print next(groups), list(groups) # first match and the rest of them
[f(v.groups()) for x in ['abc','123','a1'] for p, f in my_map for v in [p(x)] if v]
You said more terse, right? ;^)
here is my version:
for x in ['abc','123','a1']:
print next((f(v.groups()) for p, f in my_map for v in [p(x)] if v), None)
this version does not iterate over the whole my_map but stops as soon as the first successful mapping is found.