I have the following list of variables and a mastervariable
a = (1,5,7)
b = (1,3,5)
c = (2,2,2)
d = (5,2,8)
e = (5,5,8)
mastervariable = (3,2,5)
I'm trying to check if 2 elements in each variable exist in the master variable, such that the above would show B (3,5) and D (5,2) as being elements with at least 2 elements matching in the mastervariable. Also note that using sets would result in C showing up as matchign but I don't want to count C cause only 'one' of the elements in C are in mastervariable (i.e. 2 only shows up once in mastervariable not twice)
I currently have the very inefficient:
if current_variable[0]==mastervariable[0]:
if current_variable[1] = mastervariable[1]:
True
elif current_variable[2] = mastervariable[1]:
True
#### I don't use OR here because I need to know which variables match.
elif current_variable[1] == mastervariable[0]: ##<-- I'm now checking 2nd element
etc. etc.
I then continue to iterate like the above by checking each one at a time which is extremely inefficient. I did the above because using a FOR statement resulted in me checking the first element twice which was incorrect:
For i in a:
for j in a:
### this checked if 1 was in the master variable and not 1,5 or 1,7
Is there a way to use 2 FOR statement that allows me to check 2 elements in a list at once while skipping any element that has been used already? Alternatively, can you suggest an efficient way to do what I'm trying?
Edit: Mastervariable can have duplicates in it.
For the case where matching elements can be duplicated so that set breaks, use Counter as a multiset - the duplicates between a and master are found by:
count_a = Counter(a)
count_master = Counter(master)
count_both = count_a + count_master
dups = Counter({e : min((count_a[e], count_master[e])) for e in count_a if count_both[e] > count_a[e]})
The logic is reasonably intuitive: if there's more of an item in the combined count of a and master, then it is duplicated, and the multiplicity is however many of that item are in whichever of a and master has less of them.
It gives a Counter of all the duplicates, where the count is their multiplicity. If you want it back as a tuple, you can do tuple(dups.elements()):
>>> a
(2, 2, 2)
>>> master
(1, 2, 2)
>>> dups = Counter({e : min((count_a[e], count_master[e])) for e in count_a if count_both[e] > count_a[e]})
>>> tuple(dups.elements())
(2, 2)
Seems like a good job for sets. Edit: sets aren't suitable since mastervariable can contain duplicates. Here is a version using Counters.
>>> a = (1,5,7)
>>>
>>> b = (1,3,5)
>>>
>>> c = (2,2,2)
>>>
>>> d = (5,2,8)
>>>
>>> e = (5,5,8)
>>> D=dict(a=a, b=b, c=c, d=d, e=e)
>>>
>>> from collections import Counter
>>> mastervariable = (5,5,3)
>>> mvc = Counter(mastervariable)
>>> for k,v in D.items():
... vc = Counter(v)
... if sum(min(count, vc[item]) for item, count in mvc.items())==2:
... print k
...
b
e
Related
Given three or more variables, I want to find the name of the variable with the min value.
I can get the min value from the list, and I can get the index within the list of the min value. But I want the variable name.
I feel like there's another way to go about this that I'm just not thinking of.
a = 12
b = 9
c = 42
cab = [c,a,b]
# yields 9 (the min value)
min(cab)
# yields 2 (the index of the min value)
cab.index(min(cab))
What code would yield 'b'?
The magic of vars prevents you from having to make a dictionary up front if you want to have things in instance variables:
class Foo():
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def min_name(self, names = None):
d = vars(self)
if not names:
names = d.keys()
key_min = min(names, key = (lambda k: d[k]))
return key_min
In action
>>> x = Foo(1,2,3)
>>> x.min_name()
'a'
>>> x.min_name(['b','c'])
'b'
>>> x = Foo(5,1,10)
>>> x.min_name()
'b'
Right now it'll crash if you pass an invalid variable name in the parameter list for min_name, but that's resolvable.
You can also update the dictionary and it's reflected in the source
def increment_min(self):
key = self.min_name()
vars(self)[key] += 1
Example:
>>> x = Foo(2,3,4)
>>> x.increment_min()
>>> x.a
3
You cannot get the name of the variable with the minimum/maximum value like this*, since as #jasonharper commented: cab is nothing more than a list containing three integers; there is absolutely no connection to the variables that those integers originally came from.
A simple workaround is to user pairs, like this:
>>> pairs = [("a", 12), ("b", 9), ("c", 42)]
>>> min(pairs)
('b', 9)
>>> min(pairs)[0]
'b'
See Green Cloak Guy's answer, but if you want to go for readability, I suggest following a similar approach to mine.
You'd have to get very creative for this to work, and the only solution I can think of is rather inefficient.
You can get the memory address of the data b refers to fairly easily:
>>> hex(id(b))
'0xaadd60'
>>> hex(id(cab[2]))
'0xaadd60'
To actually correspond that with a variable name, though, the only way to do that would be to look through the variables and find the one that points to the right place.
You can do this by using the globals() function:
# get a list of all the variable names in the current namespace that reference your desired value
referent_vars = [k for k,v in globals().items() if id(v) == id(cab[2])]
var_name = referent_vars[0]
There are two big problems with this solution:
Namespaces - you can't put this code in a function, because if you do that and then call it from another function, then it won't work.
Time - this requires searching through the entire global namespace.
The first problem could be alleviated by additionally passing the current namespace in as a variable:
def get_referent_vars(val, globals):
return [k for k,v in globals.items() if id(v) == id(val)]
def main():
a = 12
b = 9
c = 42
cab = [a, b, c]
var_name = get_referent_vars(
cab[cab.index(min(cab))],
globals()
)[0]
print(var_name)
# should print 'b'
It seems for some reason that a dict can not have a non-duplicate key which is bitarray()
ex.:
data = {}
for _ in xrange(10):
ba = ...generate repeatable bitarrays ...
data[ba] = 1
print ba
{bitarray('11011'): 1, bitarray('11011'): 1, bitarray('11011'): 1, bitarray('01111'): 1, bitarray('11110'): 1, bitarray('11110'): 1, bitarray('01111'): 1, bitarray('01111'): 1, bitarray('11110'): 1, bitarray('11110'): 1}
You can clearly see that duplicate are stored as different keys (f.e. first two elements) !! which is weird. What could be the reason.
My goal is simply to count the number of times a bit pattern shows up, and of course Dict's are perfect for this, but it seems that bitarray() for some reason is opaque to the hashing algorithm.
btw.. i have to use bitarray(), cause i do 10000 bits+ patterns.
Any other idea of efficient way of counting occurrence of bit pattens ..
This answer addresses your first confusion regarding duplicate dictionary keys and I assume you're referring to bitarray() from bitarray module, *I've not used this module myself.
In your example above, you're not actually getting duplicate dictionary keys, you might see them that way, but they're duplicates to the naked eye only, for instance:
>>> class X:
... def __repr__(self):
... return '"X obj"'
...
>>> x1 = X()
>>> x2 = X()
>>> d = {x1:1, x2:2}
>>> d
{"X obj": 2, "X obj": 1}
But x1 isn't exactly equals to to x2 and hence they're not duplicates, they're distinct objects of class X:
>>> x1 == x2
False
>>> #same as
... id(x1) == id(x2)
False
>>> #same as
...x1 is x2
False
Moreover, because X class defines __repr__ which returns the string representation for its objects, you would think dictionary d has duplicate keys, again there are no duplicated keys nor are the keys of type str; key of value 1 is X object and key of value 2 is another object of X -- literally two different objects with a single string representation returned by their class's __repr__ method:
>>> # keys are instance of X not strings
... d
{"X obj": 2, "X obj": 1}
>>> d["X obj"]
KeyError: 'X obj'
>>>[x1]
1
>>>[x2]
2
Till BitArray 0.8.1 (or later) I believe it does not satisfy the hash invariant property.
To work around it, you should convert the bit array to byte format as follows.
>>> from bitarray import bitarray
>>> l = [bitarray('11111'), bitarray('11111'), bitarray('11010'), bitarray('11110'), bitarray('11111'), bitarray('11010')]
>>> for x in l: ht[x.tobytes()] = 0
...
>>> for x in l: ht[x.tobytes()] += 1
...
>>> ht
{'\xf8': 3, '\xf0': 1, '\xd0': 2}
Remember you can get back the bitarray from the byte format by using the command frombytes(byte). Though, in this case you will have to keep track of the size of bitarray explicitly as it will return bitarray of size multiple of 8.
If you want to keep the bitarray in the dictionary also:
>>> from bitarray import bitarray
>>> l = [bitarray('11111'), bitarray('11111'), bitarray('11010'), bitarray('11110'), bitarray('11111'), bitarray('11010')]
>>> ht = {}
>>> for x in l: ht[x.tobytes()] = (0, x)
...
>>> for x in l:
... old_count = ht[x.tobytes()][0]
... ht[x.tobytes()] = (old_count+1, x)
...
>>> ht
{'\xf8': (3, bitarray('11111')), '\xf0': (1, bitarray('11110')), '\xd0': (2, bitarray('11010'))}
>>> for x,y in ht.iteritems(): print(y)
...
(3, bitarray('11111'))
(1, bitarray('11110'))
(2, bitarray('11010'))
I solved it :
desc = bitarray(res).to01()
if desc in data : data[desc] += 1
else : data[desc] = 1
gosh I miss perl no-nonsense autovivification :)
I have tried this, for some unknown reason when it prints h, it prints None, so i thought if it counts the number of None printed then divided by 2 it will give the number of duplicates, but i cant use function count here
a= [1,4,"hii",2,4,"hello","hii"]
def duplicate(L):
li=[]
lii=[]
h=""
for i in L:
y= L.count(i)
if y>1:
h=y
print h
print h.count(None)
duplicate(a)
Use the Counter container:
from collections import Counter
c = Counter(['a', 'b', 'a'])
c is now a dictionary with the data: Counter({'a': 2, 'b': 1})
If you want to get a list with all duplicated elements (with no repetition), you can do as follows:
duplicates = filter(lambda k: c[k] > 1, c.iterkeys())
If you want to only count the duplicates, you can then just set
duplicates_len = len(duplicates)
You can use a set to get the count of unique elements, and then compare the sizes - something like that:
def duplicates(l):
uniques = set(l)
return len(l) - len(uniques)
i found an answer which is
a= [1,4,"hii",2,4,"hello",7,"hii"]
def duplicate(L):
li=[]
for i in L:
y= L.count(i)
if y>1:
li.append(i)
print len(li)/2
duplicate(a)
the answer by egualo is much better, but here is another way using a dictionary.
def find_duplicates(arr):
duplicates = {}
duplicate_elements = []
for element in arr:
if element not in duplicates:
duplicates[element] = False
else:
if duplicates[element] == False:
duplicate_elements.append(element)
duplicates[element] = True
return duplicate_elements
It's pretty simple and doesn't go through the lists twice which is kind of nice.
>> test = [1,2,3,1,1,2,2,4]
>> find_duplicates(test)
[1, 2]
I have a dictionary
k = {'a':[7,2,3],'b':[7,2,7], 'c': [8,9,10]}
where is each val is a list. I want to delete the ith term(depending on condition) in a val without going out of range. this is code for it
for i in range(len(k['a'])):
if k['a'][i] == k['b'][i]:
pass
else:
for key in k:
del [k[key][i]]
This would work return a dictionary equivalent to this
{'a':[7,2],'b':[7,2], 'c': [8,9]}
However if the dictionary was this
k = {'a':[6,2,3],'b':[7,2,7], 'c': [8,9,10]}
I would get this Error
list index out of range
How I delete key vals so I don't get this error?
The issue is that when you delete one item from each array, the size of these arrays decreases by one. Thus, the main loop becomes one iteration too long.
An illustration of the problem:
>>> a = [1, 2, 3]
>>> i = 2
>>> a[i]
3
>>> len(a)
3
>>> del [a[1]]
>>> a
[1, 3]
>>> len(a)
2
>>> a[i] # used to work
IndexError: list index out of range
In order for the index and the loop duration to work out you need to do something like this:
i = 0
while i < len(k['a']):
if k['a'][i] == k['b'][i]:
i += 1
else:
for key in k:
del [k[key][i]]
If i am doing some math functions for different variables for example:
a = x - y
b = x**2 - y**2
c = (x-y)**2
d = x + y
How can i find the minimum value out of all the variables. For example:
a = 4
b = 7
c = 3
d = 10
So the minimum value is 3 for c. How can i let my program do this.
What have i thought so far:
make a list
append a,b,c,d in the list
sort the list
print list[0] as it will be the smallest value.
The problem is if i append a,b,c,d to a list i have to do something like:
lst.append((a,b,c,d))
This makes the list to be -
[(4,7,3,10)]
making all the values relating to one index only ( lst[0] )
If possible is there any substitute to do this or any way possible as to how can i find the minimum!
LNG - PYTHON
Thank you
You can find the index of the smallest item like this
>>> L = [4,7,3,10]
>>> min(range(len(L)), key=L.__getitem__)
2
Now you know the index, you can get the actual item too. eg: L[2]
Another way which finds the answer in the form(index, item)
>>> min(enumerate(L), key=lambda x:x[1])
(2, 3)
I think you may be going the wrong way to solving your problem, but it's possible to pull values of variable from the local namespace if you know their names. eg.
>>> a = 4
>>> b = 7
>>> c = 3
>>> d = 10
>>> min(enumerate(['a', 'b', 'c', 'd']), key=lambda x, ns=locals(): ns[x[1]])
(2, 'c')
a better way is to use a dict, so you are not filling your working namespace with these "junk" variables
>>> D = {}
>>> D['a'] = 4
>>> D['b'] = 7
>>> D['c'] = 3
>>> D['d'] = 10
>>> min(D, key=D.get)
'c'
>>> min(D.items(), key=lambda x:x[1])
('c', 3)
You can see that when the correct data structure is used, the amount of code required is much less.
If you store the numbers in an list you can use a reduce having a O(n) complexity due the list is not sorted.
numbers = [999, 1111, 222, -1111]
minimum = reduce(lambda mn, candidate: candidate if candidate < mn else mn, numbers[1:], numbers[0])
pack as dictionary, find min value and then find keys that have matching values (possibly more than one minimum)
D = dict(a = 4, b = 7, c = 3, d = 10)
min_val = min(D.values())
for k,v in D.items():
if v == min_val: print(k)
The buiit-in function min will do the trick. In your example, min(a,b,c,d) will yield 3.