I'm having a hard time with python and finding differences between two lists.
CMDB list:
ABC:NL1:SB6
ABC:NL2:SB6
ABC:NL3:SB6
ABC:NL4:SB6
NL9:SB9
NL5:SB4
NL6:SB7
DB list:
NL1:SB6
NL2:SB6
ABC:NL3:SB6
ABC:NL4:SB6
ABC:NL8:SB8
ABC:NL5:SB4
ABC:NL6:SB7
I would like to get output that finds differences:
NL9:SB9
ABC:NL8:SB8
I have tried
cmdb_fin = set(cmdb)
db_fin = set(db)
equal = db_fin.symmetric_difference(cmdb_fin)
but the output is like following because it compares exact strings to each other, not like "patterns"
ABC:NL5:SB4
NL6:SB7
ABC:NL2:SB6
NL2:SB6
ABC:NL8:SB8
NL5:SB4
ABC:NL6:SB7
NL9:SB9
ABC:NL1:SB6
NL1:SB6
Is there any way to get expected by me output?
criteria:
if any given string (block of chars) in CMDB list exists in DB list (it can be only part of a string), it should not be in output as it kinda exists in both lists. And of course in other way -> DB compared to CMD
for example NL5:SB4 from CMDB list matches ABC:NL5:SB4 from DB
In order to define a custom equality comparator when using python sets, you need to define a custom class with __eq__, __ne__, & __hash__ defined. Below is an example of how this could be achieved in your case, using the last two elements in each line to define whether two elements are equivalent.
Code:
class Line(object):
def __init__(self, s):
self.s = s
self.key = ':'.join(s.split(':')[-2:])
def __repr__(self):
return self.s
def __eq__(self, other):
if isinstance(other, Line):
return ((self.key == other.key))
else:
return False
def __ne__(self, other):
return (not self.__eq__(other))
def __hash__(self):
return hash(self.key)
cmdb = ['ABC:NL1:SB6', 'ABC:NL2:SB6', 'ABC:NL3:SB6', 'ABC:NL4:SB6', 'NL9:SB9',
'NL5:SB4', 'NL6:SB7']
db = ['NL1:SB6', 'NL2:SB6', 'ABC:NL3:SB6', 'ABC:NL4:SB6', 'ABC:NL8:SB8',
'ABC:NL5:SB4', 'ABC:NL6:SB7']
cmdb_fin = set(Line(l) for l in cmdb)
db_fin = set(Line(l) for l in db)
equal = db_fin.symmetric_difference(cmdb_fin)
Output:
>>> equal
{ABC:NL8:SB8, NL9:SB9}
Usage:
>>> Line('NL5:SB4') == Line('ABC:NL5:SB4')
True
Related
first of all, my current code:
class linkedlist(object):
def __init__(self, value, next = None):
self.value = value
self.next = next
def traverse(self):
field = self
while field != None:
print(field.value)
field = field.next
def equal(self, other):
while self and other and self.value== other.value:
self = self.next
other = other.next
if self and other:
if self.value!= other.value:
return False
else:
return True
My task is to compare two linked lists. If they are identical, the "equal"-Function should return "True", if not "False". The functionhead has to stay that way.
I tried to find a solution on my own for 3h and now i'm braindead. Can anyone give me some tips/help? I'm not the best programmer, so i'm sorry :(
You are almost there. You are using the correct strategy to traverse both linked lists and skipping any paired elements that are equal (your first loop).
Where you go wrong is that you then don't handle what comes after for all the possible scenarios. You know now that the two linked lists have a prefix of between 0 and N elements that are equal. You now have one of 4 options to consider:
self is exhausted, but other is not; other is longer so not equal, return False
other is exhausted, but self is not; self is longer so not equal, return False
self and other still have more elements, but the next elements for the two linked lists have unequal values. Return False
self and other are both exhausted, so have equal length. Their prefixes were equal, so the linked lists are equal. Return True.
You only handle option 3 right now. Given that 3 out of 4 scenarios lead to return False, it is easier to just test for scenario 4:
if not self and not other:
return True
return False
or, as a full method:
def equal(self, other):
while self and other and self.value == other.value:
self = self.next
other = other.next
if not self and not other:
return True
return False
I am wondering if there is a way to establish a relation between some special "vectors".
Example:
Suppose these are my vectors (I will only choose 3 special vectors):
a=[1,2,[1]]
b=[1,2,[2]]
c=[1,3,[1]]
and I want to add the following rules when comparing them (lexicographic order):
I wish to say that
a<b
because
a[0]=b[0] and a[1]=b[1] but *a[2]<b[2]*
but I also want to say
a<c
because
a[0]=b[0] and a[1]<c[1] and a[2]<=c[2]
but note that things are a little different for "b" and "c" since these are not comparable because even though
b[0]=c[0] and b[1]<c[1], the last term changes everything since b[2]>c[2]
In other words, the rules I am applying, will first compare the "normal" entries of two vectors x and y, if a certain entry of vector x is bigger than a certain entry of vector y, we take a look at the last entry. If the last entry of vector x is greater than Or equal, then we say x>y, if this is not the case, then x and y are not comparable.
In case all "normal" entries of x and y are the same, we compare the last entry. If the last entry of x is bigger than the last entry of y, we also say x>y.
I am thinking that this has to do with a while loop.
You can easily write a function to do what you described.
def special_lt(vec1, vec2):
You've defined the "normal" values as all but the last one, and the "special" values as the last one, so that's easy:
normal1, normal2 = vec1[:-1], vec2[:-1]
special1, special2 = vec1[-1], vec2[-1]
Now, if I understand it correctly, you want to do a lexicographical comparison of the normal values, and then defer to the special values if they're equal…
if normal1 == normal2:
return special1 < special2
… but otherwise use the special values as a check to make sure they're ordered in the same way as the normal values:
elif normal1 < normal2:
if special1 <= special2:
return True
raise ValueError('not comparable')
else:
if special2 <= special1:
return False
raise ValueError('not comparable')
Notice that for comparing the lists of normal values, and the lists of special values, I didn't use a loop; I just compared the lists. That's because lists already compare lexicographically (which, internally, is done with a loop, of course, but I don't have to write it).
You can make vector a subclass of list and overload the __lt__ and __gt__ methods so that the last item is checked before the default behavior. Also overload the __le__ and __ge__ methods for completeness:
class vector(list):
def __lt__(self, other):
lt = super().__lt__(other)
if lt and self[-1] > other[-1]:
return False
return lt
def __gt__(self, other):
gt = super().__gt__(other)
if gt and self[-1] < other[-1]:
return False
return gt
def __le__(self, other):
if self == other:
return True
return self < other
def __ge__(self, other):
if self == other:
return True
return self > other
so that:
a=vector([1,2,[1]])
b=vector([1,2,[2]])
c=vector([1,3,[1]])
print(a<b)
print(a<c)
print(b<c)
print(c<b)
print(b>c)
print(c>b)
print(b<=c)
print(c<=b)
will output:
True
True
False
False
False
False
False
False
EDIT: In light of the comments made below, I would also like to point out that this is case where functools.total_ordering doesn't work because of the atypical logic required by the OP, where one object can be not less than, not greater than and not equal to the other at the same time.
So if we define only the __lt__ method for the vector class and apply the total_ordering decorator:
from functools import total_ordering
#total_ordering
class vector(list):
def __lt__(self, other):
lt = super().__lt__(other)
if lt and self[-1] > other[-1]:
return False
return lt
The test code above would produce the following incorrect output instead:
True
True
False
False
False
True
True
False
Given a basic class Item:
class Item(object):
def __init__(self, val):
self.val = val
a list of objects of this class (the number of items can be much larger):
items = [ Item(0), Item(11), Item(25), Item(16), Item(31) ]
and a function compute that process and return a value.
How to find two items of this list for which the function compute return the same value when using the attribute val? If nothing is found, an exception should be raised. If there are more than two items that match, simple return any two of them.
For example, let's define compute:
def compute( x ):
return x % 10
The excepted pair would be: (Item(11), Item(31)).
You can check the length of the set of resulting values:
class Item(object):
def __init__(self, val):
self.val = val
def __repr__(self):
return f'Item({self.val})'
def compute(x):
return x%10
items = [ Item(0), Item(11), Item(25), Item(16), Item(31)]
c = list(map(lambda x:compute(x.val), items))
if len(set(c)) == len(c): #no two or more equal values exist in the list
raise Exception("All elements have unique computational results")
To find values with similar computational results, a dictionary can be used:
from collections import Counter
new_d = {i:compute(i.val) for i in items}
d = Counter(new_d.values())
multiple = [a for a, b in new_d.items() if d[b] > 1]
Output:
[Item(11), Item(31)]
A slightly more efficient way to find if multiple objects of the same computational value exist is to use any, requiring a single pass over the Counter object, whereas using a set with len requires several iterations:
if all(b == 1 for b in d.values()):
raise Exception("All elements have unique computational results")
Assuming the values returned by compute are hashable (e.g., float values), you can use a dict to store results.
And you don't need to do anything fancy, like a multidict storing all items that produce a result. As soon as you see a duplicate, you're done. Besides being simpler, this also means we short-circuit the search as soon as we find a match, without even calling compute on the rest of the elements.
def find_pair(items, compute):
results = {}
for item in items:
result = compute(item.val)
if result in results:
return results[result], item
results[result] = item
raise ValueError('No pair of items')
A dictionary val_to_it that contains Items keyed by computed val can be used:
val_to_it = {}
for it in items:
computed_val = compute(it.val)
# Check if an Item in val_to_it has the same computed val
dict_it = val_to_it.get(computed_val)
if dict_it is None:
# If not, add it to val_to_it so it can be referred to
val_to_it[computed_val] = it
else:
# We found the two elements!
res = [dict_it, it]
break
else:
raise Exception( "Can't find two items" )
The for block can be rewrite to handle n number of elements:
for it in items:
computed_val = compute(it.val)
dict_lit = val_to_it.get(computed_val)
if dict_lit is None:
val_to_it[computed_val] = [it]
else:
dict_lit.append(it)
# Check if we have the expected number of elements
if len(dict_lit) == n:
# Found n elements!
res = dict_lit
break
Following this question, we know that two different dictionaries, dict_1 and dict_2 for example, use the exact same hash function.
Is there any way to alter the hash function used by the dictionary?Negative answers also accepted!
You can't change the hash-function - the dict will call hash on the keys it's supposed to insert, and that's that.
However, you can wrap the keys to provide different __hash__ and __eq__-Methods.
class MyHash(object):
def __init__(self, v):
self._v = v
def __hash__(self):
return hash(self._v) * -1
def __eq__(self, other):
return self._v == other._v
If this actually helps anything with your original problem/question I doubt though, it seems rather a custom array/list-based data-structure might be the answer. Or not.
Here is a "hash table" on top of a list of lists, where each hash table object is associated with a particular hashing function.
class HashTable(object):
def __init__(self, hash_function, size=256):
self.hash_function = hash_function
self.buckets = [list() for i in range(size)]
self.size = size
def __getitem__(self, key):
hash_value = self.hash_function(key) % self.size
bucket = self.buckets[hash_value]
for stored_key, stored_value in bucket:
if stored_key == key:
return stored_value
raise KeyError(key)
def __setitem__(self, key, value):
hash_value = self.hash_function(key) % self.size
bucket = self.buckets[hash_value]
i = 0
found = False
for stored_key, stored_value in bucket:
if stored_key == key:
found = True
break
i += 1
if found:
bucket[i] = (key, value)
else:
bucket.append((key, value))
The rest of your application can still see the underlying list of buckets. Your application might require additional metadata to be associated with each bucket, but that would be as simple as defining a new class for the elements of the bucket list instead of a plain list.
I think what you want is a way to create buckets. Based on this I recommend collections.defaultdict with a set initializer as the "bucket" (depends on what you're using it for though).
Here is a sample:
#!/usr/bin/env python
from collections import defaultdict
from itertools import combinations
d = defaultdict(set)
strs = ["str", "abc", "rts"]
for s in strs:
d[hash(s)].add(s)
d[hash(''.join(reversed(s)))].add(s)
for combination in combinations(d.values(), r=2):
matches = combination[0] & combination[1]
if len(matches) > 1:
print matches
# output: set(['str', 'rts'])
Two strings ending up in the same buckets here are very likely the same. I've created a hash collision by using the reverse function and using a string and it's reverse as values.
Note that the set will use full comparison but should do it very fast.
Don't hash too many values without draining the sets.
I have a list with a few hundred of objects, and I want to check, if a newcomer object is already added to my list (not an equal object, but exactly this exact instance).
I have a dumb realization like this:
def is_one_of(test_object, all_objects):
for elm in all_objects:
if test_object is elm:
return True
return False
Cannot it be more beautiful?
use any:
if any(x is test_object for x in all_objects):
The example in the python reference looks remarkably similar to your code already :)
Use the any() function:
def is_one_of(test_object, all_objects):
return any(test_object is elm for elm in all_objects)
It'll stop iterating over the generator expression as soon as a True result is found.
Eh, I made it by putting id(element) to a set:
def _unit_list(self):
"""
Returns all units in the order they should be initialized.
(Performs search by width from start_point).
"""
unit_id_set = set()
unit_list = []
unit_id_set.add(self.start_point)
unit_list.append(self.start_point)
pos = 0
while pos < len(unit_list):
cur_unit = unit_list[pos]
for child in cur_unit.links_to:
if not (id(child) in unit_id_set):
unit_list.append(child)
unit_id_set.add(id(child))
pos += 1
return unit_list
You can use
if any(test_object is x for x in all_objects): ...
if you need to do this test often however may be you can keep a set of all object ids instead
all_ids = set(map(id, all_objects))
then you can check faster with
if id(test_object) in all_ids: ...
Another common solution that may apply is to store in the object itself in a specific field if it has been already processed:
# add the object to the list
all_objects.append(x)
x.added = True
...
# Check if already added
if test_object.added: ...
I think you're looking for the in operator. The equivalent function would be:
def is_one_of(test_object, all_objects):
return test_object in all_objects
(but you really wouldn't want to write that as a function).
Edit: I'm wrong. According to the Expressions page:
For the list and tuple types, x in y is true if and only if there exists an index i such that x == y[i] is true.
That would work if your class doesn't define __eq__, but that's more fragile than I'd want to rely on. For example:
class ObjWithEq(object):
def __init__(self, val):
self.val = val
def __eq__(self, other):
return self.val == other.val
a = ObjWithEq(1)
b = ObjWithEq(1)
assert a == b
assert a in [b]
class ObjWithoutEq(object):
def __init__(self, val):
self.val = val
a = ObjWithoutEq(1)
b = ObjWithoutEq(1)
assert a != b
assert a not in [b]