Related
(In relation to this question I posed a few days ago)
I have a dictionary whose keys are strings, and whose values are sets of integers, for example:
db = {"a":{1,2,3}, "b":{5,6,7}, "c":{2,5,4}, "d":{8,11,10,18}, "e":{0,3,2}}
I would like to have a procedure that joins the keys whose values satisfy a certain generic condition given in an external function. The new item will therefore have as a key the union of both keys (the order is not important). The value will be determined by the condition itserf.
For example: given this condition function:
def condition(kv1: tuple, kv2: tuple):
key1, val1 = kv1
key2, val2 = kv2
union = val1 | val2 #just needed for the following line
maxDif = max(union) - min(union)
newVal = set()
for i in range(maxDif):
auxVal1 = {pos - i for pos in val2}
auxVal2 = {pos + i for pos in val2}
intersection1 = val1.intersection(auxVal1)
intersection2 = val1.intersection(auxVal2)
print(intersection1, intersection2)
if (len(intersection1) >= 3):
newVal.update(intersection1)
if (len(intersection2) >= 3):
newVal.update({pos - i for pos in intersection2})
if len(newVal)==0:
return False
else:
newKey = "".join(sorted(key1+key2))
return newKey, newVal
That is, the satisfying pair of items have at least 3 numbers in their values at the same distance (difference) between them. As said, if satisfied, the resulting key is the union of the two keys. And for this particular example, the value is the (minimum) matching numbers in the original value sets.
How can I smartly apply a function like this to a dictionary like db? Given the aforementioned dictionary, the expected result would be:
result = {"ab":{1,2,3}, "cde":{0,3,2}, "d":{18}}
Your "condition" in this case is more than just a mere condition. It is actually merging rule that identifies values to keep and values to drop. This may or may not allow a generalized approach depending on how the patterns and merge rules vary.
Given this, each merge operation could leave values in the original keys that may be merged with some of the remaining keys. Multiple merges can also occur (e.g. key "cde"). In theory the merging process would need to cover a power set of all keys which may be impractical. Alternatively, this can be performed by successive refinements using pairings of (original and/or merged) keys.
The merge condition/function:
db = {"a":{1,2,3}, "b":{5,6,7}, "c":{2,5,4}, "d":{8,11,10,18}, "e":{0,3,2}}
from itertools import product
from collections import Counter
# Apply condition and return a keep-set and a remove-set
# the keep-set will be empty if the matching condition is not met
def merge(A,B,inverted=False):
minMatch = 3
distances = Counter(b-a for a,b in product(A,B) if b>=a)
delta = [d for d,count in distances.items() if count>=minMatch]
keep = {a for a in A if any(a+d in B for d in delta)}
remove = {b for b in B if any(b-d in A for d in delta)}
if len(keep)>=minMatch: return keep,remove
return None,None
print( merge(db["a"],db["b"]) ) # ({1, 2, 3}, {5, 6, 7})
print( merge(db["e"],db["d"]) ) # ({0, 2, 3}, {8, 10, 11})
Merge Process:
# combine dictionary keys using a merging function/condition
def combine(D,mergeFunction):
result = { k:set(v) for k,v in D.items() } # start with copy of input
merging = True
while merging: # keep merging until no more merges are performed
merging = False
for a,b in product(*2*[list(result.keys())]): # all key pairs
if a==b: continue
if a not in result or b not in result: continue # keys still there?
m,n = mergeFunction(result[a],result[b]) # call merge function
if not m : continue # if merged ...
mergedKey = "".join(sorted(set(a+b))) # combine keys
result[mergedKey] = m # add merged set
if mergedKey != a: result[a] -= m; merging = True # clean/clear
if not result[a]: del result[a] # original sets,
if mergedKey != b: result[b] -= n; merging = True # do more merges
if not result[b]: del result[b]
return result
I want to write a code to sort arr based on customAl order, and not use sorted function.
customAl = [dshbanfmg]
arr = [bba,abb,baa,mggfba,mffgh......]
psudo code:
def sortCA(arr, customAl):
dt = {}
generate dt order based on customAl
look up and sort arr
return result
newArr = [bba,baa,abb,mffgh,mggfba......]
I know there's a similiar question but the answer is wrapped in sorted function which I don't wish to use. anyone has a better solution than unsorted, or dictionary which takes space?
Sorting string values according to a custom alphabet in Python
In my opinion, programming is a trade-off, it depends on which part you care most.
Specifically, in this scenario, you can choose to trade time for space by str.index, or you can trade space for time with an extra index dict:
customAl = 'dshbanfmg'
arr = ['bba', 'abb', 'baa', 'mggfba', 'mffgh']
# trade time for space
# no extra space but, but O(n) to index
def sortCA1(arr, customAl):
return sorted(arr, key=lambda x: [customAl.index(c) for c in x])
# trade space for time
# extra space O(n), but O(1) to index
def sortCA2(arr, customAl):
dt = {c: i for i, c in enumerate(customAl)}
return sorted(arr, key=lambda x: [dt[c] for c in x])
# output: ['bba', 'baa', 'abb', 'mffgh', 'mggfba']
Here is a version which not use sorted function, we can use a bucket based on custom alphabet order. split the arr by 1st char, if one bucket has multiple elements then split by 2nd char recursively...kind of radix sort:
one thing to mention, the length is different, so we should add a bucket to record none index str.
def sortCA3(arr, customAl):
dt = {c: i + 1 for i, c in enumerate(customAl)} # keep 0 for none bucket
def bucket_sort(arr, start):
new_arr = []
buckets = [[] for _ in range(len(customAl) + 1)]
for s in arr:
if start < len(s):
buckets[dt[s[start]]].append(s)
else:
buckets[0].append(s)
for bucket in buckets:
if len(bucket) == 1:
new_arr += bucket
elif len(bucket) > 1:
new_arr += bucket_sort(bucket, start+1)
return new_arr
return bucket_sort(arr, 0)
test and output
customAl = 'dshbanfmg'
arr = ['bba', 'bb', 'abb', 'baa', 'mggfba', 'mffgh'] # add `bb` for test
print(sortCA4(arr, customAl))
I am new to python and trying to build a graph with my data. I have a nested list and I want to separate 2 set of groups based on the relation so that the output graph will be specific to one group. I am able to get one complete graph but I want to simplify with 2 graph using python as the requirement has thousands of object.
RelationList=[["A","B"],["B","C"],["B","D"],["D","E"],["X","Y"],["Y","Z"],["Z","U"]]
Output :
Graph 1 :
A->B
B->C
B->D
D->E
Graph 2 :
X->Y
Y->Z
Z->U
Please guide me to code.
You can use array indexing to get certain elements. Array indexing work as follows: array[start:end:step]
To get the first node, you use array[0], the last node is array[-1], and to get consecutive groups you can use a range such as array[0:3]
You could do something like that:
First of all create a list with all possible nodes for graph1
graph1_nodes = ["A", "B", "C", "D", "E"]
Once you have that list you have to iterate and compare it with your current RelationList and create a sublist wich would be one of the graphs that you're asking for. For do that create a empty list where will be filled with data of graph1
Graph1 = []
Graph2 = []
for node in RelationList:
# node is the first element of the list
if node[0] in graph1_nodes:
# node[0] take the first position of the element
Graph1.append(node)
else:
Graph2.append(node)
And finally the output will be this:
Graph1 = [["A","B"],["B","C"],["B","D"],["D","E"]]
Graph2 = [["X","Y"],["Y","Z"],["Z","U"]]
Hope I've helped!
You can use the Union Find, or Disjoint Set data structure and algorithm for this. This will work for any number of nodes and subgraphs and it does not matter whether subgraphs are stored in closed regions of the list. This is so useful I always have a simple implementation at hand:
from collections import defaultdict
class UnionFind:
def __init__(self):
self.leaders = defaultdict(lambda: None)
def find(self, x):
l = self.leaders[x]
if l is not None:
l = self.find(l)
self.leaders[x] = l
return l
return x
def union(self, x, y):
lx, ly = self.find(x), self.find(y)
if lx != ly:
self.leaders[lx] = ly
def get_groups(self):
groups = defaultdict(set)
for x in self.leaders:
groups[self.find(x)].add(x)
return list(groups.values())
Application to your RelationList:
RelationList=[["A","B"],["B","C"],["B","D"],["D","E"],["X","Y"],["Y","Z"],["Z","U"]]
uf = UnionFind()
for a, b in RelationList:
uf.union(a, b)
print(uf.get_groups()) # [{'C', 'A', 'D', 'B', 'E'}, {'U', 'Z', 'Y', 'X'}]
Given pairs of items of form [(a,b),...] where (a,b) means a > b, for example:
[('best','better'),('best','good'),('better','good')]
I would like to output a list of form:
['best','better','good']
This is very hard for some reason. Any thoughts?
======================== code =============================
I know why it doesn't work.
def to_rank(raw):
rank = []
for u,v in raw:
if u in rank and v in rank:
pass
elif u not in rank and v not in rank:
rank = insert_front (u,v,rank)
rank = insert_behind(v,u,rank)
elif u in rank and v not in rank:
rank = insert_behind(v,u,rank)
elif u not in rank and v in rank:
rank = insert_front(u,v,rank)
return [[r] for r in rank]
# #Use: insert word u infront of word v in list of words
def insert_front(u,v,words):
if words == []: return [u]
else:
head = words[0]
tail = words[1:]
if head == v: return [u] + words
else : return ([head] + insert_front(u,v,tail))
# #Use: insert word u behind word v in list of words
def insert_behind(u,v,words):
words.reverse()
words = insert_front(u,v,words)
words.reverse()
return words
=================== Update ===================
Per suggestion of many, this is a straight forward topological sort setting, I ultimately decided to use the code from this source: algocoding.wordpress.com/2015/04/05/topological-sorting-python/
which solved my problem.
def go_topsort(graph):
in_degree = { u : 0 for u in graph } # determine in-degree
for u in graph: # of each node
for v in graph[u]:
in_degree[v] += 1
Q = deque() # collect nodes with zero in-degree
for u in in_degree:
if in_degree[u] == 0:
Q.appendleft(u)
L = [] # list for order of nodes
while Q:
u = Q.pop() # choose node of zero in-degree
L.append(u) # and 'remove' it from graph
for v in graph[u]:
in_degree[v] -= 1
if in_degree[v] == 0:
Q.appendleft(v)
if len(L) == len(graph):
return L
else: # if there is a cycle,
return []
RockBilly's solution also work in my case, because in my setting, for every v < u, we are guaranteed to have a pair (u,v) in our list. So his answer is not very "computer-sciency", but it gets the job done in this case.
If you have a complete grammar specified then you can simply count up the items:
>>> import itertools as it
>>> from collections import Counter
>>> ranks = [('best','better'),('best','good'),('better','good')]
>>> c = Counter(x for x, y in ranks)
>>> sorted(set(it.chain(*ranks)), key=c.__getitem__, reverse=True)
['best', 'better', 'good']
If you have an incomplete grammar then you can build a graph and dfs all paths to find the longest. This isn't very inefficient, as I haven't thought about that yet :):
def dfs(graph, start, end):
stack = [[start]]
while stack:
path = stack.pop()
if path[-1] == end:
yield path
continue
for next_state in graph.get(path[-1], []):
if next_state in path:
continue
stack.append(path+[next_state])
def paths(ranks):
graph = {}
for n, m in ranks:
graph.setdefault(n,[]).append(m)
for start, end in it.product(set(it.chain(*ranks)), repeat=2):
yield from dfs(graph, start, end)
>>> ranks = [('black', 'dark'), ('black', 'dim'), ('black', 'gloomy'), ('dark', 'gloomy'), ('dim', 'dark'), ('dim', 'gloomy')]
>>> max(paths(ranks), key=len)
['black', 'dim', 'dark', 'gloomy']
>>> ranks = [('a','c'), ('b','a'),('b','c'), ('d','a'), ('d','b'), ('d','c')]
>>> max(paths(ranks), key=len)
['d', 'b', 'a', 'c']
What you're looking for is topological sort. You can do this in linear time using depth-first search (pseudocode included in the wiki I linked)
Here is one way. It is based on using the complete pairwise rankings to make an old-style (early Python 2) cmp function and then using functools.cmp_to_key to convert it to a key suitable for the Python 3 approach to sorting:
import functools
def sortByRankings(rankings):
def cmp(x,y):
if x == y:
return 0
elif (x,y) in rankings:
return -1
else:
return 1
items = list({x for y in rankings for x in y})
items.sort(key = functools.cmp_to_key(cmp))
return items
Tested like:
ranks = [('a','c'), ('b','a'),('b','c'), ('d','a'), ('d','b'), ('d','c')]
print(sortByRankings(ranks)) #prints ['d', 'b', 'a', 'c']
Note that to work correctly, the parameter rankings must contain an entry for each pair of distinct items. If it doesn't, you would first need to compute the transitive closure of the pairs that you do have before you feed it to this function.
You can take advantage of the fact that the lowest ranked item in the list will never appear at the start of any tuple. You can extract this lowest item, then remove all elements which contain this lowest item from your list, and repeat to get the next lowest.
This should work even if you have redundant elements, or have a sparser list than some of the examples here. I've broken it up into finding the lowest ranked item, and then the grunt work of using this to create a final ranking.
from copy import copy
def find_lowest_item(s):
#Iterate over set of all items
for item in set([item for sublist in s for item in sublist]):
#If an item does not appear at the start of any tuple, return it
if item not in [x[0] for x in s]:
return item
def sort_by_comparison(s):
final_list = []
#Make a copy so we don't mutate original list
new_s = copy(s)
#Get the set of all items
item_set = set([item for sublist in s for item in sublist])
for i in range(len(item_set)):
lowest = find_lowest_item(new_s)
if lowest is not None:
final_list.insert(0, lowest)
#For the highest ranked item, we just compare our current
#ranked list with the full set of items
else:
final_list.insert(0,set(item_set).difference(set(final_list)).pop())
#Update list of ranking tuples to remove processed items
new_s = [x for x in new_s if lowest not in x]
return final_list
list_to_compare = [('black', 'dark'), ('black', 'dim'), ('black', 'gloomy'), ('dark', 'gloomy'), ('dim', 'dark'), ('dim', 'gloomy')]
sort_by_comparison(list_to_compare)
['black', 'dim', 'dark', 'gloomy']
list2 = [('best','better'),('best','good'),('better','good')]
sort_by_comparison(list2)
['best', 'better', 'good']
list3 = [('best','better'),('better','good')]
sort_by_comparison(list3)
['best', 'better', 'good']
If you do sorting or create a dictionary from the list items, you are going to miss the order as #Rockybilly mentioned in his answer. I suggest you to create a list from the tuples of the original list and then remove duplicates.
def remove_duplicates(seq):
seen = set()
seen_add = seen.add
return [x for x in seq if not (x in seen or seen_add(x))]
i = [(5,2),(1,3),(1,4),(2,3),(2,4),(3,4)]
i = remove_duplicates(list(x for s in i for x in s))
print(i) # prints [5, 2, 1, 3, 4]
j = [('excellent','good'),('excellent','great'),('great','good')]
j = remove_duplicates(list(x for s in j for x in s))
print(j) # prints ['excellent', 'good', 'great']
See reference: How do you remove duplicates from a list in whilst preserving order?
For explanation on the remove_duplicates() function, see this stackoverflow post.
If the list is complete, meaning has enough information to do the ranking(Also no duplicate or redundant inputs), this will work.
from collections import defaultdict
lst = [('best','better'),('best','good'),('better','good')]
d = defaultdict(int)
for tup in lst:
d[tup[0]] += 1
d[tup[1]] += 0 # To create it in defaultdict
print sorted(d, key = lambda x: d[x], reverse=True)
# ['best', 'better', 'good']
Just give them points, increment the left one each time you encounter it in the list.
Edit: I do think the OP has a determined type of input. Always have tuple count of combination nCr(n, 2). Which makes this a correct solution. No need to complain about the edge cases, which I already knew posting the answer(and mentioned it).
Okay, so this is a little hard to explain, but here goes:
I have a dictionary, which I'm adding content to. The content is a hashed username (key) with an IP address (value).
I was putting the hashes into an order by running them against base 16, and then using Collection.orderedDict.
So, the dictionary looked a little like this:
d = {'1234': '8.8.8.8', '2345':'0.0.0.0', '3213':'4.4.4.4', '4523':'1.1.1.1', '7654':'1.3.3.7', '9999':'127.0.0.1'}
What I needed was a mechanism that would allow me to pick one of those keys, and get the key/value item one higher and one lower. So, for example, If I were to pick 2345, the code would return the key:value combinations '1234:8.8.8.8' and '3213:4.4.4.4'
So, something like:
for i in d:
while i < len(d)
if i == '2345':
print i.nextItem
print i.previousItem
break()
Edit: OP now states that they are using OrderedDicts but the use case still requires this sort of approach.
Since dicts are not ordered you cannot directly do this. From your example, you are trying to reference the item like you would use a linked list.
A quick solution would be instead to extract the keys and sort them then iterate over that list:
keyList=sorted(d.keys())
for i,v in enumerate(keyList):
if v=='eeee':
print d[keyList[i+1]]
print d[keyList[i-1]]
The keyList holds the order of your items and you have to go back to it to find out what the next/previous key is to get the next/previous value. You also have to check for i+1 being greater than the list length and i-1 being less than 0.
You can use an OrderedDict similarly but I believe that you still have to do the above with a separate list as OrderedDict doesn't have next/prev methods.
As seen in the OrderedDict source code,
if you have a key and you want to find the next and prev in O(1) here's how you do that.
>>> from collections import OrderedDict
>>> d = OrderedDict([('aaaa', 'a',), ('bbbb', 'b'), ('cccc', 'c'), ('dddd', 'd'), ('eeee', 'e'), ('ffff', 'f')])
>>> i = 'eeee'
>>> link_prev, link_next, key = d._OrderedDict__map['eeee']
>>> print 'nextKey: ', link_next[2], 'prevKey: ', link_prev[2]
nextKey: ffff prevKey: dddd
This will give you next and prev by insertion order. If you add items in random order then just keep track of your items in sorted order.
You could also use the list.index() method.
This function is more generic (you can check positions +n and -n), it will catch attempts at searching a key that's not in the dict, and it will also return None if there's nothing before of after the key:
def keyshift(dictionary, key, diff):
if key in dictionary:
token = object()
keys = [token]*(diff*-1) + sorted(dictionary) + [token]*diff
newkey = keys[keys.index(key)+diff]
if newkey is token:
print None
else:
print {newkey: dictionary[newkey]}
else:
print 'Key not found'
keyshift(d, 'bbbb', -1)
keyshift(d, 'eeee', +1)
Try:
pos = 0
d = {'aaaa': 'a', 'bbbb':'b', 'cccc':'c', 'dddd':'d', 'eeee':'e', 'ffff':'f'}
for i in d:
pos+=1
if i == 'eeee':
listForm = list(d.values())
print(listForm[pos-1])
print(listForm[pos+1])
As in #AdamKerz's answer enumerate seems pythonic, but if you are a beginner this code might help you understand it in an easy way.
And I think its faster + smaller compared to sorting followed by building list & then enumerating
You could use a generic function, based on iterators, to get a moving window (taken from this question):
import itertools
def window(iterable, n=3):
it = iter(iterable)
result = tuple(itertools.islice(it, n))
if len(result) == n:
yield result
for element in it:
result = result[1:] + (element,)
yield result
l = range(8)
for i in window(l, 3):
print i
Using the above function with OrderedDict.items() will give you three (key, value) pairs, in order:
d = collections.OrderedDict(...)
for p_item, item, n_item in window(d.items()):
p_key, p_value = p_item
key, value = item
# Or, if you don't care about the next value:
n_key, _ = n_item
Of course using this function the first and last values will never be in the middle position (although this should not be difficult to do with some adaptation).
I think the biggest advantage is that it does not require table lookups in the previous and next keys, and also that it is generic and works with any iterable.
Maybe it is an overkill, but you can keep Track of the Keys inserted with a Helper Class and according to that list, you can retrieve the Key for Previous or Next. Just don't forget to check for border conditions, if the objects is already first or last element. This way, you will not need to always resort the ordered list or search for the element.
from collections import OrderedDict
class Helper(object):
"""Helper Class for Keeping track of Insert Order"""
def __init__(self, arg):
super(Helper, self).__init__()
dictContainer = dict()
ordering = list()
#staticmethod
def addItem(dictItem):
for key,value in dictItem.iteritems():
print key,value
Helper.ordering.append(key)
Helper.dictContainer[key] = value
#staticmethod
def getPrevious(key):
index = (Helper.ordering.index(key)-1)
return Helper.dictContainer[Helper.ordering[index]]
#Your unordered dictionary
d = {'aaaa': 'a', 'bbbb':'b', 'cccc':'c', 'dddd':'d', 'eeee':'e', 'ffff':'f'}
#Create Order over keys
ordered = OrderedDict(sorted(d.items(), key=lambda t: t[0]))
#Push your ordered list to your Helper class
Helper.addItem(ordered)
#Get Previous of
print Helper.getPrevious('eeee')
>>> d
You can store the keys and values in temp variable in prior, and can access previous and next key,value pair using index.
It is pretty dynamic, will work for any key you query. Please check this code :
d = {'1234': '8.8.8.8', '2345':'0.0.0.0', '3213':'4.4.4.4', '4523':'1.1.1.1', '7654':'1.3.3.7', '9999':'127.0.0.1'}
ch = raw_input('Pleasure Enter your choice : ')
keys = d.keys()
values = d.values()
#print keys, values
for k,v in d.iteritems():
if k == ch:
ind = d.keys().index(k)
print keys[ind-1], ':',values[ind-1]
print keys[ind+1], ':',values[ind+1]
I think this is a nice Pythonic way of resolving your problem using a lambda and list comprehension, although it may not be optimal in execution time:
import collections
x = collections.OrderedDict([('a','v1'),('b','v2'),('c','v3'),('d','v4')])
previousItem = lambda currentKey, thisOrderedDict : [
list( thisOrderedDict.items() )[ z - 1 ] if (z != 0) else None
for z in range( len( thisOrderedDict.items() ) )
if (list( thisOrderedDict.keys() )[ z ] == currentKey) ][ 0 ]
nextItem = lambda currentKey, thisOrderedDict : [
list( thisOrderedDict.items() )[ z + 1 ] if (z != (len( thisOrderedDict.items() ) - 1)) else None
for z in range( len( thisOrderedDict.items() ) )
if (list( thisOrderedDict.keys() )[ z ] == currentKey) ][ 0 ]
assert previousItem('c', x) == ('b', 'v2')
assert nextItem('c', x) == ('d', 'v4')
assert previousItem('a', x) is None
assert nextItem('d',x) is None
Another way that seems simple and straight forward: this function returns the key which is offset positions away from k
def get_shifted_key(d:dict, k:str, offset:int) -> str:
l = list(d.keys())
if k in l:
i = l.index(k) + offset
if 0 <= i < len(l):
return l[i]
return None
i know how to get next key:value of a particular key in a dictionary:
flag = 0
for k, v in dic.items():
if flag == 0:
code...
flag += 1
continue
code...{next key and value in for}
if correct :
d = { "a": 1, "b":2, "c":3 }
l = list( d.keys() ) # make a list of the keys
k = "b" # the actual key
i = l.index( k ) # get index of the actual key
for the next :
i = i+1 if i+1 < len( l ) else 0 # select next index or restart 0
n = l [ i ]
d [ n ]
for the previous :
i = i-1 if i-1 >= 0 else len( l ) -1 # select previous index or go end
p = l [ i ]
d [ p ]