K-way-merge without heapq and any other libraries using python

K-way-merge without heapq and any other libraries using python - python

Can anyone tell me what's wrong with my code? Input is n iterators. I need to make generator which yields values of merged list on fly. I don't want use heapq, queue or deque.
#!/usr/bin/python
def min_(ar):
l = list()
for val, array in ar:
l.append(val)
return l.index(min(l))
def k_way_merge(*args):
data = list()
for array in args:
data.append((array.next(), array))
while data:
index = min_(data)
key, value = data[index]
data.remove(data[index])
yield key
if value:
data.append((next(value), value))
l=[[1,3], [2,4], [10,100],[100,101]]
res = k_way_merge(iter(l[0]), iter(l[1]),iter(l[2]),iter(l[3]))
for i in res:
print i
result is:
1
2
3
It seems that next(value) raises StopIteration, but how to repair all that...Help

iterators don't evaluate to False when they are empty. You have to use the next builtin with a sentinel value:
END = object()
while data:
index = min_(data)
key, value = data.pop(index)
yield key
key = next(value, END)
if key is not END:
data.append((key, value))
Also, since min_ returns the index, why use data.remove(data[index]) - just use pop.

You need to make this operation optional if value is empty:
if value:
data.append((next(value), value))
This change works for me:
if value:
try:
data.append(next(value), value)
except StopIteration:
pass

Related

Return next key of a given dictionary key, python 3.6+

I am trying to find a way to get the next key of a Python 3.6+ (which are ordered)
For example:
dict = {'one':'value 1','two':'value 2','three':'value 3'}
What I am trying to achieve is a function to return the next key. something like:
next_key(dict, current_key='two') # -> should return 'three'
This is what I have so far:
def next_key(dict,key):
key_iter = iter(dict) # create iterator with keys
while k := next(key_iter): #(not sure if this is a valid way to iterate over an iterator)
if k == key:
#key found! return next key
try: #added this to handle when key is the last key of the list
return(next(key_iter))
except:
return False
return False
well, that is the basic idea, I think I am close, but this code gives a StopIteration error. Please help.
Thank you!

An iterator way...
def next_key(dict, key):
keys = iter(dict)
key in keys
return next(keys, False)
Demo:
>>> next_key(dict, 'two')
'three'
>>> next_key(dict, 'three')
False
>>> next_key(dict, 'four')
False

Looping while k := next(key_iter) doesn’t stop correctly. Iterating manually with iter is done either by catching StopIteration:
iterator = iter(some_iterable)
while True:
try:
value = next(iterator)
except StopIteration:
# no more items
or by passing a default value to next and letting it catch StopIteration for you, then checking for that default value (but you need to pick a default value that won’t appear in your iterable!):
iterator = iter(some_iterable)
while (value := next(iterator, None)) is not None:
# …
# no more items
but iterators are, themselves, iterable, so you can skip all that and use a plain ol’ for loop:
iterator = iter(some_iterable)
for value in iterator:
# …
# no more items
which translates into your example as:
def next_key(d, key):
key_iter = iter(d)
for k in key_iter:
if k == key:
return next(key_iter, None)
return None

You can get the keys of the dictionary as list and use index() to get the next key. You can also check for IndexError with try/except block:
my_dict = {'one':'value 1','two':'value 2','three':'value 3'}
def next_key(d, key):
dict_keys = list(d.keys())
try:
return dict_keys[dict_keys.index(key) + 1]
except IndexError:
print('Item index does not exist')
return -1
nk = next_key(my_dict, key="two")
print(nk)
And you better not use dict, list etc as variable names.

# Python3 code to demonstrate working of
# Getting next key in dictionary Using list() + index()
# initializing dictionary
test_dict = {'one':'value 1','two':'value 2','three':'value 3'}
def get_next_key(dic, current_key):
""" get the next key of a dictionary.
Parameters
----------
dic: dict
current_key: string
Return
------
next_key: string, represent the next key in dictionary.
or
False If the value passed in current_key can not be found in the dictionary keys,
or it is last key in the dictionary
"""
l=list(dic) # convert the dict keys to a list
try:
next_key=l[l.index(current_key) + 1] # using index method to get next key
except (ValueError, IndexError):
return False
return next_key
get_next_key(test_dict, 'two')
'three'
get_next_key(test_dict, 'three')
False
get_next_key(test_dict, 'one')
'two'
get_next_key(test_dict, 'NOT EXISTS')
False

TypeError: 'dict_items' object is not subscriptable on running if statement to shortlist items

I am using a function below to run the apriori algorithm and calculate support, confidence for all itemsets. The function uses a dictionary object to store all values of items and their corresponding support, confidence.
After running the if statement to select items having minimum support value of 0.15 and confidence of 0.6, I am getting an error below that dict_items object is not subscriptable.
for key, value in largeSet.items()[1:]:
TypeError: 'dict_items' object is not subscriptable
def runApriori(data_iter, minSupport, minConfidence):
"""
run the apriori algorithm. data_iter is a record iterator
Return both:
- items (tuple, support)
- rules ((pretuple, posttuple), confidence)
"""
itemSet, transactionList = getItemSetTransactionList(data_iter)
freqSet = defaultdict(int)
largeSet = dict()
# Global dictionary which stores (key=n-itemSets,value=support)
# which satisfy minSupport
assocRules = dict()
# Dictionary which stores Association Rules
oneCSet = returnItemsWithMinSupport(itemSet,
transactionList,
minSupport,
freqSet)
currentLSet = oneCSet
k = 2
while(currentLSet != set([])):
largeSet[k-1] = currentLSet
currentLSet = joinSet(currentLSet, k)
currentCSet = returnItemsWithMinSupport(currentLSet,
transactionList,
minSupport,
freqSet)
currentLSet = currentCSet
k = k + 1
def getSupport(item):
"""local function which Returns the support of an item"""
return float(freqSet[item])/len(transactionList)
toRetItems = []
for key, value in largeSet.items():
toRetItems.extend([(tuple(item), getSupport(item))
for item in value])
toRetRules = []
for key, value in largeSet.items()[1:]:
for item in value:
_subsets = map(frozenset, [x for x in subsets(item)])
for element in _subsets:
remain = item.difference(element)
if len(remain) > 0:
confidence = getSupport(item)/getSupport(element)
if confidence >= minConfidence:
toRetRules.append(((tuple(element), tuple(remain)),
confidence))
return toRetItems, toRetRules
if __name__ == "__main__":
inFile = ''
minSupport = 0.15
minConfidence = 0.6
items, rules = runApriori(inFile, minSupport, minConfidence)
printResults(items, rules)

Prior to CPython 3.6 (and 3.7 for any Python interpreter), dicts have no reliable ordering, so assuming the first item is the one you want to skip is a bad idea.
That said, if you're on 3.6+, and you know you want to skip the first element, you can use itertools.islice to do this safely, changing:
for key, value in largeSet.items()[1:]:
to:
# At top of file
from itertools import islice
for key, value in islice(largeSet.items(), 1, None):

You're not supposed to be relying on dictionaries having a particular order, so python doesn't let you skip the "first" item in a dictionary, since what is "first" depends on there being a particular order. You can cast it as a list: for key, value in list(largeSet.items())[1:], but that would rely on the dictionary order being what you expect it would be. Better would to just do for key, value in largeSet.items()), then check within the loop whether it's the item you don't want to operate on, and continue if it is. Or use pandas series.

How to find two items of a list with the same return value of a function on their attribute?

Given a basic class Item:
class Item(object):
def __init__(self, val):
self.val = val
a list of objects of this class (the number of items can be much larger):
items = [ Item(0), Item(11), Item(25), Item(16), Item(31) ]
and a function compute that process and return a value.
How to find two items of this list for which the function compute return the same value when using the attribute val? If nothing is found, an exception should be raised. If there are more than two items that match, simple return any two of them.
For example, let's define compute:
def compute( x ):
return x % 10
The excepted pair would be: (Item(11), Item(31)).

You can check the length of the set of resulting values:
class Item(object):
def __init__(self, val):
self.val = val
def __repr__(self):
return f'Item({self.val})'
def compute(x):
return x%10
items = [ Item(0), Item(11), Item(25), Item(16), Item(31)]
c = list(map(lambda x:compute(x.val), items))
if len(set(c)) == len(c): #no two or more equal values exist in the list
raise Exception("All elements have unique computational results")
To find values with similar computational results, a dictionary can be used:
from collections import Counter
new_d = {i:compute(i.val) for i in items}
d = Counter(new_d.values())
multiple = [a for a, b in new_d.items() if d[b] > 1]
Output:
[Item(11), Item(31)]
A slightly more efficient way to find if multiple objects of the same computational value exist is to use any, requiring a single pass over the Counter object, whereas using a set with len requires several iterations:
if all(b == 1 for b in d.values()):
raise Exception("All elements have unique computational results")

Assuming the values returned by compute are hashable (e.g., float values), you can use a dict to store results.
And you don't need to do anything fancy, like a multidict storing all items that produce a result. As soon as you see a duplicate, you're done. Besides being simpler, this also means we short-circuit the search as soon as we find a match, without even calling compute on the rest of the elements.
def find_pair(items, compute):
results = {}
for item in items:
result = compute(item.val)
if result in results:
return results[result], item
results[result] = item
raise ValueError('No pair of items')

A dictionary val_to_it that contains Items keyed by computed val can be used:
val_to_it = {}
for it in items:
computed_val = compute(it.val)
# Check if an Item in val_to_it has the same computed val
dict_it = val_to_it.get(computed_val)
if dict_it is None:
# If not, add it to val_to_it so it can be referred to
val_to_it[computed_val] = it
else:
# We found the two elements!
res = [dict_it, it]
break
else:
raise Exception( "Can't find two items" )
The for block can be rewrite to handle n number of elements:
for it in items:
computed_val = compute(it.val)
dict_lit = val_to_it.get(computed_val)
if dict_lit is None:
val_to_it[computed_val] = [it]
else:
dict_lit.append(it)
# Check if we have the expected number of elements
if len(dict_lit) == n:
# Found n elements!
res = dict_lit
break

Increasing values in a dictionary?

Is there a function which can take in a dictionary and modify the dictionary by increasing only the values in it by 1?
i.e
f({'1':0.3, '11':2, '111':{'a':7, 't':2}})
becomes
{'1':1.3, '11':3, '111':{'a':8, 't':3}}
and
f({'a':{'b':{'c':5}}})
becomes
{'a':{'b':{'c':6}}}
Thanks!

Not the best...
def incr(d):
try:
return d + 1
except TypeError: # test the type rather catch error
return g_incr(d)
except:
return 0
def g_incr(d):
return {k:incr(v) for k, v in d.items()}
test = {'1':0.3, '11':2, '111':{'a':7, 't':2}}
print g_incr(test)

I think you should try this;
def increment(dict):
return {k:v+1 for k,v in dict.items()}
result = increment()
print result

Yielding from sorted iterators in sorted order in Python?

Is there a better way to merge/collate a bunch of sorted iterators into one so that it yields the items in sorted order? I think the code below works but I feel like there is a cleaner, more concise way of doing it that I'm missing.
def sortIters(*iterables, **kwargs):
key = kwargs.get('key', lambda x : x)
nextElems = {}
currentKey = None
for g in iterables:
try:
nextElems[g] = g.next()
k = key(nextElems[g])
if currentKey is None or k < currentKey:
currentKey = k
except StopIteration:
pass #iterator was empty
while nextElems:
minKey = None
stoppedIters = set()
for g, item in nextElems.iteritems():
k = key(item)
if k == currentKey:
yield item
try:
nextElems[g] = g.next()
except StopIteration:
stoppedIters.add(g)
minKey = k if minKey is None else min(k, minKey)
currentKey = minKey
for g in stoppedIters:
del nextElems[g]
The use case for this is that I have a bunch of csv files that I need to merge according to some sorted field. They are big enough that I don't want to just read them all into a list and call sort(). I'm using python2.6, but if there's a solution for python3 I'd still be interested in seeing it.

yes, you want heapq.merge() which does exactly one thing; iterate over sorted iterators in order
def sortkey(row):
return (row[5], row)
def unwrap(key):
sortkey, row = key
return row
from itertools import imap
FILE_LIST = map(file, ['foo.csv', 'bar.csv'])
input_iters = imap(sortkey, map(csv.csvreader, FILE_LIST))
output_iter = imap(unwrap, heapq.merge(*input_iters))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

K-way-merge without heapq and any other libraries using python - python

You need to make this operation optional if value is empty: if value: data.append((next(value), value)) This change works for me: if value: try: data.append(next(value), value) except StopIteration: pass

Related

Return next key of a given dictionary key, python 3.6+

TypeError: 'dict_items' object is not subscriptable on running if statement to shortlist items

How to find two items of a list with the same return value of a function on their attribute?

Increasing values in a dictionary?

Yielding from sorted iterators in sorted order in Python?

Categories

Resources