I tried slicing an OrderedDict like this:
for key in some_dict[:10]:
But I get a TypeError saying "unhashable type: 'slice'". How do I get this dictionary's first 10 key-value pairs?
Try converting the OrderedDict into something that is sliceable:
list_dict = list(some_dict.items())
for i in list_dict[:10]:
# do something
Now each key-value pair is a two-item tuple. (index 0 is key, index 1 is value)
An OrderedDict is only designed to maintain order, not to provide efficient lookup by position in that order. (Internally, they maintain order with a doubly-linked list.) OrderedDicts cannot provide efficient general-case slicing, so they don't implement slicing.
For your use case, you can instead use itertools to stop the loop after 10 elements:
import itertools
for key in itertools.islice(your_odict, 0, 10):
...
or
for key, value in itertools.islice(your_odict.items(), 0, 10):
...
Internally, islice will just stop fetching items from the underlying OrderedDict iterator once it reaches the 10th item. Note that while you can tell islice to use a step value, or a nonzero start value, it cannot do so efficiently - it will have to fetch and discard all the values you want to skip to get to the ones you're interested in.
Related
Is there a more pythonic way of obtaining a sorted list of dictionary keys with one key moved to the head? So far I have this:
# create a unique list of keys headed by 'event' and followed by a sorted list.
# dfs is a dict of dataframes.
for k in (dict.fromkeys(['event']+sorted(dfs))):
display(k,dfs[k]) # ideally this should be (k,v)
I suppose you would be able to do
for k, v in list(dfs.items()) + [('event', None)]:
.items() casts a dictionary to a list of tuples (or technically a dict_items, which is why I have to cast it to list explicitly to append), to which you can append a second list. Iterating through a list of tuples allows for automatic unpacking (so you can do k,v in list instead of tup in list)
What we really want is an iterable, but that's not possible with sorted, because it must see all the keys before it knows what the first item should be.
Using dict.fromkeys to create a blank dictionary by insertion order was pretty clever, but relies on an implementation detail of the current version of python. (dict is fundamentally unordered) I admit, it took me a while to figure out that line.
Since the code you posted is just working with the keys, I suggest you focus on that. Taking up a few more lines for readability is a good thing, especially if we can hide it in a testable function:
def display_by_keys(dfs, priority_items=None):
if not priority_items:
priority_items = ['event']
featured = {k for k in priority_items if k in dfs}
others = {k for k in dfs.keys() if k not in featured}
for key in list(featured) + sorted(others):
display(key, dfs[key])
The potential downside is you must sort the keys every time. If you do this much more often than the data store changes, on a large data set, that's a potential concern.
Of course you wouldn't be displaying a really large result, but if it becomes a problem, then you'll want to store them in a collections.OrderedDict (https://stackoverflow.com/a/13062357/1766544) or find a sorteddict module.
from collections import OrderedDict
# sort once
ordered_dfs = OrderedDict.fromkeys(sorted(dfs.keys()))
ordered_dfs.move_to_end('event', last=False)
ordered_dfs.update(dfs)
# display as often as you need
for k, v in ordered_dfs.items():
print (k, v)
If you display different fields first in different views, that's not a problem. Just sort all the fields normally, and use a function like the one above, without the sort.
given a dictionary like this: example_dict ={"mark":13, "steve":3, "bill":6, "linus":11}
finding the key with max value is easy using max(example_dict.items(), key=operator.itemgetter(1)) and min value using min(example_dict.items(), key=operator.itemgetter(1))
What's the easiest way to find the key with the n-th largest value? e.g. the key with the 2nd largest value here is linus
Use nlargest:
import heapq
example_dict ={"mark":13, "steve":3, "bill":6, "linus":11}
*_, res = heapq.nlargest(2, example_dict, key=example_dict.get)
print(res)
Output
linus
From the documentation:
Return a list with the n largest elements from the dataset defined by
iterable. key, if provided, specifies a function of one argument that
is used to extract a comparison key from each element in iterable (for
example, key=str.lower).
A note on performance, also from the documentation:
perform best for smaller values of n. For
larger values, it is more efficient to use the sorted() function.
Also, when n==1, it is more efficient to use the built-in min() and
max() functions. If repeated usage of these functions is required,
consider turning the iterable into an actual heap.
Note that it returns a list, that is why you discard the first n-1 elements
Use QuickSelect algorithm. It works in O(n) on average
Something like this:
def nth_largest_key(di, n):
sorted_items = sorted(di.items(), key=lambda item: item[1],
reverse=True)
return sorted_items[n-1][0]
input_di = {"mark":13, "steve":3, "bill":6, "linus":11}
print(nth_largest_key(input_di, int(input().strip())))
I have a dictionary where the keys are integers, and are in sequence. From time to time, I need to remove older entries from the dictionary. However, when I try to do this, I run into a "dict_keys" error.
'<=' not supported between instances of 'dict_keys' and 'int'
When I try to cast the value to an int, I'm told that's not supported.
int() argument must be a string, a bytes-like object or a number, not 'dict_keys'
I see answers here saying to use a list comprehension. However, as there may be a million entries in this dictionary, I'm hoping there is some way to perform the cast without having to perform it on the entire list of keys.
import numpy as np
d = dict()
for i in range(100):
d[i] = i+10
minId = int(np.min(d.keys()))
while(minId <= 5):
d.pop(minId)
minId += 1
You don't need to convert dict_keys to int. That's not a thing that makes sense, anyway. Your problem is that np.min needs a sequence, and the return value of d.keys() is not a sequence.
For taking the minimum of an iterable, use the regular Python min, not np.min. However, calling min in a loop is an inefficient way to do things. heapq.nsmallest could help, or you could find a better data structure than a dict.
You want a list is you want to use numpy:
minId = np.min(list(d))
but actually you can use the builtin min here, which nows how to iterate, and for a dict, the iteration happens over keys anyway
minId = min(d)
You could use an OrderedDict and pop the oldest key-value pair. An advantage to use an OrderedDict is that it remembers the order that keys were first inserted. In this code, the first key will always be the minimum in the OrderedDict d. When you use popitem(last=False), it simply removes the oldest or first key-value pair.
from collections import OrderedDict
d = OrderedDict()
for i in range(100):
d[i] = i+10
d.popitem(last=False) #removes the earliest key-value pair from the dict
print(d)
If you'd like to remove the oldest 5 key-value pairs, extract these key-value pairs into a list of tuples and then use popitem(last=False) again to remove them from the top(heap analogy):
a = list(d.items())[:5] #get the first 5 key-value pairs in a list of tuples
for i in a:
if i in d.items():
print("Item {} popped from dictionary.".format(i))
d.popitem(last=False)
#Output:
Item (0, 10) popped from dictionary.
Item (1, 11) popped from dictionary.
Item (2, 12) popped from dictionary.
Item (3, 13) popped from dictionary.
Item (4, 14) popped from dictionary.
I have a dictionary where each value is a list, like so:
dictA = {1:['a','b','c'],2:['d','e']}
Unfortunately, I cannot change this structure to get around my problem
I want to gather all of the entries of the lists into one single list, as follows:
['a','b','c','d','e']
Additionally, I want to do this only once within an if-block. Since I only want to do it once, I do not want to store it to an intermediate variable, so naturally, a list comprehension is the way to go. But how? My first guess,
[dictA[key] for key in dictA.keys()]
yields,
[['a','b','c'],['d','e']]
which does not work because
'a' in [['a','b','c'],['d','e']]
yields False. Everything else I've tried has used some sort of illegal syntax.
How might I perform such a comprehension?
Loop over the returned list too (looping directly over a dictionary gives you keys as well):
[value for key in dictA for value in dictA[key]]
or more directly using dictA.itervalues():
[value for lst in dictA.itervalues() for value in lst]
List comprehensions let you nest loops; read the above loops as if they are nested in the same order:
for lst in dictA.itervalues():
for value in lst:
# append value to the output list
Or use itertools.chain.from_iterable():
from itertools import chain
list(chain.from_iterable(dictA.itervalues()))
The latter takes a sequence of sequences and lets you loop over them as if they were one big list. dictA.itervalues() gives you a sequence of lists, and chain() puts them together for list() to iterate over and build one big list out of them.
If all you are doing is testing for membership among all the values, then what you really want is to a simple way to loop over all the values, and testing your value against each until you find a match. The any() function together with a suitable generator expression does just that:
any('a' in lst for lst in dictA.itervalues())
This will return True as soon as any value in dictA has 'a' listed, and stop looping over .itervalues() early.
If you're actually checking for membership (your a in... example), you could rewrite it as:
if any('a' in val for val in dictA.itervalues()):
# do something
This saves having to flatten the list if that's not actually required.
In this particular case, you can just use a nested comprehension:
[value for key in dictA.keys() for value in dictA[key]]
But in general, if you've already figured out how to turn something into a nested list, you can flatten any nested iterable with chain.from_iterable:
itertools.chain.from_iterable(dictA[key] for key in dictA.keys())
This returns an iterator, not a list; if you need a list, just do it explicitly:
list(itertools.chain.from_iterable(dictA[key] for key in dictA.keys()))
As a side note, for key in dictA.keys() does the same thing as for key in dictA, except that in older versions of Python, it will waste time and memory making an extra list of the keys. As the documentation says, iter on a dict is the same as iterkeys.
So, in all of the versions above, it's better to just use in dictA instead.
In simple code just for understanding this might be helpful
ListA=[]
dictA = {1:['a','b','c'],2:['d','e']}
for keys in dictA:
for values in dictA[keys]:
ListA.append(values)
You can do some like ..
output_list = []
[ output_list.extend(x) for x in {1:['a','b','c'],2:['d','e']}.values()]
output_list will be ['a', 'b', 'c', 'd', 'e']
I am trying to sort a dict based on its key and return an iterator to the values from within an overridden iter method in a class. Is there a nicer and more efficient way of doing this than creating a new list, inserting into the list as I sort through the keys?
How about something like this:
def itersorted(d):
for key in sorted(d):
yield d[key]
By far the easiest approach, and almost certainly the fastest, is something along the lines of:
def sorted_dict(d):
keys = d.keys()
keys.sort()
for key in keys:
yield d[key]
You can't sort without fetching all keys. Fetching all keys into a list and then sorting that list is the most efficient way to do that; list sorting is very fast, and fetching the keys list like that is as fast as it can be. You can then either create a new list of values or yield the values as the example does. Keep in mind that you can't modify the dict if you are iterating over it (the next iteration would fail) so if you want to modify the dict before you're done with the result of sorted_dict(), make it return a list.
def sortedDict(dictobj):
return (value for key, value in sorted(dictobj.iteritems()))
This will create a single intermediate list, the 'sorted()' method returns a real list. But at least it's only one.
Assuming you want a default sort order, you can used sorted(list) or list.sort(). If you want your own sort logic, Python lists support the ability to sort based on a function you pass in. For example, the following would be a way to sort numbers from least to greatest (the default behavior) using a function.
def compareTwo(a, b):
if a > b:
return 1
if a == b:
return 0
if a < b:
return -1
List.Sort(compareTwo)
print a
This approach is conceptually a bit cleaner than manually creating a new list and appending the new values and allows you to control the sort logic.