Sorting a dict on __iter__ - python

I am trying to sort a dict based on its key and return an iterator to the values from within an overridden iter method in a class. Is there a nicer and more efficient way of doing this than creating a new list, inserting into the list as I sort through the keys?

How about something like this:
def itersorted(d):
for key in sorted(d):
yield d[key]

By far the easiest approach, and almost certainly the fastest, is something along the lines of:
def sorted_dict(d):
keys = d.keys()
keys.sort()
for key in keys:
yield d[key]
You can't sort without fetching all keys. Fetching all keys into a list and then sorting that list is the most efficient way to do that; list sorting is very fast, and fetching the keys list like that is as fast as it can be. You can then either create a new list of values or yield the values as the example does. Keep in mind that you can't modify the dict if you are iterating over it (the next iteration would fail) so if you want to modify the dict before you're done with the result of sorted_dict(), make it return a list.

def sortedDict(dictobj):
return (value for key, value in sorted(dictobj.iteritems()))
This will create a single intermediate list, the 'sorted()' method returns a real list. But at least it's only one.

Assuming you want a default sort order, you can used sorted(list) or list.sort(). If you want your own sort logic, Python lists support the ability to sort based on a function you pass in. For example, the following would be a way to sort numbers from least to greatest (the default behavior) using a function.
def compareTwo(a, b):
if a > b:
return 1
if a == b:
return 0
if a < b:
return -1
List.Sort(compareTwo)
print a
This approach is conceptually a bit cleaner than manually creating a new list and appending the new values and allows you to control the sort logic.

Related

Tuple-key dictionary in python: Accessing a whole block of entries

I am looking for an efficient python method to utilise a hash table that has two keys:
E.g.:
(1,5) --> {a}
(2,3) --> {b,c}
(2,4) --> {d}
Further I need to be able to retrieve whole blocks of entries, for example all entries that have "2" at the 0-th position (here: (2,3) as well as (2,4)).
In another post it was suggested to use list comprehension, i.e.:
sum(val for key, val in dict.items() if key[0] == 'B')
I learned that dictionaries are (probably?) the most efficient way to retrieve a value from an object of key:value-pairs. However, calling only an incomplete tuple-key is a bit different than querying the whole key where I either get a value or nothing. I want to ask if python can still return the values in a time proportional to the number of key:value-pairs that match? Or alternatively, is the tuple-dictionary (plus list comprehension) better than using pandas.df.groupby() (but that would occupy a bit much memory space)?
The "standard" way would be something like
d = {(randint(1,10),i):"something" for i,x in enumerate(range(200))}
def byfilter(n,d):
return list(filter(lambda x:x==n, d.keys()))
byfilter(5,d) ##returns a list of tuples where x[0] == 5
Although in similar situations I often used next() to iterate manually, when I didn't need the full list.
However there may be some use cases where we can optimize that. Suppose you need to do a couple or more accesses by key first element, and you know the dict keys are not changing meanwhile. Then you can extract the keys in a list and sort it, and make use of some itertools functions, namely dropwhile() and takewhile():
ls = [x for x in d.keys()]
ls.sort() ##I do not know why but this seems faster than ls=sorted(d.keys())
def bysorted(n,ls):
return list(takewhile(lambda x: x[0]==n, dropwhile(lambda x: x[0]!=n, ls)))
bysorted(5,ls) ##returns the same list as above
This can be up to 10x faster in the best case (i=1 in my example) and more or less take the same time in the worst case (i=10) because we are trimming the number of iterations needed.
Of course you can do the same for accessing keys by x[1], you just need to add a key parameter to the sort() call

Is it possible to sort a list of strings that represents Filipino numbers

list = [dalawa, tatlo, apat, siyam, isa] and is there a way to sort this to list = [isa, dalawa, tatlo, apat, siyam]. I an new in python so I don't have any idea about this.
The python sort() method will sort a list in alphabetical order.
What you can do is assign the value of each filipino number as a dictionary, and then sort it according to value.
That should be done as so: (I'm making up the values)
list = {"dalawa":2, "tatlo":3, "apat":4, "siyam":5, "isa":1}
# look up lambda functions in order to better understand the below functionality.
# In short what this does is, return to the sorted function the values of keys in the above dictionary and telling it to sort by them and not by the actual via resetting the key parameter to the lambda function.
result = sorted(list, key=lambda x:list[x[0]])

Python: Create sorted list of keys moving one key to the head

Is there a more pythonic way of obtaining a sorted list of dictionary keys with one key moved to the head? So far I have this:
# create a unique list of keys headed by 'event' and followed by a sorted list.
# dfs is a dict of dataframes.
for k in (dict.fromkeys(['event']+sorted(dfs))):
display(k,dfs[k]) # ideally this should be (k,v)
I suppose you would be able to do
for k, v in list(dfs.items()) + [('event', None)]:
.items() casts a dictionary to a list of tuples (or technically a dict_items, which is why I have to cast it to list explicitly to append), to which you can append a second list. Iterating through a list of tuples allows for automatic unpacking (so you can do k,v in list instead of tup in list)
What we really want is an iterable, but that's not possible with sorted, because it must see all the keys before it knows what the first item should be.
Using dict.fromkeys to create a blank dictionary by insertion order was pretty clever, but relies on an implementation detail of the current version of python. (dict is fundamentally unordered) I admit, it took me a while to figure out that line.
Since the code you posted is just working with the keys, I suggest you focus on that. Taking up a few more lines for readability is a good thing, especially if we can hide it in a testable function:
def display_by_keys(dfs, priority_items=None):
if not priority_items:
priority_items = ['event']
featured = {k for k in priority_items if k in dfs}
others = {k for k in dfs.keys() if k not in featured}
for key in list(featured) + sorted(others):
display(key, dfs[key])
The potential downside is you must sort the keys every time. If you do this much more often than the data store changes, on a large data set, that's a potential concern.
Of course you wouldn't be displaying a really large result, but if it becomes a problem, then you'll want to store them in a collections.OrderedDict (https://stackoverflow.com/a/13062357/1766544) or find a sorteddict module.
from collections import OrderedDict
# sort once
ordered_dfs = OrderedDict.fromkeys(sorted(dfs.keys()))
ordered_dfs.move_to_end('event', last=False)
ordered_dfs.update(dfs)
# display as often as you need
for k, v in ordered_dfs.items():
print (k, v)
If you display different fields first in different views, that's not a problem. Just sort all the fields normally, and use a function like the one above, without the sort.

Return single variable (not list) multiple times to function from second function?

I have first function which is not possible to modify or change! It displays value of the variable (main_index field which cannot be the list, tuple, dictionary etc. It is simple just a variable and must remain as it is)
That function triggers second function which can return multiple values so the idea is to somehow display those multiple values one by one, but not putting them into the list or dict. etc. Second function can be changed in any way.
Code is the following (please take into account that first function cannot be modified in any way, I am just simplifying it here).
def not_changeable():
value_to_check='7.1'
main_index=generate_index(value_to_check)
print (main_index)
def generate_index(index):
dictionary={'7.1.1':{'value':'1'},'7.1.2':{'value':'2'},'7.100.3':{'value':'3'}}
filtered_dict={}
concatanatedIndex=index+'.'
for k in dictionary.keys():
if concatanatedIndex in k:
filtered_dict[k]=dictionary[k]
print (filtered_dict)
for indx in filtered_dict:
return (filtered_dict.get(indx).get('value'))
not_changeable()
As output I am getting one value (because of return function which ends the script)
{'7.1.1': {'value': '1'}, '7.1.2': {'value': '2'}}
1
But I would like to get values
1
2
without any modification on the first function.
I am aware that if I return list I will be able to display all values, but is it possible to display 1 and 2 without modifications on the first function?
Returning in a for loop is often not what you want, it might be better to build the data structure in the loop and then return later, or return the whole data structure as it is being built in a comprehension. Here you can return a string with newline characters instead of a value, like this:
def generate_index(index):
dictionary={'7.1.1':{'value':'1'},'7.1.2':{'value':'2'},'7.100.3':{'value':'3'}}
filtered_dict={}
concatanatedIndex=index+'.'
for k in dictionary.keys():
if concatanatedIndex in k:
filtered_dict[k]=dictionary[k]
print (filtered_dict)
return '\n'.join(sorted(filtered_dict.get(indx).get('value') for indx in filtered_dict))
This will print
{'7.1.2': {'value': '2'}, '7.1.1': {'value': '1'}}
1
2
Breakdown of the last statement: '\n'.join(sorted(filtered_dict.get(indx).get('value') for indx in filtered_dict)):
We use a comprehension to generate the data we are interested in: filtered_dict.get(indx).get('value') for indx in filtered_dict - this is actually a generator comprehension, but you can put [] to make it a list comprehension.
Because we are iterating over a dictionary, and dictionaries not guaranteed to be in a certain order (though I believe this is changed with Python 3.6), I have added the sorted call to make sure 1 comes before 2.
To turn an iterable (like a list) into a string, we can use the string method .join(), which creates a string by joining together the elements in the list and puts the string in between each one. so '-hello-'.join(['a', 'b', 'c']) will become 'a-hello-b-hello-c'.
Actually a simpler way to build the return string would be to iterate over dict.values() instead of the actual dict. And if we are using python version > 3.6 we can skip the sorted call, so the return simply becomes: return '\n'.join(v.get('value') for v in filtered_dict.values()).
But a better design might be to return the values in a dictionary and print them in a specific format in a separate function that is only responsible for display.
You can adapt the generate_index function to return a generator, aka something you can iterate over (note the last line in the code below).
You can read up on generators here or see the documentation
dictionary={'7.1.1':{'value':'1'},'7.1.2':{'value':'2'},'7.100.3':{'value':'3'}}
filtered_dict={}
concatanatedIndex=index+'.'
for k in dictionary.keys():
if concatanatedIndex in k:
filtered_dict[k]=dictionary[k]
print (filtered_dict)
for indx in filtered_dict:
# change from return to yield here to create a generator
yield filtered_dict.get(indx).get('value')
Note that by calling this generate_index, you were already kind of shooting for a generator! Then you can call the result in your other function like so:
main_index=generate_index(value_to_check)
for index in main_index:
print(index)
Hope this does what you want

List Comprehension of Lists Nested in Dictionaries

I have a dictionary where each value is a list, like so:
dictA = {1:['a','b','c'],2:['d','e']}
Unfortunately, I cannot change this structure to get around my problem
I want to gather all of the entries of the lists into one single list, as follows:
['a','b','c','d','e']
Additionally, I want to do this only once within an if-block. Since I only want to do it once, I do not want to store it to an intermediate variable, so naturally, a list comprehension is the way to go. But how? My first guess,
[dictA[key] for key in dictA.keys()]
yields,
[['a','b','c'],['d','e']]
which does not work because
'a' in [['a','b','c'],['d','e']]
yields False. Everything else I've tried has used some sort of illegal syntax.
How might I perform such a comprehension?
Loop over the returned list too (looping directly over a dictionary gives you keys as well):
[value for key in dictA for value in dictA[key]]
or more directly using dictA.itervalues():
[value for lst in dictA.itervalues() for value in lst]
List comprehensions let you nest loops; read the above loops as if they are nested in the same order:
for lst in dictA.itervalues():
for value in lst:
# append value to the output list
Or use itertools.chain.from_iterable():
from itertools import chain
list(chain.from_iterable(dictA.itervalues()))
The latter takes a sequence of sequences and lets you loop over them as if they were one big list. dictA.itervalues() gives you a sequence of lists, and chain() puts them together for list() to iterate over and build one big list out of them.
If all you are doing is testing for membership among all the values, then what you really want is to a simple way to loop over all the values, and testing your value against each until you find a match. The any() function together with a suitable generator expression does just that:
any('a' in lst for lst in dictA.itervalues())
This will return True as soon as any value in dictA has 'a' listed, and stop looping over .itervalues() early.
If you're actually checking for membership (your a in... example), you could rewrite it as:
if any('a' in val for val in dictA.itervalues()):
# do something
This saves having to flatten the list if that's not actually required.
In this particular case, you can just use a nested comprehension:
[value for key in dictA.keys() for value in dictA[key]]
But in general, if you've already figured out how to turn something into a nested list, you can flatten any nested iterable with chain.from_iterable:
itertools.chain.from_iterable(dictA[key] for key in dictA.keys())
This returns an iterator, not a list; if you need a list, just do it explicitly:
list(itertools.chain.from_iterable(dictA[key] for key in dictA.keys()))
As a side note, for key in dictA.keys() does the same thing as for key in dictA, except that in older versions of Python, it will waste time and memory making an extra list of the keys. As the documentation says, iter on a dict is the same as iterkeys.
So, in all of the versions above, it's better to just use in dictA instead.
In simple code just for understanding this might be helpful
ListA=[]
dictA = {1:['a','b','c'],2:['d','e']}
for keys in dictA:
for values in dictA[keys]:
ListA.append(values)
You can do some like ..
output_list = []
[ output_list.extend(x) for x in {1:['a','b','c'],2:['d','e']}.values()]
output_list will be ['a', 'b', 'c', 'd', 'e']

Categories