How can I keep python dictionary sorted? - python

lets say I have list like this
x = ["foo", "foo", "bar", "baz", "foo", "bar"]
and I want to count number of each occurrence in a dict but I want them ordered while I am looping through the list. Like this:
from collections import defaultdict
ordered_dict = defaultdict(lambda: 0, {})
for line in x:
ordered_dict[line] += 1
I want the result to be something like this:
{"foo":3, "bar":2, "baz":1}
I wonder if there is someway to keep the dictionary ordered while I am looping. Currently I use heapq after loop

You need an OrderedDict with the "reversed" option. You used a defaultdict instead. These are not sorted, by design.
https://docs.python.org/dev/library/collections.html#collections.OrderedDict
OrderedDict is usually sorted by key insertion. If you want the dictionary sorted by value, you may want to use a Counter and then insert those entries into an OrderedDict.

Basically, you just used the wrong collection from collections:
>>> from collections import Counter, OrderedDict
>>> OrderedDict(Counter(["foo", "foo", "bar", "baz", "foo", "bar"]).most_common())
OrderedDict([('foo', 3), ('bar', 2), ('baz', 1)])

Code:
x = ["foo", "foo", "bar", "baz", "foo", "bar"]
d = {}
for item in x:
d[item] = d.get(item,0) + 1
print(d)
Output:
{'bar': 2, 'baz': 1, 'foo': 3}

Related

Python array of tuples group by first, store second

So I have an array of tuples something like this
query_results = [("foo", "bar"), ("foo", "qux"), ("baz", "foo")]
I would like to achieve something like:
{
"foo": ["bar", "qux"],
"baz": ["foo"]
}
So I have tried using this
from itertools import groupby
grouped_results = {}
for key, y in groupby(query_results, lambda x: x[0]):
grouped_results[key] = [y[1] for u in list(y)]
The issue I have is although the number of keys are correct, the number of values in each array is dramatically lower than it should be. Can anyone explain why this happens and what I should be doing?
You better use a defaultdict for this:
from collections import defaultdict
result = defaultdict(list)
for k,v in query_results:
result[k].append(v)
Which yields:
>>> result
defaultdict(<class 'list'>, {'baz': ['foo'], 'foo': ['bar', 'qux']})
If you wish to turn it into a vanilla dictionary again, you can - after the for loop - use:
result = dict(result)
this then results in:
>>> dict(result)
{'baz': ['foo'], 'foo': ['bar', 'qux']}
A defaultdict is constructed with a factory, here list. In case the key cannot be found in the dictionary, the factory is called (list() constructs a new empty list). The result is then associated with the key.
So for each key k that is not yet in the dictionary, we will construct a new list first. We then call .append(v) on that list to append values to it.
Well why not use a simple for loop?
grouped_results = {}
for key, value in query_results:
grouped_results.setdefault(key, []).append(value)
Output:
{'foo': ['bar', 'qux'], 'baz': ['foo']}
How about using a defaultdict?
d = defaultdict(list)
for pair in query_results:
d[pair[0]].append(pair[1])

Optimize creation of dictionary

I have a list with ids called ids. Every element in ids is a string. One id can exist multiple times in this list.
My aim is to create a dictionary which has the the number of occurrences as a key and the value is a list of the ids which appear that often.
My current approach looks like this:
from collections import defaultdict
import numpy as np
ids = ["foo", "foo", "bar", "hi", "hi"]
counts = defaultdict(list)
for id in np.unique(ids):
counts[ids.count(id)].append(id)
Output:
print counts
--> defaultdict(<type 'list'>, {1: ['bar'], 2: ['foo', 'hi']})
This works nicely if the list of ids is not too long. However, for longer lists the performance is rather bad.
How can I make this faster?
Instead of calling count for each element in the list, create a collections.Counter for the entire list:
ids = ["foo", "foo", "bar", "hi", "hi"]
counts = defaultdict(list)
for i, c in Counter(ids).items():
counts[c].append(i)
# counts: defaultdict(<class 'list'>, {1: ['bar'], 2: ['foo', 'hi']})
If you prefer a one-liner, you could also combine Counter.most_common (for view on the elements sorted by counts) and itertools.groupby (but I rather wouldn't)
>>> {k: [v[0] for v in g] for k, g in groupby(Counter(ids).most_common(), lambda x: x[1])}
{1: ['bar'], 2: ['foo', 'hi']}

Get indexes for first occurrence of each element in a list?

I have a list of about 90k elements (about 670 unique). I would like to get the indexes for the first occurrence of each value. I have just tried a list comprehension like this:
In: [["foo", "bar", "baz", "bar", "foo"].index(x) for x in ["foo", "bar", "baz", "bar", "foo"]]
Out: [0, 1, 2, 1, 0]
This works, but it takes a couple of minutes to run on my machine. What is a better (faster) way to do this?
You can build a dictionary that stores the index of the first occurrence of each word.
That way, you only look through your big list once, and the dictionary lookups are much faster, since the dictionary contains each value only once, and is accessed in O(log(n)).
l = ["foo", "bar", "baz", "bar", "foo"]
v = {}
for i, x in enumerate(l):
if x not in v:
v[x] = i
# v is now {'bar': 1, 'baz': 2, 'foo': 0}
Also, if you want to output a 90k-long list containing the index of the first occurrence for each element in the original list, you can get it that way:
output = [v[x] for x in l]
# output is now [0, 1, 2, 1, 0]
I think you just want to use enumerate (unless you want the first occurrence of each item in the list):
strings = ["foo", "bar", "baz", "bar", "foo"]
for index, value in enumerate(strings):
print index, value
outputs
0 foo
1 bar
2 baz
3 bar
4 foo
If you wanted, for example, 1 bar instead of 3 bar, you can maintain a dictionary of found strings:
for index, value in enumerate(strings):
if value not in d:
d[value] = index
for value in strings:
print value, d[value]
Your question is very ambiguous but as i understood it you have many duplicate values and you just want to get the index of the first appearance for each. I would leverage sets like this:
my_list = ["foo", "bar", "baz", "bar", "foo"]
my_list_unique = set(my_list)
indexes = [(x, my_list.index(x)) for x in my_list_unique]
print(indexes) # prints -> [('foo', 0), ('bar', 1), ('baz', 2)]
Note that the creation of a set in line 3 removes the duplicates so every entry in my_list_unique only exists once. This saves time when looking for the indexes. As far as the results go, it is a list of tuples where each tuple contains the string and the index in which it is first found in my_list

Get a list of a dictionary's key names

How do I make a function that takes a dictionary as input and outputs the names of its keys? So like:
input_dictionary = {"foo": 1, "bar": 2}
names = get_names(input_dictionary) # returns ["foo", "bar"]
You should read the Python documentation for dictionaries (see here for Python 3.5). There are multiple ways that this can be done.
You could use:
input_dictionary.keys(), obviously the easiest solution:
def get_names(input_dictionary):
return input_dictionary.keys()
input_dictionary.iterkeys(), to get an iterator over the dictionary's keys, then you can iterate over those keys to create a list of them:
def get_names(input_dictionary):
list_of_keys = []
for key in input_dictionary.iterkeys():
list_of_keys.append(key)
return list_of_keys
input_dictionary.iteritems(), which returns an iterator over the dictionary's (key, value) pairs, which you can iterate over and then extract the keys:
def get_names(input_dictionary):
list_of_keys = []
for item in input_dictionary.iteritems():
list_of_keys.append(item[0])
return list_of_keys
input_dictionary.popitem(), which pops (removes) and returns an arbitrary (key, value) pair from your dictionary, from which you can extract the key. You probably don't want this one, though, since it clears your dictionary
And finally, input_dictionary.viewitems() or input_dictionary.viewkeys() to get a view of the (key, value) pairs or the list of keys, respectively, for your dictionary. Anytime the dictionary changes, this view object will reflect that change.
input_dictionary = {"foo": 1, "bar": 2}
input_dictionary.keys() # ["foo", "bar"]
Using keys():
>>> input_dictionary = {"foo": 1, "bar": 2}
>>> print input_dictionary.keys()
['foo', 'bar']
So a function would be:
def dictkeys(mydictionary):
return mydictionary.keys()
Output:
>>> dictkeys(input_dictionary)
['foo', 'bar']
You don't really need a function for this though because it's the same as just simply using dictionaryname.keys()

Passing String, integer and tuple information as key for python dictionary

I'm trying to create a python dictionary and I would like to use a key that contains strings, numerics & a list/tuple entry. The key should ideally look like
("stringA", "stringB", "stringC", integer1, (integer2, integer3, integer4))
I tried to create a namedtuple based on this documentation as follows
from collections import namedtuple
dictKey = namedtuple('dictKey', 'stringA stringB stringC integer1
(integer2 integer3 integer4)')
but it throws me a ValueError saying it can only contain alphanumeric characters and underscores. So
How can I create a dictionary key which contains a tuple?
How to effectively use the dictionary key (especially the tuple it
contains) to retrieve information from the dictionary?
The issue here is with your namedtuple definition, not the dictionary key structure itself, which will work just fine, e.g.:
>>> d = {}
>>> d[('1', '2', 3, (4, 5))] = 'foo'
>>> d
{('1', '2', 3, (4, 5)): 'foo'}
When the namedtuple reads the field_names parameter, it thinks you're trying to create a field named (integer2, and doesn't realise that you mean it to be a nested tuple.
To define that structure in a namedtuple, you will instead have to have an attribute that is itself a tuple:
>>> from collections import namedtuple
>>> dictKey = namedtuple("dictKey", "stringA stringB stringC integer1 tuple1")
>>> key = dictKey("foo", "bar", "baz", 1, (2, 3, 4))
>>> d[key] = 'bar'
>>> d
{dictKey(stringA='foo', stringB='bar', stringC='baz', integer1=1, tuple1=(2, 3, 4)): 'bar',
('1', '2', 3, (4, 5)): 'foo'}
You can retrieve the value stored against the key exactly as you can for any other, either with the original namedtuple:
>>> d[key]
'bar'
or a new one:
>>> d[dictKey("foo", "bar", "baz", 1, (2, 3, 4))]
'bar'

Categories