How to make "seen" hash with python dict? - python

In Perl one can do this:
my %seen;
foreach my $dir ( split /:/, $input ) {
$seen{$dir}++;
}
This is a way to remove duplicates by keeping track of what has been "seen". In python you cannot do:
seen = {}
for x in ['one', 'two', 'three', 'one']:
seen[x] += 1
The above python results in KeyError: 'one'.
What is python-y way of making a 'seen' hash?

Use a defaultdict for getting this behavior. The catch is that you need to specify the datatype for defaultdict to work for even those keys which don't have a value:
In [29]: from collections import defaultdict
In [30]: seen = defaultdict(int)
In [31]: for x in ['one', 'two', 'three', 'one']:
...: seen[x] += 1
In [32]: seen
Out[32]: defaultdict(int, {'one': 2, 'three': 1, 'two': 1})
You can use a Counter as well:
>>> from collections import Counter
>>> seen = Counter()
>>> for x in ['one', 'two', 'three', 'one']: seen[x] += 1
...
>>> seen
Counter({'one': 2, 'three': 1, 'two': 1})
If all you need are uniques, just do a set operation: set(['one', 'two', 'three', 'one'])

You could use a set:
>>> seen=set(['one', 'two', 'three', 'one'])
>>> seen
{'one', 'two', 'three'}

If you unroll seen[x] += 1 into seen[x] = seen[x] + 1, the problem with your code is obvious: you're trying to access seen[x] before you've assigned to it. Instead, you need to check if the key exists first:
seen = {}
for x in ['one', 'two', 'three', 'one']:
if x in seen:
seen[x] += 1 # we've seen it before, so increment
else:
seen[x] = 1 # first time seeing x

Related

Python: select key, values from dictionary corresponding to given list

I have a set dictionary like so:
d = {'cat': 'one', 'dog': 'two', 'fish': 'three'}
Given a list, can I just keep the key, values given?
Input:
l = ['one', 'three']
Output:
new_d = {'cat': 'one', 'fish': 'three'}
You can use dictionary comprehension to achieve this easily:
{k: v for k, v in d.items() if v in l}
The scenario you've described above provides a perfect use case for the IN operator, which tests whether or not a value is a member of a collection, such as a list.
The code below is to demonstrate the concept. For more practical applications, look at dictionary comprehension.
d = {'cat': 'one', 'dog': 'two', 'fish': 'three'}
l = ['one', 'three']
d_output = {}
for k,v in d.items(): # Loop through input dictionary
if v in l: # Check if the value is included in the given list
d_output[k] = v # Assign the key: value to the output dictionary
print(d_output)
Output is:
{'cat': 'one', 'fish': 'three'}
You can copy your dictionary and drop unwanted elements:
d = {'cat': 'one', 'dog': 'two', 'fish': 'three'}
l = ['one', 'three']
new_d = d.copy()
for element in d:
if (d[element]) not in l:
new_d.pop(element)
print(d)
print(new_d)
Output is:
{'cat': 'one', 'dog': 'two', 'fish': 'three'}
{'cat': 'one', 'fish': 'three'}

lambda in lambda

Can anybody explain me whats wrong i am doing here -
multiArray = [
['one', 'two', 'three', 'four', 'five'],
['one', 'two', 'three', 'four', 'five'],
['one', 'two', 'three', 'four', 'five']
]
search ='four'
p1 = list(filter(lambda outerEle: search == outerEle, multiArray[0]))
p = list(filter(lambda multiArrayEle: list(filter(lambda innerArrayEle: search == innerArrayEle, multiArrayEle)), multiArray))
print (p1)
print (p)
The result i am getting here is
['four']
[['one', 'two', 'three', 'four', 'five'], ['one', 'two', 'three', 'four', 'five'], ['one', 'two', 'three', 'four', 'five']]
while i am expecting
[['four'],['four'],['four']]
In your second filter, you are using a list as a predicate (as opposed to simply a bool as you do in the first filter); now, this implicitly applies the built-in method bool to each element list, and for a list l, bool(l) is true exactly when l is non-empty:
In [4]: bool([])
Out[4]: False
In [5]: bool(['a'])
Out[5]: True
This allows you to pick out, for example, all the non-empty lists in a list of lists:
In [6]: ls = [['a'], [], ['b']]
In [7]: list(filter(lambda l: l, ls))
Out[7]: [['a'], ['b']]
Thus, in your case, at the end of the day, your filter ends up giving you all lists for which 'four' appears, which is all of them.
From your given example, it's not immediately obvious what you are trying to achieve as all the inputs are identical, but my guess is that it's something like the following:
In [19]: multiArray = [
...: ['one', 'two', 'three', 'four', 'five', 'four'],
...: ['one', 'two', 'three', 'for', 'five'],
...: ['one', 'two', 'three', 'four', 'five']
...: ]
In [20]: [list(filter(lambda x: x == search, l)) for l in multiArray]
Out[20]: [['four', 'four'], [], ['four']]
While #fuglede's answer is really the answer to your question, you can archive the result you want by changing your outer filter to map:
p = list(map(lambda multiArrayEle: list(filter(lambda innerArrayEle: search == innerArrayEle, multiArrayEle)), multiArray))

List all elements, but only one of duplicated elements?

Say I have a list of strings such as
words = ['one', 'two', 'one', 'three', 'three']
I want to create a new list in alphabetical order like
newList = ['one', 'three', 'two']
Anyone have any solutions? I have seen suggestions that output duplicates, but I cannot figure out how to achieve this particular goal (or maybe I just can't figure out how to google well.)
Throw the contents into a set to remove duplicates and sort:
newList = sorted(set(words))
OR maybe this, using set:
newList=sorted({*words})
If Order of elements in words is important for you. You can try this.
from collections import OrderedDict
words = ['one', 'two', 'one', 'three', 'three']
w1 = OrderedDict()
for i in words:
if i in w1:
w1[i]+=1
else:
w1[i] = 1
print(w1.keys())

Strict Comparison of Dictionaries in Python

I'm having a little bit of trouble comparing two similar dictionaries. I would like stricter comparison of the values (and probably keys).
Here's the really basic problem:
>>> {'a': True} == {'a': 1}
True
Similarly (and somewhat confusingly):
>>> {1: 'a'} == {True: 'a'}
True
This makes sense because True == 1. What I'm looking for is something that behaves more like is, but compares two possibly nested dictionaries. Obviously you can't use use is on the two dictionaries, because that will always return False, even if all of the elements are identical.
My current solution is to just use json.dumps to get a string representation of both and compare that.
>>> json.dumps({'a': True}, sort_keys=True) == json.dumps({'a': 1}, sort_keys=True)
False
But this only works if everything is JSON-serializable.
I also tried comparing all of the keys and values manually:
>>> l = {'a': True}
>>> r = {'a': 1}
>>> r.keys() == l.keys() and all(l[key] is r[key] for key in l.keys())
False
But this fails if the dictionaries have some nested structure. I figured I could write a recursive version of this to handle the nested case, but it seemed unnecessarily ugly and un-pythonic.
Is there a "standard" or simple way of doing this?
Thanks!
You were pretty close with JSON: Use Python's pprint module instead. This is documented to sort dictionaries in Python 2.5+ and 3:
Dictionaries are sorted by key before the display is computed.
Let's confirm this. Here's a session in Python 3.6 (which conveniently preserves insertion order even for regular dict objects):
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)]
on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> a = {2: 'two', 3: 'three', 1: 'one'}
>>> b = {3: 'three', 2: 'two', 1: 'one'}
>>> a
{2: 'two', 3: 'three', 1: 'one'}
>>> b
{3: 'three', 2: 'two', 1: 'one'}
>>> a == b
True
>>> c = {2: 'two', True: 'one', 3: 'three'}
>>> c
{2: 'two', True: 'one', 3: 'three'}
>>> a == b == c
True
>>> from pprint import pformat
>>> pformat(a)
"{1: 'one', 2: 'two', 3: 'three'}"
>>> pformat(b)
"{1: 'one', 2: 'two', 3: 'three'}"
>>> pformat(c)
"{True: 'one', 2: 'two', 3: 'three'}"
>>> pformat(a) == pformat(b)
True
>>> pformat(a) == pformat(c)
False
>>>
And let's quickly confirm that pretty-printing sorts nested dictionaries:
>>> a['b'] = b
>>> a
{2: 'two', 3: 'three', 1: 'one', 'b': {3: 'three', 2: 'two', 1: 'one'}}
>>> pformat(a)
"{1: 'one', 2: 'two', 3: 'three', 'b': {1: 'one', 2: 'two', 3: 'three'}}"
>>>
So, instead of serializing to JSON, serialize using pprint.pformat(). I imagine there may be some corner cases where two objects that you want to consider unequal nevertheless create the same pretty-printed representation. But those cases should be rare, and you wanted something simple and Pythonic, which this is.
You can test identity of all (key, value) pairs element-wise:
def equal_dict(d1, d2):
return all((k1 is k2) and (v1 is v2)
for (k1, v1), (k2, v2) in zip(d1.items(), d2.items()))
>>> equal_dict({True: 'a'}, {True: 'a'})
True
>>> equal_dict({1: 'a'}, {True: 'a'})
False
This should work with float, int, str and bool, but not other sequences or more complex objects.
Anyway, that's a start if you need it.
I think you are looking for something like this. However since you didn't provide example data I won't go into guessing what it could be
from boltons.itertools import remap
def compare(A, B): return A == B and type(A) == type(B)
dict_to_compare_against = { some dict }
def visit(path, key, value):
cur = dict_to_compare_against
for i in path:
cur = cur[i]
if not compare(cur, value):
raise Exception("Not equal")
remap(other_dict, visit=visit)
You can use isinstance() to delineate between a regular dictionary entry and a nested dictionary entry. This way you can iterate through using is to compare strictly, but also check when you need to dive down a level into the nested dictionary.
https://docs.python.org/3/library/functions.html#isinstance
myDict = {'a': True, 'b': False, 'c': {'a': True}}
for key, value in myDict.items():
if isinstance(value, dict):
# do what you need to do....
else:
# etc...

Dict merge in a dict comprehension

In python 3.5, we can merge dicts by using double-splat unpacking
>>> d1 = {1: 'one', 2: 'two'}
>>> d2 = {3: 'three'}
>>> {**d1, **d2}
{1: 'one', 2: 'two', 3: 'three'}
Cool. It doesn't seem to generalise to dynamic use cases, though:
>>> ds = [d1, d2]
>>> {**d for d in ds}
SyntaxError: dict unpacking cannot be used in dict comprehension
Instead we have to do reduce(lambda x,y: {**x, **y}, ds, {}), which seems a lot uglier. Why the "one obvious way to do it" is not allowed by the parser, when there doesn't seem to be any ambiguity in that expression?
It's not exactly an answer to your question but I'd consider using ChainMap to be an idiomatic and elegant way to do what you propose (merging dictionaries in-line):
>>> from collections import ChainMap
>>> d1 = {1: 'one', 2: 'two'}
>>> d2 = {3: 'three'}
>>> ds = [d1, d2]
>>> dict(ChainMap(*ds))
{1: 'one', 2: 'two', 3: 'three'}
Although it's not a particularly transparent solution, since many programmers might not know exactly how a ChainMap works. Note that (as #AnttiHaapala points out) "first found is used" so, depending on your intentions you might need to make a call to reversed before passing your dicts into ChainMap.
>>> d2 = {3: 'three', 2: 'LOL'}
>>> ds = [d1, d2]
>>> dict(ChainMap(*ds))
{1: 'one', 2: 'two', 3: 'three'}
>>> dict(ChainMap(*reversed(ds)))
{1: 'one', 2: 'LOL', 3: 'three'}
To me, the obvious way is:
d_out = {}
for d in ds:
d_out.update(d)
This is quick and probably quite performant. I don't know that I can speak for the python developers, but I don't know that your expected version is more easy to read. For example, your comprehension looks more like a set-comprehension to me due to the lack of a :. FWIW, I don't think there is any technical reason (e.g. parser ambiguity) that they couldn't add that form of comprehension unpacking.
Apparently, these forms were proposed, but didn't have universal enough support to warrant implementing them (yet).
You could use itertools.chain or itertools.chain.from_iterable:
import itertools
ds = [{'a': 1, 'b': 2}, {'c': 30, 'b': 40}]
merged_d = dict(itertools.chain(*(d.items() for d in ds)))
print(merged_d) # {'a': 1, 'b': 40, 'c': 30}
Based on this solution and also mentioned by #ilgia-everilä, but making it Py2 compatible and still avoiding intermediate structures. Encapsulating it inside a function makes its use quite readable.
def merge_dicts(*dicts, **extra):
"""
>>> merge_dicts(dict(a=1, b=1), dict(b=2, c=2), dict(c=3, d=3), d=4, e=4)
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 4}
"""
return dict((
(k,v)
for d in dicts
for k,v in d.items()
), **extra)
Idiomatic, without ChainMap:
>>> d1 = {1: 'one', 2: 'two'}
>>> d2 = {3: 'three'}
>>> {k: v for d in [d1, d2] for k, v in d.items()}
{1: 'one', 2: 'two', 3: 'three'}
You could define this function:
from collections import ChainMap
def mergeDicts(l):
return dict(ChainMap(*reversed(list(l))))
You can then use it like this:
>>> d1 = {1: 'one', 2: 'two'}
>>> d2 = {3: 'three'}
>>> ds = [d1, d2]
>>> mergeDicts(ds)
{1: 'one', 2: 'two', 3: 'three'}

Categories