Iterating over dictionaries within dictionaries, dictionary object turning into string? - python

test = {'a':{'aa':'value','ab':'value'},'b':{'aa':'value','ab':'value'}}
#test 1
for x in test:
print(x['aa'])
#test 2
for x in test:
print(test[x]['aa'])
Why does test 1 give me a TypeError: string indices must be integers but test 2 pass?
Does the for loop turn the dictionary into a string?

If you iterate over a dictionary, you iterate over the keys. So that means in the first loop, x = 'a', and in the second x = 'b' (or vice versa, since dictionaries are unordered). It thus simply "ignores" the values. It makes no sense to index a string with a string (well there is no straightforward interpretation for 'a'['aa'], or at least not really one I can come up with that would be "logical" for a signifcant number of programmers).
Although this may look quite strange, it is quite consistent with the fact that a membership check for example also works on the keys (if we write 'a' in some_dict, it does not look to the values either).
If you want to use the values, you need to iterate over .values(), so:
for x in test.values():
print(x['aa'])
If you however use your second thest, then this works, since then x is a key (for example 'a'), and hence test[x] will fetch you the corresponding value. If you then process test[x] further, you thus process the values of the dictionary.
You can iterate concurrently over keys and values with .items():
for k, x in test.items():
# ...
pass
Here in the first iteration k will be 'a' and x will be {'aa':'value','ab':'value'}, in the second iteration k will be 'b' and x will be {'aa':'value','ab':'value'} (again the iterations can be swapped, since dictionaries are unordered).
If you thus are interested in the outer key, and the value that is associated with the 'aa' key of the corresponding subdictionary, you can use:
for k, x in test.items():
v = x['aa']
print(k, v)

When you iterate over a dictionary with a for, you're not iterating over the items, but over the keys ('a', 'b'). These are just strings that mean nothing. That's why you have to do it as on test 2. You could also iterate over the items with test.items().

Related

Python pick a random value from hashmap that has a list as value?

so I have a defaultdict(list) hashmap, potential_terms
potential_terms={9: ['leather'], 10: ['type', 'polyester'], 13:['hello','bye']}
What I want to output is the 2 values (words) with the lowest keys, so 'leather' is definitely the first output, but 'type' and 'polyester' both have k=10, when the key is the same, I want a random choice either 'type' or 'polyester'
What I did is:
out=[v for k,v in sorted(potential_terms.items(), key=lambda x:(x[0],random.choice(x[1])))][:2]
but when I print out I get :
[['leather'], ['type', 'polyester']]
My guess is ofcourse the 2nd part of the lambda function: random.choice(x[1]). Any ideas on how to make it work as expected by outputting either 'type' or 'polyester' ?
Thanks
EDIT: See Karl's answer and comment as to why this solution isn't correct for OP's problem.
I leave it here because it does demonstrate what OP originally got wrong.
key= doesn't transform the data itself, it only tells sorted how to sort,
you want to apply choice on v when selecting it for the comprehension, like so:
out=[random.choice(v) for k,v in sorted(potential_terms.items())[:2]]
(I also moved the [:2] inside, to shorten the list before the comprehension)
Output:
['leather', 'type']
OR
['leather', 'polyester']
You have (with some extra formatting to highlight the structure):
out = [
v
for k, v in sorted(
potential_terms.items(),
key=lambda x:(x[0], random.choice(x[1]))
)
][:2]
This means (reading from the inside out): sort the items according to the key, breaking ties using a random choice from the value list. Extract the values (which are lists) from those sorted items into a list (of lists). Finally, get the first two items of that list of lists.
This doesn't match the problem description, and is also somewhat nonsensical: since the keys are, well, keys, there cannot be duplicates, and thus there cannot be ties to break.
What we wanted: sort the items according to the key, then put all the contents of those individual lists next to each other to make a flattened list of strings, but randomizing the order within each sublist (i.e., shuffling those sublists). Then, get the first two items of that list of strings.
Thus, applying the technique from the link, and shuffling the sublists "inline" as they are discovered by the comprehension:
out = [
term
for k, v in sorted(
potential_terms.items(),
key = lambda x:x[0] # this is not actually necessary now,
# since the natural sort order of the items will work.
)
for term in random.sample(v, len(v))
][:2]
Please also see https://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/ to understand how the list flattening and result ordering works in a two-level comprehension like this.
Instead of the out, a simpler function, is:
d = list(p.values()) which stores all the values.
It will store the values as:
[['leather'], ['polyester', 'type'], ['hello', 'bye']]
You can access, leather as d[0] and the list, ['polyester', 'type'], as d[1]. Now we'll just use random.shuffle(d[1]), and use d[1][0].
Which would get us a random word, type or polyester.
Final code should be like this:
import random
potential_terms={9: ['leather'], 10: ['type', 'polyester'], 13:['hello','bye']}
d = list(p.values())
random.shuffle(d[1])
c = []
c.append(d[0][0])
c.append(d[1][0])
Which gives the desired output,
either ['leather', 'polyester'] or ['leather', 'type'].

Tuple-key dictionary in python: Accessing a whole block of entries

I am looking for an efficient python method to utilise a hash table that has two keys:
E.g.:
(1,5) --> {a}
(2,3) --> {b,c}
(2,4) --> {d}
Further I need to be able to retrieve whole blocks of entries, for example all entries that have "2" at the 0-th position (here: (2,3) as well as (2,4)).
In another post it was suggested to use list comprehension, i.e.:
sum(val for key, val in dict.items() if key[0] == 'B')
I learned that dictionaries are (probably?) the most efficient way to retrieve a value from an object of key:value-pairs. However, calling only an incomplete tuple-key is a bit different than querying the whole key where I either get a value or nothing. I want to ask if python can still return the values in a time proportional to the number of key:value-pairs that match? Or alternatively, is the tuple-dictionary (plus list comprehension) better than using pandas.df.groupby() (but that would occupy a bit much memory space)?
The "standard" way would be something like
d = {(randint(1,10),i):"something" for i,x in enumerate(range(200))}
def byfilter(n,d):
return list(filter(lambda x:x==n, d.keys()))
byfilter(5,d) ##returns a list of tuples where x[0] == 5
Although in similar situations I often used next() to iterate manually, when I didn't need the full list.
However there may be some use cases where we can optimize that. Suppose you need to do a couple or more accesses by key first element, and you know the dict keys are not changing meanwhile. Then you can extract the keys in a list and sort it, and make use of some itertools functions, namely dropwhile() and takewhile():
ls = [x for x in d.keys()]
ls.sort() ##I do not know why but this seems faster than ls=sorted(d.keys())
def bysorted(n,ls):
return list(takewhile(lambda x: x[0]==n, dropwhile(lambda x: x[0]!=n, ls)))
bysorted(5,ls) ##returns the same list as above
This can be up to 10x faster in the best case (i=1 in my example) and more or less take the same time in the worst case (i=10) because we are trimming the number of iterations needed.
Of course you can do the same for accessing keys by x[1], you just need to add a key parameter to the sort() call

Why does "in" work for keys but not for strings?

I came across a very weird thing with strings and dictionaries in Python today. Can someone explain to me why the print statement works in the first for loop but fails in the second for loop?
test = 'ab'
test_dict = {}
test_dict[test] = 1
for x, y in test_dict:
print('%s %s' % (x,y))
for x,y in test:
print('%s %s' % (x,y))
Both loops are broken. The first one only happens to work due to the very specific coincidence that test is exactly two characters long, and so can be unpacked into two variables x and y.
To iterate over a dict's keys and values, write:
for k,v in d.items():
...
If you just want the keys you can do:
for k in d:
...
In detail, when you loop over a dict it iterates over the keys.
for x,y in test_dict
The dict has exactly one key, "ab". So on the first and only iteration, it assigns that string to x and y as if you'd written:
x,y = "ab"
As it happens, this is a valid unpacking. Two variables on the left, a two-item container on the right. x becomes "a" and y becomes "b".
If test were longer or shorter the first loop would also crash with either "need more than N values to unpack" or "too many values to unpack".
Why is the string unpacked in 1 scenario but not unpacked in the other?
The second loop iterates over the string "ab" directly. When you iterate over a string it breaks the string into single-character strings. The first iteration is "a" and the second is "b". On that first iteration, it tries to do:
x,y = "a"
This assignment fails with "need more than 1 value to unpack" because there are two variables on the left and only one character on the right.
For the dictionary case, you are iterating dictionary keys. for x, y in test_dict means "for each key in test_dict take the key and unpack to variables x and y". Since the only key is 'ab', the string is unpacked to x = 'a' and y = 'b'. Of course, this works specifically because your only string key has length 2.
For the string case, you are iterating a string. for x, y in test will fail. You can't say "for each character in test unpack to multiple variables" because a single character is not iterable. Instead, you will meet:
ValueError: not enough values to unpack (expected 2, got 1)
for x,y in test:
print('%s %s' % (x,y))
since test is a list-like object (a string in python is list-like), iterating over it takes each character in turn. A character is not a list-like object of length 2, so trying to split it into x and y produces an error.
If you had test = ("ab", "bc") then test would be a tuple containing pairs of characters, which could be split using the expression above.
That's a very short answer, but I hope it clarifies what's going on.
The reason this works in the dict case is a little more complicated, but not very complicated. When you iterate over a dict in python, you actually iterate over its keys. This means that you have a list of one item, which is a string of length 2. As you saw above, a string of length 2 can be unpacked into its first and second characters, which is why the statement works.

How to look for elements in a set that satisfy a certain condition on the element itself?

I have a set in which every element is a tuple of 2 tuples, each of the tuples contains itself 3 elements/fields, for example an element of the set would look like:
(('q0','l0','s0'),('q1','l1','s1'))
I need to look for specific fields of the elements of the set.
The way i do this now is:
for set_element in my_set:
s0 = set_element[0]
s1 = set_element[1]
if s0 == '('q0','l0','s0')':
"add this set_element to another set"
Now this works, but with a really high number of elements of the set I don't find this very efficient because every time I have to iterate through all the elements of the set and I can't exploit the efficiency of sets.
Is there a more efficient way to to this? Consider that i may also need to access to just one specific field like 'q0'
edit:
I'll make a more detailed example, let's assume I have this set of elements:
x= [(('q0','l0','s0'),('q1','l1','s1')),(('q0','l1','s0'),
('q1','l2','s2')),(('q0','l0','s4'),('q1','l1','s1')),
(('q2','l2','s2'),('q3','l3','s3')),(('q4','l4','s4'),('q5','l5','s5'))]
and that i want to extract all the elements in which the first tuple has q0 as a element, so in this case the result would be
(('q0','l0','s0'),('q1','l1','s1'))
(('q0','l1','s0'),('q1','l2','s2'))
(('q0','l0','s4'),('q1','l1','s1'))
You may see a ~30% speed improvement by using a set comprehension instead of an explicit loop:
x = [(('q0','l0','s0'),('q1','l1','s1')),
(('q0','l1','s0'),('q1','l2','s2')),
(('q0','l0','s4'),('q1','l1','s1')),
(('q2','l2','s2'),('q3','l3','s3')),
(('q4','l4','s4'),('q5','l5','s5'))]
x = x*100000
def original(x):
res = set()
for set_element in x:
s0 = set_element[0]
if s0[0] == 'q0':
res.add(set_element)
return res
def jp(x):
return {k for k in x if k[0][0] == 'q0'}
%timeit original(x) # 110ms
%timeit jp(x) # 77ms
I would use a dictionary instead of a tuple of a tuple, since you are only comparing the first item in the tuple pair, we can use that as a key, and use the second element in the tuple as the value. Then you can simply take advantage of the dictionary look up speed.
x = [(('q0','l0','s0'),('q1','l1','s1')),(('q2','l2','s2'),('q3','l3','s3')),(('q4','l4','s4'),('q5','l5','s5'))]
d = {a:b for a,b in x}
if ('q0','l0','s0') in d.keys():
print(d[('q0','l0','s0')]) # or do something like adding it to another set
Note this only works if the first element of the tuple pairs are unique. If they aren't you have to slightly change how the dictionary is created, by checking if the key exist, and if it does then create a list of values instead, but this is getting off topic and just speculating details.
If you only need to search q0 you can simply iterate through each keys, and checking if q0 belongs to said key. You won't be able to get around this unless you flatting your tuple and have a list of values as a key:
for each in d.keys():
if 'q0' in each:
print(d[('q0','l0','s0')])
If this isn't what you are looking for then dictionary might not be the solution for you. You could also store the key as only q0 for going the whole outer for each in d.keys() but once again it's up to your requirements and you said you need to be able to search ('q0','l0','s0') and q0.
You can try these methods if you want:
x= [(('q0','l0','s0'),('q1','l1','s1')),(('q0','l1','s0'),
('q1','l2','s2')),(('q0','l0','s4'),('q1','l1','s1')),
(('q2','l2','s2'),('q3','l3','s3')),(('q4','l4','s4'),('q5','l5','s5'))]
print(list(filter(lambda x:x[0][0]=='q0',x)))
second
def recursive(data):
data1 = data[0]
if data1[0][0] == 'q0':
print(data1)
return recursive(data[1:])
print(recursive(x))

List Comprehension of Lists Nested in Dictionaries

I have a dictionary where each value is a list, like so:
dictA = {1:['a','b','c'],2:['d','e']}
Unfortunately, I cannot change this structure to get around my problem
I want to gather all of the entries of the lists into one single list, as follows:
['a','b','c','d','e']
Additionally, I want to do this only once within an if-block. Since I only want to do it once, I do not want to store it to an intermediate variable, so naturally, a list comprehension is the way to go. But how? My first guess,
[dictA[key] for key in dictA.keys()]
yields,
[['a','b','c'],['d','e']]
which does not work because
'a' in [['a','b','c'],['d','e']]
yields False. Everything else I've tried has used some sort of illegal syntax.
How might I perform such a comprehension?
Loop over the returned list too (looping directly over a dictionary gives you keys as well):
[value for key in dictA for value in dictA[key]]
or more directly using dictA.itervalues():
[value for lst in dictA.itervalues() for value in lst]
List comprehensions let you nest loops; read the above loops as if they are nested in the same order:
for lst in dictA.itervalues():
for value in lst:
# append value to the output list
Or use itertools.chain.from_iterable():
from itertools import chain
list(chain.from_iterable(dictA.itervalues()))
The latter takes a sequence of sequences and lets you loop over them as if they were one big list. dictA.itervalues() gives you a sequence of lists, and chain() puts them together for list() to iterate over and build one big list out of them.
If all you are doing is testing for membership among all the values, then what you really want is to a simple way to loop over all the values, and testing your value against each until you find a match. The any() function together with a suitable generator expression does just that:
any('a' in lst for lst in dictA.itervalues())
This will return True as soon as any value in dictA has 'a' listed, and stop looping over .itervalues() early.
If you're actually checking for membership (your a in... example), you could rewrite it as:
if any('a' in val for val in dictA.itervalues()):
# do something
This saves having to flatten the list if that's not actually required.
In this particular case, you can just use a nested comprehension:
[value for key in dictA.keys() for value in dictA[key]]
But in general, if you've already figured out how to turn something into a nested list, you can flatten any nested iterable with chain.from_iterable:
itertools.chain.from_iterable(dictA[key] for key in dictA.keys())
This returns an iterator, not a list; if you need a list, just do it explicitly:
list(itertools.chain.from_iterable(dictA[key] for key in dictA.keys()))
As a side note, for key in dictA.keys() does the same thing as for key in dictA, except that in older versions of Python, it will waste time and memory making an extra list of the keys. As the documentation says, iter on a dict is the same as iterkeys.
So, in all of the versions above, it's better to just use in dictA instead.
In simple code just for understanding this might be helpful
ListA=[]
dictA = {1:['a','b','c'],2:['d','e']}
for keys in dictA:
for values in dictA[keys]:
ListA.append(values)
You can do some like ..
output_list = []
[ output_list.extend(x) for x in {1:['a','b','c'],2:['d','e']}.values()]
output_list will be ['a', 'b', 'c', 'd', 'e']

Categories