I have a dictionary where each value is a list, like so:
dictA = {1:['a','b','c'],2:['d','e']}
Unfortunately, I cannot change this structure to get around my problem
I want to gather all of the entries of the lists into one single list, as follows:
['a','b','c','d','e']
Additionally, I want to do this only once within an if-block. Since I only want to do it once, I do not want to store it to an intermediate variable, so naturally, a list comprehension is the way to go. But how? My first guess,
[dictA[key] for key in dictA.keys()]
yields,
[['a','b','c'],['d','e']]
which does not work because
'a' in [['a','b','c'],['d','e']]
yields False. Everything else I've tried has used some sort of illegal syntax.
How might I perform such a comprehension?
Loop over the returned list too (looping directly over a dictionary gives you keys as well):
[value for key in dictA for value in dictA[key]]
or more directly using dictA.itervalues():
[value for lst in dictA.itervalues() for value in lst]
List comprehensions let you nest loops; read the above loops as if they are nested in the same order:
for lst in dictA.itervalues():
for value in lst:
# append value to the output list
Or use itertools.chain.from_iterable():
from itertools import chain
list(chain.from_iterable(dictA.itervalues()))
The latter takes a sequence of sequences and lets you loop over them as if they were one big list. dictA.itervalues() gives you a sequence of lists, and chain() puts them together for list() to iterate over and build one big list out of them.
If all you are doing is testing for membership among all the values, then what you really want is to a simple way to loop over all the values, and testing your value against each until you find a match. The any() function together with a suitable generator expression does just that:
any('a' in lst for lst in dictA.itervalues())
This will return True as soon as any value in dictA has 'a' listed, and stop looping over .itervalues() early.
If you're actually checking for membership (your a in... example), you could rewrite it as:
if any('a' in val for val in dictA.itervalues()):
# do something
This saves having to flatten the list if that's not actually required.
In this particular case, you can just use a nested comprehension:
[value for key in dictA.keys() for value in dictA[key]]
But in general, if you've already figured out how to turn something into a nested list, you can flatten any nested iterable with chain.from_iterable:
itertools.chain.from_iterable(dictA[key] for key in dictA.keys())
This returns an iterator, not a list; if you need a list, just do it explicitly:
list(itertools.chain.from_iterable(dictA[key] for key in dictA.keys()))
As a side note, for key in dictA.keys() does the same thing as for key in dictA, except that in older versions of Python, it will waste time and memory making an extra list of the keys. As the documentation says, iter on a dict is the same as iterkeys.
So, in all of the versions above, it's better to just use in dictA instead.
In simple code just for understanding this might be helpful
ListA=[]
dictA = {1:['a','b','c'],2:['d','e']}
for keys in dictA:
for values in dictA[keys]:
ListA.append(values)
You can do some like ..
output_list = []
[ output_list.extend(x) for x in {1:['a','b','c'],2:['d','e']}.values()]
output_list will be ['a', 'b', 'c', 'd', 'e']
Related
so I have a defaultdict(list) hashmap, potential_terms
potential_terms={9: ['leather'], 10: ['type', 'polyester'], 13:['hello','bye']}
What I want to output is the 2 values (words) with the lowest keys, so 'leather' is definitely the first output, but 'type' and 'polyester' both have k=10, when the key is the same, I want a random choice either 'type' or 'polyester'
What I did is:
out=[v for k,v in sorted(potential_terms.items(), key=lambda x:(x[0],random.choice(x[1])))][:2]
but when I print out I get :
[['leather'], ['type', 'polyester']]
My guess is ofcourse the 2nd part of the lambda function: random.choice(x[1]). Any ideas on how to make it work as expected by outputting either 'type' or 'polyester' ?
Thanks
EDIT: See Karl's answer and comment as to why this solution isn't correct for OP's problem.
I leave it here because it does demonstrate what OP originally got wrong.
key= doesn't transform the data itself, it only tells sorted how to sort,
you want to apply choice on v when selecting it for the comprehension, like so:
out=[random.choice(v) for k,v in sorted(potential_terms.items())[:2]]
(I also moved the [:2] inside, to shorten the list before the comprehension)
Output:
['leather', 'type']
OR
['leather', 'polyester']
You have (with some extra formatting to highlight the structure):
out = [
v
for k, v in sorted(
potential_terms.items(),
key=lambda x:(x[0], random.choice(x[1]))
)
][:2]
This means (reading from the inside out): sort the items according to the key, breaking ties using a random choice from the value list. Extract the values (which are lists) from those sorted items into a list (of lists). Finally, get the first two items of that list of lists.
This doesn't match the problem description, and is also somewhat nonsensical: since the keys are, well, keys, there cannot be duplicates, and thus there cannot be ties to break.
What we wanted: sort the items according to the key, then put all the contents of those individual lists next to each other to make a flattened list of strings, but randomizing the order within each sublist (i.e., shuffling those sublists). Then, get the first two items of that list of strings.
Thus, applying the technique from the link, and shuffling the sublists "inline" as they are discovered by the comprehension:
out = [
term
for k, v in sorted(
potential_terms.items(),
key = lambda x:x[0] # this is not actually necessary now,
# since the natural sort order of the items will work.
)
for term in random.sample(v, len(v))
][:2]
Please also see https://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/ to understand how the list flattening and result ordering works in a two-level comprehension like this.
Instead of the out, a simpler function, is:
d = list(p.values()) which stores all the values.
It will store the values as:
[['leather'], ['polyester', 'type'], ['hello', 'bye']]
You can access, leather as d[0] and the list, ['polyester', 'type'], as d[1]. Now we'll just use random.shuffle(d[1]), and use d[1][0].
Which would get us a random word, type or polyester.
Final code should be like this:
import random
potential_terms={9: ['leather'], 10: ['type', 'polyester'], 13:['hello','bye']}
d = list(p.values())
random.shuffle(d[1])
c = []
c.append(d[0][0])
c.append(d[1][0])
Which gives the desired output,
either ['leather', 'polyester'] or ['leather', 'type'].
I am looking for an efficient python method to utilise a hash table that has two keys:
E.g.:
(1,5) --> {a}
(2,3) --> {b,c}
(2,4) --> {d}
Further I need to be able to retrieve whole blocks of entries, for example all entries that have "2" at the 0-th position (here: (2,3) as well as (2,4)).
In another post it was suggested to use list comprehension, i.e.:
sum(val for key, val in dict.items() if key[0] == 'B')
I learned that dictionaries are (probably?) the most efficient way to retrieve a value from an object of key:value-pairs. However, calling only an incomplete tuple-key is a bit different than querying the whole key where I either get a value or nothing. I want to ask if python can still return the values in a time proportional to the number of key:value-pairs that match? Or alternatively, is the tuple-dictionary (plus list comprehension) better than using pandas.df.groupby() (but that would occupy a bit much memory space)?
The "standard" way would be something like
d = {(randint(1,10),i):"something" for i,x in enumerate(range(200))}
def byfilter(n,d):
return list(filter(lambda x:x==n, d.keys()))
byfilter(5,d) ##returns a list of tuples where x[0] == 5
Although in similar situations I often used next() to iterate manually, when I didn't need the full list.
However there may be some use cases where we can optimize that. Suppose you need to do a couple or more accesses by key first element, and you know the dict keys are not changing meanwhile. Then you can extract the keys in a list and sort it, and make use of some itertools functions, namely dropwhile() and takewhile():
ls = [x for x in d.keys()]
ls.sort() ##I do not know why but this seems faster than ls=sorted(d.keys())
def bysorted(n,ls):
return list(takewhile(lambda x: x[0]==n, dropwhile(lambda x: x[0]!=n, ls)))
bysorted(5,ls) ##returns the same list as above
This can be up to 10x faster in the best case (i=1 in my example) and more or less take the same time in the worst case (i=10) because we are trimming the number of iterations needed.
Of course you can do the same for accessing keys by x[1], you just need to add a key parameter to the sort() call
I've always found Dictionaries to be an odd thing in python. I know it is just me i'm sure but I cant work out how to take two lists and add them to the dict. If both lists were mapable it wouldn't be a problem something like dictionary = dict(zip(list1, list2)) would suffice. However, during each run the list1 will always have one item and the list2 could have multiple items or single item that I'd like as values.
How could I approach adding the key and potentially multiple values to it?
After some deliberation, Kasramvd's second option seems to work well for this scenario:
dictionary.setdefault(list1[0], []).append(list2)
Based on your comment all you need is assigning the second list as a value to only item of first list.
d = {}
d[list1[0]] = list2
And if you want to preserve the values for duplicate keys you can use dict.setdefault() in order to create value of list of list for duplicate keys.
d = {}
d.setdefault(list1[0], []).append(list2)
I need to update a list while it is being iterated over.
Basically, i have a list of tuples called some_list Each tuple contains a bunch of strings, such as name and path. What I want to do is go over every tuple, look at the name, then find all the tuples that contain the string with an identical path and delete them from the list.
The order does not matter, I merely wish to go over the whole list, but whenever I encounter a tuple with a certain path, all tuples (including oneself) should be removed from the list. I can easily construct such a list and assign it to some_list_updated, but the problem seems to be that the original list does not update...
The code has more or less the following structure:
for tup in some_list[:]:
...
...somecode...
...
some_list = some_list_updated
It seems that the list does update appropriately when I print it out, but python keeps iterating over the old list, it seems. What is the appropriate way to go about it - if there is one? Thanks a lot!
You want to count the paths using a dictionary, then use only those that have a count of 1, then loop using a list comprehension to do the final filter. Using a collections.Counter() object makes the counting part easy:
from collections import Counter
counts = Counter(tup[index_of_path] for tup in some_list)
some_list = [tup for tup in some_list if counts[tup[index_of_path]] == 1]
I am trying to sort a dict based on its key and return an iterator to the values from within an overridden iter method in a class. Is there a nicer and more efficient way of doing this than creating a new list, inserting into the list as I sort through the keys?
How about something like this:
def itersorted(d):
for key in sorted(d):
yield d[key]
By far the easiest approach, and almost certainly the fastest, is something along the lines of:
def sorted_dict(d):
keys = d.keys()
keys.sort()
for key in keys:
yield d[key]
You can't sort without fetching all keys. Fetching all keys into a list and then sorting that list is the most efficient way to do that; list sorting is very fast, and fetching the keys list like that is as fast as it can be. You can then either create a new list of values or yield the values as the example does. Keep in mind that you can't modify the dict if you are iterating over it (the next iteration would fail) so if you want to modify the dict before you're done with the result of sorted_dict(), make it return a list.
def sortedDict(dictobj):
return (value for key, value in sorted(dictobj.iteritems()))
This will create a single intermediate list, the 'sorted()' method returns a real list. But at least it's only one.
Assuming you want a default sort order, you can used sorted(list) or list.sort(). If you want your own sort logic, Python lists support the ability to sort based on a function you pass in. For example, the following would be a way to sort numbers from least to greatest (the default behavior) using a function.
def compareTwo(a, b):
if a > b:
return 1
if a == b:
return 0
if a < b:
return -1
List.Sort(compareTwo)
print a
This approach is conceptually a bit cleaner than manually creating a new list and appending the new values and allows you to control the sort logic.