I have two dictionaries with multiple values:
t={'Musée de Armée':130102845,48.8570374,2.3118779}
While the other s is the sorted Ordered Dictionary of t:
s={'193 Gallery':3359610327,48.8624495,2.3652262}
Both dictionaries have a length of 800 and I would like to merge both dictionaries while keeping their values such that the final result is:
t={'Musée de Armée':130102845,48.8570374,2.3118779,'193 Gallery',3359610327,48.8624495,2.3652262}
This is what I have tried:
s = OrderedDict(sorted(t.items()))
for k,v in t,s:
t[k].append(s.values())
It gives me an error regarding too many values to unpack.
So I would like to join sorted and unsorted dictionary into one.
In general, you can create a new OrderedDict from two other like that:
my_od = OrderedDict(**s, **t)
You can try:
from collections import defaultdict
u = defaultdict(list)
for k, v in in s, t:
u[k].extend(v)
To those answering my question and providing guidance, thanks for your help. I had come to a solution the following day after posting this question. I just hadn't had the time to post it.What worked for me however, was to iterate through both the dictionary items in an array-like manner and the zip method:
zip(*iterables)
Make an iterator that aggregates elements from each of the iterables. Returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The iterator stops when the shortest input iterable is exhausted. With a single iterable argument, it returns an iterator of 1-tuples. With no arguments, it returns an empty iterator.The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using zip(*[iter(s)]*n). This repeats the same iterator n times so that each output tuple has the result of n calls to the iterator. This has the effect of dividing the input into n-length chunks.zip() should only be used with unequal length inputs when you don’t care about trailing, unmatched values from the longer iterables. If those values are important, use itertools.zip_longest() instead.zip() in conjunction with the * operator can be used to unzip a list
This was what I did:
d = dict()
for (k, v), (k2, v2) in zip(t.items(), s.items()):
d[k] = [v[0], v[1], v[2], k2, v2[0], v2[1], v2[2]]
zip is indeed a good tool when you have paired elements, as in your case. Since you did not say in your question that there are always exactly three numbers in the values, here is a more general solution that also modifies t in place, as you intended:
for key, (newkey, newval) in zip (t, s.items()):
t[key].extend([ newkey ] + newval)
This appends the key and values from s to the entry in t. Or you could do it in two steps, it's a matter of taste:
for key, (newkey, newval) in zip (t, s.items()):
t[key].append(newkey)
t[key].extend(newval)
Related
so I have a defaultdict(list) hashmap, potential_terms
potential_terms={9: ['leather'], 10: ['type', 'polyester'], 13:['hello','bye']}
What I want to output is the 2 values (words) with the lowest keys, so 'leather' is definitely the first output, but 'type' and 'polyester' both have k=10, when the key is the same, I want a random choice either 'type' or 'polyester'
What I did is:
out=[v for k,v in sorted(potential_terms.items(), key=lambda x:(x[0],random.choice(x[1])))][:2]
but when I print out I get :
[['leather'], ['type', 'polyester']]
My guess is ofcourse the 2nd part of the lambda function: random.choice(x[1]). Any ideas on how to make it work as expected by outputting either 'type' or 'polyester' ?
Thanks
EDIT: See Karl's answer and comment as to why this solution isn't correct for OP's problem.
I leave it here because it does demonstrate what OP originally got wrong.
key= doesn't transform the data itself, it only tells sorted how to sort,
you want to apply choice on v when selecting it for the comprehension, like so:
out=[random.choice(v) for k,v in sorted(potential_terms.items())[:2]]
(I also moved the [:2] inside, to shorten the list before the comprehension)
Output:
['leather', 'type']
OR
['leather', 'polyester']
You have (with some extra formatting to highlight the structure):
out = [
v
for k, v in sorted(
potential_terms.items(),
key=lambda x:(x[0], random.choice(x[1]))
)
][:2]
This means (reading from the inside out): sort the items according to the key, breaking ties using a random choice from the value list. Extract the values (which are lists) from those sorted items into a list (of lists). Finally, get the first two items of that list of lists.
This doesn't match the problem description, and is also somewhat nonsensical: since the keys are, well, keys, there cannot be duplicates, and thus there cannot be ties to break.
What we wanted: sort the items according to the key, then put all the contents of those individual lists next to each other to make a flattened list of strings, but randomizing the order within each sublist (i.e., shuffling those sublists). Then, get the first two items of that list of strings.
Thus, applying the technique from the link, and shuffling the sublists "inline" as they are discovered by the comprehension:
out = [
term
for k, v in sorted(
potential_terms.items(),
key = lambda x:x[0] # this is not actually necessary now,
# since the natural sort order of the items will work.
)
for term in random.sample(v, len(v))
][:2]
Please also see https://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/ to understand how the list flattening and result ordering works in a two-level comprehension like this.
Instead of the out, a simpler function, is:
d = list(p.values()) which stores all the values.
It will store the values as:
[['leather'], ['polyester', 'type'], ['hello', 'bye']]
You can access, leather as d[0] and the list, ['polyester', 'type'], as d[1]. Now we'll just use random.shuffle(d[1]), and use d[1][0].
Which would get us a random word, type or polyester.
Final code should be like this:
import random
potential_terms={9: ['leather'], 10: ['type', 'polyester'], 13:['hello','bye']}
d = list(p.values())
random.shuffle(d[1])
c = []
c.append(d[0][0])
c.append(d[1][0])
Which gives the desired output,
either ['leather', 'polyester'] or ['leather', 'type'].
I am looking for an efficient python method to utilise a hash table that has two keys:
E.g.:
(1,5) --> {a}
(2,3) --> {b,c}
(2,4) --> {d}
Further I need to be able to retrieve whole blocks of entries, for example all entries that have "2" at the 0-th position (here: (2,3) as well as (2,4)).
In another post it was suggested to use list comprehension, i.e.:
sum(val for key, val in dict.items() if key[0] == 'B')
I learned that dictionaries are (probably?) the most efficient way to retrieve a value from an object of key:value-pairs. However, calling only an incomplete tuple-key is a bit different than querying the whole key where I either get a value or nothing. I want to ask if python can still return the values in a time proportional to the number of key:value-pairs that match? Or alternatively, is the tuple-dictionary (plus list comprehension) better than using pandas.df.groupby() (but that would occupy a bit much memory space)?
The "standard" way would be something like
d = {(randint(1,10),i):"something" for i,x in enumerate(range(200))}
def byfilter(n,d):
return list(filter(lambda x:x==n, d.keys()))
byfilter(5,d) ##returns a list of tuples where x[0] == 5
Although in similar situations I often used next() to iterate manually, when I didn't need the full list.
However there may be some use cases where we can optimize that. Suppose you need to do a couple or more accesses by key first element, and you know the dict keys are not changing meanwhile. Then you can extract the keys in a list and sort it, and make use of some itertools functions, namely dropwhile() and takewhile():
ls = [x for x in d.keys()]
ls.sort() ##I do not know why but this seems faster than ls=sorted(d.keys())
def bysorted(n,ls):
return list(takewhile(lambda x: x[0]==n, dropwhile(lambda x: x[0]!=n, ls)))
bysorted(5,ls) ##returns the same list as above
This can be up to 10x faster in the best case (i=1 in my example) and more or less take the same time in the worst case (i=10) because we are trimming the number of iterations needed.
Of course you can do the same for accessing keys by x[1], you just need to add a key parameter to the sort() call
Is there a more pythonic way of obtaining a sorted list of dictionary keys with one key moved to the head? So far I have this:
# create a unique list of keys headed by 'event' and followed by a sorted list.
# dfs is a dict of dataframes.
for k in (dict.fromkeys(['event']+sorted(dfs))):
display(k,dfs[k]) # ideally this should be (k,v)
I suppose you would be able to do
for k, v in list(dfs.items()) + [('event', None)]:
.items() casts a dictionary to a list of tuples (or technically a dict_items, which is why I have to cast it to list explicitly to append), to which you can append a second list. Iterating through a list of tuples allows for automatic unpacking (so you can do k,v in list instead of tup in list)
What we really want is an iterable, but that's not possible with sorted, because it must see all the keys before it knows what the first item should be.
Using dict.fromkeys to create a blank dictionary by insertion order was pretty clever, but relies on an implementation detail of the current version of python. (dict is fundamentally unordered) I admit, it took me a while to figure out that line.
Since the code you posted is just working with the keys, I suggest you focus on that. Taking up a few more lines for readability is a good thing, especially if we can hide it in a testable function:
def display_by_keys(dfs, priority_items=None):
if not priority_items:
priority_items = ['event']
featured = {k for k in priority_items if k in dfs}
others = {k for k in dfs.keys() if k not in featured}
for key in list(featured) + sorted(others):
display(key, dfs[key])
The potential downside is you must sort the keys every time. If you do this much more often than the data store changes, on a large data set, that's a potential concern.
Of course you wouldn't be displaying a really large result, but if it becomes a problem, then you'll want to store them in a collections.OrderedDict (https://stackoverflow.com/a/13062357/1766544) or find a sorteddict module.
from collections import OrderedDict
# sort once
ordered_dfs = OrderedDict.fromkeys(sorted(dfs.keys()))
ordered_dfs.move_to_end('event', last=False)
ordered_dfs.update(dfs)
# display as often as you need
for k, v in ordered_dfs.items():
print (k, v)
If you display different fields first in different views, that's not a problem. Just sort all the fields normally, and use a function like the one above, without the sort.
I have a little task which I solved.
Task: find all PAIRS in a sequence which sum up to a certain number.
For example (1,2,3,4) and target 3 yields one pair (1,2).
I came up with a solution:
def pair(lst, find):
res = []
for i in lst:
if (find - i) in lst:
res.append([(find - i),i])
return {x:y for x,y in res}
I'm a bit surprised to see the dictionary comprehension filter all duplicate solutions.
Which actually forms my question: how and why a dictionary comprehension removes duplicates?
Because dict hashes its keys then store them in a set-like data structure. As a result the newly created {key :value} overrides the older one and in your case the duplicates. I think this may be a duplicate question
Given a list of tuples e.g.
[('a','b','4'),('c','d','9'),('e','f','2')]
The third element of each tuple will always be the string value of an integer.
I want to write each tuple as a row of a csv using csv.writerow().
Before I do, I want to reorder the tuples (ideally by overwriting the existing list or creating a new one) such that they get written in descending order of the integer value of that third element of each e.g.
c,d,9
a,b,4
e,f,2
I'm trying to imagine some sort of multiple if/else combo in a list comprehension, but surely there's go to be a simpler way?
The sorted function (or the list method sort) takes optional arguments reverse to allow you to sort in decreasing order, and key to allow you to specify what to sort by.
l = [('a','b','4'),('c','d','9'),('e','f','2')]
l.sort(key=lambda x: int(x[2]), reverse=True)
gives you the list in the order you want.
In my answer I use sys.stdout as an example but you may use a file instead
>>> import sys, csv
>>> items = [('a','b','4'),('c','d','9'),('e','f','2')]
>>> w = csv.writer(sys.stdout)
>>> w.writerows(sorted(items, key=lambda x: int(x[2]), reverse=True))
c,d,9
a,b,4
e,f,2
This works in both Python 2 and Python 3:
x = [('a','b','4'),('c','d','9'),('e','f','2')]
x.sort(key=lambda int(x:x[2]), reverse=True)
key is a function applied to each item in the list and returns the key to be used as the basis for sorting.
Another one using itemgetter slightly faster than lambda x: x[2] for small lists and considerably faster for larger lists.
from operator import itemgetter
l = [('a','b','4'),('c','d','9'),('e','f','2')]
l.sort(key=itemgetter(2), revese=True)
Python sorting mini Howto has got a lot many useful tricks worth reading.
Slightly different, albeit less efficient solution:
def sort_tuple_list(l):
sorted_ints = sorted([int(i[2]) for i in l], reverse=True)
sorted_tuple_list = []
for i in sorted_ints:
for tup in l:
if int(tup[2]) == i:
sorted_tuple_list.append(tup)
return sorted_tuple_list
Returns a list sorted according to the original question's specifications.
You can then simply write each row of this returned list to your csv file.