Combinations of several dicts python

Combinations of several dicts python - python

I have 15 dicts like the following 3 (all 15 are of varying lengths).
For example:
HQDict = {'HQ1':10, 'HQ2':3, 'HQ3':5}
BADict = {'BA1':15, 'BA2':4, 'BA3':3}
STDict = {'ST1':5, 'ST2':4, 'ST3':3}
I want to create all the possible combinations of the 15 dicts with only one element selected from each dict with the values added together and the keys stored in a list. I have been able to get all the information into the respective dicts but I am clueless on where to start with the combinations, I have seen itertools.combinations but I'm not sure how to make it only select 1 element from each dict. If you need any more information please ask and I will be happy to edit.
Edit1:
I also needed to add that the values are additive so value of BA2 will be the value of BA1 + BA2 and that the combination could be a list of 1.
list=[HQ1,BA2,ST1]
value=34
next permutation
list=[HQ2]
value=13
Edit2:
Rather than try and create the combinations of the dicts the end goal is to give the function a total and it will return all the possible combinations of buildings (each dict represents a building and each item in the dict a level) that add up to that total. So for example:
combinations(34) would return
[HQ1,BA2,ST1]
and combinations(13 would return
[HQ2]
pastebin to file containing all buildings and code im using to create the dicts : link to pastebin

I have seen itertools.combinations but I'm not sure how to make it only select 1 element from each dict.
Use itertools.product(..) instead. It takes a varying list of arguments each corresponding to a list of options to pick in one iteration:
>>> map(dict, product(HQDict.items(), STDict.items(), BADict.items()))
[{'HQ1': 10, 'BA2': 4, 'ST1': 5}, {'HQ1': 10, 'ST1': 5, 'BA3': 3}, ...... ]
If you have 15 such dicts, I'd suggest putting all of them in a list, and calling product like below:
>>> map(dict, product(*list_of_dicts))
EDIT: In python3, you will get a map object back, and you'll have to iterate over it to get the actual values. You can convert it to a list, but will defeat the purpose of map returning something that you can iterate over. You can convert to a list like:
>>> [dict(x) for x in product(HQDict.items(), STDict.items(), BADict.items())]
[{'HQ1': 10, 'BA2': 4, 'ST1': 5}, {'HQ1': 10, 'ST1': 5, 'BA1': 15}, ..]

Related

How to find maximum element from a list and its index?

I have a list with ordered dictionaries. These ordered dictionaries have different sizes and can also have the same size(for example, 10 dictionaries can have the length of 30 and 20 dictionaries can have the length of 32). I want to find the maximum number of items a dictionary from the list has. I have tried this, which gets me the correct maximum length:
maximum_len= max(len(dictionary_item) for dictionary_item in item_list)
But how can I find the dictionary fields for which the maximum_len is given? Say that the maximum_len is 30, I want to also have the dictionary with the 30 keys printed. It can be any dictionary with the size 30, not a specific one. I just need the keys of that dictionary.

Well you can always use filter:
output_dics=filter((lambda x: len(x)==maximum_len),item_list)
then you have all the dictionarys that satisfies the condition , pick a random one or the first one

Don't know if this is the easiest or most elegant way to do it but you could just write a simple function that returns 2 values, the max_length you already calculated but also the dict that you can get via the .index method and the max_length of the object you were searching for.
im talking about something like this:
def get_max(list_of_dict):
plot = []
for dict_index, dictionary in enumerate(list_of_dict):
plot.append(len(dictionary))
return max(plot), list_of_dict[plot.index(max(plot))]
maximum_len, max_dict = get_max(test)
tested it, works for my case, although i have just made myself a testlist with just 5 dicts of different length.
EDIT:
changed variable "dict" to "dictionary" to prevent it shadowing from outer scope.

Find difference between list and set

I am trying to find differences between MongoDB records. After performing my queries, I end up with a set of unique results (by applying set()).
Now, I want to compare a new extraction with the set that I just defined to see if there are any new additions to the record.
What I have done now is the following:
unique_documents = set([str(i) for i in dict_of_uniques[my_key]])
all_documents = [str(i) for i in (dict_of_all_docs[my_key])]
Basically I am trying to compare the string version of a dict among the two variables.
I have several approaches, among which unique_documents.difference(all_documents), but it keeps out returning an empty set. I know for a fact that the all_documents variable contains two new entries in the record. I would like to know which ones are they.
Thank you,

If all_documents is the set with new elements that you want to get as the result, then you need to reverse the order of the arguments to the difference method.
unique_documents = set([str(i) for i in dict_of_uniques[my_key]])
all_documents = set([str(i) for i in (dict_of_all_docs[my_key])])
all_documents.difference(unique_documents)
See how the order matters:
>>> x = set([1,2,3])
>>> y = set([3,4,5])
>>> x.difference(y)
{1, 2}
>>> y.difference(x)
{4, 5}
difference gives you the elements of the first set that are not present in the second set.
If you want to see things that were either added or removed, you can symmetric_difference. This function is described as "symmetric" because it gives the same results regardless of argument order.
>>> x.symmetric_difference(y)
{1, 2, 4, 5}
>>> y.symmetric_difference(x)
{1, 2, 4, 5}

It is hard to tell without a description of the dictionary structure but your code seems to be comparing single keys only. If you want to compare the content of both dictionaries, you need to get all the values:
currentData = set( str(rec) for rec in dict_of_all_docs.values() )
changedKeys = [k for k,value in dict_of_fetched.items() if str(value) not in currentData]
This doesn't seem very efficient though but without more information on the data structure, it is hard to make a better suggestion. If your records can already matched by a dictionary key, you probably don't need to use a set at all. A simple loop should do.

Rather than unique_documents.difference(all_documents) use all_documents.difference(unique_documents)
More on Python Sets

python randomize preserving every 3 items (list of lists?)

I have a list that looks like this:
important stuff = [1, 1, '539287_640214358329_682457984_n.jpg',
1, 2, '10273290_745672065239_6510327149011099172_o.jpg',
1, 3,'196453_640214498049_2103349152_n.jpg',
1, 4, '1277816_699439470729_877539164_o.jpg',
1, 5, '10682279_777090163119_2043260231272895742_o.jpg',
1, 6,'736323_656687181659_1199237329_o.jpg',
1, 7, '185184_640214403239_1313590472_n.jpg',
1, 8, '1898786_730004488189_837817798_o.jpg']
I need a way to shuffle it keeping the "rows" (aka every 3 values) constant. The two numbers need to stay associated with the same .jpg.
Is the best way to do this to create a list of lists? how would that work? I found answers to creating a flat list from a list of lists, but I need to go in the opposite direction.

A list of lists is probably one of the easier ways of handling this. Assuming your initial list is properly formatted according to your description (two numbers to one string), you could do something like this:
from random import shuffle
listOfLists = []
for i in range(0,len(importantStuff)/3):
listOfLists.append([importantStuff[i*3+0],importantStuff[i*3+1],importantStuff[i*3+2]])
shuffle(listOfLists)
singleList = listOfLists[0]
singleItem = listOfLists[0][2]
For more generic cases use variables instead of hardcoded values

import random
# list of lists
data = [data[x:x+3] for x in range(0, len(data),3)]
# shuffle.
random.shuffle(data,random.random)
# OR keep them in dictionary with filename as key.
foo = {}
for i in data:
foo[i[2]] = i[:2]
print foo
It's really your choice.
Personally I would keep it as dictionary for fast look up and organization.

Sorting dictionary list-values based on time

I'm pretty new to python (couple weeks into it) and I'm having some trouble wrapping my head around data structures. What I've done so far is extract text line-by-line from a .txt file and store them into a dictionary with the key as animal, for example.
database = {
'dog': ['apple', 'dog', '2012-06-12-08-12-59'],
'cat': [
['orange', 'cat', '2012-06-11-18-33-12'],
['blue', 'cat', '2012-06-13-03-23-48']
],
'frog': ['kiwi', 'frog', '2012-06-12-17-12-44'],
'cow': [
['pear', 'ant', '2012-06-12-14-02-30'],
['plum', 'cow', '2012-06-12-23-27-14']
]
}
# year-month-day-hour-min-sec
That way, when I print my dictionary out, it prints out by animal types, and the newest dates first.
Whats the best way to go about sorting this data by time? I'm on python 2.7. What I'm thinking is
for each key:
grab the list (or list of lists) --> get the 3rd entry --> '-'.split it, --> then maybe try the sorted(parameters)
I'm just not really sure how to go about this...

Walk through the elements of your dictionary. For each value, run sorted on your list of lists, and tell the sorting algorithm to use the third field of the list as the "key" element. This key element is what is used to compare values to other elements in the list in order to ascertain sort order. To tell sorted which element of your lists to sort with, use operator.itemgetter to specify the third element.
Since your timestamps are rigidly structured and each character in the timestamp is more temporally significant than the next one, you can sort them naturally, like strings - you don't need to convert them to times.
# Dictionary stored in d
from operator import itemgetter
# Iterate over the elements of the dictionary; below, by
# calling items(), k gets the key value of an entry and
# v gets the value of that entry
for k,v in d.items():
if v and isinstance(v[0], list):
v.sort(key=itemgetter(2)) # Start with 0, so third element is 2

If your dates are all in the format year-month-day-hour-min-sec,2012-06-12-23-27-14,I think your step of split it is not necessary,just compare them as string.
>>> '2012-06-12-23-27-14' > '2012-06-12-14-02-30'
True

Firstly, you'll probably want each key,value item in the dict to be of a similar type. At the moment some of them (eg: database['dog'] ) are a list of strings (a line) and some (eg: database['cat']) are a list of lines. If you get them all into list of lines format (even if there's only one item in the list of lines) it will be much easier.
Then, one (old) way would be to make a comparison function for those lines. This will be easy since your dates are already in a format that's directly (string) comparable. To compare two lines, you want to compare the 3rd (2nd index) item in them:
def compare_line_by_date(x,y):
return cmp(x[2],y[2])
Finally you can get the lines for a particular key sorted by telling the sorted builtin to use your compare_line_by_date function:
sorted(database['cat'],compare_line_by_date)
The above is suitable (but slow, and will disappear in python 3) for arbitrarily complex comparison/sorting functions. There are other ways to do your particular sort, for example by using the key parameter of sorted:
def key_for_line(line):
return line[2]
sorted(database['cat'],key=key_for_line)
Using keys for sorting is much faster than cmp because the key function only needs to be run once per item in the list to be sorted, instead of every time items in the list are compared (which is usually much more often than the number of items in the list). The idea of a key is to basically boil each list item down into something that be compared naturally, like a string or a number. In the example above we boiled the line down into just the date, which is then compared.
Disclaimer: I haven't tested any of the code in this answer... but it should work!

What is the most appropriate python data structure to keep the result of pairwise computations over 2 dictionaries?

I have 2 dictionaries with long as the type of the values. I want to apply some computation at every possible combination of 2-values from the 2 dictionaries and maintain a data structure that will keep the result and the input values. i.e: key(a),value(a),key(b),value(b),f(value(a),value(b)) . What sort of data structures would you suggest for this operation

When your computation only depends on the values of the dictionary, you should reformulate your problem statement to take only an iterable of values, and not dictionaries.
You can use tuples as dictionary keys:
import itertools
Adict = {"x": 1, "y": 2, "z":3}
Bdict = {"foo": 4, "bar": 5, "baz":6}
A,B = Adict.values(),Bdict.values()
def comp(a, b):
return a * b # Insert complicated computation here
res = {(a,b):comp(a,b) for a,b in itertools.product(A, B)}

Either a list of dicts or a more complex iterator defining your own custom object to represent the data and ways of iterating over the data by the different keys you have.

I would suggest a dictionary with:
keys being tuples of tuples: ( (key(a), value(a)) , (key(b), value(b)) )
values being long (if it's the type returned by your function f(a,b))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Combinations of several dicts python - python

Related

How to find maximum element from a list and its index?

Find difference between list and set

python randomize preserving every 3 items (list of lists?)

Sorting dictionary list-values based on time

What is the most appropriate python data structure to keep the result of pairwise computations over 2 dictionaries?

Categories

Resources