When i am iterating over an itertools.permutations, I would like to know at what indexes specific combinations of numbers would show up, without slowly iterating over the whole thing.
For example:
When I have a list, foo, which equals list(itertools.permutations(range(10))), I would like to know at which indexes the first character will be a zero, and the seventeenth a three. A simple way to do this would be to check every combination and see whether it fits my requirement.
n = 10
foo = list(itertools.permutations(range(n)))
solutions = []
for i, permutation in foo:
if permutation[0] == 0 and permutation[16] == 3:
solutions.append(i)
However, as n gets larger, this becomes incredibly slow, and very memory inefficient.
Is there some pattern that I could use so that instead of creating a long list I could simply say that if (a*i+b)%c == 0 then I know that it will fit my pattern.
EDIT: in reality I will be having many conditions some of which also involve more than 2 positions, therefore I hope that by combining those conditions I can limit the amount of possibilities to the point where this becomes doable. Also, the 100 might have been a big bit, I am expecting n to not get larger than 20.
You need to do a mapping between permutations of not fixed elements and corresponding permutations with fixed cells enrolled. For example, if you count permutations over list [0, 1, 2, 3, 4] and require a value 1, for zero cell and a value 2 for third cell, permutation (0, 4, 3) will be mapped to (1, 0, 4, 2, 3). I know, tuples are not friendly for this case because they are immutable but lists has insert method which is pretty useful here. That's why I convert them to lists and then back to tuples.
import itertools
def item_padding(item, cells):
#returns padding of item, e.g. (0, 4, 3) -> (1, 0, 4, 2, 3)
listed_item = list(item)
for idx in sorted(cells):
listed_item.insert(idx, cells[idx])
return tuple(listed_item)
array = range(5)
cells = {0:1, 3:2} #indexes and their fixed values
remaining_items = set(list(array)) - set(list(cells.values()))
print(list(map(lambda x: item_padding(x, cells), itertools.permutations(remaining_items))))
Output:
[(1, 0, 3, 2, 4), (1, 0, 4, 2, 3), (1, 3, 0, 2, 4), (1, 3, 4, 2, 0), (1, 4, 0, 2, 3), (1, 4, 3, 2, 0)]
To sum up, list conversions are quite slow as well as iterations. Despite that, I think this algorithm is a conceptually good example that reveals what can be done here. Use numpy instead if you really need to optimise it.
Update:
It works 6 seconds on my laptop if array is range(12) (with 3628800 permutations). It's three times more than returning not padded tuples.
Related
Sorry for the long title, but I'm not sure how to shorten it. I'm trying to program an object that targets other objects. Each object is assigned an integer id starting from 0, and that's all that's really relavent here. I can access objects by id, so I jsut need to get the numbers. Each object should target the same amount of other objects, not target itself, and not target any other object more than once. I want to make each object randomly choose it's targets given these conditions, which is trivial. But this can make thing uneven, which I want to avoid. By uneven, I mean the amount of times an object is targeted by other objects is random. So what I want is for each object to not only have the same amount of targets, but be targeted the same amount of times, which would end up being the same amount. This leads to my problem. I need to ranomly generate groups of numbers from a given range. Each group will be the same size. Within each group, each number is unique. The order is irrelevent; these groups could just be sets. Overall, each number should appear the same mount of times. The amount of times each number is repeated and the size of the groups is the same, and given. Making sure each object's id isn't in the corresponding group can be done after the groups are generated, so that's not an issue.
Attempt 1: Guess and check
So my first thought was a simple guess-and-check method. Basically just randomly generating the groups of numbers and rejecting invalid numbers. Ignoring that this program sometimes ends up in an infinite loop (this was just a quick thing made for this), it gets the job done. But I feel like it's kinda inefficent, and there's still the whole infinite loop problem.
import random
from typing import Tuple, Set, Iterator
def randTargets(numObjs:int, numTargets:int) -> Iterator[Tuple[int, Set[int]]]:
# numObjs is the total number of objects
# numTargets is the number of other objects each object will target
# numTargets can be assumed to be in the range [2, numObjs)
numLeft = [numTargets] * numObjs
# numLeft represents how many times each object can be targeted
# i.e., if numLeft[i] is n, i can only be targeted n more time(s)
for i in range(numObjs):
targets = set()
# this can get caught in an infinite loop where the only
# target left is the object itself, but I'm too lazy to fix
# that right now. Just assume that doesn't happen lmao
while len(targets) < numTargets:
t = random.randrange(numObjs)
# checks that the target isn't the object itself, hasn't already
# been targeted too many times, and hasn't already been targeted
# by this item
if t != i and numLeft[t] > 0 and t not in targets:
targets.add(t)
numLeft[t] -= 1
yield i, targets
# check to make sure every object has been targeted exactly numTargets times
assert all(i == 0 for i in numLeft)
Attempt 2: Not really random
I tried to take a crack at this, but the best thing I could come up with wasn't exactly random.
def randTargetsSlightlyBetter(numObjs:int, numTargets:int) -> Iterator[Tuple[int, Set[int]]]:
# numObjs is the total number of objects
# numTargets is the number of other objects each object will target
# numTargets can be assumed to be in the range [2, numObjs)
objs = list(range(numObjs))
targets = []
for offset in random.sample(range(1, numObjs), k=numTargets):
# shifts the objs list to the right by offset, wrapping around
targets.append(objs[offset:] + objs[:offset])
for obj, *targets in zip(objs, *targets):
yield obj, targets
I feel like it might be kinda hard to tell what that does, so here:
# if numObjs = 4, objs looks like this:
[0, 1, 2, 3]
# let's assume instead of random offsets I just use (1, 2)
# targets would look like this:
[[1, 2, 3, 0],
[2, 3, 0, 1]]
# adding objs to the beginning of targets would get:
[[0, 1, 2, 3],
[1, 2, 3, 0],
[2, 3, 0, 1]]
# each column is a group, the first row being the object targeting the others
# transposing the list using zip(*targets), we get:
[(0, 1, 2),
(1, 2, 3),
(2, 3, 0),
(3, 0, 1)]
# which would be equivalent to zip(objs, *targets)
Trying to randomize this by shuffling the values of the inital list objs wouldn't do anything, because the individual objects are interchangable and the ids are arbitrary. So I thought to randomize how much the target lists are offset, which kind works. But this wouldn't be completely random, there would still be a pattern to things. Looking at the example with offsets of (1, 2), we can see that object 0 would target objects 1 and 2. Object 1 would target 2 and 3, and so on. While the pattern would be harder to see with randomized offsets, there still would be one, and that's what I'm trying to avoid.
Sorry if any of this was confusingly explained, I can have a wierd way of thinking about thing lmao. If anything needs clarifying, let me know.
TL;DR:
I have a range of integer numbers, and I want to randomly generate groups of numbers from this range in a way that no number appears more than once in any given group. The order of numbers in a group doesn't matter, and groups don't have to have every number, just a certain amount. Additionally, I want to do this so that overall, every number in the range appears multiple times between all the groups.
Example output:
>>>randGroups(range(4), repeats=2)
[{0, 2}, {1, 3}, {0, 3}, {1, 2}]
>>>randGroups(range(10), repeats=3)
[{1, 8, 9}, {1, 2, 8}, {5, 6, 7}, {2, 3, 6}, {4, 5, 9}, {5, 7, 9}, {0, 8, 9}, {0, 2, 8}, {1, 6, 7}, {0, 3, 4}]
Randomized via index as opposed to actual parameter, while it isn't the most random it might still count.
# import library
import random
# input parameters
range_ = 9
repeats = 4
# rotate array elements
def rotate(input, n):
return input[n:] + input[:n]
# shifts for shuffleing, as they are all different then no index can have same element
shifts = [i for i in range(0,range_)]
random.shuffle(shifts)
# arrays to partition for shuffling
to_shuffle = [[j for j in range(0,range_)] for i in range(0,repeats)]
# rotate arrays relative to shift
for i in range(0,repeats):
temp = rotate(to_shuffle[i],shifts[i])
to_shuffle[i] = temp
# exchange rows and columns so as to give proper output
output_arrays = [[to_shuffle[j][i] for j in range(0,repeats)] for i in range(0,range_)]
# random output
print(output_arrays)
Example output:
[[3, 8, 4, 5], [4, 0, 5, 6], [5, 1, 6, 7], [6, 2, 7, 8], [7, 3, 8, 0], [8, 4, 0, 1], [0, 5, 1, 2], [1, 6, 2, 3], [2, 7, 3, 4]]
I encountered the following example in a tutorial I watched recently.
We want to sort these numbers:
numbers = [8, 3, 1, 2, 5, 4, 7, 6]
prioritising the ones belonging to the following group:
group = {2, 3, 5, 7}
So the helper (sorting key) function implemented by the author was the following:
def helper(x):
if x in group:
return (0, x)
return (1, x)
and it sorts by calling
numbers.sort(key=helper)
I can't seem to get my head around this return (0,x) vs. return (1,x) which most likely is something easy to explain (but perhaps I am missing an element about how the sorting helper function works)
What that key function does is, instead of comparing
[8, 3, 1, 2, 5, 4, 7, 6]
it compares
[(0, 8), (1, 3), (0, 1), (1, 2), (1, 5), (0, 4), (1, 7), (0, 6)]
Tuples are sorted lexicographically, meaning that the first elements are compared first. If they differ, the comparison stop. If they're the same, the second elements are compared.
This has the effect of bringing all numbers in group to the front (in numerical order) followed by the rest of the numbers (also in numerical order).
Well, (0, x) is smaller than (1, x). In short, Python will first compare the first element, if they are the same, then the second, then the third...
Is it clear enough? I mean, in your example, all elements in that group will be considered as smaller than the elements which are not in that group.
When the following line executes numbers.sort(key=helper), an iterator iterates over each element of the list number.
While iterating, for each element, it makes a call to the helper method with the element.
If this element is a part of the group, it returns (0, element).
If it isn't a part of the group, it returns (1, element).
Now, while sorting, the elements to be sorted are [(0,x), (1,x), (0,x)...] not the actual elements.
It compares two tuples in the list and checks if the values are > or < or =.
While comparing two tuples, it first compares them based on the value at the 0th index in each element.
Then it compares them based on the 1st value in each element of the list and so on..
This results in the following output:
>>> numbers
[2, 3, 5, 7, 1, 4, 6, 8]
If there would have been characters at the first index in each element, they would have been sorted based on their ASCII values.
I am using scikit-learn to cluster some data, and I want to compare the results of different clustering techniques. I am immediately faced with the issue that the labels for the clusters are different for different runs, so even if they are clustered exactly the same the similarity of the lists is still very low.
Say I have
list1 = [1, 1, 0, 5, 5, 1, 8, 1]
list2 = [3, 3, 1, 2, 2, 3, 8, 3]
I would (ideally) like a function that returns the best mapping in the form of a translation dictionary like this:
findMapping(list1, list2)
>>> {0:1, 1:3, 5:2, 8:8}
And I said "best mapping" because let's say list3 = [3, 3, 1, 2, 2, 3, 8, 4] then findMapping(list1, list3) would still return the same mapping even though the final 1 turns into a 4 instead of a 3.
So the best mapping is the one that minimizes the number of differences between the two lists. I think's a good criterion, but there may be a better one.
I could write a trial-and-error optimization algorithm to do this, but I'm hardly the first person to want to compare the results of clustering algorithms. I expect something like this already exists and I just don't know what it's called. But I searched around and didn't find any answers.
The point is that after applying the best translation I will measure the difference between the lists, so maybe there is a way to measure the difference between lists of numbers indexed differently without finding the translation as an intermediate step, and that's good too.
===================================
Based on Pallie's answer I was able to create the findMapping function, and then I took it one step further to create a translation function that returns the second list converted to the labels of the first list.
def translateLabels(masterList, listToConvert):
contMatrix = contingency_matrix(masterList, listToConvert)
labelMatcher = munkres.Munkres()
labelTranlater = labelMatcher.compute(contMatrix.max() - contMatrix)
uniqueLabels1 = list(set(masterList))
uniqueLabels2 = list(set(listToConvert))
tranlatorDict = {}
for thisPair in labelTranlater:
tranlatorDict[uniqueLabels2[thisPair[1]]] = uniqueLabels1[thisPair[0]]
return [tranlatorDict[label] for label in listToConvert]
Even with this conversion (which I needed for consistent plotting of cluster colors), using the Rand index and/or normalized mutual information does seem like a good way to compare the differences that don't require a shared labeling.
I also like the idea of first sorting both lists according the values in the data, but that may not work when comparing clusters from very different data.
You could try calculating the adjusted Rand index between two results. This gives a score between -1 and 1, where 1 is a perfect match.
Or by taking argmax of confusion matrix:
list1 = ['a', 'a', 'b', 'c', 'c', 'a', 'd', 'a']
list2 = [3, 3, 1, 2, 2, 3, 8, 3]
np.argmax(contingency_matrix(list1, list2), axis=1)
array([2, 0, 1, 3])
2 means row 2 (the value 2, the cluster 3) best matches "a" column 0 (the index of 2). Row 0 then matches column 1, etc.
For the Hungarian method:
m = Munkres()
contmat = contingency_matrix(list1, list2)
m.compute(contmat.max() - contmat)
[(0, 2), (1, 0), (2, 1), (3, 3)]
using: https://github.com/bmc/munkres
I'm trying to generate a list of all possible 1-dimensional positions for an arbitrary number of identical objects. I want it formatted so each coordinate is the distance from the previous object, so for 3 objects (0,5,2) would mean one object is at position 0, another is at position 5 and another is at position 7.
So the main restraint is that the sum of the coordinates is <=D. Nested for loops works well for this. For example, with 3 objects with maximum coordinate D:
def positions(D):
output=[]
for i in range(D+1):
for j in range(D+1-i):
for k in range(D+1-i-j):
output.append((i,j,k))
return(output)
What's the best way to extend this to an arbitrary number of objects? I can't find a good way without explicitly writing a specific number of for loops.
I think you can combine itertools.combinations, which will give you the locations, with taking the difference, which should give you your "distance from the previous object" behaviour. For example, using
def diff(loc):
return [y-x for x,y in zip((0,) + loc, loc)]
we have
In [114]: list(itertools.combinations(range(4), 3))
Out[114]: [(0, 1, 2), (0, 1, 3), (0, 2, 3), (1, 2, 3)]
for the possible positions, and then
In [115]: [diff(x) for x in itertools.combinations(range(4), 3)]
Out[115]: [[0, 1, 1], [0, 1, 2], [0, 2, 1], [1, 1, 1]]
for your relative-distance version.
If not, then a canonical name for a function? 'Cycle' makes sense to me, but that's taken.
The example in the header is written for clarity and brevity. Real cases I'm working with have a lot of repetition.
(e.g., I want [1, 1, 0, 0, 0, 2, 1] to "match" [0, 0, 2, 1, 1, 1, 0])
This type of thing is obscuring my algorithm and filling my code with otherwise useless repetition.
You can get the cycles of the list with:
def cycles(a):
return [ a[i:] + a[:i] for i in range(len(a)) ]
You can then check if b is a cycle of a with:
b in cycles(a)
If the length of the list is long, or if want to make multiple comparison to the same cycles, it may be beneficial (performance wise) to embed the results in a set.
set_cycles = set(cycles(a))
b in set_cycles
You can prevent necessarily constructing all the cycles by embedding the equality check in the list and using any:
any( b == a[i:]+a[:i] for i in range(len(a)))
You could also achieve this effect by turning the cycles function into a generator.
Misunderstood your question earlier. If you want to check if any cycle of a list l1 matches a list l2 the best (cleanest/most pythonic) method is probably any(l1 == l2[i:] + l2[:i] for i in xrange(len(l2))). There is also a rotate method in collections.deque that you might find useful.
You could use cycle from itertools, together with islice to cut it up. This basically puts this answer in a list comprehension so the list is shifted once for every element.
>>> from itertools import islice, cycle
>>> l = [0,1,2]
>>> [tuple(islice(cycle(t),i,i+len(t))) for i,_ in enumerate(l)]
[(0, 1, 2), (1, 2, 0), (2, 0, 1)]