Iterative Permutations of Distribution

Iterative Permutations of Distribution - python

I'm trying to generate all possible combinations of a distribution of sorts.
For example, say you have 5 points to spend on 4 categories, but you can only spend a maximum of 2 points on any given category.
In this instance, all possible solutions would be as follows:
[0, 1, 2, 2]
[0, 2, 1, 2]
[0, 2, 2, 1]
[1, 0, 2, 2]
[1, 1, 1, 2]
[1, 1, 2, 1]
[1, 2, 0, 2]
[1, 2, 1, 1]
[1, 2, 2, 0]
[2, 0, 1, 2]
[2, 0, 2, 1]
[2, 1, 0, 2]
[2, 1, 1, 1]
[2, 1, 2, 0]
[2, 2, 0, 1]
[2, 2, 1, 0]
I have successfully been able to make a recursive function that accomplishes this, but for larger numbers of categories it takes extremely long to generate. I have attempted making an iterative function instead in hopes of speeding it up, but I can't seem to get it to account for the category maximums.
Here is my recursive function (count = points, dist = zero-filled array w/ same size as max_allo)
def distribute_recursive(count, max_allo, dist, depth=0):
for ration in range(max(count - sum(max_allo[depth + 1:]), 0), min(count, max_allo[depth]) + 1):
dist[depth] = ration
count -= ration
if depth + 1 < len(dist):
distribute_recursive(count, max_allo, dist, depth + 1)
else:
print(dist)
count += ration

recursion isn't slow
Recursion isn't what's making it slow; consider a better algorithm
def dist (count, limit, points, acc = []):
if count is 0:
if sum (acc) is points:
yield acc
else:
for x in range (limit + 1):
yield from dist (count - 1, limit, points, acc + [x])
You can collect the generated results in a list
print (list (dist (count = 4, limit = 2, points = 5)))
pruning invalid combinations
Above, we use a fixed range of limit + 1, but watch what happens if we're generating a combination with a (eg) limit = 2 and points = 5 ...
[ 2, ... ] # 3 points remaining
[ 2, 2, ... ] # 1 point remaining
At this point, using a fixed range of limit + 1 ([ 0, 1, 2 ]) is silly because we know we only have 1 point remaining to spend. The only remaining options here are 0 or 1...
[ 2, 2, 1 ... ] # 0 points remaining
Above we know we can use an empty range of [ 0 ] because there's no points left to spend. This will prevent us from attempting to validate combinations like
[ 2, 2, 2, ... ] # -1 points remaining
[ 2, 2, 2, 0, ... ] # -1 points remaining
[ 2, 2, 2, 1, ... ] # -2 points remaining
[ 2, 2, 2, 2, ... ] # -3 points remaining
If count was significantly large, this could rule out a huge amount of invalid combinations
[ 2, 2, 2, 2, 2, 2, 2, 2, 2, ... ] # -15 points remaining
To implement this optimization, we could add yet another parameter to our dist function, but at 5 parameters, it would start to look messy. Instead we introduce an auxiliary function to control the loop. Adding our optimization, we trade the fixed range for a dynamic range of min (limit, remaining) + 1. And finally, since we know how many points have been allocated, we no longer need to test the sum of each combination; yet another expensive operation removed from our algorithm
# revision: prune invalid combinations
def dist (count, limit, points):
def loop (count, remaining, acc):
if count is 0:
if remaining is 0:
yield acc
else:
for x in range (min (limit, remaining) + 1):
yield from loop (count - 1, remaining - x, acc + [x])
yield from loop (count, points, [])
benchmarks
In the benchmarks below, the first version of our program is renamed to dist1 and the faster program using a dynamic range dist2. We setup three tests, small, medium, and large
def small (prg):
return list (prg (count = 4, limit = 2, points = 5))
def medium (prg):
return list (prg (count = 8, limit = 3, points = 7))
def large (prg):
return list (prg (count = 16, limit = 5, points = 10))
And now we run the tests, passing each program as an argument. Note for the large test, only 1 pass is done as dist1 takes awhile to generate the result
print (timeit ('small (dist1)', number = 10000, globals = globals ()))
print (timeit ('small (dist2)', number = 10000, globals = globals ()))
print (timeit ('medium (dist1)', number = 100, globals = globals ()))
print (timeit ('medium (dist2)', number = 100, globals = globals ()))
print (timeit ('large (dist1)', number = 1, globals = globals ()))
print (timeit ('large (dist2)', number = 1, globals = globals ()))
The results for the small test show that pruning invalid combinations doesn't make much of a difference. However in the medium and large cases, the difference is dramatic. Our old program takes over 30 minutes for the large set, but just over 1 second using the new program!
dist1 small 0.8512216459494084
dist2 small 0.8610155049245805 (0.98x speed-up)
dist1 medium 6.142372329952195
dist2 medium 0.9355670949444175 (6.57x speed-up)
dist1 large 1933.0877765258774
dist2 large 1.4107366011012346 (1370.26x speed-up)
For frame of reference, the size of each result is printed below
print (len (small (dist2))) # 16 (this is the example in your question)
print (len (medium (dist2))) # 2472
print (len (large (dist2))) # 336336
checking our understanding
In the large benchmark with count = 12 and limit = 5, using our unoptimized program we were iterating through 512, or 244,140,625 possible combinations. Using our optimized program, we skip all invalid combinations resulting in 336,336 valid answers. By analyzing combination count alone, we see a staggering 99.86% of possible combinations are invalid. If analysis of each combination costs an equal amount of time, we can expect our optimized program to perform at a minimum of 725.88x better, due to invalid combination pruning.
In the large benchmark, measured at 1370.26x faster, the optimized program meets our expectations and even goes beyond. The additional speed-up is likely owed to the fact we eliminated the call to sum
huuuuge
To show this technique works for extremely large data sets, consider the huge benchmark. Our program finds 17,321,844 valid combinations amongst 716, or 33,232,930,569,601 possibilities.
In this test, our optimized program prunes 99.99479% of the invalid combinations. Correlating these numbers to the previous data set, we estimate the optimized program runs 1,918,556.16x faster than the unoptimized version.
The theoretical running time of this benchmark using the unoptimized program is 117.60 years. The optimized program finds the answer in just over 1 minute.
def huge (prg):
return list (prg (count = 16, limit = 7, points = 12))
print (timeit ('huge (dist2)', number = 1, globals = globals ()))
# 68.06868170504458
print (len (huge (dist2)))
# 17321844

You can use a generator function for the recursion, while applying additional logic to cut down on the number of recursive calls needed:
def listings(_cat, points, _max, current = []):
if len(current) == _cat:
yield current
else:
for i in range(_max+1):
if sum(current+[i]) <= points:
if sum(current+[i]) == points or len(current+[i]) < _cat:
yield from listings(_cat, points, _max, current+[i])
print(list(listings(4, 5, 2)))
Output:
[[0, 1, 2, 2], [0, 2, 1, 2], [0, 2, 2, 1], [1, 0, 2, 2], [1, 1, 1, 2], [1, 1, 2, 1], [1, 2, 0, 2], [1, 2, 1, 1], [1, 2, 2, 0], [2, 0, 1, 2], [2, 0, 2, 1], [2, 1, 0, 2], [2, 1, 1, 1], [2, 1, 2, 0], [2, 2, 0, 1], [2, 2, 1, 0]]
While it is unclear at around what category size your solution drastically slows down, this solution will run under one second for category sizes up to 24, searching for a total of five points with a maximum slot value of two. Note that for large point and slot values, the number of possible category sizes computed under a second increases:
import time
def timeit(f):
def wrapper(*args):
c = time.time()
_ = f(*args)
return time.time() - c
return wrapper
#timeit
def wrap_calls(category_size:int) -> float:
_ = list(listings(category_size, 5, 2))
benchmark = 0
category_size = 1
while benchmark < 1:
benchmark = wrap_calls(category_size)
category_size += 1
print(category_size)
Output:
24

Related

Google foo.bar failing all test cases but working in python IDE

So I'm doing the foo.bar challenge, and I've got code in python that outputs the required answers. I know for a fact that for at least the first two test cases my output matches their output but it still fails all of them. I assumed it could be because its running in python 2.7.13 so I found an online sandbox that runs that version of python but my code still outputs the required output there too. I've tried using the print function to output the results, I've tried formatting the results as lists and arrays but none of this worked. The question is below:
Doomsday Fuel
Making fuel for the LAMBCHOP's reactor core is a tricky process
because of the exotic matter involved. It starts as raw ore, then
during processing, begins randomly changing between forms, eventually
reaching a stable form. There may be multiple stable forms that a
sample could ultimately reach, not all of which are useful as fuel.
Commander Lambda has tasked you to help the scientists increase fuel
creation efficiency by predicting the end state of a given ore sample.
You have carefully studied the different structures that the ore can
take and which transitions it undergoes. It appears that, while
random, the probability of each structure transforming is fixed. That
is, each time the ore is in 1 state, it has the same probabilities of
entering the next state (which might be the same state). You have
recorded the observed transitions in a matrix. The others in the lab
have hypothesized more exotic forms that the ore can become, but you
haven't seen all of them.
Write a function solution(m) that takes an array of array of
nonnegative ints representing how many times that state has gone to
the next state and return an array of ints for each terminal state
giving the exact probabilities of each terminal state, represented as
the numerator for each state, then the denominator for all of them at
the end and in simplest form. The matrix is at most 10 by 10. It is
guaranteed that no matter which state the ore is in, there is a path
from that state to a terminal state. That is, the processing will
always eventually end in a stable state. The ore starts in state 0.
The denominator will fit within a signed 32-bit integer during the
calculation, as long as the fraction is simplified regularly.
For example, consider the matrix m: [ [0,1,0,0,0,1], # s0, the
initial state, goes to s1 and s5 with equal probability
[4,0,0,3,2,0], # s1 can become s0, s3, or s4, but with different
probabilities [0,0,0,0,0,0], # s2 is terminal, and unreachable
(never observed in practice) [0,0,0,0,0,0], # s3 is terminal
[0,0,0,0,0,0], # s4 is terminal [0,0,0,0,0,0], # s5 is terminal ]
So, we can consider different paths to terminal states, such as: s0 ->
s1 -> s3 s0 -> s1 -> s0 -> s1 -> s0 -> s1 -> s4 s0 -> s1 -> s0 -> s5
Tracing the probabilities of each, we find that s2 has probability 0
s3 has probability 3/14 s4 has probability 1/7 s5 has probability 9/14
So, putting that together, and making a common denominator, gives an
answer in the form of [s2.numerator, s3.numerator, s4.numerator,
s5.numerator, denominator] which is [0, 3, 2, 9, 14].
Languages
To provide a Java solution, edit Solution.java To provide a Python
solution, edit solution.py
Test cases
========== Your code should pass the following test cases. Note that it may also be run against hidden test cases not shown here.
-- Java cases -- Input: Solution.solution({{0, 2, 1, 0, 0}, {0, 0, 0, 3, 4}, {0, 0, 0, 0, 0}, {0, 0, 0, 0,0}, {0, 0, 0, 0, 0}}) Output:
[7, 6, 8, 21]
Input: Solution.solution({{0, 1, 0, 0, 0, 1}, {4, 0, 0, 3, 2, 0}, {0,
0, 0, 0, 0, 0}, {0, 0, 0, 0, 0, 0}, {0, 0, 0, 0, 0, 0}, {0, 0, 0, 0,
0, 0}}) Output:
[0, 3, 2, 9, 14]
-- Python cases -- Input: solution.solution([[0, 2, 1, 0, 0], [0, 0, 0, 3, 4], [0, 0, 0, 0, 0], [0, 0, 0, 0,0], [0, 0, 0, 0, 0]]) Output:
[7, 6, 8, 21]
Input: solution.solution([[0, 1, 0, 0, 0, 1], [4, 0, 0, 3, 2, 0], [0,
0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0,
0, 0]]) Output:
[0, 3, 2, 9, 14]
my code is below:
import numpy as np
from fractions import Fraction
from math import gcd
def solution(M):
height = (len(M))
length = (len(M[0]))
M = np.array(M)
AB = []
#Find B
for i in range(0, height):
#if B = 1
if (sum(M[:,0])) == 0:
sumB = 1
if(M[i,0]) != 0:
B1 = Fraction((M[i,0]), (sum(M[i])))
B2 = Fraction((M[0,i]), (sum(M[0])))
B = B1 * B2
#Find sum(B) to infinity
sumB = (1/(1-B))
#Find A
boolean2 = 0
count = 0
index = []
for i in range (0, height):
if sum(M[i]) == 0:
if boolean2 == 0:
terminalstart = i
boolean = 0
boolean2 = 1
for j in range(0, height):
#if there is no A
if j==height-1 and boolean == 0:
index.append(i-terminalstart)
count +=1
if (M[j,i]) != 0:
boolean = 1
A1 = Fraction((M[j,i]), (sum(M[j])))
A = A1
if j!=0:
A2 = Fraction((M[0,j]), (sum(M[0])))
A = A1 * A2
#Find AB
AB.append(A*sumB)
#Find common denominators
x = []
y = []
for i in range (0,len(AB)):
x.append(AB[i].denominator)
lcm = 1
#change numerators to fit
for i in x:
lcm = lcm*i//gcd(lcm, i)
for i in range (0, len(AB)):
z = (lcm) / x[i]
#
z = float(z)
#
y.append(int((AB[i].numerator)*z))
#insert 0s
for i in range (0, count):
y.insert(index[i], 0)
#insert denominator
y.append(lcm)
return y
So the code and the questions are basically irrelevant, the main point is, my output (y) is exactly the same as the output in the examples, but when it runs in foo.bar it fails. To test it I used a code that simply returned the desired output in foo.bar and it worked for the test case that had this output:
def solution(M):
y = [0, 3, 2, 9, 14]
return y
So I know that since my code gets to the exact same array and data type for y in the python IDE it should work in google foo.bar, but for some reason its not. Any help would be greatly appreciated
edit:
I found a code online that works:
import numpy as np
# Returns indexes of active & terminal states
def detect_states(matrix):
active, terminal = [], []
for rowN, row in enumerate(matrix):
(active if sum(row) else terminal).append(rowN)
return(active,terminal)
# Convert elements of array in simplest form
def simplest_form(B):
B = B.round().astype(int).A1 # np.matrix --> np.array
gcd = np.gcd.reduce(B)
B = np.append(B, B.sum()) # append the common denom
return (B / gcd).astype(int)
# Finds solution by calculating Absorbing probabilities
def solution(m):
active, terminal = detect_states(m)
if 0 in terminal: # special case when s0 is terminal
return [1] + [0]*len(terminal[1:]) + [1]
m = np.matrix(m, dtype=float)[active, :] # list --> np.matrix (active states only)
comm_denom = np.prod(m.sum(1)) # product of sum of all active rows (used later)
P = m / m.sum(1) # divide by sum of row to convert to probability matrix
Q, R = P[:, active], P[:, terminal] # separate Q & R
I = np.identity(len(Q))
N = (I - Q) ** (-1) # calc fundamental matrix
B = N[0] * R * comm_denom / np.linalg.det(N) # get absorbing probs & get them close to some integer
return simplest_form(B)
When I compared the final answer from this working code to mine by adding the lines:
print(simplest_form(B))
print(type(simplest_form(B))
this is what I got
[ 0 3 2 9 14]
<class 'numpy.ndarray'>
array([ 0, 3, 2, 9, 14])
When I added the lines
y = np.asarray(y)
print(y)
print(type(y))
to my code this is what I got:
[ 0 3 2 9 14]
<class 'numpy.ndarray'>
array([ 0, 3, 2, 9, 14])
when they were both running the same test input. These are the exact same but for some reason mine doesn't work on foo.bar but his does. Am I missing something?

It turns out the
math.gcd(x, y)
function is not allowed in python 2. I just rewrote it as this:
def grcd(x, y):
if x >= y:
big = x
small = y
else:
big = y
small = x
bool1 = 1
for i in range(1, big+1):
while bool1 == 1:
if big % small == 0:
greatest = small
bool1 = 0
small-= 1
return greatest

how to make sure that two numbers next to each other in a list are different

I have a simple code that generates a list of random numbers.
x = [random.randrange(0,11) for i in range(10)]
The problem I'm having is that, since it's random, it sometimes produces duplicate numbers right next to each other. How do I change the code so that it never happens? I'm looking for something like this:
[1, 7, 2, 8, 7, 2, 8, 2, 6, 5]
So that every time I run the code, all the numbers that are next to each other are different.

x = []
while len(x) < 10:
r = random.randrange(0,11)
if not x or x[-1] != r:
x.append(r)
x[-1] contains the last inserted element, which we check not to be the same as the new random number. With not x we check that the array is not empty, as it would generate a IndexError during the first iteration of the loop

Here's an approach that doesn't rely on retrying:
>>> import random
>>> x = [random.choice(range(12))]
>>> for _ in range(9):
... x.append(random.choice([*range(x[-1]), *range(x[-1]+1, 12)]))
...
>>> x
[6, 2, 5, 8, 1, 8, 0, 4, 6, 0]
The idea is to choose each new number by picking from a list that excludes the previously picked number.
Note that having to re-generate a new list to pick from each time keeps this from actually being an efficiency improvement. If you were generating a very long list from a relatively short range, though, it might be worthwhile to generate different pools of numbers up front so that you could then select from the appropriate one in constant time:
>>> pool = [[*range(i), *range(i+1, 3)] for i in range(3)]
>>> x = [random.choice(random.choice(pool))]
>>> for _ in range(10000):
... x.append(random.choice(pool[x[-1]]))
...
>>> x
[0, 2, 0, 2, 0, 2, 1, 0, 1, 2, 0, 1, 2, 1, 0, ...]

O(n) solution by adding to the last element randomly from [1,stop) modulo stop
import random
x = [random.randrange(0,11)]
x.extend((x[-1]+random.randrange(1,11)) % 11 for i in range(9))
x
Output
[0, 10, 4, 5, 10, 1, 4, 8, 0, 9]

from random import randrange
from itertools import islice, groupby
# Make an infinite amount of randrange's results available
pool = iter(lambda: randrange(0, 11), None)
# Use groupby to squash consecutive values into one and islice to at most 10 in total
result = [v for v, _ in islice(groupby(pool), 10)]

Function solution that doesn't iterate to check for repeats, just checks each add against the last number in the list:
import random
def get_random_list_without_neighbors(lower_limit, upper_limit, length):
res = []
# add the first number
res.append(random.randrange(lower_limit, upper_limit))
while len(res) < length:
x = random.randrange(lower_limit, upper_limit)
# check that the new number x doesn't match the last number in the list
if x != res[-1]:
res.append(x)
return res
>>> print(get_random_list_without_neighbors(0, 11, 10)
[10, 1, 2, 3, 1, 8, 6, 5, 6, 2]

def random_sequence_without_same_neighbours(n, min, max):
x = [random.randrange(min, max + 1)]
uniq_value_count = max - min + 1
next_choises_count = uniq_value_count - 1
for i in range(n - 1):
circular_shift = random.randrange(0, next_choises_count)
x.append(min + (x[-1] + circular_shift + 1) % uniq_value_count)
return x
random_sequence_without_same_neighbours(n=10, min=0, max=10)

It's not to much pythonic but you can do something like this
import random
def random_numbers_generator(n):
"Generate a list of random numbers but without two duplicate numbers in a row "
result = []
for _ in range(n):
number = random.randint(1, n)
if result and number == result[-1]:
continue
result.append(number)
return result
print(random_numbers_generator(10))
Result:
3, 6, 2, 4, 2, 6, 2, 1, 4, 7]

Function to make a list as unsorted as possible

I am looking for a function to make the list as unsorted as possible. Preferably in Python.
Backstory:
I want to check URLs statuses and see if URLs give a 404 or not. I just use asyncio and requests modules. Nothing fancy.
Now I don't want to overload servers, so I want to minimize checking URLs which are on the same domain at the same time. I have this idea to sort the URLs in a way that items which are close to one another (having the same sort key = domain name) are placed as far apart from each other in the list as possible.
An example with numbers:
a=[1,1,2,3,3] # <== sorted list, sortness score = 2
0,1,2,3,4 # <== positions
could be unsorted as:
b=[1,3,2,1,3] # <== unsorted list, sortness score = 6
0,1,2,3,4 # <== positions
I would say that we can compute a sortness score by summing up the distances between equal items (which have the same key = domain name). Higher sortness means better unsorted. Maybe there is a better way for testing unsortness.
The sortness score for list a is 2. The sum of distances for 1 is (1-0)=1, for 2 is 0, for 3 is (4-3)=1.
The sortness score for list b is 6. The sum of distances for 1 is (3-0)=3, for 2 is 0, for 3 is (4-1)=3.
URLs list would look something like a list of (domain, URL) tuples:
[
('example.com', 'http://example.com/404'),
('test.com', 'http://test.com/404'),
('test.com', 'http://test.com/405'),
('example.com', 'http://example.com/405'),
...
]
I am working on a prototype which works Ok-ish, but not optimal as I can find some variants which are better unsorted by hand.
Anyone wants to give it a go?
This is my code, but it's not great :):
from collections import Counter
from collections import defaultdict
import math
def test_unsortness(lst:list) -> float:
pos = defaultdict(list)
score = 0
# Store positions for each key
# input = [1,3,2,3,1] => {1: [0, 4], 3: [1, 3], 2: [2]}
for c,l in enumerate(lst):
pos[l].append(c)
for k,poslst in pos.items():
for i in range(len(poslst)-1):
score += math.sqrt(poslst[i+1] - poslst[i])
return score
def unsort(lst:list) -> list:
free_positions = list(range(0,len(lst)))
output_list = [None] * len(free_positions)
for val, count in Counter(lst).most_common():
pos = 0
step = len(free_positions) / count
for i in range(count):
output_list[free_positions[int(pos)]] = val
free_positions[int(pos)] = None # Remove position later
pos = pos + step
free_positions = [p for p in free_positions if p]
return output_list
lsts = list()
lsts.append( [1,1,2,3,3] )
lsts.append( [1,3,2,3,1] ) # This has the worst score after unsort()
lsts.append( [1,2,3,0,1,2,3] ) # This has the worst score after unsort()
lsts.append( [3,2,1,0,1,2,3] ) # This has the worst score after unsort()
lsts.append( [3,2,1,3,1,2,3] ) # This has the worst score after unsort()
lsts.append( [1,2,3,4,5] )
for lst in lsts:
ulst = unsort(lst)
print( ( lst, '%.2f'%test_unsortness(lst), '====>', ulst, '%.2f'%test_unsortness(ulst), ) )
# Original score Unsorted score
# ------- ----- -------- -----
# ([1, 1, 2, 3, 3], '2.00', '====>', [1, 3, 1, 3, 2], '2.83')
# ([1, 3, 2, 3, 1], '3.41', '====>', [1, 3, 1, 3, 2], '2.83')
# ([1, 2, 3, 0, 1, 2, 3], '6.00', '====>', [1, 2, 3, 1, 2, 3, 0], '5.20')
# ([3, 2, 1, 0, 1, 2, 3], '5.86', '====>', [3, 2, 1, 3, 2, 1, 0], '5.20')
# ([3, 2, 1, 3, 1, 2, 3], '6.88', '====>', [3, 2, 3, 1, 3, 2, 1], '6.56')
# ([1, 2, 3, 4, 5], '0.00', '====>', [1, 2, 3, 4, 5], '0.00')
PS. I am not looking just for a randomize function and I know there are crawlers which can manage domain loads, but this is for the sake of exercise.

Instead of unsorting your list of URLs, why not grouping them by domain, each in a queue, then process them asynchronously with a delay (randomised?) in between?
It looks to me less complex than what you're trying to do to achieve the same thing and if you have a lot of domain, you can always throttle the number to run through concurrently at that point.

I used Google OR Tools to solve this problem. I framed it as a constraint optimization problem and modeled it that way.
from collections import defaultdict
from itertools import chain, combinations
from ortools.sat.python import cp_model
model = cp_model.CpModel()
data = [
('example.com', 'http://example.com/404'),
('test.com', 'http://test.com/404'),
('test.com', 'http://test.com/405'),
('example.com', 'http://example.com/405'),
('google.com', 'http://google.com/404'),
('example.com', 'http://example.com/406'),
('stackoverflow.com', 'http://stackoverflow.com/404'),
('test.com', 'http://test.com/406'),
('example.com', 'http://example.com/407')
]
tmp = defaultdict(list)
for (domain, url) in sorted(data):
var = model.NewIntVar(0, len(data) - 1, url)
tmp[domain].append(var) # store URLs as model variables where the key is the domain
vals = list(chain.from_iterable(tmp.values())) # create a single list of all variables
model.AddAllDifferent(vals) # all variables must occupy a unique spot in the output
constraint = []
for urls in tmp.values():
if len(urls) == 1: # a single domain does not need a specific constraint
constraint.append(urls[0])
continue
combos = combinations(urls, 2)
for (x, y) in combos: # create combinations between each URL of a specific domain
constraint.append((x - y))
model.Maximize(sum(constraint)) # maximize the distance between similar URLs from our constraint list
solver = cp_model.CpSolver()
status = solver.Solve(model)
output = [None for _ in range(len(data))]
if status == cp_model.OPTIMAL or status == cp_model.FEASIBLE:
for val in vals:
idx = solver.Value(val)
output[idx] = val.Name()
print(output)
['http://example.com/407',
'http://test.com/406',
'http://example.com/406',
'http://test.com/405',
'http://example.com/405',
'http://stackoverflow.com/404',
'http://google.com/404',
'http://test.com/404',
'http://example.com/404']

There is no obvious definition of unsortedness that would work best for you, but here's something that at least works well:
Sort the list
If the length of the list is not a power of two, then spread the items out evenly in a list with the next power of two size
Find a new index for each item by reversing the bits in its old index.
Remove the gaps to bring the list back to its original size.
In sorted order, the indexes of items that are close together usually differ only in the smallest bits. By reversing the bit order, you make the new indexes for items that are close together differ in the largest bits, so they will end up far apart.
def bitreverse(x, bits):
# reverse the lower 32 bits
x = ((x & 0x55555555) << 1) | ((x & 0xAAAAAAAA) >> 1)
x = ((x & 0x33333333) << 2) | ((x & 0xCCCCCCCC) >> 2)
x = ((x & 0x0F0F0F0F) << 4) | ((x & 0xF0F0F0F0) >> 4)
x = ((x & 0x00FF00FF) << 8) | ((x & 0xFF00FF00) >> 8)
x = ((x & 0x0000FFFF) << 16) | ((x & 0xFFFF0000) >> 16)
# take only the appropriate length
return (x>>(32-bits)) & ((1<<bits)-1)
def antisort(inlist):
if len(inlist) < 3:
return inlist
inlist = sorted(inlist)
#get the next power of 2 list length
p2len = 2
bits = 1
while p2len < len(inlist):
p2len *= 2
bits += 1
templist = [None] * p2len
for i in range(len(inlist)):
newi = i * p2len // len(inlist)
newi = bitreverse(newi, bits)
templist[newi] = inlist[i]
return [item for item in templist if item != None]
print(antisort(["a","b","c","d","e","f","g",
"h","i","j","k","l","m","n","o","p","q","r",
"s","t","u","v","w","x","y","z"]))
Output:
['a', 'n', 'h', 'u', 'e', 'r', 'k', 'x', 'c', 'p', 'f', 's',
'm', 'z', 'b', 'o', 'i', 'v', 'l', 'y', 'd', 'q', 'j', 'w', 'g', 't']

You could implement an inverted binary search.
from typing import Union, List
sorted_int_list = [1, 1, 2, 3, 3]
unsorted_int_list = [1, 3, 2, 1, 3]
sorted_str_list = [
"example.com",
"example.com",
"test.com",
"stackoverflow.com",
"stackoverflow.com",
]
unsorted_str_list = [
"example.com",
"stackoverflow.com",
"test.com",
"example.com",
"stackoverflow.com",
]
def inverted_binary_search(
input_list: List[Union[str, int]],
search_elem: Union[int, str],
list_selector_start: int,
list_selector_end: int,
) -> int:
if list_selector_end - list_selector_start <= 1:
if search_elem < input_list[list_selector_start]:
return list_selector_start - 1
else:
return list_selector_start
list_selector_mid = (list_selector_start + list_selector_end) // 2
if input_list[list_selector_mid] > search_elem:
return inverted_binary_search(
input_list=input_list,
search_elem=search_elem,
list_selector_start=list_selector_mid,
list_selector_end=list_selector_end,
)
elif input_list[list_selector_mid] < search_elem:
return inverted_binary_search(
input_list=input_list,
search_elem=search_elem,
list_selector_start=list_selector_start,
list_selector_end=list_selector_mid,
)
else:
return list_selector_mid
def inverted_binary_insertion_sort(your_list: List[Union[str, int]]):
for idx in range(1, len(your_list)):
selected_elem = your_list[idx]
inverted_binary_search_position = (
inverted_binary_search(
input_list=your_list,
search_elem=selected_elem,
list_selector_start=0,
list_selector_end=idx,
)
+ 1
)
for idk in range(idx, inverted_binary_search_position, -1):
your_list[idk] = your_list[idk - 1]
your_list[inverted_binary_search_position] = selected_elem
return your_list
Output
inverted_sorted_int_list = inverted_binary_insertion_sort(sorted_int_list)
print(inverted_sorted_int_list)
>> [1, 3, 3, 2, 1]
inverted_sorted_str_list = inverted_binary_insertion_sort(sorted_str_list)
print(inverted_sorted_str_list)
>> ['example.com', 'stackoverflow.com', 'stackoverflow.com', 'test.com', 'example.com']
Update:
Given the comments, you could also run the function twice. This will untangle duplicates.
inverted_sorted_int_list = inverted_binary_insertion_sort(
inverted_binary_insertion_sort(sorted_int_list)
)
>> [1, 3, 2, 1, 3]

Here's a stab at it, but I am not sure it wouldn't degenerate a bit given particular input sets.
We pick the most frequent found item and append its first occurrence to a list. Then same with the 2nd most frequent and so on.
Repeat half the size of the most found item. That's the left half of the list.
Then moving from least frequent to most frequent, pick first item and add its values. When an item is found less than half the max, pick on which side you want to put it.
Essentially, we layer key by key and end up with more frequent items at left-most and right-most positions in the unsorted list, leaving less frequent ones in the middle.
def unsort(lst:list) -> list:
"""
build a dictionary by frequency first
then loop thru the keys and append
key by key with the other keys in between
"""
result = []
#dictionary by keys (this would be domains to urls)
di = defaultdict(list)
for v in lst:
di[v].append(v)
#sort by decreasing dupes length
li_len = [(len(val),key) for key, val in di.items()]
li_len.sort(reverse=True)
#most found count
max_c = li_len[0][0]
#halfway point
odd = max_c % 2
num = max_c // 2
if odd:
num += 1
#frequency, high to low
by_freq = [tu[1] for tu in li_len]
di_side = {}
#for the first half, pick from frequent to infrequent
#alternating by keys
for c in range(num):
#frequent to less
for key in by_freq:
entries = di[key]
#first pass: less than half the number of values
#we don't want to put all the infrequents first
#and have a more packed second side
if not c:
#pick on way in or out?
if len(entries) <= num:
#might be better to alternate L,R,L
di_side[key] = random.choice(["L","R"])
else:
#pick on both
di_side[key] = "B"
#put in the left half
do_it = di_side[key] in ("L","B")
if do_it and entries:
result.append(entries.pop(0))
#once you have mid point pick from infrequent to frequent
for c in range(num):
#frequent to less
for key in reversed(by_freq):
entries = di[key]
#put in the right half
do_it = di_side[key] in ("R","B")
if entries:
result.append(entries.pop(0))
return result
Running this I got:
([1, 1, 2, 3, 3], '2.00', '====>', [3, 1, 2, 1, 3], '3.41')
([1, 3, 2, 3, 1], '3.41', '====>', [3, 1, 2, 1, 3], '3.41')
([1, 2, 3, 0, 1, 2, 3], '6.00', '====>', [3, 2, 1, 0, 1, 2, 3], '5.86')
([3, 2, 1, 0, 1, 2, 3], '5.86', '====>', [3, 2, 1, 0, 1, 2, 3], '5.86')
([3, 2, 1, 3, 1, 2, 3], '6.88', '====>', [3, 2, 3, 2, 1, 3, 1], '5.97')
([1, 2, 3, 4, 5], '0.00', '====>', [5, 1, 2, 3, 4], '0.00')
Oh, and I also added an assert to check nothing had been dropped or altered by the unsorting:
assert(sorted(lst) == sorted(ulst))
alternate approach?
I'll leave it as a footnote for now, but the general idea of not clustering (not the OP's specific application of not overloading domains) seems like it would be a candidate for a force-repulsive approach, where identical domains would try to keep as far from each other as possible. i.e. 1, 1, 2 => 1, 2, 1 because the 1s would repel each other. That's a wholly different algorithmic approach however.

When I faced a similar problem, here's how I solved it:
Define the "distance" between two strings (URLs in this case) as their Levenshtein distance (code to compute this value is readily available)
Adopt your favorite travelling-salesman algorithm to find the (approximate) shortest path through your set of strings (finding the exact shortest path isn't computationally feasible but the approximate algorithms are fairly efficient)
Now modify your "distance" metric to be inverted -- i.e. compute the distance between two strings (s1,s2) as MAX_INT - LevenshteinDistance(s1,s2)
With this modification, the "shortest path" through your set is now really the longest path, i.e. the most un-sorted one.

An easy way to scramble a list is to maximize its "sortness" score using a genetic algorithm with a permutation chromosome. I was able to hack quickly a version in R using the GA package. I'm not a Python user, but I am sure there are GA libraries for Python that include permutation chromosomes. If not, a general GA library with real-valued vector chromosomes can be adapted. You just use a vector with values in [0, 1] as a chromosome and convert each vector to its sort index.

I hope this algorithm works correctly:
unsorted_list = ['c', 'a', 'a', 'a', 'a', 'b', 'b']
d = {i: unsorted_list.count(i) for i in unsorted_list}
print(d) # {'c': 1, 'a': 4, 'b': 2}
d = {k: v for k, v in sorted(d.items(), key=lambda item: item[1], reverse=True)}
print(d) # {'a': 4, 'b': 2, 'c': 1}
result = [None] * len(unsorted_list)
border_index_left = 0
border_index_right = len(unsorted_list) - 1
it = iter(d)
def set_recursively(k, nk):
set_borders(k)
set_borders(nk)
if d[k]:
set_recursively(k, nk)
def set_borders(key):
global border_index_left, border_index_right
if key is not None and d[key]:
result[border_index_left] = key
d[key] = d[key] - 1
border_index_left = border_index_left + 1
if key is not None and d[key]:
result[border_index_right] = key
d[key] = d[key] - 1
border_index_right = border_index_right - 1
next_element = next(it, None)
for k, v in d.items():
next_element = next(it, None)
set_recursively(k, next_element)
print(result) # ['a', 'b', 'a', 'c', 'a', 'b', 'a']
Visually, it looks as walking from the edge to the middle:
[2, 3, 3, 3, 1, 1, 0]
[None, None, None, None, None, None, None]
[3, None, None, None, None, None, None]
[3, None, None, None, None, None, 3]
[3, 1, None, None, None, None, 3]
[3, 1, None, None, None, 1, 3]
[3, 1, 3, None, None, 1, 3]
[3, 1, 3, 2, None, 1, 3]
[3, 1, 3, 2, 0, 1, 3]

Just saying, putting a short time delay would work just fine. I think someone mentioned this already. It is very simple and very reliable. You could do something like:
from random import sample
from time import sleep
import requests
intervalList = list(range(0.1, 0.5))
error404 = []
connectionError = []
for i in your_URL_list:
ststusCode = req.get(str(i)).status_code
if ststusCode == 404:
error404.append(i)
sleep(sample(intervalList,1))
Cheers

Median of the medians of a list

I need a vector that stores the median values of the medians of the main list "v". I have tried something with the following code but I am only able to write some values in the correct way.
v=[1,2,3,4,5,6,7,8,9,10]
final=[]
nfac=0
for j in range (0,4):
nfac=j+1
for k in range (0,nfac):
if k%2==0:
final.append(v[10/2**(nfac)-1])
else:
final.append(v[9-10/2**(nfac)])
The first median in v=[1,2,3,4,5,6,7,8,9,10] is 5
Then I want the medians of the remaining sublists [1,2,3,4] and [6,7,8,9,10]. I.e. 2 and 8 respectively. And so on.
The list "final" must be in the following form:
final=[5,2,8,1,3,6,9,4,7,10]

Please take a note that the task as you defined it is basically equivalent to constructing a binary heap from an array.
Definitely start by defining a helper function for finding the median:
def split_by_median(l):
median_ind = (len(l)-1) // 2
median = l[median_ind]
left = l[:median_ind]
right = l[median_ind+1:] if len(l) > 1 else []
return median, left, right
Following the example you give, you want to process the resulting sublists in a breadth-first manner, so we need a queue to remember the following tasks:
from collections import deque
def construct_heap(v):
lists_to_process = deque([sorted(v)])
nodes = []
while lists_to_process:
head = lists_to_process.popleft()
if len(head) == 0:
continue
median, left, right = split_by_median(head)
nodes.append(median)
lists_to_process.append(left)
lists_to_process.append(right)
return nodes
So calling the function finally:
print(construct_heap([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])) # [5, 2, 8, 1, 3, 6, 9, 4, 7, 10]
print(construct_heap([5, 1, 2])) # [2, 1, 5]
print(construct_heap([1, 0, 0.5, -1])) # [0, -1, 0.5, 1]
print(construct_heap([])) # []

counting up and then down a range in python

I am trying to program a standard snake draft, where team A pick, team B, team C, team C, team B, team A, ad nauseum.
If pick number 13 (or pick number x) just happened how can I figure which team picks next for n number of teams.
I have something like:
def slot(n,x):
direction = 'down' if (int(x/n) & 1) else 'up'
spot = (x % n) + 1
slot = spot if direction == 'up' else ((n+1) - spot)
return slot
I have feeling there is a simpler, more pythonic what than this solution. Anyone care to take a hack at it?
So I played around a little more. I am looking for the return of a single value, rather than the best way to count over a looped list. The most literal answer might be:
def slot(n, x): # 0.15757 sec for 100,000x
number_range = range(1, n+1) + range(n,0, -1)
index = x % (n*2)
return number_range[index]
This creates a list [1,2,3,4,4,3,2,1], figures out the index (e.g. 13 % (4*2) = 5), and then returns the index value from the list (e.g. 4). The longer the list, the slower the function.
We can use some logic to cut the list making in half. If we are counting up (i.e. (int(x/n) & 1) returns False), we get the obvious index value (x % n), else we subtract that value from n+1:
def slot(n, x): # 0.11982 sec for 100,000x
number_range = range(1, n+1) + range(n,0, -1)
index = ((n-1) - (x % n)) if (int(x/n) & 1) else (x % n)
return number_range[index]
Still avoiding a list altogether is fastest:
def slot(n, x): # 0.07275 sec for 100,000x
spot = (x % n) + 1
slot = ((n+1) - spot) if (int(x/n) & 1) else spot
return slot
And if I hold the list as variable rather than spawning one:
number_list = [1,2,3,4,5,6,7,8,9,10,11,12,12,11,10,9,8,7,6,5,4,3,2,1]
def slot(n, x): # 0.03638 sec for 100,000x
return number_list[x % (n*2)]

Why not use itertools cycle function:
from itertools import cycle
li = range(1, n+1) + range(n, 0, -1) # e.g. [1, 2, 3, 4, 4, 3, 2, 1]
it = cycle(li)
[next(it) for _ in xrange(10)] # [1, 2, 3, 4, 4, 3, 2, 1, 1, 2]
Note: previously I had answered how to run up and down, as follows:
it = cycle(range(1, n+1) + range(n, 0, -1)) #e.g. [1, 2, 3, 4, 3, 2, 1, 2, 3, ...]

Here's a generator that will fulfill what you want.
def draft(n):
while True:
for i in xrange(1,n+1):
yield i
for i in xrange(n,0,-1):
yield i
>>> d = draft(3)
>>> [d.next() for _ in xrange(12)]
[1, 2, 3, 3, 2, 1, 1, 2, 3, 3, 2, 1]

from itertools import chain, cycle
def cycle_up_and_down(first, last):
up = xrange(first, last+1, 1)
down = xrange(last, first-1, -1)
return cycle(chain(up, down))
turns = cycle_up_and_down(1, 4)
print [next(turns) for n in xrange(10)] # [1, 2, 3, 4, 4, 3, 2, 1, 1, 2]

Here is a list of numbers that counts up, then down:
>>> [ -abs(5-i)+5 for i in range(0,10) ]
[0, 1, 2, 3, 4, 5, 4, 3, 2, 1]
Written out:
count_up_to = 5
for i in range( 0, count_up_to*2 ):
the_number_you_care_about = -abs(count_up_to-i) + count_up_to
# do stuff with the_number_you_care_about
Easier to read:
>>> list( range(0,5) ) + list( range( 5, 0, -1 ) )
[0, 1, 2, 3, 4, 5, 4, 3, 2, 1]
Written out:
count_up_to = 5
for i in list( range(0,5) ) + list( range(5, 0, -1) ):
# i is the number you care about
Another way:
from itertools import chain
for i in chain( range(0,5), range(5,0,-1) ):
# i is the number you care about

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Iterative Permutations of Distribution - python

Related

Google foo.bar failing all test cases but working in python IDE

how to make sure that two numbers next to each other in a list are different

Function to make a list as unsorted as possible

Median of the medians of a list

counting up and then down a range in python

Categories

Resources