The smallest user group to test all providers - python

I have an algorithm problem to solve.
Description: I have a python dictionary that contains following
modules = {"auth_provider_1": [3, 4, 17, 19],
"auth_provider_2": [1, 6, 8, 10, 13, 14, 16, 18],
"auth_provider_3": [0, 7, 11, 12, 15],
"auth_provider_4": [2, 5, 9],
"cont_provider_1": [4, 14],
"cont_provider_2": [8, 9, 13, 15, 16, 17],
"cont_provider_3": [2, 3, 5, 10, 11, 18],
"cont_provider_4": [0, 1, 6, 7, 12, 19]}
There are two types of modules, auth_provider, and cont_provider.
Each one has 4 different providers. For example, we have 4 auth_providers; auth_provider_1, auth_provider_2, auth_provider_3 and auth_provider_4.
Each provider has a list that contains which user is using that provider.
each user is using only one auth provider and only one cont provider.
For example;
user 3 is using auth_provider_1 and cont_provider_3
user 1 is using auth_provider_2 and cont_provider_4.. and so on.
Problem: We want to check all providers with a minimum of a user group. For example, if we want to check what providers are users 0,2,4,8 using we will be checking all providers available. Same way with the users 0,2,4,16 and 0,2,4,13...etc
What I tried: Making a list from provider names sorted by their list length. For example;
sorted_list = ['cont_provider_1', 'auth_provider_4', 'auth_provider_1', 'auth_provider_3', 'cont_provider_2', 'cont_provider_3', 'cont_provider_4', 'auth_provider_2']
Iterating this sorted list, searching each element of its list in the modules dictionary and when it's found in a list I removed the key(provider_name) of that list from the sorted list.
For example from a sorted list, the first element is cont_provider_1 which has the smallest list length (it has only 4 and 14), I wanted to start from the smallest one because I thought it would make more sense. Then I search cont_provider_1 in the modules dictionary.
But when I find the 4 and auth_provider_1 and cont_provider_1 somehow iteration stuck somewhere and gives me this answer
sorted_list_after_iteration_is_over = ['auth_provider_4']
min_users = [4, 14, 0, 7, 11, 12, 15]
Question: What would be the best algorithm for this problem? Am I on the right track? Where I am doing wrong? Any suggestions or help?
here is my whole code
modules = {"auth_provider_1": [3, 4, 17, 19],
"auth_provider_2": [1, 6, 8, 10, 13, 14, 16, 18],
"auth_provider_3": [0, 7, 11, 12, 15],
"auth_provider_4": [2, 5, 9],
"cont_provider_1": [4, 14],
"cont_provider_2": [8, 9, 13, 15, 16, 17],
"cont_provider_3": [2, 3, 5, 10, 11, 18],
"cont_provider_4": [0, 1, 6, 7, 12, 19]}
providers_sorted_list = sorted(modules, key = lambda key: len(modules[key]))
# ['cont_provider_1', 'auth_provider_4', 'auth_provider_1', 'auth_provider_3', 'cont_provider_2', 'cont_provider_3', 'cont_provider_4', 'auth_provider_2']
test_users = []
for provider in providers_sorted_list:
search_list = modules[provider]
for user in search_list:
for key, val in modules.items():
if user in val:
if not user in test_users:
test_users.append(user)
if key in providers_sorted_list:
providers_sorted_list.remove(key)
print(providers_sorted_list)
print(test_users)

Related

How to divide a list or array into the correct number of groups

I have tried two methods of creating groups of numbers and then dividing those groups into smaller groups of sequential numbers and selecting one.
The first method used lists, but they wouldn't divide the groups correctly.
# This program prints the wrong number of elements.
# I need the correct number of elements, and I want the list to
# deal with floats.
begin = 1
end = 22
num_groop = 2
num_in_groop = (begin + end) // num_groop
lis = []
# loop iterates through index making list from beginning to end
end = num_in_groop
for _ in np.arange(num_groop):
lis.append(list(np.arange(begin, end+1)))
begin += num_in_groop
end += num_in_groop
print('lis', lis,)
# a function to choose one group from the lis and print it
x_1 = lis[0]
x_2 = lis[1]
inp = input('Choose group 1 or 2 by entering 1 or 2\n')
intinp = int(inp)
def choosefunc():
if intinp == 1:
del x_2[:]
print('You chose group x_1 = ',x_1[:])
elif intinp == 2:
del x_1[:]
print('You chose group x_2 = ',x_2[:])
choosefunc()
print('lis is now', lis)
The problem with this is that when it's repeated to narrow down the groups, it divides only using integers. Though the original max number was 22, after repeating this twice, it produces the wrong number of lists. To be correct maths, it should be this:
The first division of the list into an even number is fine:
[[1,2,3,4,5,6,7,8,9,10,11], [12,13,14,15,16,17,18,19,20,21,22]].
Then when choosing one of these groups, choose the first, and divide by two again that's where the maths doesn't work. It should be this:
lis [[1, 2, 3, 4, 5, 5.5], [5.6, 7, 8, 9, 10, 11]]
But because it doesn't seem to handle floats, it is this:
lis [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]].
That's not correct maths. 22/2 = 11, and 11/2 = 5.5.
I want the lists to be of equal size.
# ---------------------------------------------------
When I try to solve the problem using lists, by using numpy arrays, I get an error that stops me from continuing.
# I tried to solve this problem using array but getting an error.
# TypeError: 'tuple' object cannot be interpreted as an integer.
import numpy as np
begin = 1
end = 22
num_groop = 2
num_in_groop = (begin + end) // num_groop
lis = np.array([])
print('lis is now', lis) # prints the new value for lis
end = num_in_groop
for _ in np.arange(num_groop):
print('line20, lis now', lis)
lis(np.arange(range((begin, end+1)))) #error
begin += num_in_groop
end += num_in_groop
print('lis', lis)
If [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11]] is an acceptable split, then
def subdivide_list(a):
midpoint = len(a) // 2
return a[:midpoint], a[midpoint:]
lists = [list(range(1, 12))]
for x in range(3):
print(x, lists)
new_lists = []
for a in lists:
new_lists.extend(subdivide_list(a))
lists = new_lists
does what you want:
0 [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]]
1 [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11]]
2 [[1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]]
EDIT
This also works for [list(range(1, 23))] (print adjusted to show the lengths of the lists):
0 [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]] [22]
1 [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]] [11, 11]
2 [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16], [17, 18, 19, 20, 21, 22]] [5, 6, 5, 6]

Loop selecting one item per list in nested list

list = [[1,2,3,4],[5,6,7,8],[9,10,11,12]]
I would like to have a loop that randomly selects only ONE item from the indexes of the list for all 3 of them. So the loop would start and pick 3, then picks 7 and then picks 9, for example. And then the loop stops, doesn't continue on picking items again. I only want 3 repetitions
I have managed to do this
(with:
for i in list:
item = list[0].pop((random.choice(list)[0])))
but it doesn't do it only once, but it goes through all of the items (choosing the first one) of the first index, then moves to the second one and so on.
Any help is appreciated!
You seem to be indexing the list on 0 in each iteration, which will only give you random values from the first inner list. Use random.choice iterating over the list, or use map:
list(map(random.choice, my_list))
# [3, 8, 11]
Equivalently:
[random.choice(i) for i in my_list]
Based on the comments, if you want to remove the item you've randomly selected from the list, use instead:
[i.pop(random.randint(0,len(i))) for i in my_list]
# [4, 6, 9]
print(my_list)
# [[1, 2, 3], [5, 7, 8], [10, 11, 12]]
This is my code:
print("before random sample",pos)
pos = random.sample(pos, len(pos))
print("after random sample", pos)
test = [i.pop(random.randint(0,len(i))) for i in pos]```
This is the output:
#before random sample [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]]
#after random sample [[6, 7, 8, 9, 10], [1, 2, 3, 4, 5], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]]
#.......
#test = [i.pop(random.randint(0,len(i))) for i in pos]
#IndexError: pop index out of range

Bubble Sort working, but switching back the first 2 numbers, and only with certain numbers

This may be the weirdest problem I've ever seen, when I set the object at index 3 (the 4th object) to 5, the bubble sort algorithm seems to have problems, but when I set it to 12, the problem goes away, even though both 12 and 5 are lower numbers than both the numbers at index 2 and 4 (the numbers before and after 5/12) Here's the code, and here's the output:
array_to_sort = [10, 5, 13, 5, 42]
should_stop = False
print(array_to_sort)
while should_stop == False:
should_stop = True
index = 1
for back_number in array_to_sort:
print(array_to_sort)
front_number = array_to_sort[index]
if back_number > front_number:
array_to_sort.remove(back_number)
array_to_sort.remove(front_number)
array_to_sort.insert(index - 1, front_number)
array_to_sort.insert(index, back_number)
should_stop = False
if index + 1 < len(array_to_sort):
index += 1
(obviously not all the output):
[10, 5, 13, 5, 42]
[10, 5, 13, 5, 42]
[5, 10, 13, 5, 42]
[5, 10, 13, 5, 42]
[10, 5, 5, 13, 42]
[10, 5, 5, 13, 42]
However, it does eventually get fully sorted
Then with the same code, but if I set the array to this:
array_to_sort = [10, 5, 13, 12, 42]
The output becomes what would be expected:
[10, 5, 13, 12, 42]
[10, 5, 13, 12, 42]
[5, 10, 13, 12, 42]
[5, 10, 13, 12, 42]
[5, 10, 12, 13, 42]
[5, 10, 12, 13, 42]
PS: I know this is definitely not the best way to do bubble sort, I'm just starting with python.
This will happen any time there are duplicates in your list. list.remove removes the first instance of the object being removed. So when you try to remove the second 5 at index 3, you're actually removing the 5 at index 1.
Solution? Don't use list.remove in this scenario. Just swap the two values. This will be easiest if you use a loop over a range.
for back_number in range(len(array_to_sort)-1):
print(array_to_sort)
front_number = array_to_sort[back_number+1]
if array_to_sort[back_number] > array_to_sort[front_number]:
array_to_sort[front_number],array_to_sort[back_number] = array_to_sort[back_number], array_to_sort[front_number]
should_stop = False

Python - Comparing each item of a list to every other item in that list

I need to compare every item in a very long list (12471 items) to every other item in the same list. Below is my list:
[array([3, 4, 5])
array([ 6, 8, 10])
array([ 9, 12, 15])
array([12, 16, 20])
array([15, 20, 25])
...] #12471 items long
I need to compare the second item of each array to the first item of every other array to see if they're equal. And preferably, in a very efficient way. Is there a simple and efficient way to do this in Python 2.x?
I worked up a very crude method here, but it is terribly slow:
ls=len(myList) #12471
l=ls
k=0
for i in myList:
k+=1
while l>=0:
l-=1
if i[1]==myList[l][0]:
#Do stuff
l=ls
While this is still theoretically N^2 time (worst case), it should make things a bit better:
import collections
inval = [[3, 4, 5],
[ 6, 8, 10],
[ 9, 12, 15],
[ 12, 14, 15],
[12, 16, 20],
[ 6, 6, 10],
[ 8, 8, 10],
[15, 20, 25]]
by_first = collections.defaultdict(list)
by_second = collections.defaultdict(list)
for item in inval:
by_first[item[0]].append(item)
by_second[item[1]].append(item)
for k, vals in by_first.items():
if k in by_second:
print "by first:", vals, "by second:", by_second[k]
Output of my simple, short case:
by first: [[6, 8, 10], [6, 6, 10]] by second: [[6, 6, 10]]
by first: [[8, 8, 10]] by second: [[6, 8, 10], [8, 8, 10]]
by first: [[12, 14, 15], [12, 16, 20]] by second: [[9, 12, 15]]
Though this DOES NOT handle duplicates.
We can do this in O(N) with an assumption that python dict takes O(1) time for insert and lookup.
In the first scan, we create a map storing first number and row index by scanning the full list
In the second scan, we find if map from first scan contains second element of each row. If map contains then value of map gives us the list of row indices that match the required criterion.
myList = [[3, 4, 5], [ 6, 8, 10], [ 9, 12, 15], [12, 16, 20], [15, 20, 25]]
first_column = dict()
for idx, list in enumerate(myList):
if list[0] in first_column:
first_column[list[0]].append(idx)
else:
first_column[list[0]] = [idx]
for idx, list in enumerate(myList):
if list[1] in first_column:
print ('rows matching for element {} from row {} are {}'.format(list[1], idx, first_column[list[1]]))

greedy multiple knapsack (minimize/reduce number of bins)

actually, I already have a partial answer for this question, but I'm wondering if this small piece of greedy code can be generalized to something closer to the optimal solution.
how I met this problem (not relevant for problem itself, but maybe interesting):
I receive a large collection of objects (it's a set of profiles of dykes, and each dyke keeps more or less the same shape along its length), I can group them according to a property (the name of the dyke). the output of my program goes to an external program that we have to invoke by hand (don't ask me why) and which can't recover from failures (one mistake stops the whole batch).
in the application where I'm using this, there's no hard requirement on the amount of bins nor to the maximum size of the bins, what I try to do is to
keep the amount of groups low (invoke the program few times.)
keep the sets small (reduce the damage if a batch fails)
keep similar things together (a failure in a group is probably a failure in the whole group).
I did not have much time, so I wrote a small greedy function that groups sets together.
a colleague suggested I could add some noise to the data to explore the neighbourhood of the approximate solution I find, and we were wondering how far from optimal are the solutions found.
not that it is relevant to the original task, which doesn't need a true optimal solution, but I thought I would share the question with the community and see what comments come out of it.
def group_to_similar_sizes(orig, max_size=None, max_factor=None):
"""group orig list in sections that to not overflow max(orig) (or given max_size).
return list of grouped indices, plus max effective length.
>>> group_to_similar_sizes([1, 3, 7, 13])
([[2, 1, 0], [3]], 13)
>>> group_to_similar_sizes([2, 9, 9, 11, 12, 19, 19, 22, 22, ])
([[3, 1], [4, 2], [5], [6, 0], [7], [8]], 22)
result is independent of original ordering
>>> group_to_similar_sizes([9, 19, 22, 12, 19, 9, 2, 22, 11, ])
([[3, 1], [4, 2], [5], [6, 0], [7], [8]], 22)
you can specify a desired max size
>>> group_to_similar_sizes([2, 9, 9, 11, 12, 19, 19, 22, 22, ], 50)
([[3, 2, 1], [6, 5, 4], [8, 7, 0]], 50)
if the desired max size is too small, it still influences the way we make groups.
>>> group_to_similar_sizes([1, 3, 7, 13], 8)
([[1], [2, 0], [3]], 13)
>>> group_to_similar_sizes([2, 9, 9, 11, 12, 19, 19, 22, 22, ], 20)
([[1], [3, 2], [4, 0], [5], [6], [7], [8]], 22)
max size can be adjusted by a multiplication factor
>>> group_to_similar_sizes([9, 19, 22, 12, 5, 9, 2, 22, 11, ], max_factor=0.75)
([[2], [3], [4, 1], [5, 0], [6], [7], [8]], 22)
>>> group_to_similar_sizes([9, 19, 22, 12, 5, 9, 2, 22, 11, ], max_factor=1.5)
([[2, 1], [6, 5], [7, 3, 0], [8, 4]], 33)
"""
ordered = sorted(orig)
max_size = max_size or ordered[-1]
if max_factor is not None:
max_size = int(max_size * max_factor)
orig_ordered = list(ordered)
todo = set(range(len(orig)))
effective_max = 0
result = []
## while we still have unassigned items
while ordered:
## choose the largest item
## make it member of a group
## check which we can still put in its bin
candidate_i = len(ordered) - 1
candidate = ordered.pop()
if candidate_i not in todo:
continue
todo.remove(candidate_i)
group = [candidate_i]
group_size = candidate
for j in sorted(todo, reverse=True):
if orig_ordered[j] + group_size <= max_size:
group.append(j)
group_size += orig_ordered[j]
todo.remove(j)
result.insert(0, group)
effective_max = max(group_size, effective_max)
return result, effective_max
I like the idea of your colleague to add some noise data, But may be it's better to make a few swaps in ordered after you call ordered = sorted(orig)?

Categories