Keeping count of values available from among multiple sets - python

I have the following situation:
I am generating n combinations of size 3 from, made from n values. Each kth combination [0...n] is pulled from a pool of values, located in the kth index of a list of n sets. Each value can appear 3 times. So if I have 10 values, then I have a list of size 10. Each index holds a set of values 0-10.
So, it seems to me that a good way to do this is to have something keeping count of all the available values from among all the sets. So, if a value is rare(lets say there is only 1 left), if I had a structure where I could look up the rarest value, and have the structure tell me which index it was located in, then it would make generating the possible combinations much easier.
How could I do this? What about one structure to keep count of elements, and a dictionary to keep track of list indices that contain the value?
edit: I guess I should put in that a specific problem I am looking to solve here, is how to update the set for every index of the list (or whatever other structures i end up using), so that when I use a value 3 times, it is made unavailable for every other combination.
Thank you.
Another edit
It seems that this may be a little too abstract to be asking for solutions when it's hard to understand what I am even asking for. I will come back with some code soon, please check back in 1.5-2 hours if you are interested.

how to update the set for every index of the list (or whatever other structures i end up using), so that when I use a value 3 times, it is made unavailable for every other combination.
I assume you want to sample the values truly randomly, right? What if you put 3 of each value into a list, shuffle it with random.shuffle, and then just keep popping values from the end of the list when you're building your combination? If I'm understanding your problem right, here's example code:
from random import shuffle
valid_values = [i for i in range(10)] # the valid values are 0 through 9 in my example, update accordingly for yours
vals = 3*valid_values # I have 3 of each valid value
shuffle(vals) # randomly shuffle them
while len(vals) != 0:
combination = (vals.pop(), vals.pop(), vals.pop()) # combinations are 3 values?
print(combination)
EDIT: Updated code based on the added information that you have sets of values (but this still assumes you can use more than one value from a given set):
from random import shuffle
my_sets_of_vals = [......] # list of sets
valid_values = list()
for i in range(my_sets_of_vals):
for val in my_sets_of_vals[i]:
valid_values.append((i,val)) # this can probably be done in list comprehension but I forgot the syntax
vals = 3*valid_values # I have 3 of each valid value
shuffle(vals) # randomly shuffle them
while len(vals) != 0:
combination = (vals.pop()[1], vals.pop()[1], vals.pop()[1]) # combinations are 3 values?
print(combination)

Based on the edit you could make an object for each value. It could hold the number of times you have used the element and the element itself. When you find you have used an element three times, remove it from the list

Related

Find the highest sum combination with conditions from a list

So I have a 360 element list that I want to find the highest sum of 11 numbers combination, but with a condition.To make it a bit clearer:
1-Gets a list as input
2-Create a combination list with 11 numbers
3-Check the list for a specific condition
4-If yes, return the list's sum
I tried to use itertools.combination then check for the condition but it took so long as my list is really big.So I'm wondering if there's a way to check for the condition first rather than creating all the combinations then filtering them out.
EDIT: Guys I think you didn't get my question quite well.I want to get the list's combination first(like permutation), not just the highest 11 numbers
Why not sorted the list, descending, and then pick the first 11? If you need the indices, you can just find the numbers that you need from the original list.
import random
import pprint
original_items = random.choices(range(999), k=360)
pprint.pprint(original_items)
highest_11 = sorted(original_items, reverse=True)[:11]
pprint.pprint(highest_11)
Generally, sort the list and find the first 11 numbers that satisfy your condition.
Even if your condition is non deterministic, you can still probably reduce the runtime to linear for the search itself (and thus the runtime will depend on the condition).

Re-generate a random index until the indexed element is met the condition n times in python

I know there have been a few similar questions regarding loop and random numbers, but I can't seem to find a solution for my problem.
Say I have a fixed list of numbers from my dataset and a threshold that the number has to meet:
x = (7,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41)
threshold= 25
I need to randomly pick a number from this list. Unfortunately, I cannot just directly loop over my original list, I'm forced to randomly pick an index of the list first, and find my number. So for example if I now randomly generate a index number 1 then I get x[1] which is 11
The final result I need is to find numbers that are greater than the threshold for at least 3 times and then put all the resulting number in a list, then my loop can stop. (The indexes cannot repeat).
As an example, a possible final results would be (27,29,31) (The results can be in any format) . I'm thinking maybe something like this to start but really need help to proceed:
Unless you are particularly concerned about the memory usage of creating an additional filtered list, the simplest would probably be to start by doing this:
filtered = [i for i in x if i > threshold]
You can then choose three samples from this filtered list (after import random). The following will potentially choose the same item more than once:
random.choices(filtered, k=3)
or if you want to avoid choosing the same item more than once:
random.sample(filtered, k=3)
Each of the above functions will output a list. Use tuple(....) on the output if you need to convert it to a tuple.
First a clarification. Do you need to pick a random element from the list each iteration, or do you need to pick a different random element from the list each time. I.e., can the same index be picked twice? You're doing the latter.
Second, you want to use range(len(x)). You don't want to hardwire the length of x into your code, and you want index 0 to be a possibility. random.shuffle() may be a better choice.
Lastly, you want to do something like:
result = []
for ....
if select >= threshold:
result.append(select)
if len(result) >= 3: break
If we assume the following constraints:
We are not allowed to loop over the original list (including list comprehension)
We are only allowed to access one member of the original list at a time through its index
We must pick 3 distinct members of the list that are greater or equal to the threshold
The following code should satisfy all of them:
x = [ 7,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41 ]
threshold = 25
result_index = []
while len(result_index) < 3:
index = random.range(0, len(x))
if x[index] >= threshold and index not in result_index:
result_index.append(index)
result = [ x[a] for a in result_index ]
Here is how this works:
In the loop, we store indices, not the numbers them selves.
For each index we check 2 conditions: there is a number there that is bigger or equal to the threshold and we haven't seen this index before.
If the conditions are satisfied, we save the index, not the number!
Repeat until we have 3 indices.
Build new list by getting numbers from those indices directly.

How to only check for two values existence in a list

I have a list of lists say, for example:
directions = [[-1,0,1],[1,0,4],[1,1,2][-1,1,2]]
now, in any of the nested lists the index [2] is of no importance in the test.
I want to try to find if the first two values in any of the nested lists match the inverse of any other, To clarify further by inverse I mean the negative value In python code. preferable with only one line but if that not possible than a work around to get the same effect.
and if is condition is true and the third values of the two nested lists should be added together and stored in the second original list in the check function and the second list which was the inverse one should be deleted.
So
if nested list's first 2 values == -another nested list's first 2 values
add their third values together
list delete(inverse list)
I hope this makes a little more sense.
I have tried this below but I still cant get it to skip the 3 value or index 2
listNum = 0
while len(directions) > listNum:
if (-directions[listNum][0], -directions[listNum][1], anything(Idk)) in directions:
index = index(-directions[listNum][0], -directions[listNum][1], anything(Idk))
directions[listNum][2] += directions[index][2]
directions.del(index)
But I don't know what to put where I put anything(Idk)

Python: Listing the duplicates in a list

I am fairly new to Python and I am interested in listing duplicates within a list. I know how to remove the duplicates ( set() ) within a list and how to list the duplicates within a list by using collections.Counter; however, for the project that I am working on this wouldn't be the most efficient method to use since the run time would be n(n-1)/2 --> O(n^2) and n is anywhere from 5k-50k+ string values.
So, my idea is that since python lists are linked data structures and are assigned to the memory when created that I begin counting duplicates from the very beginning of the creation of the lists.
List is created and the first index value is the word 'dog'
Second index value is the word 'cat'
Now, it would check if the second index is equal to the first index, if it is then append to another list called Duplicates.
Third index value is assigned 'dog', and the third index would check if it is equal to 'cat' then 'dog'; since it matches the first index, it is appended to Duplicates.
Fourth index is assigned 'dog', but it would check the third index only, and not the second and first, because now you can assume that since the third and second are not duplicates that the fourth does not need to check before, and since the third/first are equal, the search stops at the third index.
My project gives me these values and append it to a list, so I would want to implement that above algorithm because I don't care how many duplicates there are, I just want to know if there are duplicates.
I can't think of how to write the code, but I figured the basic structure of it, but I might be completely off (using random numgen for easier use):
for x in xrange(0,10):
list1.append(x)
for rev, y in enumerate(reversed(list1)):
while x is not list1(y):
cond()
if ???
I really don't think you'll get better than a collections.Counter for this:
c = Counter(mylist)
duplicates = [ x for x,y in c.items() if y > 1 ]
building the Counter should be O(n) (unless you're using keys which are particularly bad for hashing -- But in my experience, you need to try pretty hard to make that happen) and then getting the duplicates list is also O(n) giving you a total complexity of O(2n) == O(n) (for typical uses).

Finding the most similar numbers across multiple lists in Python

In Python, I have 3 lists of floating-point numbers (angles), in the range 0-360, and the lists are not the same length. I need to find the triplet (with 1 number from each list) in which the numbers are the closest. (It's highly unlikely that any of the numbers will be identical, since this is real-world data.) I was thinking of using a simple lowest-standard-deviation method to measure agreement, but I'm not sure of a good way to implement this. I could loop through each list, comparing the standard deviation of every possible combination using nested for loops, and have a temporary variable save the indices of the triplet that agrees the best, but I was wondering if anyone had a better or more elegant way to do something like this. Thanks!
I wouldn't be surprised if there is an established algorithm for doing this, and if so, you should use it. But I don't know of one, so I'm going to speculate a little.
If I had to do it, the first thing I would try would be just to loop through all possible combinations of all the numbers and see how long it takes. If your data set is small enough, it's not worth the time to invent a clever algorithm. To demonstrate the setup, I'll include the sample code:
# setup
def distance(nplet):
'''Takes a pair or triplet (an "n-plet") as a list, and returns its distance.
A smaller return value means better agreement.'''
# your choice of implementation here. Example:
return variance(nplet)
# algorithm
def brute_force(*lists):
return min(itertools.product(*lists), key = distance)
For a large data set, I would try something like this: first create one triplet for each number in the first list, with its first entry set to that number. Then go through this list of partially-filled triplets and for each one, pick the number from the second list that is closest to the number from the first list and set that as the second member of the triplet. Then go through the list of triplets and for each one, pick the number from the third list that is closest to the first two numbers (as measured by your agreement metric). Finally, take the best of the bunch. This sample code demonstrates how you could try to keep the runtime linear in the length of the lists.
def item_selection(listA, listB, listC):
# make the list of partially-filled triplets
triplets = [[a] for a in listA]
iT = 0
iB = 0
while iT < len(triplets):
# make iB the index of a value in listB closes to triplets[iT][0]
while iB < len(listB) and listB[iB] < triplets[iT][0]:
iB += 1
if iB == 0:
triplets[iT].append(listB[0])
elif iB == len(listB)
triplets[iT].append(listB[-1])
else:
# look at the values in listB just below and just above triplets[iT][0]
# and add the closer one as the second member of the triplet
dist_lower = distance([triplets[iT][0], listB[iB]])
dist_upper = distance([triplets[iT][0], listB[iB + 1]])
if dist_lower < dist_upper:
triplets[iT].append(listB[iB])
elif dist_lower > dist_upper:
triplets[iT].append(listB[iB + 1])
else:
# if they are equidistant, add both
triplets[iT].append(listB[iB])
iT += 1
triplets[iT:iT] = [triplets[iT-1][0], listB[iB + 1]]
iT += 1
# then another loop while iT < len(triplets) to add in the numbers from listC
return min(triplets, key = distance)
The thing is, I can imagine situations where this wouldn't actually find the best triplet, for instance if a number from the first list is close to one from the second list but not at all close to anything in the third list. So something you could try is to run this algorithm for all 6 possible orderings of the lists. I can't think of a specific situation where that would fail to find the best triplet, but one might still exist. In any case the algorithm will still be O(N) if you use a clever implementation, assuming the lists are sorted.
def symmetrized_item_selection(listA, listB, listC):
best_results = []
for ordering in itertools.permutations([listA, listB, listC]):
best_results.extend(item_selection(*ordering))
return min(best_results, key = distance)
Another option might be to compute all possible pairs of numbers between list 1 and list 2, between list 1 and list 3, and between list 2 and list 3. Then sort all three lists of pairs together, from best to worst agreement between the two numbers. Starting with the closest pair, go through the list pair by pair and any time you encounter a pair which shares a number with one you've already seen, merge them into a triplet. For a suitable measure of agreement, once you find your first triplet, that will give you a maximum pair distance that you need to iterate up to, and once you get up to it, you just choose the closest triplet of the ones you've found. I think that should consistently find the best possible triplet, but it will be O(N^2 log N) because of the requirement for sorting the lists of pairs.
def pair_sorting(listA, listB, listC):
# make all possible pairs of values from two lists
# each pair has the structure ((number, origin_list),(number, origin_list))
# so we know which lists the numbers came from
all_pairs = []
all_pairs += [((nA,0), (nB,1)) for (nA,nB) in itertools.product(listA,listB)]
all_pairs += [((nA,0), (nC,2)) for (nA,nC) in itertools.product(listA,listC)]
all_pairs += [((nB,1), (nC,2)) for (nB,nC) in itertools.product(listB,listC)]
all_pairs.sort(key = lambda p: distance(p[0][0], p[1][0]))
# make a dict to track which (number, origin_list)s we've already seen
pairs_by_number_and_list = collections.defaultdict(list)
min_distance = INFINITY
min_triplet = None
# start with the closest pair
for pair in all_pairs:
# for the first value of the current pair, see if we've seen that particular
# (number, origin_list) combination before
for pair2 in pairs_by_number_and_list[pair[0]]:
# if so, that means the current pair shares its first value with
# another pair, so put the 3 unique values together to make a triplet
this_triplet = (pair[1][0], pair2[0][0], pair2[1][0])
# check if the triplet agrees more than the previous best triplet
this_distance = distance(this_triplet)
if this_distance < min_distance:
min_triplet = this_triplet
min_distance = this_distance
# do the same thing but checking the second element of the current pair
for pair2 in pairs_by_number_and_list[pair[1]]:
this_triplet = (pair[0][0], pair2[0][0], pair2[1][0])
this_distance = distance(this_triplet)
if this_distance < min_distance:
min_triplet = this_triplet
min_distance = this_distance
# finally, add the current pair to the list of pairs we've seen
pairs_by_number_and_list[pair[0]].append(pair)
pairs_by_number_and_list[pair[1]].append(pair)
return min_triplet
N.B. I've written all the code samples in this answer out a little more explicitly than you'd do it in practice to help you to understand how they work. But when doing it for real, you'd use more list comprehensions and such things.
N.B.2. No guarantees that the code works :-P but it should get the rough idea across.

Categories