Tuples birthday paradox

Tuples birthday paradox - python

Tuples is n
The birthday problem equation is this:
Question:
For n = 200, write an algorithm (in Python) for enumerating the number of tuples in the sample space that satisfy the condition that at least two people have the same birthday. (Note that your algorithm will need to scan each tuple)
import itertools
print(list(itertools.permutations([0,0,0]))
I am wondering for this question how do I insert a n into this?

"how to get n in there":
n = 200
space = itertools.permutations(bday_pairs, n)
I've left out a couple parts of your code:
itertools returns a list; you don't need to coerce it.
Printing this result is likely not what you want with n = 200; that's a huge list.
Now, all you need to do is to build bday_pairs, the list of all possible pairs of birthdays. For convenience, I suggest that you use the integers 1-365. Since you haven't attacked that part of the problem at all, I'll leave that step up to you.
You still need to do the processing to count the sets with at least one matching birthday, another part of the problem you haven't attacked. However, I trust that the above code solves your stated problem?

Related

Iterate through all list orders

So I have a set of 11 variables that I want to assign the numbers 1 to 11 using python. I want to then iterate through all combinations of these assignments and check they complete a series of tests. The 6 tests are such that 6 different combinations of the variables (including a separate 12th variable that retains a constant value (12)) are tested to see if they add to 26. For a permutation to be successful, all combinations must add to 26.
To simplify, say it was just 3 variables assigned the numbers 1-3 I want the program to output a=1,b=2,c=3 then to check this against the criteria before changing the order to a=1,b=3,c=2. Check then change: a=2,b=1,c=3 check and change etc until all combinations have been checked.
I first considered using a list to store the numbers 1 to 11 then just randomising the list order using the shuffle function. I’m not sure how else to iterate through the combinations. The random nature of shuffle would eventually do the job but it would be very slow and certainly not an elegant solution.
Thanks for any help in advance :)

I have since realised after looking through the answers and links provided that what I actually need here is the itertools.permutation function. Since this particular problem requires all permutations of the list of numbers 1-11 and not just combinations of these numbers in a list of a smaller size and the ordering matters.The following link was useful in realising the solution.
https://www.tutorialspoint.com/generate-all-permutation-of-a-set-in-python
import itertools
numbers = list(range(1,12))
permutations=list(itertools.permutations(numbers,11))
The result of this code provides all possible orders of the numbers 1-11 as a list of tuples which I can then iterate through to check if they pass the tests.

The itertools package includes a combinations function that may be helpful for you. For example:
import itertools
numbers = range(1, 12)
for (a,b,c) in itertools.combinations(numbers, 3):
# do something

Optimization of Unique, non-reversible permutations in Python

I'm currently trying to solve the 'dance recital' kattis challenge in Python 3. See here
After taking input on how many performances there are in the dance recital, you must arrange performances in such a way that sequential performances by a dancer are minimized.
I've seen this challenge completed in C++, but my code kept running out of time and I wanted to optimize it.
Question: As of right now, I generate all possible permutations of performances and run comparisons off of that. A faster way to would be to not generate all permutations, as some of them are simply reversed and would result in the exact same output.
import itertools
print(list(itertools.permutations(range(2)))) --> [(0,1),(1,0)] #They're the same, backwards and forwards
print(magic_algorithm(range(2))) --> [(0,1)] #This is what I want
How might I generate such a list of permutations?
I've tried:
-Generating all permutation, running over them again to reversed() duplicates and saving them. This takes too long and the result cannot be hard coded into the solution as the file becomes too big.
-Only generating permutations up to the half-way mark, then stopping, assuming that after that, no unique permutations are generated (not true, as I found out)
-I've checked out questions here, but no one seems to have the same question as me, ditto on the web
Here's my current code:
from itertools import permutations
number_of_routines = int(input()) #first line is number of routines
dance_routine_list = [0]*10
permutation_list = list(permutations(range(number_of_routines))) #generate permutations
for q in range(number_of_routines):
s = input()
for c in s:
v = ord(c) - 65
dance_routine_list[q] |= (1 << v) #each routine ex.'ABC' is A-Z where each char represents a performer in the routine
def calculate():
least_changes_possible = 1e9 #this will become smaller, as optimizations are found
for j in permutation_list:
tmp = 0
for i in range(1,number_of_routines):
tmp += (bin(dance_routine_list[j[i]] & dance_routine_list[j[i - 1]]).count('1')) #each 1 represents a performer who must complete sequential routines
least_changes_possible = min(least_changes_possible, tmp)
return least_changes_possible
print(calculate())
Edit: Took a shower and decided adding a 2-element-comparison look-up table would speed it up, as many of the operations are repeated. Still doesn't fix iterating over the whole permutations, but it should help.
Edit: Found another thread that answered this pretty well. How to generate permutations of a list without "reverse duplicates" in Python using generators
Thank you all!

There are at most 10 possible dance routines, so at most 3.6M permutations, and even bad algorithms like generate 'em all and test will be done very quickly.
If you wanted a fast solution for up to 24 or so routines, then I would do it like this...
Given the the R dance routines, at any point in the recital, in order to decide which routine you can perform next, you need to know:
Which routines you've already performed, because there you can't do those ones next. There are 2R possible sets of already-performed routines; and
Which routine was performed last, because that helps determine the cost of the next one. There are at most R-1 possible values for that.
So there are at less than (R-2)*2R possible recital states...
Imagine a directed graph that connects each possible state to all the possible following states, by an edge for the routine that you would perform to get to that state. Label each edge with the cost of performing that routine.
For example, if you've performed routines 5 and 6, with 5 last, then you would be in state (5,6):5, and there would be an edge to (3,5,6):3 that you could get to after performing routine 3.
Starting at the initial "nothing performed yet" state ():-, use Dijkstra's algorithm to find the least cost path to a state with all routines performed.
Total complexity is O(R2*2R) or so, depending exactly how you implement it.
For R=10, R2*2R is ~100 000, which won't take very long at all. For R=24 it's about 9 billion, which is going to take under half a minute in pretty good C++.

Using random function to find "cut off" values on python

I need help with this problem and I think the major reason I don't know how to do it is because I don't really understand what it is asking for.
The problem is asking to write a program that will find the curve of a class. It has to find the "cut off" values for A,B,C and D. 40% of the class will receive an A, as well as a B(40%), 10% will receive a C and the remaining 10% will get a D.
The grades have to be generated randomly between 20 - 100 and the number of student has to be an input. a tuple has to be returned with the cut off values of A,B,C and D.
It also gives me the hint to use list.sort() and min(list)
I honestly don't really get what this is asking for, so far I have this but I am not sure what I am doing.
import random
def Cutoffgrade():
students=float(input('How many students are in the class?'))
grades=list(range(20,101))
list.sort(grades)
Astudents=students*0.4
Bstudents=students*0.4
Cstudents=students*0.1
Dstudents=students*0.1
I also have this other function I did which I know is something similar to what I have to do, but I just don't get this problem.
Here is the other function I did for a different problem that is somewhat asking for a random list as well:
def createlist(v1,v2):
random.randint(v1,v2)
random.randrange(v1,v2)
numlist=[]
for i in range(5):
n=random.randrange(v1,v2)
numlist.append(n)
print(numlist)
So I have been trying to transform this function into what I would need for the new problem but don't know how.

Since this is a homework question and I imagine your professor isn't very likely to get back to you during the weekend I will give you some hints at how to tackle this:
First build a list of grades using random.randint(start, stop) and a list comprehension or standard for loop
Sort the list.
Determine how many students should be in each grade bracket.
Slice the list into the appropriate chunks and find the min() of each list.

Speed/structure optimization for a recursive tree

Edit -> short version:
In Python, unlike in C, if I pass a parameter to a function I -say: a dict-, the changes made within the function call will reflect outside (as if I passed a pointer instead of just the value)
I want to avoid this so:
-> I make a copy of my dict and pass the copy to my function
But the values of my dict can be some dict and this goes on until an undefinite depth
-> the recursive copy is very long.
Question: what is a pythonic way to go about this?
Long version:
I'm coding a master-mind playing robot with a n-digit code in Python.
You try to guess the code and for each try you get an answer in terms of how many white/black/none you have, meaning resp. "good digit good position"/"good digit wrong position"/"wrong digit" (but you don't know to which digit the whites/blacks/none refer)
I analyze the answers and build a tree of possibilities with a dictionary storing white/black/none.
I store a map of the possible positions of the numbers 0-9 within the code (a digit can appear more than once) in a list.
Ex: for a 3-digit game I will have [[x,y1,y2,y3][-1,0,1,4][...][...][][][][][][]] with:
x: the total number of times this digit appears in the code (default value being n+1, ie. 4 in the exemple) with positive meaning sure and negative "at least"
y1,y2,..,yn the position within the code: 1 means I know the digit is in this position, 0 I know it's not, and 4 (or anything) as default
In my exemple: I know that '1' appears at least once in the code (-1) that it is present in position 2 and that it is NOT present in position 1 and that position 3 is still hypothetically possible.
While I explore my tree of possibilities, I update this list. Which means that each branch of the tree will have its own copy of the list.
Since I recently discovered that, unlike in C, when I pass my list to a sub-method, any change made to it within the sub will reflect on the list outside, I manually copy my list each time with a small method:
def bak_symb(_s):
_b = [[z for z in _s[i]] for i in xrange(10)]
return _b
Now, I profiled my programm and noticed that 90% of the time is spent either in
append()
(the branches of my tree are nested dictionaries {w:{},b:0,n:{}} to which I append each branch of possibilities that I explore)For each branch : the programm has to find a n-digit code
or
my copying function
So I have three questions.
Is there a way to make this function faster?
Is there a something better adapted than the structures I chose (2-depth list for the symbols and nested dict for the hypothesis)
Is there a more adequate way of doing this than building this huge tree
All comments and remarks are welcome.
I'm self-taught in and might have missed some obvious pythonic way of doing some things.
Last but not least, I tried to find a good compromise between making this short and clear, here again don't hesitate to ask for more details.
Thanks in advance,
Matt

Searching through large data set

how would i search through a list with ~5 mil 128bit (or 256, depending on how you look at it) strings quickly and find the duplicates (in python)? i can turn the strings into numbers, but i don't think that's going to help much. since i haven't learned much information theory, is there anything about this in information theory?
and since these are hashes already, there's no point in hashing them again

If it fits into memeory, use set(). I think it will be faster than sort. O(n log n) for 5 million items is going to cost you.
If it does not fit into memory, say you've lot more than 5 million record, divide and conquer. Break the records at the mid point like 1 x 2^127. Apply any of the above methods. I guess information theory helps by stating that a good hash function will distribute the keys evenly. So the divide by mid point method should work great.
You can also apply divide and conquer even if it fit into memory. Sorting 2 x 2.5 mil records is faster than sorting 5 mil records.

Load them into memory (5M x 64B = 320MB), sort them, and scan through them finding the duplicates.

In Python2.7+ you can use collections.Counter for older Python use collections.deaultdict(int). Either way is O(n).
first make a list with some hashes in it
>>> import hashlib
>>> s=[hashlib.sha1(str(x)).digest() for x in (1,2,3,4,5,1,2)]
>>> s
['5j\x19+y\x13\xb0LTWM\x18\xc2\x8dF\xe69T(\xab', '\xdaK\x927\xba\xcc\xcd\xf1\x9c\x07`\xca\xb7\xae\xc4\xa85\x90\x10\xb0', 'w\xdeh\xda\xec\xd8#\xba\xbb\xb5\x8e\xdb\x1c\x8e\x14\xd7\x10n\x83\xbb', '\x1bdS\x89$s\xa4g\xd0sr\xd4^\xb0Z\xbc 1dz', '\xac4x\xd6\x9a<\x81\xfab\xe6\x0f\\6\x96\x16ZN^j\xc4', '5j\x19+y\x13\xb0LTWM\x18\xc2\x8dF\xe69T(\xab', '\xdaK\x927\xba\xcc\xcd\xf1\x9c\x07`\xca\xb7\xae\xc4\xa85\x90\x10\xb0']
If you are using Python2.7 or later
>>> from collections import Counter
>>> c=Counter(s)
>>> duplicates = [k for k in c if c[k]>1]
>>> print duplicates
['\xdaK\x927\xba\xcc\xcd\xf1\x9c\x07`\xca\xb7\xae\xc4\xa85\x90\x10\xb0', '5j\x19+y\x13\xb0LTWM\x18\xc2\x8dF\xe69T(\xab']
if you are using Python2.6 or earlier
>>> from collections import defaultdict
>>> d=defaultdict(int)
>>> for i in s:
... d[i]+=1
...
>>> duplicates = [k for k in d if d[k]>1]
>>> print duplicates
['\xdaK\x927\xba\xcc\xcd\xf1\x9c\x07`\xca\xb7\xae\xc4\xa85\x90\x10\xb0', '5j\x19+y\x13\xb0LTWM\x18\xc2\x8dF\xe69T(\xab']

Is this array sorted?
I think the fastest solution can be a heap sort or quick sort, and after go through the array, and find the duplicates.

You say you have a list of about 5 million strings, and the list may contain duplicates. You don't say (1) what you want to do with the duplicates (log them, delete all but one occurrence, ...) (2) what you want to do with the non-duplicates (3) whether this list is a stand-alone structure or whether the strings are keys to some other data that you haven't mentioned (4) why you haven't deleted duplicates at input time instead building a list containing duplicates.
As a Data Structures and Algorithms 101 exercise, the answer you have accepted is a nonsense. If you have enough memory, detecting duplicates using a set should be faster than sorting a list and scanning it. Note that deleting M items from a list of size N is O(MN). The code for each of the various alternatives is short and rather obvious; why don't you try writing them, timing them, and reporting back?
If this is a real-world problem that you have, you need to provide much more information if you want a sensible answer.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.