Calculate probability within a loop - Python3 - python

Good evening!
I have just picked up Python last month and I am having a lot of fun with it!
I am writing my first program ever in order to determine what are the probability of 2 songs of a same album playing back to back in a Spotify shuffled playlist: Sbotify! I have the use input sorted out but I cannot find a way to apply the probability formula for a value.
The formula used will be to multiply the probabilities of each separate event by one another as described in this article (method 2-3). I need to multiply an imputed value int() by itself -1 on repeat until this value reaches 0 and then do the same thing for another value and so on and so forth.
I have tried:
for loops
Map() function
Range() to bypass the int cannot be integrate error
But nothing seem to work. I've looked for an answer for hours and I cannot seem to find anything that fits my purpose. Any link, resource, or knowledge is much welcomed! 😊
Here is my code,
import math
import statistics
print("\nHey user! Sbotify has for purpose to tell you the probability of \n
playing a song of the same album in a shuffled Spotify playlist. \n\n")
total_songs = int(input("\nEnter the total of your songs:"))
lst = []
num = int(input('Enter a number of albums: '))
for n in range(num):
numbers = int(input('Enter number of song for each album: '))
lst.append(numbers)
print("Sum of songs belonging to an album is :", sum(lst))
print(lst)
print("\nLet's find out the songs that don't belong to any album. This will be handy later:")
solo_songs = total_songs - sum(lst)
print(int(solo_songs))
# I'm afraid we need to use... **MATH**
print("Nice! Now we need to find out the probabilities of 2 songs of the same album playing back to back in shuffle mode")
#Each album needs to be considered 1 entity and to that, the solo_songs value needs to be multiplied by itself -1 until 0
#to respect the probability formula
let me know if anything is correct, over complicated or to be redone completely!

Related

Runtime error on my code for CodeChef: Fit To Play (PLAYFIT)

I submitted code for this CodeChef problem:
Rayne Wooney has been one of the top players for his football club for
the last few years. But unfortunately, he got injured during a game a
few months back and has been out of play ever since.
He's got proper treatment and is eager to go out and play for his team
again. Before doing that, he has to prove to his fitness to the coach
and manager of the team. Rayne has been playing practice matches for
the past few days. He's played N practice matches in all.
He wants to convince the coach and the manager that he's improved over
time and that his injury no longer affects his game. To increase his
chances of getting back into the team, he's decided to show them stats
of any 2 of his practice games. The coach and manager will look into
the goals scored in both the games and see how much he's improved. If
the number of goals scored in the 2nd game(the game which took place
later) is greater than that in 1st, then he has a chance of getting
in. Tell Rayne what is the maximum improvement in terms of goal
difference that he can show to maximize his chances of getting into
the team. If he hasn't improved over time, he's not fit to play.
Scoring equal number of goals in 2 matches will not be considered an
improvement. Also, he will be declared unfit if he doesn't have enough
matches to show an improvement.
Input:
The first line of the input contains a single integer T, the number of test cases. Each test case begins with a single integer
N, the number of practice matches Rayne has played.
The next line contains N integers. The i th integer,
gi, on this line represents the number of goals Rayne scored in his i th practice match. The matches are given
in chronological order i.e. j > i means match number j took place
after match number i.
Output:
For each test case output a single line containing the maximum goal difference that Rayne can show to his coach and manager.
If he's not fit yet, print "UNFIT".
Constraints:
1 ≤ T ≤ 10
1 ≤ N ≤ 100000
0 ≤ gi ≤ 1000000 (Well, Rayne's a legend! You can expect him to score so many goals!)
My code:
for _ in range(int(input())):
num = int(input())
goals = list(map(int,input().split()))
list1 = []
for i in range(num-1):
diff = goals[i+1]-goals[i]
list1.append(diff)
if max(list1)>0:
print(max(list1))
else:
print('UNFIT')
Codechef's giving me a Runtime Error. Why is that?
The constraints allow N to be just 1. In that case your list1 is empty, and max([]) will produce the following error:
ValueError: max() arg is an empty sequence
The challenge mentions what you should return in such a case:
Also, he will be declared unfit if he doesn't have enough matches to show an improvement.
To solve this issue, change this line:
if max(list1)>0:
to:
if list1 and max(list1)>0:
Other remarks
Use meaningful variable names. list1 tells us what the data type is, but not what it is for.
Building list1 gives some overhead and will use O(N) extra space. Try to do it without creating a list.

How to create a hill climbing algorithm

I have been following a book for learning python, and the book has one of the following challenge:
Self Check
Here’s a self check that really covers everything so far. You may have
heard of the infinite monkey theorem? The theorem states that a monkey
hitting keys at random on a typewriter keyboard for an infinite amount
of time will almost surely type a given text, such as the complete
works of William Shakespeare. Well, suppose we replace a monkey with a
Python function. How long do you think it would take for a Python
function to generate just one sentence of Shakespeare? The sentence
we’ll shoot for is: “methinks it is like a weasel”
You’re not going to want to run this one in the browser, so fire up
your favorite Python IDE. The way we’ll simulate this is to write a
function that generates a string that is 28 characters long by
choosing random letters from the 26 letters in the alphabet plus the
space. We’ll write another function that will score each generated
string by comparing the randomly generated string to the goal.
A third function will repeatedly call generate and score, then if 100%
of the letters are correct we are done. If the letters are not correct
then we will generate a whole new string.To make it easier to follow
your program’s progress this third function should print out the best
string generated so far and its score every 1000 tries.
I was able to implement this part of the challenge, with the following code:
(I am new to python)
import random
target = 'methinks it is like a weasel'
target_len = 28
def string_generate(strlen):
alphabet = 'abcdefghijklmnopqrstuvwxyz ' #26 letters of the alphabet + space
res = ''
for i in range(strlen):
res += alphabet[random.randrange(27)]
return res
def score_check(target,strlen):
score = 0
res = string_generate(strlen)
for i in range(strlen):
if res[i] == target[i]:
score += 1
return score, res
def progress_check():
counter = 0
score = 0
res = ''
while score != 28:
score_temp, res_temp = score_check(target, target_len)
counter += 1
if score_temp > score:
score, res = score_temp, res_temp
print(res, score)
else:
score, res = score, res
return res, score
progress_check()
It then have the following extra challenge:
Self Check Challenge
See if you can improve upon the program in the self check by keeping
letters that are correct and only modifying one character in the best
string so far. This is a type of algorithm in the class of ‘hill
climbing’ algorithms, that is we only keep the result if it is better
than the previous one.
However, I am not able to figure out what this hill climbing algorithim is, and how I would implement it into my existing piece of code.
Please explain how to implement this hill climbing algorithm, thank you all so much!

Optimize permutations search loop (can't use itertools) that is extremely slow. Any suggestions?

This is a game where you have 12 cards and you pick you until you choose 3 from the same group. I am attempting to find the probability of choosing each group. The script that I have created works, but it is extremely slow. My coworker created a similar script in R without the functions and his script takes 1/100th the time that mine takes. I am just trying to figure out why. Any ideas would be greatly appreciated.
from collections import Counter
import pandas as pd
from datetime import datetime
weight = pd.read_excel('V01Weights.xlsx')
Weight looks like the following:
Symb Weight
Grand 170000
Grand 170000
Grand 105
Major 170000
Major 170000
Major 215
Minor 150000
Minor 150000
Minor 12000
Bonus 105000
Bonus 105000
Bonus 105000
Max Picks represents the total number of different "cards". Total Picks represents the max number of user choices. This is because after 8 choices, you are guaranteed to have 2 of each type so on the 9th pick, you are guaranteed to have 3 matching.
TotalPicks = 9
MaxPicks = 12
This should have been named PickedProbabilities.
Picks = {0:0,1:0,2:0,3:0}
This is my simple version of the timeit class because I don't like the timeit class
def Time_It(function):
start =datetime.now()
x = function()
finish = datetime.now()
TotalTime = finish - start
Minutes = int(TotalTime.seconds/60)
Seconds = TotalTime.seconds % 60
print('It took ' + str(Minutes) + ' minutes and ' + str(Seconds) + ' seconds')
return(x)
Given x(my picks in order) I find the probability. These picks are done without replacement
def Get_Prob(x,weight):
prob = 1
weights = weight.iloc[:,1]
for index in x:
num = weights[index]
denom = sum(weights)
prob *= num/denom
weights.drop(index, inplace = True)
# print(weights)
return(prob)
This is used to determine if there are duplicates in my loop because that is not allowed
def Is_Allowed(x):
return(len(x) == len(set(x)))
This determines if a win is present in all of the cards present thus far.
def Is_Win(x):
global Picks
WinTypes = [[0,1,2],[3,4,5],[6,7,8],[9,10,11]]
IsWin = False
for index,item in enumerate(WinTypes):
# print(index)
if set(item).issubset(set(x)):
IsWin = True
Picks[index] += Get_Prob(x,weight)
# print(Picks[index])
print(sum(Picks.values()))
break
return(IsWin)
This is my main function that cycles through all of the cards. I attempted to do this using recursion but I eventually gave up. I can't use itertools to create all of the permutations because for example [0,1,2,3,4] will be created by itertools but this is not possible because once you get 3 matching, the game ends.
def Cycle():
for a in range(MaxPicks):
x = [a]
for b in range(MaxPicks):
x = [a,b]
if Is_Allowed(x):
for c in range(MaxPicks):
x = [a,b,c]
if Is_Allowed(x):
if Is_Win(x):
# print(x)
continue
for d in range(MaxPicks):
x = [a,b,c,d]
if Is_Allowed(x):
if Is_Win(x):
# print(x)
continue
for e in range(MaxPicks):
x = [a,b,c,d,e]
if Is_Allowed(x):
if Is_Win(x):
continue
for f in range(MaxPicks):
x = [a,b,c,d,e,f]
if Is_Allowed(x):
if Is_Win(x):
continue
for g in range(MaxPicks):
x = [a,b,c,d,e,f,g]
if Is_Allowed(x):
if Is_Win(x):
continue
for h in range(MaxPicks):
x = [a,b,c,d,e,f,g,h]
if Is_Allowed(x):
if Is_Win(x):
continue
for i in range(MaxPicks):
if Is_Allowed(x):
if Is_Win(x):
continue
Calls the main function
x = Time_It(Cycle)
print(x)
writes the probabilities to a text file
with open('result.txt','w') as file:
# file.write(pickle.dumps(x))
for item in x:
file.write(str(item) + ',' + str(x[item]) + '\n')
My coworker created a similar script in R without the functions and his script takes 1/100th the time that mine takes.
Two easy optimizations:
1) In-line the function calls like Is_Allowed() because Python have a lot of function call overhead (such as creating a new stackframe and argument tuples).
2) Run the code in using pypy which is really good at optimizing functions like this one.
Ok, this time I hope I got your problem right:)
There are two insights (I guess you have them, just for the sake of the completeness) needed in order to speed up your program algorithmically:
The probabilities for the sequence (card_1, card_2) and (card_2, card_1) are not equal, so we cannot use the results from the urn problem, and it looks like we need to try out all permutations.
However, given a set of cards we picked so far, we don't really need the information in which sequence they where picked - it is all the same for the future course of the game. So it is enough to use dynamic programming and calculate the probabilities for every subset to be traversed during the game (thus we need to check 2^N instead of N! states).
For a set of picked cards set the probability to pick a card i in the next turn is:
norm:=sum Wi for i in set
P(i|set)=Wi/norm if i not in set else 0.0
The recursion for calculating P(set) - the probability that a set of picked card occured during the game is:
set_without_i:=set/{i}
P(set)=sum P(set_without_i)*P(i|set_without_i) for i in set
However this should be done only for set_without_i for which the game not ended yet, i.e. no group has 3 cards picked.
This can be done by means of recursion+memoization or, as my version does, by using bottom-up dynamic programming. It also uses binary representation of integers for representations of sets and (most important part!) returns the result almost instantly [('Grand', 0.0014104762718021384), ('Major', 0.0028878988709489244), ('Minor', 0.15321793072867956), ('Bonus', 0.84248369412856905)]:
#calculates probability to end the game with 3 cards of a type
N=12
#set representation int->list
def decode_set(encoded):
decoded=[False]*N
for i in xrange(N):
if encoded&(1<<i):
decoded[i]=True
return decoded
weights = [170000, 170000, 105, 170000, 170000, 215, 150000, 150000, 12000, 105000, 105000, 105000]
def get_probs(decoded_set):
denom=float(sum((w for w,is_taken in zip(weights, decoded_set) if not is_taken)))
return [w/denom if not is_taken else 0.0 for w,is_taken in zip(weights, decoded_set)]
def end_group(encoded_set):
for i in xrange(4):
whole_group = 7<<(3*i) #7=..000111, 56=00111000 and so on
if (encoded_set & whole_group)==whole_group:
return i
return None
#MAIN: dynamic program:
MAX=(1<<N)#max possible set is 1<<N-1
probs=[0.0]*MAX
#we always start with the empty set:
probs[0]=1.0
#building bottom-up
for current_set in xrange(MAX):
if end_group(current_set) is None: #game not ended yet!
decoded_set=decode_set(current_set)
trans_probs=get_probs(decoded_set)
for i, is_set in enumerate(decoded_set):
if not is_set:
new_set=current_set | (1<<i)
probs[new_set]+=probs[current_set]*trans_probs[i]
#filtering wins:
group_probs=[0.0]*4
for current_set in xrange(MAX):
group_won=end_group(current_set)
if group_won is not None:
group_probs[group_won]+=probs[current_set]
print zip(["Grand", "Major", "Minor", "Bonus"], group_probs)
Some explanation of the "tricks" used in code:
A pretty standard trick is to use integer's binary representation to encode a set. Let's say we have objects [a,b,c], so we could represent the set {b,c} as 110, which would mean a (first in the list corresponds to 0- the lowest digit) - not in the set, b(1) in the set, c(1) in the set. However, 110 read as integer it is 6.
The current_set - for loop simulates the game and best understood while playing. Let's play with two cards [a,b] with weights [2,1].
We start the game with an empty set, 0 as integer, so the probability vector (given set, its binary representation and as integer mapped onto probability):
probs=[{}=00=0->1.0, 01={a}=1->0.0, {b}=10=2->0.0, {a,b}=11=3->0.0]
We process the current_set=0, there are two possibilities 66% to take card a and 33% to take cardb, so the probabilities become after the processing:
probs=[{}=00=0->1.0, 01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->0.0]
Now we process the current_set=1={a} the only possibility is to take b so we will end with set {a,b}. So we need to update its (3={a,b}) probability via our formula and we get:
probs=[{}=00=0->1.0, 01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->0.66]
In the next step we process 2, and given set {b} the only possibility is to pick card a, so probability of set {a,b} needs to be updated again
probs=[{}=00=0->1.0, 01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->1.0]
We can get to {a,b} on two different paths - this could be seen in our algorithm. The probability to go through set {a,b} at some point in our game is obviously 1.0.
Another important thing: all paths that leads to {a,b} are taken care of before we process this set (it would be the next step).
Edit: I misunderstood the original problem, the here presented solution is for the following problem:
Given 4 groups with 3 different cards with a different score for every card, we pick up cards as long as we don't have picked 3 cards from the same group. What is the expected score(sum of scores of picked cards) in the end of the game.
I leave the solution as it is, because it was such a joy to work it out after so many probability-theory-less years and I just cannot delete it:)
See my other answer for handling of the original problem
There are two possibilities to improve the performance: making the code faster (and before starting this, one should profile in order to know which part of the program should be optimized, otherwise the time is spent optimizing things that don't count) or improving the algorithm. I propose to do the second.
Ok, this problem seems to be more complex as at the first site. Let's start with some observations.
All you need to know is the expected number of the picked cards at the end of the game:
If Pi is the probability that the card i is picked somewhere during the game, then we are looking for the expected value of the score E(Score)=P1*W1+P2*W2+...Pn*Wn. However, if we look at the cards of a group, we can state that because of the symmetry the probabilities for the cards of this group are the same, e.g. P1=P2=P3=:Pgrand in your case. Thus our expectation can be calculated:
E(Score)=3*Pgrand*(W1+W2+W3)/3+...+3*Pbonus*(W10+W11+W12)/3
We call averageWgrand:=(W1+W2+W3)/3 and note that E(#grand)=3*Pgrand - the expected number of picked grand card at the end of the game. With this our formula becomes:
E(Score)=E(#grand)*averageWgrand+...+E(#bonus)*averageWbonus
In your example we can go even further: The number of cards in every group is equal, so because of the symmetry we can claim: E(#grand)=E(#minor)=E(#major)=E(#grand)=:(E#group). For the sake of simplicity, in the following we consider only this special case (but the outlined solution could be extended also to the general case). This lead to the following simplification:
E(Score)=4*E(#group)(averageWgrand+...+averageWbonus)/4
We call averageW:=(averageWgrand+...+averageWbonus)/4 and note that E(#cards)=4*E(#grand) is the expected number of picked card at the end of the game.
Thus, E(Score)=E(#cards)*averageW, so our task is reduced to calculating the expected value of the number of cards at the end of the game:
E(#cards)=P(1)*1+P(2)*2+...P(n)*n
where P(i) denotes the probability, that the game ends with exact i cards. The probabilities P(1),P(2) and P(k), k>9 are easy to see - they are 0.
Calculation of the probability of ending the game with i picked cards -P(i):
Let's play a slightly different game: we pick exactly i cards and win if and only if:
There is exactly one group with 3 cards picked. We call this group full_group.
The last picked (i-th) card was from the full_group.
It is easy to see, that the probability to win this game P(win) is exactly the probability we are looking for - P(i). Once again we can use the symmetry, because all groups are equal (P(win, full=grand) means the probability that we what and that the full_group=grand):
P(win)=P(win, grand)+P(win, minor)+P(win, major)+P(win, bonus)
=4*P(win, grand)
P(win, grand) is the probability that:
after picking i-1 cards the number of picked grand cards is 2, i.e. `#grand=2' and
after picking i-1 cards, for every group the number of picked cards is less than 3 and
we pick a grand-card in the last round. Given the first two constraints hold, this (conditional) probability is 1/(n-i+1) (there are n-i+1 cards left and only one of them is "right").
From the urn problem we know the probability for
P(#grand=u, #minor=x, #major=y, #bonus=z) = binom(3,u)*binom(3,x)*binom(3,y)*binom(3,z)/binom(12, u+x+y+z)
with binom(n,k)=n!/k!/(n-k)!. Thus P(win, grand) can be calculated as:
P(win, grand) = 1/(n-i+1)*sum P(#grand=2, #minor=x, #major=y, #bonus=z)
where x<=2, y<=2, z<=2 and 2+x+y+z=i-1
And now the code:
import math
def binom(n,k):
return math.factorial(n)//math.factorial(k)//math.factorial(n-k)
#expected number of cards:
n=12 #there are 12 cards
probs=[0]*n
for minor in xrange(3):
for major in xrange(3):
for bonus in xrange(3):
i = 3 + minor +major +bonus
P_urn = binom(3,2)*binom(3,minor)*binom(3,major)*binom(3,bonus)/float(binom(n, n-i+1))
P_right_last_card = 1.0/(n-i+1)
probs[i]+=4*P_urn*P_right_last_card #factor 4 from symmetry
print "Expected number of cards:", sum((prob*card_cnt for card_cnt, prob in enumerate(probs)))
As result I get 6.94285714286 as the expected number of cards in the end of the game. And very fast - almost instantly. Not sure whether it is right though...
Conclusion:
Obviously, if you like to handle a more general case (more groups, number cards in a group different) you have to extend the code (recursion, memoization of binom) and the theory.
But the most crucial part: with this approach you (almost) don't care in which order the cards were picked - and thus the number of states you have to inspect is down by factor of (k-1)! where k is the maximal possible number of cards in the end of the game. In your example k=9 and thus the approach is faster by factor 40000 (I don't even consider the speed-up from the exploited symmetry, because it might not be possible in general case).

average value of inputs in python

I am in intro programming class and I just started learning the loops. My question asks about the average value of the inputs and it supposed to be done with while-loops. When the user inputs end instead of a number, that signifies the end of the loop and calculates the average value. The program should print the average value and return it. Also, if the user does not input any number and directly inputs end, the program should print "No numbers were entered". I tried creating the loop, but I do not know what I am missing for my loop to be running.
Also, I am not allowed to use any inbuilt functions like sum, ave, etc.
Here is the code I've written so far
def avgOfTheSum():
total = 0
avg = 0
count = 0
while True:
number = input("Enter next number:")
if number != 'end':
num = float(number)
total = total + num
count = count + 1
average = total/count
else:
print("The Average is:",average)
break
return average
a few tips from my not-so-experienced point of view :
when you increment a variable, eg. 'total = total + num', you can do so with a more compact way : use 'total += num' which does exactly the same thing and lightens your code. Some people find it ugly though, so use it if you will.
You first declared a variable named 'avg' but you then later use 'average', which leads to an error when trying to print 'average' which was not defined because the first 'if' statement was bypassed.
You should use one naming for the average. Either 'avg' or 'average' is okay but remember your code must be easy to understand so try not to squeeze things too much, especially if someone is reviewing it when you are finished.
Use one name and stick to it. That way you don't have an error when the user inputs something that isnt handled by your code.
You could add safe nets to ensure the user passes a number but the most simple ways need to use python built-ins.
You could add something like (not python do not write it like so)
if count = 0
then print 'no numbers entered'
then either :
break if you want to quit the application
or pass if you want to force the user to enter a number (enforcing a new loop)
Hope it helped you a little bit !

Randomly SELECTing rows based on certain criteria

I'm building a media player for the office, and so far so good but I want to add a voting system (kinda like Pandora thumbs up/thumbs down)
To build the playlist, I am currently using the following code, which pulls 100 random tracks that haven't been played recently (we make sure all tracks have around the same play count), and then ensures we don't hear the same artist within 10 songs and builds a playlist of 50 songs.
max_value = Items.select(fn.Max(Items.count_play)).scalar()
query = (Items
.select()
.where(Items.count_play < max_value, Items.count_skip_vote < 5)
.order_by(fn.Rand()).limit(100))
if query.count < 1:
max_value = max_value - 1
query = (Items
.select()
.where(Items.count_play < max_value, Items.count_skip_vote < 5)
.order_by(fn.Rand()).limit(100))
artistList = []
playList = []
for item in query:
if len(playList) is 50:
break
if item.artist not in artistList:
playList.append(item.path)
if len(artistList) < 10:
artistList.append(item.artist)
else:
artistList.pop(0)
artistList.append(item.artist)
for path in playList:
client.add(path.replace("/music/Library/",""))
I'm trying to work out the best way to use the up/down votes.
I want to see less with downvotes and more with upvotes.
I'm not after direct code because I'm pretty OK with python, it's more of the logic that I can't quite nut out (that being said, if you feel the need to improve my code, I won't stop you :) )
Initially give each track a weight w, e.g. 10 - a vote up increases this, down reduces it (but never to 0). Then when deciding which track to play next:
Calculate the total of all the weights, generate a random number between 0 and this total, and step through the tracks from 0-49 adding up their w until you exceed the random number. play that track.
The exact weighting algorithm (e.g. how much an upvote/downvote changes w) will of course affect how often tracks (re)appear. Wasn't it Apple who had to change the 'random' shuffle of their early iPod because it could randomly play the same track twice (or close enough together for a user to notice) so they had to make it less random, which I presume means also changing the weighting by how recently the track was played - in that case the time since last play would also be taken into account at the time of choosing the next track. Make sure you cover the end cases where everyone downvotes 49 (or all 50 if they want silence) of the tracks. Or maybe that's what you want...

Categories