Here are 2 functions that do exactly the same thing, but does anyone know why the one using the count() method is much faster than the other? (I mean how does it work? How is it built?)
If possible, I'd like a more understandable answer than what's found here : Algorithm used to implement the Python str.count function
or what's in the source code : https://hg.python.org/cpython/file/tip/Objects/stringlib/fastsearch.h
def scoring1(seq):
score = 0
for i in range(len(seq)):
if seq[i] == '0':
score += 1
return score
def scoring2(seq):
score = 0
score = seq.count('0')
return score
seq = 'AATTGGCCGGGGAG0CTTC0CTCC000TTTCCCCGGAAA'
# takes 1min15 when applied to 100 sequences larger than 100 000 characters
score1 = scoring1(seq)
# takes 10 sec when applied to 100 sequences larger than 100 000 characters
score2 = scoring2(seq)
Thanks a lot for your reply
#CodeMonkey has already given the answer, but it is potentially interesting to note that your first function can be improved so that it runs about 20% faster:
import time, random
def scoring1(seq):
score=0
for i in range(len(seq)):
if seq[i]=='0':
score+=1
return score
def scoring2(seq):
score=0
for x in seq:
score += (x =='0')
return score
def scoring3(seq):
score = 0
score = seq.count('0')
return score
def test(n):
seq = ''.join(random.choice(['0','1']) for i in range(n))
functions = [scoring1,scoring2,scoring3]
for i,f in enumerate(functions):
start = time.clock()
s = f(seq)
elapsed = time.clock() - start
print('scoring' + str(i+1) + ': ' + str(s) + ' computed in ' + str(elapsed) + ' seconds')
test(10**7)
Typical output:
scoring1: 5000742 computed in 0.9651326495293333 seconds
scoring2: 5000742 computed in 0.7998054195159483 seconds
scoring3: 5000742 computed in 0.03732172598339578 seconds
Both of the first two approaches are blown away by the built-in count().
Moral of the story: when you are not using an already optimized built-in method, you need to optimize your own code.
Because count is executed in the underlying native implementation. The for-loop is executed in slower interpreted code.
Related
So, my goal is to change this output from datetime:
time left: -1 day, 23:57:28
To this:
time left: 0:00:30
Now, this needs to be dynamic, as the code is supposed to be changed in the dictionary. I'm trying to figure out why it is outputting with
-1 day, 23:57:28
I've tried moving where it executes and even changing some other code. I just don't understand why it's showing with -1 day. It seems likes it is executing one too many times
Also, a side note, the purpose of this program is to figure out how many songs can fit into a playlist given a time restraint. I can't seem to figure out the right if statement for it to work. Could someone also help with this?
This is the current output of the program:
0:02:34
0:06:30
Reached limit of 0:07:00
time left: -1 day, 23:57:28
See code below:
import datetime
#durations and names of songs are inputted here
timeDict = {
'Song1' : '2:34',
'Song2' : '3:56',
'Song3' : '3:02'
}
def timeAdder():
#assigns sum to the datetime library's timedelta class
sum = datetime.timedelta()
#sets the limit, can be whatever
limit = '0:07:00'
#calculates the sum
for i in timeDict.values():
(m, s) = i.split(':')
d = datetime.timedelta(minutes=int(m), seconds=int(s))
sum += d
#checks whether the limit has been reached
while str(sum)<limit:
print(sum)
break
#commits the big STOP when limit is reached
if str(sum)>limit:
print("Reached limit of " + limit)
break
#timeLeft variable created as well as datetime object conversion to a string
x = '%H:%M:%S'
timeLeft = datetime.datetime.strptime(limit, x) - datetime.datetime.strptime(str(sum), x)
for i in timeDict:
if timeDict[i] <= str(timeLeft):
print("You can fit " + i + " into your playlist.")
print("time left: " + str(timeLeft))
def main():
timeAdder()
main()
Any help with this would be appreciated.
It seems likes it is executing one too many times
Bingo. The problem is here:
sum += d
...
#commits the big STOP when limit is reached
if str(sum)>limit:
print("Reached limit of " + limit)
break
You are adding to your sum right away, and then checking whether it has passed the limit. Instead, you need to check whether adding to the sum will pass the limit before you actually add it.
Two other things: first, sum is a Python keyword, so you don't want to use it as a variable name. And second, you never want to compare data as strings, you will get weird behavior. Like:
>>> "0:07:30" > "2:34"
False
So all of your times should be timedelta objects.
Here is new code:
def timeAdder():
#assigns sum to the datetime library's timedelta class
sum_ = datetime.timedelta()
#sets the limit, can be whatever
limit = '0:07:00'
(h, m, s) = (int(i) for i in limit.split(":"))
limitDelta = datetime.timedelta(hours=h, minutes=m, seconds=s)
#calculates the sum
for i in timeDict.values():
(m, s) = i.split(':')
d = datetime.timedelta(minutes=int(m), seconds=int(s))
if (sum_ + d) > limitDelta:
print("Reached limit of " + limit)
break
# else, loop continues
sum_ += d
print(sum_)
timeLeft = limitDelta - sum_
for songName, songLength in timeDict.items():
(m, s) = (int(i) for i in songLength.split(':'))
d = datetime.timedelta(minutes=m, seconds=s)
if d < timeLeft:
print("You can fit " + songName + " into your playlist.")
print("time left: " + str(timeLeft))
Demo
I just wrote up code for problem 1.6 String Compression from Cracking the Coding Interview. I am wondering how I can condense this code to make it more efficient. Also, I want to make sure that this code is O(n) because I am not concatenating to a new string.
The problem states:
Implement a method to perform basic string compression using the counts of repeated characters. For example, the string 'aabcccccaaa' would become a2b1c5a3. If the "compressed" string would not become smaller than the original string, your method should return the original string. You can assume the string has only uppercase and lowercase letters (a - z).
My code works. My first if statement after the else checks to see if the count for the character is 1, and if it is then to just append the character. I do this so when checking the length of the end result and the original string to decide which one to return.
import string
def stringcompress(str1):
res = []
d = dict.fromkeys(string.ascii_letters, 0)
main = str1[0]
for char in range(len(str1)):
if str1[char] == main:
d[main] += 1
else:
if d[main] == 1:
res.append(main)
d[main] = 0
main = str1[char]
d[main] += 1
else:
res.append(main + str(d[main]))
d[main] = 0
main = str1[char]
d[main] += 1
res.append(main + str(d[main]))
return min(''.join(res), str1)
Again, my code works as expected and does what the question asks. I just want to see if there are certain lines of code I can take out to make the program more efficient.
I messed around testing different variations with the timeit module. Your variation worked fantastically when I generated test data that did not repeat often, but for short strings, my stringcompress_using_string was the fastest method. As the strings grow longer everything flips upside down, and your method of doing things becomes the fastest, and stringcompress_using_string is the slowest.
This just goes to show the importance of testing under different circumstances. My initial conclusions where incomplete, and having more test data showed the true story about the effectiveness of these three methods.
import string
import timeit
import random
def stringcompress_original(str1):
res = []
d = dict.fromkeys(string.ascii_letters, 0)
main = str1[0]
for char in range(len(str1)):
if str1[char] == main:
d[main] += 1
else:
if d[main] == 1:
res.append(main)
d[main] = 0
main = str1[char]
d[main] += 1
else:
res.append(main + str(d[main]))
d[main] = 0
main = str1[char]
d[main] += 1
res.append(main + str(d[main]))
return min(''.join(res), str1, key=len)
def stringcompress_using_list(str1):
res = []
count = 0
for i in range(1, len(str1)):
count += 1
if str1[i] is str1[i-1]:
continue
res.append(str1[i-1])
res.append(str(count))
count = 0
res.append(str1[i] + str(count+1))
return min(''.join(res), str1, key=len)
def stringcompress_using_string(str1):
res = ''
count = 0
# we can start at 1 because we already know the first letter is not a repition of any previous letters
for i in range(1, len(str1)):
count += 1
# we keep going through the for loop, until a character does not repeat with the previous one
if str1[i] is str1[i-1]:
continue
# add the character along with the number of times it repeated to the final string
# reset the count
# and we start all over with the next character
res += str1[i-1] + str(count)
count = 0
# add the final character + count
res += str1[i] + str(count+1)
return min(res, str1, key=len)
def generate_test_data(min_length=3, max_length=300, iterations=3000, repeat_chance=.66):
assert repeat_chance > 0 and repeat_chance < 1
data = []
chr = 'a'
for i in range(iterations):
the_str = ''
# create a random string with a random length between min_length and max_length
for j in range( random.randrange(min_length, max_length+1) ):
# if we've decided to not repeat by randomization, then grab a new character,
# otherwise we will continue to use (repeat) the character that was chosen last time
if random.random() > repeat_chance:
chr = random.choice(string.ascii_letters)
the_str += chr
data.append(the_str)
return data
# generate test data beforehand to make sure all of our tests use the same test data
test_data = generate_test_data()
#make sure all of our test functions are doing the algorithm correctly
print('showing that the algorithms all produce the correct output')
print('stringcompress_original: ', stringcompress_original('aabcccccaaa'))
print('stringcompress_using_list: ', stringcompress_using_list('aabcccccaaa'))
print('stringcompress_using_string: ', stringcompress_using_string('aabcccccaaa'))
print()
print('stringcompress_original took', timeit.timeit("[stringcompress_original(x) for x in test_data]", number=10, globals=globals()), ' seconds' )
print('stringcompress_using_list took', timeit.timeit("[stringcompress_using_list(x) for x in test_data]", number=10, globals=globals()), ' seconds' )
print('stringcompress_using_string took', timeit.timeit("[stringcompress_using_string(x) for x in test_data]", number=10, globals=globals()), ' seconds' )
The following results where all taken on an Intel i7-5700HQ CPU # 2.70GHz, quad core processor. Compare the different functions within each blockquote, but don't try to cross compare results from one blockquote to another because the size of the test data will be different.
Using long strings
Test data generated with generate_test_data(10000, 50000, 100, .66)
stringcompress_original took 7.346990528497378 seconds
stringcompress_using_list took 7.589927956366313 seconds
stringcompress_using_string took 7.713812443264496 seconds
Using short strings
Test data generated with generate_test_data(2, 5, 10000, .66)
stringcompress_original took 0.40272931026355685 seconds
stringcompress_using_list took 0.1525574881739265 seconds
stringcompress_using_string took 0.13842854253813164 seconds
10% chance of repeating characters
Test data generated with generate_test_data(10, 300, 10000, .10)
stringcompress_original took 4.675965586924492 seconds
stringcompress_using_list took 6.081609410376534 seconds
stringcompress_using_string took 5.887430301813865 seconds
90% chance of repeating characters
Test data generated with generate_test_data(10, 300, 10000, .90)
stringcompress_original took 2.6049783549783547 seconds
stringcompress_using_list took 1.9739111725413099 seconds
stringcompress_using_string took 1.9460854974553605 seconds
It's important to create a little framework like this that you can use to test changes to your algorithm. Often changes that don't seem useful will make your code go much faster, so the key to the game when optimizing for performance is to try out different things, and time the results. I'm sure there are more discoveries that could be found if you play around with making different changes, but it really matters on the type of data you want to optimize for -- compressing short strings vs long strings vs strings that don't repeat as often vs those that do.
The following Python program flips a coin several times, then reports the longest series of heads and tails. I am trying to convert this program into a program that uses functions so it uses basically less code. I am very new to programming and my teacher requested this of us, but I have no idea how to do it. I know I'm supposed to have the function accept 2 parameters: a string or list, and a character to search for. The function should return, as the value of the function, an integer which is the longest sequence of that character in that string. The function shouldn't accept input or output from the user.
import random
print("This program flips a coin several times, \nthen reports the longest
series of heads and tails")
cointoss = int(input("Number of times to flip the coin: "))
varlist = []
i = 0
varstring = ' '
while i < cointoss:
r = random.choice('HT')
varlist.append(r)
varstring = varstring + r
i += 1
print(varstring)
print(varlist)
print("There's this many heads: ",varstring.count("H"))
print("There's this many tails: ",varstring.count("T"))
print("Processing input...")
i = 0
longest_h = 0
longest_t = 0
inarow = 0
prevIn = 0
while i < cointoss:
print(varlist[i])
if varlist[i] == 'H':
prevIn += 1
if prevIn > longest_h:
longest_h = prevIn
print("",longest_h,"")
inarow = 0
if varlist[i] == 'T':
inarow += 1
if inarow > longest_t:
longest_t = inarow
print("",longest_t,"")
prevIn = 0
i += 1
print ("The longest series of heads is: ",longest_h)
print ("The longest series of tails is: ",longest_t)
If this is asking too much, any explanatory help would be really nice instead. All I've got so far is:
def flip (a, b):
flipValue = random.randint
but it's barely anything.
import random
def Main():
numOfFlips=getFlips()
outcome=flipping(numOfFlips)
print(outcome)
def getFlips():
Flips=int(input("Enter number if flips:\n"))
return Flips
def flipping(numOfFlips):
longHeads=[]
longTails=[]
Tails=0
Heads=0
for flips in range(0,numOfFlips):
flipValue=random.randint(1,2)
print(flipValue)
if flipValue==1:
Tails+=1
longHeads.append(Heads) #recording value of Heads before resetting it
Heads=0
else:
Heads+=1
longTails.append(Tails)
Tails=0
longestHeads=max(longHeads) #chooses the greatest length from both lists
longestTails=max(longTails)
return "Longest heads:\t"+str(longestHeads)+"\nLongest tails:\t"+str(longestTails)
Main()
I did not quite understand how your code worked, so I made the code in functions that works just as well, there will probably be ways of improving my code alone but I have moved the code over to functions
First, you need a function that flips a coin x times. This would be one possible implementation, favoring random.choice over random.randint:
def flip(x):
result = []
for _ in range(x):
result.append(random.choice(("h", "t")))
return result
Of course, you could also pass from what exactly we are supposed to take a choice as a parameter.
Next, you need a function that finds the longest sequence of some value in some list:
def longest_series(some_value, some_list):
current, longest = 0, 0
for r in some_list:
if r == some_value:
current += 1
longest = max(current, longest)
else:
current = 0
return longest
And now you can call these in the right order:
# initialize the random number generator, so we get the same result
random.seed(5)
# toss a coin a hundred times
series = flip(100)
# count heads/tails
headflips = longest_series('h', series)
tailflips = longest_series('t', series)
# print the results
print("The longest series of heads is: " + str(headflips))
print("The longest series of tails is: " + str(tailflips))
Output:
>> The longest series of heads is: 8
>> The longest series of heads is: 5
edit: removed the flip implementation with yield, it made the code weird.
Counting the longest run
Let see what you have asked for
I'm supposed to have the function accept 2 parameters: a string or list,
or, generalizing just a bit, a sequence
and a character
again, we'd speak, generically, of an item
to search for. The function should return, as the value of the
function, an integer which is the longest sequence of that character
in that string.
My implementation of the function you are asking for, complete of doc
string, is
def longest_run(i, s):
'Counts the longest run of item "i" in sequence "s".'
c, m = 0, 0
for el in s:
if el==i:
c += 1
elif c:
m = m if m >= c else c
c = 0
return m
We initialize c (current run) and m (maximum run so far) to zero,
then we loop, looking at every element el of the argument sequence s.
The logic is straightforward but for elif c: whose block is executed at the end of a run (because c is greater than zero and logically True) but not when the previous item (not the current one) was not equal to i. The savings are small but are savings...
Flipping coins (and more...)
How can we simulate flipping n coins? We abstract the problem and recognize that flipping n coins corresponds to choosing from a collection of possible outcomes (for a coin, either head or tail) for n times.
As it happens, the random module of the standard library has the exact answer to this problem
In [52]: random.choices?
Signature: choices(population, weights=None, *, cum_weights=None, k=1)
Docstring:
Return a k sized list of population elements chosen with replacement.
If the relative weights or cumulative weights are not specified,
the selections are made with equal probability.
File: ~/lib/miniconda3/lib/python3.6/random.py
Type: method
Our implementation, aimed at hiding details, could be
def roll(n, l):
'''Rolls "n" times a dice/coin whose face values are listed in "l".
E.g., roll(2, range(1,21)) -> [12, 4] simulates rolling 2 icosahedron dices.
'''
from random import choices
return choices(l, k=n)
Putting this together
def longest_run(i, s):
'Counts the longest run of item "i" in sequence "s".'
c, m = 0, 0
for el in s:
if el==i:
c += 1
elif c:
m = m if m >= c else c
c = 0
return m
def roll(n, l):
'''Rolls "n" times a dice/coin whose face values are listed in "l".
E.g., roll(2, range(1,21)) -> [12, 4] simulates rolling 2 icosahedron dices.
'''
from random import choices
return choices(l, k=n)
N = 100 # n. of flipped coins
h_or_t = ['h', 't']
random_seq_of_h_or_t = flip(N, h_or_t)
max_h = longest_run('h', random_seq_of_h_or_t)
max_t = longest_run('t', random_seq_of_h_or_t)
I am trying to make a program that grabs 5 integers from the user, and then finds the average of them. I have it set up to take in the 5 numbers, but how do I return them all as separate variables so I can use them later on? Thanks!
def main():
x = 0
testScoreNumber = 1
while x < 5:
getNumber_0_100(testScoreNumber)
x += 1
testScoreNumber += 1
calcAverage(score1, score2, score3, score4, score5)
print(calculatedAverage)
def getNumber_0_100(testnumber):
test = int(input("Enter test score " + str(testnumber) + ":"))
testcount = 0
while testcount < 1:
test = int(input("Enter test score " + str(testnumber) + ":"))
if test > 0 or test < 100:
testcount += 1
return test
^Here is the problem, the everytime this function runs, I want it to return a different value to a different variable. Ex. test1, test2, test3.
def calcAverage(_score1,_score2,_score3,_score4,_score5):
total = _score1 + _score2 + _score3 + _score4 + _score5
calculatedAverage = total/5
return calculatedAverage
You need to store the result somewhere. It is usually (always?) a bad idea to dynamically create variable names (although it is possible using globals). The typical place to store the results is in a list or a dictionary -- in this case, I'd use a list.
change this portion of the code:
x = 0
testScoreNumber = 1
while x < 5:
getNumber_0_100(testScoreNumber)
x += 1
testScoreNumber += 1
to:
results = []
for x in range(5):
results.append( getNumber_0_100(x+1) )
which can be condensed even further:
results = [ getNumber_0_100(x+1) for x in range(5) ]
You can then pass that results list to your next function:
avg = get_ave(results[0],results[1],...)
print(avg)
Or, you can use the unpacking operator for shorthand:
avg = get_ave(*results)
print(avg)
It isn't the responsibility of the returning function to say what the caller does with its return value. In your case, it would be simple to let main have a list where it adds the return values. You could do this:
scores = []
for i in range(5):
scores.append(getNumber_0_100(i))
calcAverage(*scores)
Note that *scores is to pass a list as arguments to your calcAverage function. It's probably better to have calculateAverage be a general function which takes a list of values and calculates their average (i.e. doesn't just work on five numbers):
def calcAverage(numbers):
return sum(numbers) / len(numbers)
Then you'd call it with just calcAverage(scores)
A more Pythonic way to write the first part might be scores = [getNumber_0_100(i) for i in range(5)]
Python allows you to return a tuple, and you can unroll this tuple when you receive the return values. For example:
def return_multiple():
# do something to calculate test1, test2, and test3
return (test1, test2, test3)
val1, val2, val3 = return_multiple()
The limitation here though is that you need to know how many variables you're returning. If the number of inputs is variable, you're better off using lists.
I am trying to figure out how to take in a list of numbers and sort them into certain categories such as 0-10, 10-20, 20-30 and up to 90-100 but I have the code started, but the code isn't reading in all the inputs, but only the last one and repeating it. I am stumped, anyone help please?
def eScores(Scores):
count0 = 0
count10 = 0
count20 = 0
count30 = 0
count40 = 0
count50 = 0
count60 = 0
count70 = 0
count80 = 0
count90 = 0
if Scores > 90:
count90 = count90 + 1
if Scores > 80:
count80 = count80 + 1
if Scores > 70:
count70 = count70 + 1
if Scores > 60:
count60 = count60 + 1
if Scores > 50:
count50 = count50 + 1
if Scores > 40:
count40 = count40 + 1
if Scores > 30:
count30 = count30 + 1
if Scores > 20:
count20 = count20 + 1
if Scores > 10:
count10 = count10 + 1
if Scores <= 10:
count0 = count0 + 1
print count90,'had a score of (90 - 100]'
print count80,'had a score of (80 - 90]'
print count70,'had a score of (70 - 80]'
print count60,'had a score of (60 - 70]'
print count50,'had a score of (50 - 60]'
print count40,'had a score of (40 - 50]'
print count30,'had a score of (30 - 40]'
print count20,'had a score of (20 - 30]'
print count10,'had a score of (10 - 20]'
print count0,'had a score of (0 - 10]'
return eScores(Scores)
Each time eScores is called is sets all the counters (count10, count20) back to zero. So only the final call has any effect.
You should either declare the counters as global variables, or put the function into a class and make the counters member variables of the class.
Another problem is that the function calls itself in the return statement:
return eScores(Scores)
Since this function is (as I understand it) supposed to update the counter variables only, it does not need to return anything, let alone call itself recursively. You'd better remove the return statement.
One thing you're making a mistake on is that you're not breaking out of the whole set of if's when you go through. For example, if you're number is 93 it is going to set count90 to 1, then go on to count80 and set that to one as well, and so on until it gets to count10.
Your code is repeating because the function is infintely recursive (it has no stop condition). Here are the relevant bits:
def eScores(Scores):
# ...
return eScores(Scores)
I think what you'd want is more like:
def eScores(Scores):
# same as before, but change the last line:
return
Since you're printing the results, I assume you don't want to return the values of score10, score20, etc.
Also, the function won't accumulate results since you're creating new local counts each time the function is called.
Why don't you just use each number as a key (after processing) and return a dictionary of values?
def eScores(Scores):
return_dict = {}
for score in Scores:
keyval = int(score/10)*10 # py3k automatically does float division
if keyval not in return_dict:
return_dict[keyval] = 1
else:
return_dict[keyval] += 1
return return_dict