My test class produces the required plot, but I have to manually stop execution every time - the console continues to show 'Instantiating tests'; can anyone spot why the execution never halts? Any tips on increasing my code's 'Pythonic-ness' would also be appreciated!
(Python 3.5 running in PyCharm CE 2016.2.1 on Mac OS X 10.11.6)
# A program to test various sorting algorithms. It generates random lists of various sizes,
# sorts them, and then tests that:
# a. the list is in ascending order
# b. the set of elements is the same as the original list
# c. record the time taken
import random
import timeit
from unittest import TestCase
import matplotlib.pyplot as plt
from Sorter import insertionsort, mergesort, quicksort
class TestSort(TestCase):
def test_sorting(self):
times_insertionsort = [] # holds the running times for insertion sort
times_quicksort = [] # holds the running times for quick sort
times_mergesort = [] # holds the running times for merge sort
lengths = [] # holds the array lengths
# determine the number of lists to be created
for i in range(0, 13):
# initialise a new empty list
pre_sort = []
# determine the list's length, then populate the list with 'random' data
for j in range(0, i * 100):
pre_sort.append(random.randint(0, 1000))
# record the length of the list
lengths.append(len(pre_sort))
# record the time taken by quicksort to sort the list
start_time = timeit.default_timer()
post_quicksort = quicksort(pre_sort)
finish_time = timeit.default_timer()
times_quicksort.append((finish_time - start_time) * 1000)
# record the time taken by insertionsort to sort the list
start_time = timeit.default_timer()
post_insertionsort = insertionsort(pre_sort)
finish_time = timeit.default_timer()
times_insertionsort.append((finish_time - start_time) * 1000)
# record the time taken by mergesort to sort the list
start_time = timeit.default_timer()
post_mergesort = mergesort(pre_sort)
finish_time = timeit.default_timer()
times_mergesort.append((finish_time - start_time) * 1000)
# check that:
# a. the list is in ascending order
# b. the set of elements is the same as the original list
for k in range(0, len(pre_sort) - 1):
self.assertTrue(post_insertionsort[k] in pre_sort)
self.assertTrue(post_insertionsort[k] <= post_insertionsort[k + 1])
self.assertTrue(post_mergesort[k] == post_insertionsort[k])
self.assertTrue(post_mergesort[k] == post_quicksort[k])
# plot the results
plt.plot(lengths, times_insertionsort, 'r')
plt.plot(lengths, times_quicksort, 'g')
plt.plot(lengths, times_mergesort, 'b')
plt.xlabel('List size')
plt.ylabel('Execution time (ms)')
plt.show()
You need to close the window where the plot is shown in order for the script (your test in this case) to continue/complete/finish. From the docs, emphasis mine:
When you want to view your plots on your display, the user interface backend will need to start the GUI mainloop. This is what show() does. It tells matplotlib to raise all of the figure windows created so far and start the mainloop. Because this mainloop is blocking by default (i.e., script execution is paused), you should only call this once per script, at the end. Script execution is resumed after the last window is closed. Therefore, if you are using matplotlib to generate only images and do not want a user interface window, you do not need to call show (see Generate images without having a window appear and What is a backend?).
Or don't use show() for the tests and just generate an image which you can verify against later.
I had the same error in Django. I found one of the migrations weren't applied.
Related
Assume having a simple application running on an average machine (4vCore ~3GHz) implementing the following code:
import time
for i in range(99):
print(time.time())
Which will generate a series of high precision timestamps
1630912866.957683
1630912866.957693
1630912866.957696
1630912866.9576986
1630912866.9577005
1630912866.9577026
1630912866.9577048
1630912866.9577065
1630912866.9577084
1630912866.9577103
What is the probability of these timestamps colliding and what are the factors influencing it?
You can get the frequence of it happening by running this code at the beginning of your program, or at random intervals, preferably in another thread to do not interrupt your program consumption:
import time
number = 1000000
l = []
prev = time.time()
for x in range(number):
if time.time() != prev:
l.append(time.time())
prev = time.time()
frequence = len(l)/number
Just keep in mind that the framerate will change according to the computer you are using, the application requests, even the other running applications...
I was wondering if there is simple way to measure the memory used to execute a function similar to the way you can time a function.
For example, I can time a function and append the elapsed time to a list by doing the following:
from timeit import default_timer as timer
timePerTenRuns = []
# Loop through numRuns to generate an average
#
for i in range(0, len(numRuns)):
# Generate random data
#
randList = getRand(k, randStart, k/randDivisor)
# Time the function
#
start = timer()
# Execute function to count duplicates in a random list
#
countDuplicates(randList)
# Take the time at function's end
#
end = timer()
# Append time for one execution of function
#
timePerTenRuns.append(end-start)
I would like to do the same to calculate and store the memory required to execute that function, then append it to a similar list.
I've read about tools such as memory_profiler, memory_usage, and others, but none have seemed to be able to produce the implementation I am looking for.
Is this possible? Any help would be greatly appreciated.
Best regards,
Ryan
I have a huge list, which is all_entries (currently 80k integer items). In this list contains items I already handled in my overall program.
When my program uses the following method, it usually takes around 30s to get to the return statement. How can I speed this up?
Tip: new_entries is 40k long, so huge as well.
def get_fresh_entries(self, new_entries, all_entries):
"""
:param new_entries: Entries from which some might already be in all_entries.
:param all_entries: Entries already handled and saved.
"""
fresh = []
shuffle(new_entries)
for i in new_entries:
if i not in all_entries:
fresh.append(i)
if len(fresh) > 80000:
break
return fresh
The only problem is the line if i not in all_entries: which is executed for every new entry and tests against up to 80k existing entries.
Here it is important to understand the difference when performig the test on a list or on a set.
testing if an element is in a list is like testing if someone is at home without knowing the address (just the town) and go from door to door.
testing if an element is in a set is like testing if someone is at home, knowing the exact address and ring a single doorbell
So simply converting all_entries to a set once(!) will eliminate the primary speed issue.
...
all_entries_set = set(all_entries)
for i in new_entries:
if i not in all_entries_set:
...
While there are other hints how to speedup your program using a set is the crucial one because it reduces complexity.
A list comprehension will do:
As #Delgan comment, is better if all_entries is a set.
all_entries = set(all_entries)
Then:
fresh = [x for x in new_entries if x not in all_entries]
Also take a look at itertools.ifilter, it is lazyly evaluated:
fresh = itertools.ifilter(lambda x: x not in all_entries, new_entries)
In case you need to keep only the first n data, since itertools is lazy you can just take them like this:
fresh = itertools.islice(itertools.ifilter(lambda x: x not in all_entries,
new_entries),
n))
or with the list comprehension like, but using a generator instead:
fresh = itertools.islice((x for x in new_entries if x not in all_entries), n)
Using set operations can significantly speed up your code. I have defined a new function get_fresh_entries_2 which uses set operations. And at the end I have added a small speed comparison. Using set operations speeds up the process by a huge factor
from random import shuffle
from itertools import compress
from time import time
def get_fresh_entries_2(new_entries, all_entries):
shuffle(new_entries)
diff = set(new_entries)- set(all_entries)
if len(new_entries) > 80000:
ind = [i in diff for i in new_entries[:80000]]
else:
ind = [i in diff for i in new_entries]
fresh = compress(new_entries,ind)
return list(fresh)
def get_fresh_entries(new_entries, all_entries):
"""
:param new_entries: Entries from which some might already be in all_entries.
:param all_entries: Entries already handled and saved.
"""
fresh = []
shuffle(new_entries)
for i in new_entries:
if i not in all_entries:
fresh.append(i)
if len(fresh) > 80000:
break
return fresh
new_entries = np.asarray(np.random.randint(1,11, size = (40000))).tolist()
all_entries = np.asarray(np.random.randint(0,10, size = (80000))).tolist()
t0 = time()
a = get_fresh_entries(new_entries, all_entries)
t1 = time()
b = get_fresh_entries_2(new_entries, all_entries)
t2 = time()
t1-t0 # 4.321316957473755 sec
t2-t1 # 0.005052804946899414 sec
Still a NOOB in Python. Get stuck many times.
Script runs 3 sequencies, one after the other, each for 20 seconds.
Each sequence has a while loop. and a time out statement.
Then it starts the next loop, and so on till the end of end of the
3rd loop. Then it quits. I would like to start again from the top.
I probably have too many while loops.
#!/usr/bin/env python
# Import required libraries
import time
# More setup
# Choose a matrix to use
mat = mat1
t_end = time.time() + 20
#Start loop
while time.time() < t_end:
# code
# loop timeout
# 2 more loops follow just like first one, except matrix becomes
mat = mat2
mat = mat3
As others have already commented, you should do any repetitive tasks within a function. In Python, functions are defined using the "def" keyword. Using a function, it could be done as follows:
import time
# Replace these dummy assignments with real code
mat1 = "blah"
mat2 = "rhubarb"
mat3 = "custard"
def processMatrix(matrix, seconds=20):
t_end = time.time() + seconds
while time.time() < t_end:
pass # 'pass' does nothing - replace with your real code
processMatrix(mat1)
processMatrix(mat2)
processMatrix(mat3)
Note that I've also included the time/seconds as a parameter in the function. This gives you more flexibility, in case you wanted to run for different times for testing or different times for each matrix, etc. However, I've done it with a default value of 20 so that you don't need to include it in the function call. If you do want to override the default you could call, eg,
processMatrix(mat1, 5)
instead of,
processMatrix(mat1)
I have to time the implementation I did of an algorithm in one of my classes, and I am using the time.time() function to do so. After implementing it, I have to run that algorithm on a number of data files which contains small and bigger data sets in order to formally analyse its complexity.
Unfortunately, on the small data sets, I get a runtime of 0 seconds even if I get a precision of 0.000000000000000001 with that function when looking at the runtimes of the bigger data sets and I cannot believe that it really takes less than that on the smaller data sets.
My question is: Is there a problem using this function (and if so, is there another function I can use that has a better precision)? Or am I doing something wrong?
Here is my code if ever you need it:
import sys, time
import random
from utility import parseSystemArguments, printResults
...
def main(ville):
start = time.time()
solution = dynamique(ville) # Algorithm implementation
end = time.time()
return (end - start, solution)
if __name__ == "__main__":
sys.argv.insert(1, "-a")
sys.argv.insert(2, "3")
(algoNumber, ville, printList) = parseSystemArguments()
(algoTime, solution) = main(ville)
printResults(algoTime, solution, printList)
The printResults function:
def printResults(time, solution, printList=True):
print ("Temps d'execution = " + str(time) + "s")
if printList:
print (solution)
The solution to my problem was to use the timeit module instead of the time module.
import timeit
...
def main(ville):
start = timeit.default_timer()
solution = dynamique(ville)
end = timeit.default_timer()
return (end - start, solution)
Don't confuse the resolution of the system time with the resolution of a floating point number. The time resolution on a computer is only as frequent as the system clock is updated. How often the system clock is updated varies from machine to machine, so to ensure that you will see a difference with time, you will need to make sure it executes for a millisecond or more. Try putting it into a loop like this:
start = time.time()
k = 100000
for i in range(k)
solution = dynamique(ville)
end = time.time()
return ((end - start)/k, solution)
In the final tally, you then need to divide by the number of loop iterations to know how long your code actually runs once through. You may need to increase k to get a good measure of the execution time, or you may need to decrease it if your computer is running in the loop for a very long time.