Obtain %cpu usage: is this script correct? - python

I found this python script which seems to do the job, but I don't know if it is really correct and I can't explain myself that 100- in the last lines.
Theory behind it is clear: you sum up user,system and i/o time spent by the cpu, and you divide it for the same sum plus the idle time. This would give you a % of the cpu load.
I don't need a 100% accurate measurement, but just some hint about the real %cpu usage.
import time
TIMEFORMAT = "%m/%d/%y %H:%M:%S"
INTERVAL = 2
def getTimeList():
statFile = file("/proc/stat", "r")
timeList = statFile.readline().split(" ")[2:6]
statFile.close()
for i in range(len(timeList)) :
timeList[i] = int(timeList[i])
return timeList
def deltaTime(interval) :
x = getTimeList()
time.sleep(interval)
y = getTimeList()
for i in range(len(x)) :
y[i] -= x[i]
return y
if __name__ == "__main__" :
while True :
dt = deltaTime(INTERVAL)
timeStamp = time.strftime(TIMEFORMAT)
cpuPct = 100 - (dt[len(dt) - 1] * 100.00 / sum(dt)) #why 100 - ?
print timeStamp + "\t" + str('%.4f' %cpuPct)

The 100 is 100%. The expression yields time idle, you want time busy. Subtract from 100.

Related

Python multiprocessing: how to create x number of processes and get return value back

I have a program that I created using threads, but then I learned that threads don't run concurrently in python and processes do. As a result, I am trying to rewrite the program using multiprocessing, but I am having a hard time doing so. I have tried following several examples that show how to create the processes and pools, but I don't think it's exactly what I want.
Below is my code with the attempts I have tried. The program tries to estimate the value of pi by randomly placing points on a graph that contains a circle. The program takes two command-line arguments: one is the number of threads/processes I want to create, and the other is the total number of points to try placing on the graph (N).
import math
import sys
from time import time
import concurrent.futures
import random
import multiprocessing as mp
def myThread(arg):
# Take care of imput argument
n = int(arg)
print("Thread received. n = ", n)
# main calculation loop
count = 0
for i in range (0, n):
x = random.uniform(0,1)
y = random.uniform(0,1)
d = math.sqrt(x * x + y * y)
if (d < 1):
count = count + 1
print("Thread found ", count, " points inside circle.")
return count;
# end myThread
# receive command line arguments
if (len(sys.argv) == 3):
N = sys.argv[1] # original ex: 0.01
N = int(N)
totalThreads = sys.argv[2]
totalThreads = int(totalThreads)
print("N = ", N)
print("totalThreads = ", totalThreads)
else:
print("Incorrect number of arguments!")
sys.exit(1)
if ((totalThreads == 1) or (totalThreads == 2) or (totalThreads == 4) or (totalThreads == 8)):
print()
else:
print("Invalid number of threads. Please use 1, 2, 4, or 8 threads.")
sys.exit(1)
# start experiment
t = int(time() * 1000) # begin run time
total = 0
# ATTEMPT 1
# processes = []
# for i in range(totalThreads):
# process = mp.Process(target=myThread, args=(N/totalThreads))
# processes.append(process)
# process.start()
# for process in processes:
# process.join()
# ATTEMPT 2
#pool = mp.Pool(mp.cpu_count())
#total = pool.map(myThread, [N/totalThreads])
# ATTEMPT 3
#for i in range(totalThreads):
#total = total + pool.map(myThread, [N/totalThreads])
# p = mp.Process(target=myThread, args=(N/totalThreads))
# p.start()
# ATTEMPT 4
# with concurrent.futures.ThreadPoolExecutor() as executor:
# for i in range(totalThreads):
# future = executor.submit(myThread, N/totalThreads) # start thread
# total = total + future.result() # get result
# analyze results
pi = 4 * total / N
print("pi estimate =", pi)
delta_time = int(time() * 1000) - t # calculate time required
print("Time =", delta_time, " milliseconds")
I thought that creating a loop from 0 to totalThreads that creates a process for each iteration would work. I also wanted to pass in N/totalThreads (to divide the work), but it seems that processes take in an iterable list rather than an argument to pass to the method.
What is it I am missing with multiprocessing? Is it at all possible to even do what I want to do with processes?
Thank you in advance for any help, it is greatly appreciated :)
I have simplified your code and used some hard-coded values which may or may not be reasonable.
import math
import concurrent.futures
import random
from datetime import datetime
def myThread(arg):
count = 0
for i in range(0, arg[0]):
x = random.uniform(0, 1)
y = random.uniform(0, 1)
d = math.sqrt(x * x + y * y)
if (d < 1):
count += 1
return count
N = 10_000
T = 8
_start = datetime.now()
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = {executor.submit(myThread, (int(N / T),)): _ for _ in range(T)}
total = 0
for future in concurrent.futures.as_completed(futures):
total += future.result()
_end = datetime.now()
print(f'Estimate for PI = {4 * total / N}')
print(f'Run duration = {_end-_start}')
A typical output on my machine looks like this:-
Estimate for PI = 3.1472
Run duration = 0:00:00.008895
Bear in mind that the number of threads you start is effectively managed by the ThreadPoolExecutor (TPE) [ when constructed with no parameters ]. It makes decisions about the number of threads that can run based on your machine's processing capacity (number of cores etc). Therefore you could, if you really wanted to, set T to a very high number and the TPE will block execution of any new threads until it determines that there is capacity.

python-datetime outputting -1 day instead of desired output

So, my goal is to change this output from datetime:
time left: -1 day, 23:57:28
To this:
time left: 0:00:30
Now, this needs to be dynamic, as the code is supposed to be changed in the dictionary. I'm trying to figure out why it is outputting with
-1 day, 23:57:28
I've tried moving where it executes and even changing some other code. I just don't understand why it's showing with -1 day. It seems likes it is executing one too many times
Also, a side note, the purpose of this program is to figure out how many songs can fit into a playlist given a time restraint. I can't seem to figure out the right if statement for it to work. Could someone also help with this?
This is the current output of the program:
0:02:34
0:06:30
Reached limit of 0:07:00
time left: -1 day, 23:57:28
See code below:
import datetime
#durations and names of songs are inputted here
timeDict = {
'Song1' : '2:34',
'Song2' : '3:56',
'Song3' : '3:02'
}
def timeAdder():
#assigns sum to the datetime library's timedelta class
sum = datetime.timedelta()
#sets the limit, can be whatever
limit = '0:07:00'
#calculates the sum
for i in timeDict.values():
(m, s) = i.split(':')
d = datetime.timedelta(minutes=int(m), seconds=int(s))
sum += d
#checks whether the limit has been reached
while str(sum)<limit:
print(sum)
break
#commits the big STOP when limit is reached
if str(sum)>limit:
print("Reached limit of " + limit)
break
#timeLeft variable created as well as datetime object conversion to a string
x = '%H:%M:%S'
timeLeft = datetime.datetime.strptime(limit, x) - datetime.datetime.strptime(str(sum), x)
for i in timeDict:
if timeDict[i] <= str(timeLeft):
print("You can fit " + i + " into your playlist.")
print("time left: " + str(timeLeft))
def main():
timeAdder()
main()
Any help with this would be appreciated.
It seems likes it is executing one too many times
Bingo. The problem is here:
sum += d
...
#commits the big STOP when limit is reached
if str(sum)>limit:
print("Reached limit of " + limit)
break
You are adding to your sum right away, and then checking whether it has passed the limit. Instead, you need to check whether adding to the sum will pass the limit before you actually add it.
Two other things: first, sum is a Python keyword, so you don't want to use it as a variable name. And second, you never want to compare data as strings, you will get weird behavior. Like:
>>> "0:07:30" > "2:34"
False
So all of your times should be timedelta objects.
Here is new code:
def timeAdder():
#assigns sum to the datetime library's timedelta class
sum_ = datetime.timedelta()
#sets the limit, can be whatever
limit = '0:07:00'
(h, m, s) = (int(i) for i in limit.split(":"))
limitDelta = datetime.timedelta(hours=h, minutes=m, seconds=s)
#calculates the sum
for i in timeDict.values():
(m, s) = i.split(':')
d = datetime.timedelta(minutes=int(m), seconds=int(s))
if (sum_ + d) > limitDelta:
print("Reached limit of " + limit)
break
# else, loop continues
sum_ += d
print(sum_)
timeLeft = limitDelta - sum_
for songName, songLength in timeDict.items():
(m, s) = (int(i) for i in songLength.split(':'))
d = datetime.timedelta(minutes=m, seconds=s)
if d < timeLeft:
print("You can fit " + songName + " into your playlist.")
print("time left: " + str(timeLeft))
Demo

print random data for each millisecond python

I want to print random data ranging from -1 to 1 in csv file for each millisecond using Python. I started with to print for each second and it worked. But, I am facing difficulty with printing random data for each millisecond. I want the timestamp to be in UNIX epoch format like "1476449030.55676" (for milliseconds, decimal point is not required)
tstep = datetime.timedelta(milliseconds=1)
tnext = datetime.datetime.now() + tstep
NumberOfReadings = 10; # 10 values ( 1 value for 1 millisecond)
i = 0;
f = open(sys.argv[1], 'w+')
try:
writer = csv.writer(f)
while i < NumberOfReadings:
writer.writerow((random.uniform(-1, 1), time.time()))
tdiff = tnext - datetime.datetime.now()
time.sleep(float(tdiff.total_seconds()/1000))
tnext = tnext + tstep
i =i+1;
finally:
f.close()
UPD: time.sleep() accepts argument in seconds, so you don't need to divide it by 1000. After fixing this, my output looks like this:
0.18153176446804853,1476466290.720721
-0.9331178681567136,1476466290.721784
-0.37142653326337327,1476466290.722779
0.1397040393287503,1476466290.723766
0.7126280853504974,1476466290.724768
-0.5367844384018245,1476466290.725762
0.44284645253432786,1476466290.726747
-0.2914685960956531,1476466290.727744
-0.40353712249981943,1476466290.728778
0.035369003158632895,1476466290.729771
Which is as good as it gets, given the precision of time.sleep and other time functions.
Here's a stripped down version, which outputs timestamps into stdout every second:
import time
tstep = 0.001
tnext = time.time() + tstep
NumberOfReadings = 10; # 10 values ( 1 value for 1 millisecond)
for i in range(NumberOfReadings):
now = time.time()
print(now)
time.sleep(tnext - now)
tnext += tstep
================================================
This is the problem:
float(tdiff.total_seconds()/1000)
You use integer division, and then convert result to float.
Instead, you need to use float division:
tdiff.total_seconds()/1000.0

Track and display percentage of code already executed

I have a very large code that takes some time to run. In order to make sure the process hasn't stalled somewhere I print to screen the percentage of the code that has already been executed, which depends on a for loop and an integer.
To display the percentage of the for loop already processed I use flags to indicate how much of the loop already passed.
The MWE might make it a bit more clear:
import time
N = 100
flag_15, flag_30, flag_45, flag_60, flag_75, flag_90 = False, False,\
False, False, False, False
for i in range(N):
# Large block of code.
time.sleep(0.1)
if i + 1 >= 0.15 * N and flag_15 is False:
print '15%'
flag_15 = True
elif i + 1 >= 0.3 * N and flag_30 is False:
print '30%'
flag_30 = True
elif i + 1 >= 0.45 * N and flag_45 is False:
print '45%'
flag_45 = True
elif i + 1 >= 0.6 * N and flag_60 is False:
print '60%'
flag_60 = True
elif i + 1 >= 0.75 * N and flag_75 is False:
print '75%'
flag_75 = True
elif i + 1 >= 0.9 * N and flag_90 is False:
print '90%'
flag_90 = True
elif i + 1 == N:
print '100%'
This works but is quite verbose and truly ugly. I was wondering if there might be a better/prettier way of doing this.
I like to use modulus to periodically print status messages.
import time
N = 100
for i in range(N):
#do work here
if i % 15 == 0:
print "{}% complete".format(int(100 * i / N))
print "100% complete"
Result:
0% complete
15% complete
30% complete
45% complete
60% complete
75% complete
90% complete
100% complete
for values of N other than 100, if you want to print every 15%, you'll have to dynamically calculate the stride instead of just using the literal 15 value.
import time
import math
N = 300
percentage_step = 15
stride = N * percentage_step / 100
for i in range(N):
#do work
if i % stride == 0:
print "{}% complete".format(int(100 * i / N))
(Posting a second answer because this solution uses a completely different technique)
You could create a list of milestone values, and print a message when the percentage complete reaches the lowest value.
milestones = [15, 30, 45, 60, 75, 90, 100]
for i in range(N):
#do work here
percentage_complete = (100.0 * (i+1) / N)
while len(milestones) > 0 and percentage_complete >= milestones[0]:
print "{}% complete".format(milestones[0])
#remove that milestone from the list
milestones = milestones[1:]
Result:
15% complete
30% complete
45% complete
60% complete
75% complete
90% complete
100% complete
Unlike the "stride" method I posted earlier, here you have precise control over which percentages are printed. They don't need to be evenly spaced, they don't need to be divisible by N, they don't even need to be integers! You could do milestones = [math.pi, 4.8, 15.16, 23.42, 99] if you wanted.
You can use combination of write() and flush() for nice ProgressBar:
import sys
import time
for i in range(100):
row = "="*i + ">"
sys.stdout.write("%s\r%d%%" %(row, i + 1))
sys.stdout.flush()
time.sleep(0.1)
sys.stdout.write("\n")
Progress will be displaying like this:
69%====================================================================>
You don't need any flags. You can just print the completion based on the current value of i.
for i in range(N):
# lots of code
print '{0}% completed.'.format((i+1)*100.0/N)
Just add a "\r" in Misha's answer:
import sys
import time
for i in range(100):
row = "="*i + ">"
sys.stdout.write("%s\r %d%%\r" %(row, i + 1))
sys.stdout.flush()
time.sleep(0.1)
sys.stdout.write("\n")
Output:
65%======================================================>
In colab.research.google.com works like this:
import sys
import time
for i in range(100):
row = "="*i + ">"
sys.stdout.write("\r %d%% %s " %( i + 1,row))
sys.stdout.flush()
time.sleep(0.1)
sys.stdout.write("\n")

Algorithm timing in Python

I want to compute how many times my computer can do counter += 1 in one second. A naive approach is the following:
from time import time
counter = 0
startTime = time()
while time() - startTime < 1:
counter += 1
print counter
The problem is time() - startTime < 1 may be considerably more expensive than counter += 1.
Is there a way to make a less "clean" 1 sec sample of my algorithm?
The usual way to time algorithms is the other way around: Use a fixed number of iterations and measure how long it takes to finish them. The best way to do such timings is the timeit module.
print timeit.timeit("counter += 1", "counter = 0", number=100000000)
Note that timing counter += 1 seems rather pointless, though. What do you want to achieve?
Why don't you infer the time instead? You can run something like:
from datetime import datetime
def operation():
counter = 0
tbeg = datetime.utcnow()
for _ in range(10**6):
counter += 1
td = datetime.utcnow() - tbeg
return (td.microseconds + (td.seconds + td.days * 24 * 3600) * 10**6)/10.0**6
def timer(n):
stack = []
for _ in range(n):
stack.append(operation()) # units of musec/increment
print sum(stack) / len(stack)
if __name__ == "__main__":
timer(10)
and get the average elapsed microseconds per increment; I get 0.09 (most likely very inaccurate). Now, it is a simple operation to infer that if I can make one increment in 0.09 microseconds, then I am able to make about 11258992 in one second.
I think the measurements are very inaccurate, but maybe is a sensible approximation?
I have never worked with the time() library, but according to that code I assume it counts seconds, so what if you do the /sec calculations after ctrl+C happens? It would be something like:
#! /usr/bin/env python
from time import time
import signal
import sys
#The ctrl+C interruption function:
def signal_handler(signal, frame):
counts_per_sec = counter/(time()-startTime)
print counts_per_sec
exit(0)
signal.signal(signal.SIGINT, signal_handler)
counter = 0
startTime = time()
while 1:
counter = counter + 1
Of course, it wont be exact because of the time passed between the last second processed and the interruption signal, but the more time you leave the script running, the more precise it will be :)
Here is my approach
import time
m = 0
timeout = time.time() + 1
while True:
if time.time() > timeout:
break
m = m + 1
print(m)

Categories