How can I reach 100% CPU utilization when using mutliprocessing? - python

I am trying to accelerate my python3 script using multi-processing by running 4 processes simultaneously. However, my processes never reaches 100% CPU utilization. The core of my code simply reads a .mp3 recording and do some recognition on it using scikit-learn then saves the results to a .json.
here is my top output:
top - 17:07:07 up 18 days, 3:31, 4 users, load average: 3.73, 3.67, 3.87
Tasks: 137 total, 1 running, 75 sleeping, 18 stopped, 0 zombie
%Cpu(s): 32.8 us, 20.3 sy, 0.0 ni, 46.3 id, 0.0 wa, 0.0 hi, 0.5 si, 0.1 st
KiB Mem : 8167880 total, 2683088 free, 4314756 used, 1170036 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 3564064 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5832 am 20 0 1887644 776736 24076 S 63.0 9.5 201:10.19 python3
5829 am 20 0 1956336 845556 24348 S 55.0 10.4 200:31.20 python3
5830 am 20 0 2000772 890260 23820 S 55.0 10.9 200:39.80 python3
5834 am 20 0 2430932 1.260g 24252 S 50.3 16.2 200:45.52 python3
4657 am 20 0 108116 4460 3424 S 0.3 0.1 1:11.48 sshd
6564 root 20 0 0 0 0 I 0.3 0.0 7:30.08 kworker/2:1
1 root 20 0 225212 6660 4452 S 0.0 0.1 0:26.33 systemd
......
As you can see in the previous output, there is no heavy load on the memory so the limited CPU utilization cannot be related to I/O or Memory.
Is there anyway to 'force' python to use all 100% ? or how can I debug my code to figure out what is causing this? and if I am missing something obvious, how can I change my code to reach 100% CPU utilization?
Here is a small overview of my main multi-processing code:
# -*- coding: utf-8 -*-
import os
import time
import logging
import cProfile
import multiprocessing as mp
from packages.Recognizer import Recognizer
from packages.RecordingFile import RecordingFile
from packages.utils.pickle_utils import pickle_load
_PRINTS = True
class ServerSER:
def __init__(self, date, model_fname, results_path,
nprocs=1, run_type="server"):
# bunch of inits
def process_files(self):
# Setup a list of processes that we want to run
self.processes = [mp.Process(target=self.recognition_func, args=("processes/p" + str(p),
self.model_obj, p, self.output))
for p in range(self.nprocs)]
# Run processes
for p in self.processes: p.start()
# Exit the completed processes
for p in self.processes: p.join()
# Get process results from the output queue
self.results = []
for p in self.processes:
try:
r = self.output.get_nowait()
self.results.append(r)
except Exception as e:
print(e)
return [e[1][0] for e in self.results]
def recognition_func(self, pfolder, model_obj, pos, output, profile=True):
# start profiling
pr = cProfile.Profile()
pr.enable()
# start logging
logger_name = "my-logger" + str(pos)
logging.basicConfig(format='%(asctime)s %(levelname)5s %(message)s',
level=logging.INFO, filename=logger_name + ".txt")
logging.info("Start logging for process number " + str(pos))
# start processing until no files available
while len([f for f in os.listdir(pfolder) if ".mp3" in f]) > 0:
# get oldest file
oldest_file = [f for f in self.sorted_ls(pfolder) if ".mp3" in f][0]
# process
try:
recording = RecordingFile(pfolder=pfolder,
base_url=self.base_url,
fpath=oldest_file,
results_path=self.results_path)
if _PRINTS:
msg = "%10s : %50s" % ("PROCESSING", oldest_file)
logging.info(msg)
# prints
print("------------------------------------------------------------------------")
print("%10s : %50s" % ("PROCESSING", oldest_file))
print("------------------------------------------------------------------------")
# recognize for file
_ = Recognizer(recording=recording,
duration_step=1,
channel=1,
model_obj=self.model_obj)
# clean ups
recording.delete_files()
print("%10s." % ("DONE"))
print("------------------------------------------------------------------------")
except Exception as e:
self.errors[oldest_file] = e
logging.warn(e)
print(e, " while processing ", oldest_file)
# put results in queue
self.output.put((pos, [self.errors, self.durations]))
# save profiling results
# pr.print_stats(sort='time')
pr.disable()
pr.dump_stats("incode_cprofiler_output" + str(pos) + ".txt")
return True
output of uname -a:
Linux 4.15.0-70-generic #79-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux
output of lscpu:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 94
Model name: Intel Core Processor (Skylake, IBRS)
Stepping: 3
CPU MHz: 3696.000
BogoMIPS: 7392.00
Virtualization: VT-x
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
L3 cache: 16384K
NUMA node0 CPU(s): 0-3
EDIT
When playing a bit with the number of processes, the following happens:
In case I use 1 process, CPU usage is by 110%.
For 2 processes, CPU usage is at 80%.
With 6 processes, each process is around 50% CPU usage.

Related

Combine using Python multiprocessing ThreadPool + Pool (for process), process does not work sometimes

I'm using ThreadPool for IO-intensive work (such as loading data) while fork a process within the thread if there is CPU-intensive calculation needed. This is my workaround to the GIL problem (anyway, this is not my key problem here).
My problem is: when running my code, sometimes, although a process is folked, it looks always in sleeping status (not even run my calculation). Thus, it causes blocking since the code calls join to wait for result (or error). Note that the problem doesn't always happen, just occasionally.
My running environment is: linux centos 7.3, anaconda 5.1.0 (with built-in python 3.6.4). And note that: i failed to reproduce the issue with the same code on windows.
the following is the simplified code, which can reproduce the issue on linux:
import logging
import time
import random
import os
import threading
from multiprocessing.pool import Pool, ThreadPool
class Task(object):
def __init__(self, func, *args) -> None:
super().__init__()
self.func = func
self.args = args
class ConcurrentExecutor(object):
def __init__(self, spawn_count=1) -> None:
super().__init__()
self.spawn_count = spawn_count
def fork(self, task):
result = None
print('Folk started: %s' % task.args)
pool = Pool(processes=1)
try:
handle = pool.apply_async(task.func, task.args)
pool.close()
result = handle.get()
print('Folk completed: %s' % task.args)
except Exception as err:
print('Fork failure: FOLK%s' % task.args)
raise err
finally:
pool.terminate()
return result
def spawn(self, tasks):
results = []
try:
print('Spawn started')
handles = []
pool = ThreadPool(processes=self.spawn_count)
for task in tasks:
handles.append(pool.apply_async(task.func, task.args))
pool.close()
pool.join()
print('all done')
for handle in handles:
results.append(handle.get(10))
print('Spawn completed')
except Exception as err:
print('Spawn failure')
raise err
return results
def foo_proc(i):
print(i)
result=i*i
time.sleep(1)
return result
def foo(i):
executor = ConcurrentExecutor(2)
try:
result = executor.fork(Task(foo_proc, i))
except Exception as err:
result = 'ERROR'
return result
if __name__ == '__main__':
executor = ConcurrentExecutor(4)
tasks = []
for i in range(1, 10):
tasks.append(Task(foo, i))
start = time.time()
print(executor.spawn(tasks))
end = time.time()
print(end - start)
The following shows an example of running result:
[appadmin#168-61-40-47 test]$ python test.py
7312
Spawn started
Folk started: 1
Folk started: 2
Folk started: 3
Folk started: 4
4
2
1
3
Folk completed: 4
Folk completed: 2
Folk completed: 1
Folk completed: 3
Folk started: 5
Folk started: 6
Folk started: 7
5
Folk started: 8
7
8
Folk completed: 5
Folk completed: 7
Folk completed: 8
Folk started: 9
9
Folk completed: 9
You may see the code was stuck since process "6" was folked but never do the work. Meanwhile, i can see two python processes, which is supposed to be only 1 process at last if everything runs correctly:
[user#x-x-x-x x]$ ps -aux | grep python
user 7312 0.1 0.0 1537216 13524 pts/0 Sl+ 22:23 0:02 python test.py
user 7339 0.0 0.0 1545444 10604 pts/0 S+ 22:23 0:00 python test.py
Could anyone help? Thanks in advance!

Concurrent Futures and the correct number of threads to use.

I'm developing a webscraping tool and I am using concurrent processes. I want to know if there is a general rule for the number of threads you need to use. Currently, I have it set for 10, but I've noticed that I get a lot more missing data values when I push the number past the amount of threads.
URLs = loadit()
start = time.time()
with ThreadPoolExecutor(max_workers=10) as executor:
# start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url): url for url in URLs}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
print(data.values())
# scraped_data = restaurant_parse(link)
# time.sleep(random.randrange(3, 5))
writeit(outName, data.values())
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
end = time.time()
print(end - start)
lscpu on a ubuntu linux box shows
ubuntu-dev#ubuntu:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 3
On-line CPU(s) list: 0-2
Thread(s) per core: 1
Core(s) per socket: 3
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 61
Model name: Intel(R) Core(TM) i7-5557U CPU # 3.10GHz
Stepping: 4
CPU MHz: 3100.000
BogoMIPS: 6200.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K
NUMA node0 CPU(s): 0-2
Thank you!

can't see some of the process names

I'm creating a simple program in python that should save my current processes (using linux and pycharm).
my class code:
class pidSaver:
__pidDictionary={}
def __init__(self):
pids = [pid for pid in os.listdir('/proc') if pid.isdigit()]
for pid in pids:
try:
os.kill(int(pid), 0)
except OSError as e:
if e.errno != errno.EPERM: #no premission error
continue
try:
self.__pidDictionary[pid]=open(os.path.join('/proc', pid, 'cmdline'), 'rb').read()
except IOError: # proc has already terminated
continue
def getDic(self):
return self.__pidDictionary
and my main code:
pidsTry = pidSaver()
printList= pidsTry.getDic()
keyList= list(printList.keys())
IntegerKeyList=[]
for key in keyList:
IntegerKeyList.append(int(key))
IntegerKeyList.sort()
for key in IntegerKeyList:
print "%d : %s" %(key ,printList[str(key)])
the output:
1 : /sbin/init
2 :
3 :
5 :
...
7543 : less
...
so from some reason for some of the process I can't get a names and I got a blank out put.
when I run on my computer the command ps -aux | less I got this result:
root 1 0.0 0.0 33776 4256 ? Ss אפר24 0:01 /sbin/init
root 2 0.0 0.0 0 0 ? S אפר24 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S אפר24 0:00 [ksoftirqd/0]
myUser 7543 0.0 0.0 13752 1548 pts/9 T אפר24 0:00 less
so basically, the process that I cannot see in my python are the process that surrounded by "[]".
I don't understand why is this. Also, I want to get them too. how can I do it and why this is happening?
thank you!
These processes you can't see are kernel threads. As the name says they are running in kernel space and are therefore no childs of PID 1, i.e. the init system. Their cmdline is empty because they don't have any corresponding executable that gets called and no arguments to be passed, and this empty cmdline is a pretty safe way to identify them. If you still want to get their name it's in the file /proc/"pid"/status under the name field.

How can python threads be programmed such that the user can distinguish between them using monitoring tools available in LINUX

For example, I can name threads easily for reference within the python program:
#!/usr/bin/python
import time
import threading
class threadly(threading.Thread):
def __init__(self, name):
threading.Thread.__init__(self)
self.name = name
def run(self):
while True:
time.sleep(4)
print "I am", self.name, "and I am barely awake."
slowthread=threadly("slowthread")
slowthread.start()
anotherthread=threadly("anotherthread")
anotherthread.start()
while True:
time.sleep(2)
print "I will never stop running"
print "Threading enumerate:", threading.enumerate()
print "Threading active_count:", threading.active_count()
print
And the output looks like this:
I am slowthread and I am barely awake.
I am anotherthread and I am barely awake.
I will never stop running
Threading enumerate: [<_MainThread(MainThread, started 140121216169728)>, <threadly(slowthread, started 140121107244800)>, <threadly(anotherthread, started 140121026328320)>]
Threading active_count: 3
I will never stop running
Threading enumerate: [<_MainThread(MainThread, started 140121216169728)>, <threadly(slowthread, started 140121107244800)>, <threadly(anotherthread, started 140121026328320)>]
Threading active_count: 3
I can find the PID this way:
$ ps aux | grep test
557 12519 0.0 0.0 141852 3732 pts/1 S+ 03:59 0:01 vim test.py
557 13974 0.0 0.0 275356 6240 pts/2 Sl+ 05:36 0:00 /usr/bin/python ./test.py
root 13987 0.0 0.0 103248 852 pts/3 S+ 05:39 0:00 grep test
I can then invoke top:
# top -p 13974
Pressing 'H' turns on display of threads, and we see they are all displaying as the name of the command or of the main thread:
top - 05:37:08 up 5 days, 4:03, 4 users, load average: 0.02, 0.03, 0.00
Tasks: 3 total, 0 running, 3 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.8%us, 2.7%sy, 0.0%ni, 95.3%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 32812280k total, 27717980k used, 5094300k free, 212884k buffers
Swap: 16474104k total, 4784k used, 16469320k free, 26008752k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13974 justin.h 20 0 268m 6240 1740 S 0.3 0.0 0:00.03 test.py
13975 justin.h 20 0 268m 6240 1740 S 0.0 0.0 0:00.00 test.py
13976 justin.h 20 0 268m 6240 1740 S 0.0 0.0 0:00.00 test.py
Contrast this with software like rsyslog which does name its threads:
# ps aux | grep rsyslog
root 2763 0.0 0.0 255428 1672 ? Sl Mar22 6:53 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
root 2774 47.7 0.0 265424 6276 ? Sl Mar22 3554:26 /sbin/rsyslogd -i /var/run/syslogd-01.pid -c5 -f /etc/rsyslog-01.conf
root 2785 2.7 0.0 263408 3596 ? Sl Mar22 207:46 /sbin/rsyslogd -i /var/run/syslogd-02.pid -c5 -f /etc/rsyslog-02.conf
root 2797 1.7 0.0 263404 3528 ? Sl Mar22 131:39 /sbin/rsyslogd -i /var/run/syslogd-03.pid -c5 -f /etc/rsyslog-03.conf
root 2808 24.3 0.0 265560 3352 ? Sl Mar22 1812:25 /sbin/rsyslogd -i /var/run/syslogd-04.pid -c5 -f /etc/rsyslog-04.conf
root 2819 1.3 0.0 263408 1596 ? Sl Mar22 103:42 /sbin/rsyslogd -i /var/run/syslogd-05.pid -c5 -f /etc/rsyslog-05.conf
root 2830 0.0 0.0 263404 1408 ? Sl Mar22 0:17 /sbin/rsyslogd -i /var/run/syslogd-06.pid -c5 -f /etc/rsyslog-06.conf
root 13994 0.0 0.0 103248 852 pts/3 S+ 05:40 0:00 grep rsyslog
Let's pick '2774' because it looks busy:
$ top -p 2774
And press 'H' and we see a descriptively named thread showing me that the thread dedicated to the 'main' ruleset and the Reg Queue is consuming 55.6 % CPU.
top - 05:50:52 up 5 days, 4:17, 4 users, load average: 0.00, 0.00, 0.00
Tasks: 4 total, 1 running, 3 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.7%us, 2.6%sy, 0.0%ni, 95.5%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 32812280k total, 29833152k used, 2979128k free, 214836k buffers
Swap: 16474104k total, 4784k used, 16469320k free, 28123448k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2775 root 20 0 259m 6020 1212 R 55.6 0.0 3152:40 rs:main Q:Reg
2776 root 20 0 259m 6020 1212 S 7.0 0.0 407:57.94 rsyslogd
2774 root 20 0 259m 6020 1212 S 0.0 0.0 0:00.00 rsyslogd
2777 root 20 0 259m 6020 1212 S 0.0 0.0 0:00.00 rsyslogd
Another way to see the names is:
$ grep Name /proc/2775/task/*/status
/proc/2775/task/2774/status:Name: rsyslogd
/proc/2775/task/2775/status:Name: rs:main Q:Reg
/proc/2775/task/2776/status:Name: rsyslogd
/proc/2775/task/2777/status:Name: rsyslogd
So to restate my question:
How can python threads be programmed such that the user can distinguish between them using monitoring tools available in LINUX
In my question, I've tried to accomplish this by naming the thread within Python. Perhaps there a better way to expose differently identifiable threads from the OS?
Also, preferably I am looking for a Pythonic and standard way of doing this such that it would be part of the standard python distribution (RHEL 6/Python 2.6.7 specifically but this shouldn't matter unless the support comes in a later version of python) . Contributed modules are good to know about, but for my intended application unfortunately they would not be allowed for supportability reasons due to policies.
http://code.google.com/p/procname/
This appears to be your solution..
class worker(Thread):
def __init__(self, name)
Thread.__init__(self)
self.name = name
self.alive = True
self.start()
def run(self):
procname.setprocname('My super name')
while self.alive is True:
## Do work
x = worker('Worker')

Benchmarking tool using twisted

I am trying to write a web benchmarking tool base on twisted. Twisted is very fantastic asynchronous framework for web applications. Because I get started with this framework for just two weeks, I face a problem, here is it:
When I test this benchmarking tool compare with ApacheBench, the result differs greatly on the same concurrency. Here is the result of my tool:
python pyab.py 50000 50 http://xx.com/a.txt
speed:1063(q/s), worker:50, interval:7, req_made:7493, req_done:7443, req_error:0
And Here is the result of Apache Bench:
ab -c 50 -n 50000 http://xx.com/a.txt
Server Software: nginx/1.4.1
Server Hostname: 203.90.245.26
Server Port: 8080
Document Path: /a.txt
Document Length: 6 bytes
Concurrency Level: 50
Time taken for tests: 6.89937 seconds
Complete requests: 50000
Failed requests: 0
Write errors: 0
Total transferred: 12501750 bytes
HTML transferred: 300042 bytes
Requests per second: 8210.27 [#/sec] (mean)
Time per request: 6.090 [ms] (mean)
Time per request: 0.122 [ms] (mean, across all concurrent requests)
Transfer rate: 2004.62 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.8 0 4
Processing: 1 5 3.4 5 110
Waiting: 0 2 3.6 2 109
Total: 1 5 3.5 5 110
Percentage of the requests served within a certain time (ms)
50% 5
66% 6
75% 6
80% 6
90% 7
95% 7
98% 8
99% 8
100% 110 (longest request)
On the same url and concurrency, ApacheBench can go up to 8000 req/sec, while pyab only 1000 req/sec.
Here is my code(pyab.py):
from twisted.internet import reactor,threads
from twisted.internet.protocol import Protocol
from twisted.internet.defer import Deferred
from twisted.web.client import Agent
from twisted.web.client import HTTPConnectionPool
from twisted.web.http_headers import Headers
from twisted.python import log
import time, os, stat, logging, sys
from collections import Counter
logging.basicConfig(
#filename= "/%s/log/%s.%s" % (RUN_DIR,RUN_MODULE,RUN_TIME),
format="%(asctime)s [%(levelname)s] %(message)s",
level=logging.WARNING,
#level=logging.DEBUG,
stream=sys.stdout
)
#log.startLogging(sys.stdout)
observer = log.PythonLoggingObserver()
observer.start()
class IgnoreBody(Protocol):
def __init__(self, deferred, tl):
self.deferred = deferred
self.tl = tl
def dataReceived(self, bytes):
pass
def connectionLost(self, reason):
self.deferred.callback(None)
class Pyab:
def __init__( self, n = 50000, concurrency = 100, url='http://203.90.245.26:8080/a.txt'):
self.n = n
self.url = url
self.pool = HTTPConnectionPool(reactor, persistent=True)
self.pool.maxPersistentPerHost = concurrency
self.agent = Agent(reactor, connectTimeout = 5, pool = self.pool)
#self.agent = Agent(reactor, connectTimeout = 5)
self.time_start = time.time()
self.max_worker = concurrency
self.cnt = Counter({
'worker' : 0 ,
'req_made' : 0,
'req_done' : 0,
'req_error' : 0,
})
def monitor( self ):
interval = int(time.time() - self.time_start)
speed = 0
if interval != 0:
speed = int( self.cnt['req_done'] / interval )
log.msg("speed:%d(q/s), worker:%d, interval:%d, req_made:%d, req_done:%d, req_error:%d"
% (speed, self.cnt['worker'], interval, self.cnt['req_made'], self.cnt['req_done'], self.cnt['req_error']), logLevel=logging.WARNING)
reactor.callLater(1, lambda : self.monitor())
def start( self ):
self.keeprunning = True
self.monitor()
self.readMore()
def stop( self ):
self.keeprunning = False
def readMore( self ):
while self.cnt['worker'] < self.max_worker and self.cnt['req_done'] < self.n :
self.make_request()
if self.keeprunning and self.cnt['req_done'] < self.n:
reactor.callLater( 0.0001, lambda: self.readMore() )
else:
reactor.stop()
def make_request( self ):
d = self.agent.request(
'GET',
#'http://examplexx.com/',
#'http://example.com/',
#'http://xa.xingcloud.com/v4/qvo/WDCXWD7500AADS-00M2B0_WD-WCAV5E38536685366?update0=ref0%2Ccor&update1=nation%2Ccn&action0=visit&_ts=1376397973636',
#'http://203.90.245.26:8080/a.txt',
self.url,
Headers({'User-Agent': ['Twisted Web Client Example']}),
None)
self.cnt['worker'] += 1
self.cnt['req_made'] += 1
def cbResponse(resp):
self.cnt['worker'] -= 1
self.cnt['req_done'] += 1
log.msg('response received')
finished = Deferred()
resp.deliverBody(IgnoreBody(finished, self))
return finished
def cbError(error):
self.cnt['worker'] -= 1
self.cnt['req_error'] += 1
log.msg(error, logLevel=logging.ERROR)
d.addCallback(cbResponse)
d.addErrback(cbError)
if __name__ == '__main__' :
if len(sys.argv) < 4:
print "Usage: %s <n> <concurrency> <url>" % (sys.argv[0])
sys.exit()
ab = Pyab(n=int(sys.argv[1]), concurrency=int(sys.argv[2]), url=sys.argv[3])
ab.start()
reactor.run()
Is there any wrong with my code? Thanks!
When I last used it, ab was known to have dozens of serious bugs. Sometimes that would cause it to report massively inflated results. Sometimes it would report negative results. Sometimes it would crash. I'd try another tool, like httperf, as a sanity check.
However, if your server is actually that fast, then you might have another issue.
Even if ab has been fixed, you're talking here about a C program versus a Python program running on CPython. 8x slower than C in Python is not actually all that bad, so I don't expect there is actually anything wrong with your program, except that it doesn't make use of spawnProcess and multi-core concurrency.
For starters, see if you get any better results on PyPy.

Categories