Python list append timings

Python list append timings - python

cat /proc/meminfo
MemTotal: 3981272 kB
I ran this simple test in python
#!/usr/bin/env python
import sys
num = int(sys.argv[1])
li = []
for i in xrange(num):
li.append(i)
$ time ./listappend.py 1000000
real 0m0.342s
user 0m0.304s
sys 0m0.036s
$ time ./listappend.py 2000000
real 0m0.646s
user 0m0.556s
sys 0m0.084s
$ time ./listappend.py 4000000
real 0m1.254s
user 0m1.136s
sys 0m0.116s
$ time ./listappend.py 8000000
real 0m2.424s
user 0m2.176s
sys 0m0.236s
$ time ./listappend.py 16000000
real 0m4.832s
user 0m4.364s
sys 0m0.452s
$ time ./listappend.py 32000000
real 0m9.737s
user 0m8.637s
sys 0m1.028s
$ time ./listappend.py 64000000
real 0m56.296s
user 0m17.797s
sys 0m3.180s
Question:
The time for 64000000 is 6 times more than the time for 32000000 but before that the times were simply doubling. Why so ?

TL;DR - Due to RAM being insufficient & the memory being swapped out to secondary storage.
I ran the program with different sizes on my box. Here are the results
/usr/bin/time ./test.py 16000000
2.90user 0.26system 0:03.17elapsed 99%CPU 513480maxresident
0inputs+0outputs (0major+128715minor)pagefaults
/usr/bin/time ./test.py 32000000
6.10 user 0.49 system 0:06.64 elapsed 99%CPU 1022664maxresident
40inputs (2major+255998minor)pagefaults
/usr/bin/time ./test.py 64000000
12.70 user 0.98 system 0:14.09 elapsed 97%CPU 2040132maxresident
4272inputs (22major+510643minor)pagefaults
/usr/bin/time ./test.py 128000000
30.57 user 23.29 system 27:12.32 elapsed 3%CPU 3132276maxresident
19764880inputs (389184major+4129375minor)pagefaults
User time the time the program ran as the user. (running user logic)
System time the time the program executed as the system. (i.e., time spent in system calls)
Elapsed time The total time the program executed. (includes waiting time..)
Elapsed time = User time + System Time + time spent waiting
Major Page Fault Occurs when a page of memory isn't in RAM & has to be fetched from a secondary device like a Hard Disk.
16M list size: list is mostly in memory. Hence no page faults.
32M list size: parts of list has to be swapped out of memory. Hence the little bit bump from exact two fold increase of elapsed time .
64M list size: increase in elapsed time is more than two fold due to 22 major pagefaults.
128M list size: The elapsed time has increased from 14 sec to over 27 minutes !! The waiting time is almost 26 minutes. This is due to a huge number of pagefaults (389184). Also notice the CPU usage is down to 3% from 99% due the the massive waiting time.
As unutbu pointed out python interpreter allocating a O(n*n) extra space for lists as they grow the situation is only worsened.

According to effbot:
The time needed to append an item to the list is “amortized constant”;
whenever the list needs to allocate more memory, it allocates room for
a few items more than it actually needs, to avoid having to reallocate
on each call (this assumes that the memory allocator is fast; for huge
lists, the allocation overhead may push the behaviour towards O(n*n)).
(my emphasis).
As you append more items to the list, the reallocator will try to reserve ever-larger amounts of memory. Once you've consumed all your physical memory (RAM) and your OS starts using swap space, the shuffling of data from disk to RAM or vice versa will make your program very slow.

I strongly suspect your Python process runs out of physical RAM available to it, and starts swapping to disk.
Re-run the last test while keeping an eye on its memory usage and/or the number of page faults.

Related

How to get accurate process CPU and memory usage with python?

I am trying to create a process monitor but I am unable to get accurate results comparing my results to windows task manager.
I have been using psutil which seems to work fine when looking at the overall cpu and memory usage but doesn't appear to be very accurate for a single process. Memory usage is always higher than task manager and CPU it always random.
I am setting the process once on initialise with self.process = psutil.Process(self.pid) and then calling the below method once a second, the process in task manager is running at a constant 5.4% cpu usage and 130mb ram however the below code produces:
CPU: 12.5375
Memory 156459008
CPU: 0.0
Memory 156459008
CPU: 0.0
Memory 156459008
CPU: 0.0
Memory 156459008
CPU: 12.5375
Memory 156459008
CPU: 0.0
Memory 156459008
CPU: 0.0
Memory 156459008
Example code:
def process_info(self):
# I am calling this method twice because I read the first time gets ignored?
ignore_cpu = self.process.cpu_percent(interval=None) / psutil.cpu_count()
time.sleep(0.1)
process_cpu = self.process.cpu_percent(interval=None) / psutil.cpu_count()
# I also tried the below code but it was much worse than above
# for j in range(10):
# if j == 0:
# test_list = []
# p_cpu = self.process.cpu_percent(interval=0.1) / psutil.cpu_count()
# test_list.append(p_cpu)
# process_cpu = (sum(test_list)) / len(test_list)
# Memory is about 25mb higher than task manager
process_memory = self.process.memory_info().rss
print(f"CPU: {process_cpu}")
print(f"Memory: {process_memory}")
am I using psutil incorrectly or is there a more accurate way to grab the data?

I think you are misinterpreting the output of psutil.cpu_percent().
Returns a float representing the current system-wide CPU utilization as a percentage. When interval is > 0.0 compares process times to system CPU times elapsed before and after the interval (blocking). When interval is 0.0 or None compares process times to system CPU times elapsed since last call, returning immediately. That means that the first time this is called it will return a meaningful 0.0 value on which subsequent calls can be based. In this case is recommended for accuracy that this function be called with at least 0.1 seconds between calls.
So, if you call it with interval=0.1, it will return the percentage of time the process was running in the last 0.1 seconds.
If you call it with interval=None, it will return the percentage of time the process was running since the last time you called it.
So, if you call it twice in a row with interval=None, the first call will return the percentage of time the process was running since the last time you called it, and the second call will return the percentage of time the process was running since the last time you called it.
So, if you want to get the percentage of time the process was running in the last 0.1 seconds, you should call it with interval=0.1.

Like #Axeltherabbit mentioned in his comment, Task Manager grabs all processes under a given name, whereas psutils grabs an individual process. psutils is correct, task manager over-scopes and decides to grab all processes.

this should be more accurate i think?
import psutil
for process in [psutil.Process(pid) for pid in psutil.pids()]:
try:
process_name = process.name()
process_mem = process.memory_percent()
process_cpu = process.cpu_percent(interval=0.5)
except psutil.NoSuchProcess as e:
print(e.pid, "killed before analysis")
else:
print("Name:", process_name)
print("CPU%:", process_cpu)
print("MEM%:", process_mem)

Problems with load RAM on 100%

I am conducting an experiment to load RAM at 100% on Mac OS. Stumbled upon the method described here: https://unix.stackexchange.com/a/99365
I decide to do the same. I wrote two programs which are presented below. While executing the first program, the system writes that the process takes 120 GB, but the memory usage graph is still stable. When executing the second program, almost immediately a warning pops up that the system does not have enough resources. The second program creates ten parallel processes that increase memory consumption in approximately the same way.
First program:
def load_ram(vm, timer):
x = (vm * 1024 * 1024 * 1024 * 8) * (0,)
begin_time = time.time()
while time.time() - begin_time < timer:
pass
print("end")
Memory occupied by the first program
Second program:
def load_ram(vm, timer):
file_sh = open("bash_file.sh", "w")
str_to_bash = """
VM=%d;
for i in {1..10};
do
python -c "x=($VM*1024*1024*1024*8)*(0,); import time; time.sleep(%d)" & echo "started" $i ;
done""" % (int(vm), int(timer))
file_sh.write(str_to_bash)
file_sh.close()
os.system("bash bash_file.sh")
Memory occupied by the second program
Memory occupied by the second program + system message
Parameters: vm = 16, timer = 30.
In the first program, memory takes up are equal to about 128 gigabytes (after that, a kill pops up in the terminal and the process stops). The second takes up more than 160 gigabytes, as shown in the picture. And all these ten processes are not completed. The warning that the system is low on resources is displayed even if the memory takes up are 10 gigabytes per process (that is, 100 gigabytes in total).
According to the situation described, two questions arise:
Why, with the same memory consumption (120 gigabytes), in the first case, the system pretends that this process does not exist, and in the second case it immediately falls under the same load?
Where does the number of 120 gigabytes come from if my computer's operating system contains only 16 gigabytes?
Thank you for the attention!

Why does per-process overhead constantly increase for multiprocessing?

I was counting for a 6 core CPU with 12 logical CPUs in a for-loop till really high numbers several times.
To speed things up i was using multiprocessing. I was expecting something like:
Number of processes <= number of CPUs = time identical
number of processes + 1 = number of CPUs = time doubled
What i was finding was a continuous increase in time. I'm confused.
the code was:
#!/usr/bin/python
from multiprocessing import Process, Queue
import random
from timeit import default_timer as timer
def rand_val():
num = []
for i in range(200000000):
num = random.random()
print('done')
def main():
for iii in range(15):
processes = [Process(target=rand_val) for _ in range(iii)]
start = timer()
for p in processes:
p.start()
for p in processes:
p.join()
end = timer()
print(f'elapsed time: {end - start}')
print('for ' + str(iii))
print('')
if __name__ == "__main__":
main()
print('done')
result:
elapsed time: 14.9477102 for 1
elapsed time: 15.4961154 for 2
elapsed time: 16.9633134 for 3
elapsed time: 18.723183399999996 for 4
elapsed time: 21.568377299999995 for 5
elapsed time: 24.126758499999994 for 6
elapsed time: 29.142095499999996 for 7
elapsed time: 33.175509300000016 for 8
.
.
.
elapsed time: 44.629786800000005 for 11
elapsed time: 46.22480710000002 for 12
elapsed time: 50.44349420000003 for 13
elapsed time: 54.61919949999998 for 14

There are two wrong assumptions you make:
Processes are not free. Merely adding processes adds overhead to the program.
Processes do not own CPUs. A CPU interleaves execution of several processes.
The first point is why you see some overhead even though there are less processes than CPUs. Note that your system usually has several background processes running, so the point of "less processes than CPUs" is not clearcut for a single application.
The second point is why you see the execution time increase gradually when there are more processes than CPUs. Any OS running mainline Python does preemptive multitasking of processes; roughly, this means a process does not block a CPU until it is done, but is paused regularly so that other processes can run.
In effect, this means that several processes can run on one CPU at once. Since the CPU can still only do a fixed amount of work per time, all processes take longer to complete.

I don't understand what you are trying to achieve?
You are taking the same work, and running it X times, where X is the number of SMPs in your loop. You should be taking the work and dividing it by X, then sending a chunk to each SMP unit.
Anyway, with regards what you are observing - you are seeing the time it takes to spawn and close the separate processes. Python isn't quick at starting new processes.

Your test is faulty.
Imagine this, it takes 1 day for one farmer to work a 10km^2 farmland using a single tractor. If there are two farmers working 20km^2 farms, why are you expecting two farmers working twice the amount of farmlands using two tractors to take less time?
You have 6 CPU cores, your village has 6 tractors, but nobody has money to buy private tractors. As the number of workers (processes) on the village increased, the number of tractors remain the same, so everyone has to share the limited number of tractors.
In an ideal world, two farmers working twice the amount of work using two tractors would take exactly the same amount of time as one farmer working one portion of work, but in real computers the machine has other work to do even if it seems idle. There are task switching, the OS kernel has to run and monitor hardware devices, memory caches needs to be flushed and invalidated between CPU cores, your browser needs to run, the village elder is running a meeting to discuss who should get the tractors and when, etc.
As the number of workers increases beyond the number of tractors, farmers don't just hog the tractors for themselves. Instead they made an arrangement that they'd pass the tractors around every three hours or so. This means that the seventh farmer don't have to wait for two days to get their share of tractor time. However, there's a cost to transferring tractors between farmlands, just as there are costs for CPU to switch between processes; task switch too frequently and the CPU is not actually doing work, and switch to infrequently and you get resource starvation as some jobs takes too long to start being worked on.
A more sensible test would be to keep the size of farmland constant and just increase the number of farmers. In your code, that would correspond to this change:
def rand_val(num_workers):
num = []
for i in range(200000000 / num_workers):
num = random.random()
print('done')
def main():
for iii in range(15):
processes = [Process(target=lambda: rand_val(iii)) for _ in range(iii)]
...

Python most accurate method to measure time (ms)

I need to meassure the time certain parts of my code take. While executing my code on a powerfull server, I get 10 diffrent results
I tried comparing time measured with time.time(), time.perf_counter(), time.perf_counter_ns(), time.process_time() and time.process_time_ns().
import time
for _ in range(10):
start = time.perf_counter()
i = 0
while i < 100000:
i = i + 1
time.sleep(1)
end = time.perf_counter()
print(end - start)
I'm expecting when executing the same code 10 times, to be the same (the results to have a resolution of at least 1ms) ex. 1.041XX and not 1.030sec - 1.046sec.
When executing my code on a 16 cpu, 32gb memory server I'm receiving this result:
1.045549364
1.030857833
1.0466020120000001
1.0309665050000003
1.0464690349999994
1.046397238
1.0309525370000001
1.0312070380000007
1.0307592159999999
1.046095523
Im expacting the result to be:
1.041549364
1.041857833
1.0416020120000001
1.0419665050000003
1.0414690349999994
1.041397238
1.0419525370000001
1.0412070380000007
1.0417592159999999
1.041095523

Your expectations are wrong. If you want to measure code average time consumption use the timeit module. It executes your code multiple times and averages over the times.
The reason your code has different runtimes lies in your code:
time.sleep(1) # ensures (3.5+) _at least_ 1000ms are waited, won't be less, might be more
You are calling it in a tight loop,resulting in accumulated differences:
Quote from time.sleep(..) documentation:
Suspend execution of the calling thread for the given number of seconds. The argument may be a floating point number to indicate a more precise sleep time. The actual suspension time may be less than that requested because any caught signal will terminate the sleep() following execution of that signal’s catching routine. Also, the suspension time may be longer than requested by an arbitrary amount because of the scheduling of other activity in the system.
Changed in version 3.5: The function now sleeps at least secs even if the sleep is interrupted by a signal, except if the signal handler raises an exception (see PEP 475 for the rationale).
Emphasis mine.

Perfoming a code do not take the same time at each loop iteration because of the scheduling of the system (system puts on hold your process to perform another process then back to it...).

A clock with a microsecond timer

I'm trying to create a microsecond timer on python. The goal is to have a "tick" every microsecond.
My current approach was:
us = list()
while len(us) <= 1000:
t = time.time()*1000000
if t.is_integer():
us.append(t)
It shows that there are clear limitations in term of timing that I am not aware of.
The 656 first values were 1532960518213592.0. The while loop executes in the "same" microsecond.
Then the value jumps to 1532960518217613.0. The maximum resolution seems to be 4021 us.
How can I overcome those limitations?
EDIT: About this measurements.
Chrome with a youtube video was running in the background. + Outlooks, Teams, Adobe and some other stuff.
The CPU is an i5-5200U CPU # 2.20 GHz (2 cores).

The problem is that the current time is functionality provided by your operating system, so it will have different behaviors on different systems, both in terms of precision of the clock and in terms of how often it is polled. Also keep in mind that your program can be paused an resumed in it's execution by the scheduler of your operating system.
Here is a simplified version of your code:
[time.time() * 10**6 for i in range(1000)]
On my local computer (Windows Ubuntu Subsystem), this produces the following (notice it's about one per second with gaps):
[1532961190053186.0,
1532961190053189.0,
1532961190053190.0,
1532961190053191.0,
1532961190053192.0,
1532961190053193.0,
1532961190053194.0,
1532961190053195.0,
1532961190053196.0,
1532961190053198.0, ...]
On a server (Ubuntu), this produces the following (notice the same time occurring multiple times):
[1532961559708196.0,
1532961559708198.0,
1532961559708199.0,
1532961559708199.0,
1532961559708199.0,
1532961559708200.0,
1532961559708200.0,
1532961559708200.0,
1532961559708200.0,
1532961559708201.0, ...]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.