Why does running this script freeze my computer? - python

I wrote a script in Python using SciPy to perform a short-time Fourier transform on a signal. When I ran it on a signal with a thousand timepoints, it ran fine. When I ran it on a signal with a million timepoints, it froze my computer (computer doesn't respond, and if audio was playing, the computer outputs a skipping and looping buzz); this has consistently occurred all three times I attempted it. I've written scripts that would take hours, but I've never encountered one that actually froze my computer. Any idea why? The script is posted below:
import scipy as sp
from scipy import fftpack
def STFT(signal, seconds_per_sample, window_seconds, min_Hz):
window_samples = int(window_seconds/seconds_per_sample) + 1
signal_samples = len(signal)
if signal_samples <= window_samples:
length = max(signal_samples, int(1/(seconds_per_sample*min_Hz)) + 1)
return sp.array([0]), fftpack.fftshift(fftpack.fftfreq(length, seconds_per_sample)), fftpack.fftshift(fftpack.fft(signal, n = length))
else:
length = max(window_samples, int(1/(seconds_per_sample*min_Hz)) + 1)
frequency = fftpack.fftshift(fftpack.fftfreq(length, seconds_per_sample))
time = []
FTs = []
for i in range(signal_samples - window_samples):
time.append(seconds_per_sample*i)
FTs.append(fftpack.fftshift(fftpack.fft(signal[i:i + window_samples], n = length)))
return sp.array(time), frequency, sp.array(FTs)

in the script is consumed too much RAM when you run it over a too large number of points, see Why does a simple python script crash my system
The process in that your program runs stores the arrays and variables for the calculations in process memory which is ram
you can fix this by forcing the program to use hard disk memory.
For workarounds (shelve,...) see the following links
memory usage, how to free memory
Python large variable RAM usage
I need to free up RAM by storing a Python dictionary on the hard drive, not in RAM. Is it possible?

Related

Problems with load RAM on 100%

I am conducting an experiment to load RAM at 100% on Mac OS. Stumbled upon the method described here: https://unix.stackexchange.com/a/99365
I decide to do the same. I wrote two programs which are presented below. While executing the first program, the system writes that the process takes 120 GB, but the memory usage graph is still stable. When executing the second program, almost immediately a warning pops up that the system does not have enough resources. The second program creates ten parallel processes that increase memory consumption in approximately the same way.
First program:
def load_ram(vm, timer):
x = (vm * 1024 * 1024 * 1024 * 8) * (0,)
begin_time = time.time()
while time.time() - begin_time < timer:
pass
print("end")
Memory occupied by the first program
Second program:
def load_ram(vm, timer):
file_sh = open("bash_file.sh", "w")
str_to_bash = """
VM=%d;
for i in {1..10};
do
python -c "x=($VM*1024*1024*1024*8)*(0,); import time; time.sleep(%d)" & echo "started" $i ;
done""" % (int(vm), int(timer))
file_sh.write(str_to_bash)
file_sh.close()
os.system("bash bash_file.sh")
Memory occupied by the second program
Memory occupied by the second program + system message
Parameters: vm = 16, timer = 30.
In the first program, memory takes up are equal to about 128 gigabytes (after that, a kill pops up in the terminal and the process stops). The second takes up more than 160 gigabytes, as shown in the picture. And all these ten processes are not completed. The warning that the system is low on resources is displayed even if the memory takes up are 10 gigabytes per process (that is, 100 gigabytes in total).
According to the situation described, two questions arise:
Why, with the same memory consumption (120 gigabytes), in the first case, the system pretends that this process does not exist, and in the second case it immediately falls under the same load?
Where does the number of 120 gigabytes come from if my computer's operating system contains only 16 gigabytes?
Thank you for the attention!

Multithreaded Python Script but just 20% CPU? How can I speed up my Python Script?

I programmed a simple SHA256 blockchain mining algorithm in python using hashlib. What I am trying to do here is changing a random number in my data and then calculating the SHA256 hash. Whenever the hash has the predefined amount of zeroes in the beginning, it prints the hash and its corresponding nonce to the console. The goal is to have 11 zeroes in the beginning.
import hashlib
import threading
#bn=block number d=input data, ph=previous hash
bn = 1
d = "mydata"
ph = "0000000000000000000000000000000000000000000000000000000000000000"
class MyThread(threading.Thread):
def __init__(self, threadID, begin, end):
threading.Thread.__init__(self)
self.threadID = threadID
self.end = end
self.begin = begin
def run(self):
for val in range(self.begin, self.end):
mine(self.threadID, val)
print("done " + str(self.threadID))
#hashing function
def mine(id, x):
hash = hashlib.sha256((str(bn) + str(x) + str(d) + str(ph)).encode('utf-8')).hexdigest()
if hash.find("0000000", 0, 7) > -1:
# print(id)
print(x)
print(hash)
#now break it up and distribute it to the threads
#Possible SHA256 Combinations 8388608000000
pc = 8388608000000
#Number of desired Threads (more threads help more?)
nt = 704
#calculating the steps
step = round((pc)/(nt))
#starting the threads with each thread calculating a different range of nonces.
for x in range (1, nt):
thread = MyThread((x), (x-1)*(step), (x)*(step))
thread.start()
How can I speed this thing up?
Currently, I run this on an Intel 8250u quadcore and I can get hashes with 7 zeroes in the beginning comfortably. However, 8 zeroes already take 5 hours.
Funnily, only around 20-30% of my CPU is used. How can I make it run on all cores with threads running in parallel?
Do you have another Idea how I can speed this thing up? I have better hardware (Threadripper 1920) available but I'm afraid that alone will not give me a 300x - 4000x (best case) speed increase...
My first thought was outsourcing this to the GPU since I know that Bitcoin Mining evolved from CPU to GPU. When looking into Numba I saw that it does not support Hashlib. I also saw that compiling this using Pypy might speed it up? What approach would you guys favour? I'm confused...
Thanks for being patient with me
Python is a bit weird in this regard. What you're doing here is multithreading and not multiprocessing. Because of the Global Interpreter Lock, these threads aren't actually running at the same time. Check out the multiprocessing module for Python if you want to actually run calculations in parallel.
EDIT: As Jared mentioned, Bitcoin mining on your personal computer is no longer profitable due to most mining now being done with specialized hardware. As a project this is cool, but I wouldn't expect it to make you money.

Extremely simple Python program taking up 100% of CPU

I have this program that checks the amount of CPU being used by the current Python process.
import os
import psutil
p = psutil.Process(os.getpid())
counter = 0
while True:
if counter % 1000 == 0:
print(p.cpu_percent())
counter += 1
The output is:
98.79987719751766
98.79981257674615
98.79975442677997
98.80031017770553
98.80022615662917
98.80020675841527
98.80027781367056
98.80038116157328
98.80055555555509
98.80054906013777
98.8006523704943
98.80072337402265
98.80081374321833
98.80092993219198
98.80030995738038
98.79962549234794
98.79963842975158
98.79916715088079
98.79930277598402
98.7993480085206
98.79921895171654
98.799154456851
As seen by the output, this program is taking up 100% of my CPU and I'm having a tough time understanding why. I noticed that putting a time.sleep(0.25) causes CPU usage to go down to zero.
In my actual application, I have a similar while loop and can't afford to have a sleep in the while loop since it is reading a video from OpenCV and needs to stay realtime.
import cv2
cap = cv2.VideoCapture("video.mp4")
while True:
success, frame = cap.retrieve()
This program takes the same amount of CPU as the first program I wrote, but this one decodes video!
If someone could explain a bit more that'd be awesome.
Your original loop is doing something as fast as it can. Any program that is doing purely CPU bound work, with no significant blocking operations involved, will happily consume the whole CPU, it just does whatever it's doing faster if the CPU is faster.
Your particular something is mostly just incrementing a value and dividing it by 1000 over and over, which is inefficient, but making it more efficient, e.g. by changing the loop to:
import os
import psutil
p = psutil.Process(os.getpid())
while True:
for i in range(1000): pass
print(p.cpu_percent())
removing all the division work and having a more efficient addition (range does the work at the C layer), would just mean you do the 1000 loops faster and print the cpu_percent more often (it might slightly reduce the CPU usage, but only because you might be printing enough that the output buffer is filled faster than it can be drained for display, and your program might end up blocking on I/O occasionally).
Point is, if you tell the computer to do something forever as fast as it can, it will. So don't. Use a sleep; even a small one (time.sleep(0.001)) would make a huge difference.
For your video capture scenario, seems like that's what you want in the first place. If the video source is producing enough output to occupy your whole CPU, so be it; if it isn't, and your code blocks when it gets ahead, that's great, but you don't want to slow processing just for the sake of lower CPU usage if it means falling behind/dropping frames.
I hope you are doing great!
When you do:
while True:
if counter % 1000 == 0:
print(p.cpu_percent())
counter += 1
You actually ask your computer to process constantly.
It is going to increment counter as fast as possible and will display the cpu_percent every time counter is modulo of 1000.
It means that your program will feed the CPU constantly with the instructions of incrementing that counter.
When you use sleep, you basically say to the OS (operating system) that your code shouldn't execute anything new before sleep time. Your code will then not flood the CPU with instructions.
Sleep suspends execution for the given number of seconds.
Currently it is better to use a sleep than the counter.
import os
import psutil
import time
p = psutil.Process(os.getpid())
counter = 0
while True:
time.sleep(1)
print(p.cpu_percent())
I hope it is helping.
Have a lovely day,
G
You might want to add in a time.sleep(numberOfSeconds) to your loop if you don't want it to be using 100% CPU all the time, if it's only checking for a certain condition over and over.

Python: kill the application if subprocess.check_output waits too long to receive the output

I have implemented a cache oblivious algorithm and have shown with the PAPI library that the L1/L2/L3 misses are very low. However I would also like to see how the algorithm behaves if I reduce the available RAM memory and force the algorithm to start using the swap space in the disk. Since the algorithm is cache oblivious, I should expect a much better scaling to the disk compared to other non cache oblivious algorithms for the same problem.
The problem however is that it is very hard to predict how bad the algorithms will perform once out on the disk; a small increase in the input size might dramatically change the time that it takes for the algorithm to finish running. So if you have many algorithms that you want to test, if one takes forever to finish then the experiment will be useless (I could of course sit and monitor the experiment and kill if with ctrl+c, but I really need to sleep).
Let's say the algorithms are A,B and C. I use a different python script, one for each algorithm. For varying input size n I use subprocess.check_output to call the executable of the implementation. This executable returns some statistics that I then process and store in a suitable format that I can then use with R for example to make some nice plots.
This is an example code for algorithm A:
import subprocess
import sys
f1=open('data.stats', 'w+', 1)
min = 200000
max = 2000000
step = 200000
iterations = 10
ns = range(minLeafs, maxLeafs+1, step)
incr = 0
f1.write('n\tp\talg\ttime\n')
for n in ns:
i = 0
for p in ps:
for it in range(0, iterations):
resA = subprocess.check_output(['/usr/bin/time', '-v','./A',n],
stderr=subprocess.STDOUT)
#do something with resA
f1.write(resA + '\n')
incr = incr + 1
print(incr/(((len(ns)))*iterations)*100.0, '%', end="\r")
i = i + 1
My question is, can I somehow kill a script if subprocess.check_outputtakes too long to receive an answer? The best thing would for me to define a cut off, like 10 minutes, so if subprocess.check_output hasn't received anything, then kill the entire script.
If you're using Python 3 (and the format of your call to print suggests you might be), then check_output actually already has a timeout argument that might be useful to you: https://docs.python.org/3.6/library/subprocess.html#subprocess.check_output

Reducing system time usage of Python Subprocess

I have a python script that uses multiprocessing's pool.map( ... ) to run a large number of calculations in parallel. Each of these calculations consists of the python script setting up input for a fortran program, using subprocess.popen( ... , stdin=PIPE, stdout=PIPE, stderr=PIPE ) to run the program, dump the input to it and read the output. Then the script parses the output, gets the needed numbers, then does it again for the next run.
def main():
#Read a configuration file
#do initial setup
pool = multiprocessing.Pool(processes=maxProc)
runner = CalcRunner( things that are the same for each run )
runNumsAndChis = pool.map( runner, xrange(startRunNum, endRunNum))
#dump the data that makes it past a cut to disk
class CalcRunner(object):
def __init__(self, stuff):
#setup member variables
def __call__(self, runNumber):
#get parameters for this run
params = self.getParams(runNum)
inFileLines = []
#write the lines of the new input file to a list
makeInputFile(inFileLines, ... )
process = subprocess.Popen(cmdString, bufsize=0, stdin=subprocess.PIPE, ... )
output = proc.communicate( "".join(inFileLines) )
#get the needed numbers from stdout
chi2 = getChiSq(output[0])
return [runNumber, chi2]
...
Anyways, on to the reason for the question. I submit this script to a grid engine system to break this huge parameter space sweep into 1000, 12 core (I choose 12 since most of the grid is 12 cores), tasks. When a single task runs on a single 12 core machine about 1/3 of the machine's time is spent doing system stuff, and the other 2/3 of the time is doing the user calculations, presumably setting up inputs to ECIS (the aforementioned FORTRAN code), running ECIS, and parsing the output of ECIS. However, sometimes 5 tasks get sent to a 64 core machine to utilize 60 of its cores. On that machine 40% of the time is spent doing system stuff and 1-2% doing user stuff.
First of all, where are all the system calls coming from? I tried writing a version of the program that runs ECIS once per separate thread and keeps piping new input to it and it spends FAR more time in system (and is slower overall), so it doesn't seem like it is due to all the process creation and deletion.
Second of all, how do I go about decreasing the amount of time spent on system calls?
At a guess, the open a process once and keep sending input to it was slower because I had to turn gfortran's output buffering off to get anything from the process, nothing else worked (short of modifying the fortran code... which isn't happening).
The OS on my home test machines where I developed this is Fedora 14. The OS on the grid machines is a recent version of Red Hat.
I have tried playing around with bufsize, setting it to -1 (system defaults), 0 (unbuffered), 1 (line by line), and 64Kb that does not seem to change things.

Categories