Python memory error when trying to build simple neural net - python

The below piece of code gives me a memory error when I run it. It is part of a simple neural net I'm building. Keep in mind that all the math is written out for learning purposes:
learning_rate = 0.2
costs = []
for i in range(50000):
ri = np.random.randint(len(data))
point =data[ri]
## print (point)
z = point[0] * w1 + point[1] * w2 + b
pred = sigmoid(z)
target = point[2]
cost = np.square(pred-target)
costs.append(cost)
dcost_pred = 2* (pred-target)
dpred_dz = sigmoid_p(z)
dz_dw1 = point[0]
dz_dw2 = point[1]
dz_db = 1
dcost_dz = dcost_pred*dpred_dz
dcost_dw1 = dcost_pred*dpred_dz*dz_dw1
dcost_dw2 = dcost_pred*dpred_dz
dcost_db = dcost_dz * dz_db
w1 = w1 - learning_rate*dcost_dw1
w2 = w2 - learning_rate*dcost_dw2
b = b - learning_rate*dcost_db
plt.plot(costs)
if i % 100 == 0:
for j in range(len(data)):
cost_sum = 0
point = data[ri]
z = point[0]*w1+point[1]*w2+b
pred = sigmoid(z)
target = point[2]
cost_sum += np.square(pred-target)
costs.append(cost_sum/len(data))
When the program gets to this part it results in the following error:
Traceback (most recent call last):
File "D:\First Neual Net.py", line 89, in <module>
plt.plot(costs)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\pyplot.py", line 3261, in plot
ret = ax.plot(*args, **kwargs)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\__init__.py", line 1717, in inner
return func(ax, *args, **kwargs)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\axes\_axes.py", line 1373, in plot
self.add_line(line)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\axes\_base.py", line 1779, in add_line
self._update_line_limits(line)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\axes\_base.py", line 1801, in _update_line_limits
path = line.get_path()
File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\lines.py", line 957, in get_path
self.recache()
File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\lines.py", line 667, in recache
self._xy = np.column_stack(np.broadcast_arrays(x, y)).astype(float)
File "C:\Program Files (x86)\Python36-32\lib\site-packages\numpy\lib\shape_base.py", line 353, in column_stack
return _nx.concatenate(arrays, 1)
MemoryError
Are there any ways to make this code more efficient? Maybe using generators?

Bugs
You probably meant to move the cost_sum = 0 out of the loop!
Memory Error
You're attempting to plot 50000 points, matplotlib certainly doesn't take kindly to that, so you might want to reduce the number of points you plot. It appears you tried to do this in the loop at the bottom of the code?
Efficiency
Let's address the efficiency part of your question. Now, I can't comment on how to make the algorithm itself to be more efficient, but I thought I'd offer my wisdom on making Python code run faster.
I'll start off by saying this: the Python compiler (that sits behind the interpreter) does next to no optimisation, so the common optimisations compilers usually apply themselves apply to us here.
Common Subexpression Elimination
In your code you make use of stuff like this:
dcost_dz = dcost_pred*dpred_dz
dcost_dw1 = dcost_pred*dpred_dz*dz_dw1
dcost_dw2 = dcost_pred*dpred_dz
This is really inefficient! We are recomputing dcost_pred*dpred_dx 3 times! It would be much better to just make use of the variable we've already assigned to:
dcost_dz = dcost_pred*dpred_dz
dcost_dw1 = dcost_dz*dz_dw1
dcost_dw2 = dcost_dz
To put this into perspective, this is 4 instructions less each iteration of the loop (14 vs 10).
In a similar vein, you recompute point[0] and point[1] twice per iteration, why not use (x, y) = data[ri] instead?
Loop Hoisting
You also do a bit of recomputation during your iterations of the loop. For instance, every iteration you find the size of the data up to 3 different times and this doesn't change. So compute it before the loop:
for i in range(50000):
ri = np.random.randint(len(data))
becomes
datasz = len(data)
for i in range(50000):
ri = np.random.randint(datasz)
Also, that np.random.randint is costing you 3 instructions to even access, without even thinking about the call. If you were being really performance sensitive, you'd move it out of the loop too:
datasz = len(data)
randint = np.random.randint
for i in range(50000):
ri = randint(datasz)
Strength Reduction
Sometimes, some operators are more fast performing than others. For instance;
b = b - learning_rate*dcost_db
Uses a BINARY_SUBTRACT instruction, but
b -= learning_rate*dcost_db
uses an INPLACE_SUBTRACT instruction. It's likely the in-place instruction is a bit faster, so something to consider (but you'd need to test out that theory).
Plot Every Iteration
As stated in the comments (nicely spotted #HFBrowning), you are plotting all the points every iteration, which is going to absolutely cripple performance!
List append
Since you know the size of the list you are inserting into (50500 or something?) you can allocate the list to be that size with something like: costs = [0] * 50500, this saves a lot of time reallocating the size of the list when it gets full. You'd stop using append and start assigning to an index. Bear in mind though, this will cause weirdness in your plot unless you only plot once after the loop has finished!

Related

How can I prevent parallel python from quitting with "OSError: [Errno 35] Resource temporarily unavailable"?

Context
I'm trying to run 1000 simulations that involve (1) damaging a road network and then (2) measuring the traffic delays due to the damage. Both steps (1) and (2) involve creating multiple "maps". In step (1), I create 30 damage maps. In step (2), I measure the traffic delay for each of those 30 damage maps. The function then returns the average traffic delay over the 30 damage maps, and proceeds to run the next simulation. The pseudocode for the setup looks like this:
for i in range(0,1000): # for each simulation
create 30 damage maps using parallel python
measure the traffic delay of each damage map using parallel
python
compute the average traffic delay for simulation i
Since the maps are independent of each other, I've been using the parallel python package at each step.
Problem -- Error Message
The code has twice thrown the following error around the 72nd simulation (of 1000) and stopped running during step (1), which involves damaging the bridges.
An error has occurred during the function execution
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/ppworker.py", line 90, in run
__result = __f(*__args)
File "<string>", line 4, in compute_damage
File "<string>", line 3, in damage_bridges
File "/Library/Python/2.7/site-packages/scipy/stats/__init__.py", line 345, in <module>
from .stats import *
File "/Library/Python/2.7/site-packages/scipy/stats/stats.py", line 171, in <module>
from . import distributions
File "/Library/Python/2.7/site-packages/scipy/stats/distributions.py", line 10, in <module>
from ._distn_infrastructure import (entropy, rv_discrete, rv_continuous,
File "/Library/Python/2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 16, in <module>
from scipy.misc import doccer
File "/Library/Python/2.7/site-packages/scipy/misc/__init__.py", line 68, in <module>
from scipy.interpolate._pade import pade as _pade
File "/Library/Python/2.7/site-packages/scipy/interpolate/__init__.py", line 175, in <module>
from .interpolate import *
File "/Library/Python/2.7/site-packages/scipy/interpolate/interpolate.py", line 32, in <module>
from .interpnd import _ndim_coords_from_arrays
File "interpnd.pyx", line 1, in init scipy.interpolate.interpnd
File "/Library/Python/2.7/site-packages/scipy/spatial/__init__.py", line 95, in <module>
from .ckdtree import *
File "ckdtree.pyx", line 31, in init scipy.spatial.ckdtree
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/__init__.py", line 123, in cpu_count
with os.popen(comm) as p:
OSError: [Errno 35] Resource temporarily unavailable
Systems and versions
I'm running Python 2.7 in a PyCharm virtual environment with parallel python (pp) 1.6.5. My computer runs Mac OS High Sierra 10.13.3 with memory of 8 GB 1867 MHz DDR3.
Attempted fixes
I gather that the problem is with the parallel python package or how I've used it, but am otherwise at a loss to understand how to fix this. It's been noted as a bug on the parallel python page -- wkerzendorf posted there:
Q: I get a Socket Error/Memory Error when using jobs that use >os.system calls
A: The fix I found is using subprocess.Popen and poping the >stdout,stderr into the subprocess.PIPE. here is an example:
subprocess.Popen(['ls ->rtl'],stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=True). That >fixed the error for me.
However, I wasn't at all sure where to make this modification.
I've also read that the problem may be with my system limits, per this Ghost in the Machines blog post. However, when I tried to reconfigure the max number of files and max user processes, I got the following message in terminal:
Could not set resource limits: 1: Operation not permitted
Code using parallel python
The code I'm working with is rather complicated (requires multiple input files to run) so I'm afraid I can't provide a minimal, reproducible example here. You can download and run a version of the code at this link.
Below I've included the code for step (1), in which I use parallel python to create the 30 damage maps. The number of workers is 4.
ppservers = () #starting a super cool parallelization
# Creates jobserver with automatically detected number of workers
job_server = pp.Server(ppservers=ppservers)
print "Starting pp with", job_server.get_ncpus(), "workers"
# set up jobs
jobs = []
for i in targets:
jobs.append(job_server.submit(compute_damage, (lnsas[i%len(lnsas)], napa_dict, targets[i], i%sets, U[i%sets][:] ), modules = ('random', 'math', ), depfuncs = (damage_bridges, )))
# get the results that have already run
bridge_array_new = []
bridge_array_internal = []
indices_array = []
bridge_array_hwy_num = []
for job in jobs:
(index, damaged_bridges_internal, damaged_bridges_new, num_damaged_bridges_road) = job()
bridge_array_internal.append(damaged_bridges_internal)
bridge_array_new.append(damaged_bridges_new)
indices_array.append(index)
bridge_array_hwy_num.append(num_damaged_bridges_road)
Additional functions
The compute_damage function looks like this.
def compute_damage(scenario, master_dict, index, scenario_index, U):
'''goes from ground-motion intensity map to damage map '''
#figure out component damage for each ground-motion intensity map
damaged_bridges_internal, damaged_bridges_new, num_damaged_bridges = damage_bridges(scenario, master_dict, scenario_index, U) #e.g., [1, 89, 598] #num_bridges_out is highway bridges only
return index, damaged_bridges_internal, damaged_bridges_new, num_damaged_bridges
The damage_bridges function looks like this.
def damage_bridges(scenario, master_dict, scenario_index, u):
'''This function damages bridges based on the ground shaking values (demand) and the structural capacity (capacity). It returns two lists (could be empty) with damaged bridges (same thing, just different bridge numbering'''
from scipy.stats import norm
damaged_bridges_new = []
damaged_bridges_internal = []
#first, highway bridges and overpasses
beta = 0.6 #you may want to change this by removing this line and making it a dictionary lookup value 3 lines below
i = 0 # counter for bridge index
for site in master_dict.keys(): #1-1889 in Matlab indices (start at 1)
lnSa = scenario[master_dict[site]['new_id'] - 1]
prob_at_least_ext = norm.cdf((1/float(beta)) * (lnSa - math.log(master_dict[site]['ext_lnSa'])), 0, 1) #want to do moderate damage state instead of extensive damage state as we did here, then just change the key name here (see master_dict description)
#U = random.uniform(0, 1)
if u[i] <= prob_at_least_ext:
damaged_bridges_new.append(master_dict[site]['new_id']) #1-1743
damaged_bridges_internal.append(site) #1-1889
i += 1 # increment bridge index
# GB ADDITION -- to use with master_dict = napa_dict, since napa_dict only has 40 bridges
num_damaged_bridges = sum([1 for i in damaged_bridges_new if i <= 1743])
return damaged_bridges_internal, damaged_bridges_new, num_damaged_bridges
It seems like the issue was that I had neglected to destroy the servers created in steps (1) and (2) -- a simple fix! I simply added job_server.destroy() at the end of each step. I'm currently running the simulations and have reached 250 of the 1000 without incident.
To be completely clear, the code for step (1) is now:
ppservers = () #starting a super cool parallelization
# Creates jobserver with automatically detected number of workers
job_server = pp.Server(ppservers=ppservers)
# set up jobs
jobs = []
for i in targets:
jobs.append(job_server.submit(compute_damage, (lnsas[i%len(lnsas)], napa_dict, targets[i], i%sets, U[i%sets][:] ), modules = ('random', 'math', ), depfuncs = (damage_bridges, )))
# get the results that have already run
bridge_array_new = []
bridge_array_internal = []
indices_array = []
bridge_array_hwy_num = []
for job in jobs:
(index, damaged_bridges_internal, damaged_bridges_new, num_damaged_bridges_road) = job()
bridge_array_internal.append(damaged_bridges_internal)
bridge_array_new.append(damaged_bridges_new)
indices_array.append(index)
bridge_array_hwy_num.append(num_damaged_bridges_road)
job_server.destroy()

Issue with the performance of multiprocessing method against numpy.loadtxt()

I usually use numpy.loadtxt(filename) when I want to load files from the disk. Recently, I got a node of 36 processors so I thought to utilize the multiprocessing approach to load files such that each processor loads a portion of the file and eventually the root processor gathers them. I am expecting the files to be loaded are always huge (at least 5 GB) so using such a multiprocessing approach is reasonable.
To do so, I wrote the following method that simply loads any given file from the disk using multiple processors. I came from C world so I found out that using mpi4py library satisfies what I need. Note that jobs is an integer indicating the number of jobs in the file. Each job is a binary value written at a line in the file.
def load_dataset(COM, jobs, rank, size, filepath):
start = time.time()
C = None
r = None
rank_indices = ()
job_batch = jobs / size
for i in range((rank * job_batch), ((rank + 1) * job_batch)):
rank_indices = rank_indices + (i,)
C1 = []
with open(filepath) as fd:
for n, line in enumerate(fd):
if n in rank_indices:
s = line.splitlines()
W = [int(n) for n in s[0].split()]
W = np.asarray(W, np.int8)
C1.append(W)
C1 = np.asarray(C1)
gather_C = COM.gather(C1, root=0)
COM.Barrier()
if rank == 0:
print('\t\t>> Rank 0 is now gathering from other processors!! Wait please!')
C = np.asarray(list(itertools.chain(*gather_C)), dtype=np.int8)
end = time.time()
print('Loading time= %s' % (end - start))
del C1, gather_C
return C
However, it turns out that numpy.loadtxt(filename) is really faster than my method which is surprising!! I think I have a bug in my code so I am sharing it hoping that someone can spot any bug that causes the performance issue. All ideas and hints are also appreciated.

PyCall Error When Calling Python from Julia

I'm playing around with Julia and I'm using Sympy to which I think uses PyCall to call Python.
When I run the script below, I get a long error. It's too long to post all of it here, but here is the start of it:
LoadError: PyError (ccall(#pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr,
PyPtr), o, arg, C_NULL)) <type 'exceptions.RuntimeError'>
RuntimeError('maximum recursion depth exceeded while calling a Python object',)
File "d:\Users\OEM\AppData\Local\JuliaPro-0.6.0.1\pkgs-0.6.0.1\v0.6\Conda\deps\usr\lib\site-packages\sympy\core\cache.py", line 93, in wrapper
retval = cfunc(*args, **kwargs)
File "d:\Users\OEM\AppData\Local\JuliaPro-0.6.0.1\pkgs-0.6.0.1\v0.6\Conda\deps\usr\lib\site-packages\sympy\core\compatibility.py", line 809, in wrapper
result = user_function(*args, **kwds)
File "d:\Users\OEM\AppData\Local\JuliaPro-0.6.0.1\pkgs-0.6.0.1\v0.6\Conda\deps\usr\lib\site-packages\sympy\core\function.py", line 427, in __new__
result = super(Function, cls).__new__(cls, *args, **options)
File "d:\Users\OEM\AppData\Local\JuliaPro-0.6.0.1\pkgs-0.6.0.1\v0.6\Conda\deps\usr\lib\site-packages\sympy\core\cache.py", line 93, in wrapper
retval = cfunc(*args, **kwargs)
File "d:\Users\OEM\AppData\Local\JuliaPro-0.6.0.1\pkgs-0.6.0.1\v0.6\Conda\deps\usr\lib\site-packages\sympy\core\compatibility.py", line 809, in wrapper
result = user_function(*args, **kwds)
File "d:\Users\OEM\AppData\Local\JuliaPro-0.6.0.1\pkgs-0.6.0.1\v0.6\Conda\deps\usr\lib\site-packages\sympy\core\function.py", line 250, in __new__
evaluated = cls.eval(*args)
File "d:\Users\OEM\AppData\Local\JuliaPro-0.6.0.1\pkgs-0.6.0.1\v0.6\Conda\deps\usr\lib\site-packages\sympy\functions\elementary\integers.py", line 25, in eval
if arg.is_imaginary or (S.ImaginaryUnit*arg).is_real:
File "d:\Users\OEM\AppData\Local\JuliaPro-0.6.0.1\pkgs-0.6.0.1\v0.6\Conda\deps\usr\lib\site-packages\sympy\core\decorators.py", line 91, in __sympifyit_wrapper
return func(a, b)
File "d:\Users\OEM\AppData\Local\JuliaPro-0.6.0.1\pkgs-0.6.0.1\v0.6\Conda\deps\usr\lib\site-packages\sympy\core\decorators.py", line 132, in binary_op_wrapper
return func(self, other)
File "d:\Users\OEM\AppData\Local\JuliaPro-0.6.0.1\pkgs-0.6.0.1\v0.6\Conda\deps\usr\lib\site-packages\sympy\core\expr.py", line 140, in __mul__
return Mul(self, other)
File "d:\Users\OEM\AppData\Local\JuliaPro-0.6.0.1\pkgs-0.6.0.1\v0.6\Conda\deps\usr\lib\site-packages\sympy\core\cache.py", line 93, in wrapper
retval = cfunc(*args, **kwargs)
File "d:\Users\OEM\AppData\Local\JuliaPro-0.6.0.1\pkgs-0.6.0.1\v0.6\Conda\deps\usr\lib\site-packages\sympy\core\compatibility.py", line 809, in wrapper
result = user_function(*args, **kwds)
And as you may be able to see, towards the end it repeats: see line 93 on the end, then line 140, then line 93...
Here is my code:
function oddPeriodSquareRoots()
#=
Get the length of the continued fraction for square root of for the number i.
E.g. √7=[2;(1,1,1,4)]
=#
irrationalNumber, intPart, fractionalPart = symbols(string("irrationalNumber intPart fractionalPart"))
for i in [6451]
# For perfect squares, the period is 0
irrationalNumber = BigFloat(sqrt(BigFloat(i)))
if irrationalNumber == floor(irrationalNumber)
continue
end
# Get the continued fraction using symbolic programming
irrationalNumber = sqrt(Sym(i))
continuedFractionLength = 0
while true
intPart = Sym(BigInt(floor(irrationalNumber)))
if continuedFractionLength == 0
firstContinuedFractionTimes2 = intPart*2
end
continuedFractionLength += 1
if intPart == firstContinuedFractionTimes2
break
end
fractionalPart = irrationalNumber - intPart
irrationalNumber = 1 / fractionalPart
end
continuedFractionLength -= 1 # We ignore the first term.
end
return continuedFractionLength
end
This routine calculates the length of a continued fraction for the square root of some number. For the number 6451 it gives the error.
So my question is can this be resolved please?
I'm glad the recursionlimit solution was found. This hadn't been seen before. This comment is about how to streamline your SymPy code, as you seem to be confused about that. Basically, you just need to make your initial value symbolic, and then Julia's methods should (in most all cases) take care of the rest. Here is a slight rewrite:
using SymPy
using PyCall
#pyimport sys
sys.setrecursionlimit(10000)
"""
Get the length of the continued fraction for square root of for the number i.
E.g. √7=[2;(1,1,1,4)]
"""
function oddPeriodSquareRoots(n)
i = Sym(n)
# For perfect squares, the period is 0
continuedFractionLength = 0
irrationalNumber = sqrt(i)
if is_integer(irrationalNumber)
return continuedFractionLength
end
# Get the continued fraction using symbolic programming
while true
intPart = floor(irrationalNumber)
if continuedFractionLength == 0
firstContinuedFractionTimes2 = intPart*2
end
continuedFractionLength += 1
if intPart == firstContinuedFractionTimes2
break
end
fractionalPart = irrationalNumber - intPart
irrationalNumber = 1 / fractionalPart
end
continuedFractionLength -= 1 # We ignore the first term.
return continuedFractionLength
end
Thanks very much for everyone's input. I managed to solve this by putting these lines at the top of the file (in addition to the "Using Sympy" line which I had the whole time):
using SymPy
using PyCall
#pyimport sys
sys.setrecursionlimit(10000)
This sets the recursion limit in Python. Not sure why it has to be so large for this to work.
I did also remove some of my type conversions etc. I thought this might help with the error and/or speed. But it didn't really.
Also, removing the line where I declare the variables to by symbols doesn't stop the code from working.
irrationalNumber, intPart, fractionalPart = symbols(string("irrationalNumber intPart fractionalPart"))
Same in Python. So not sure what the point of it is.
But in Julia, either way I have to have that Sym() wrapper around these 2 lines:
irrationalNumber = sqrt(Sym(i))
...
intPart = Sym(floor(irrationalNumber))
By inspecting these types, with the use of typeof, I can see they are symbolic, not floats. Without them, everything turns into floats and so I'm not doing it symbolically.

Determining Memory Consumption with Python Multiprocessing and Shared Arrays

I've spent some time research the Python multiprocessing module, it's use of os.fork, and shared memory using Array in multiprocessing for a piece of code I'm writing.
The project itself boils down to this: I have several MxN arrays (let's suppose I have 3 arrays called A, B, and C) that I need to process to calculate a new MxN array (called D), where:
Dij = f(Aij, Bij, Cij)
The function f is such that standard vector operations cannot be applied. This task is what I believe is called "embarrassing parallel". Given the overhead involved in multiprocessing, I am going to break the calculation of D into blocks. For example, if D was 8x8 and I had 4 processes, each processor would be responsible for solving a 4x4 "chunk" of D.
Now, the size of the arrays has the potential to be very big (on the order of several GB), so I want all arrays to use shared memory (even array D, which will have sub-processes writing to it). I believe I have a solution to the shared array issue using a modified version of what is presented here.
However, from an implementation perspective it'd be nice to place arrays A, B, and C into a dictionary. What is unclear to me is if doing this will cause the arrays to be copied in memory when the reference counter for the dictionary is incremented within each sub-process.
To try and answer this, I wrote a little test script (see below) and tried running it using valgrind --tool=massif to track memory usage. However, I am not quite clear how to intemperate the results from it. Specifically, whether each massiff.out file (where the number of files is equal to the number of sub-processes created by my test script + 1) denotes the memory used by that process (i.e. I need to sum them all up to get the total memory usage) or if I just need to consider the massif.out associated with the parent process.
On a side note: One of my shared memory arrays has the sub-processes writing to it. I know that this sound be avoided, especially since I am not using locks to limit only one sub-process writing to the array at any given time. Is this a problem? My thought is that since the order that the array is filled out is irrelevant, the calculation of any index is independent of any other index, and that any given sub-process will never write to the same array index as any other process, there will not be any sort of race conditions. Is this correct?
#! /usr/bin/env python
import multiprocessing as mp
import ctypes
import numpy as np
import time
import sys
import timeit
def shared_array(shape=None, lock=False):
"""
Form a shared memory numpy array.
https://stackoverflow.com/questions/5549190/is-shared-readonly-data-copied-to-different-processes-for-python-multiprocessing
"""
shared_array_base = mp.Array(ctypes.c_double, shape[0]*shape[1], lock=lock)
# Create a locked or unlocked array
if lock:
shared_array = np.frombuffer(shared_array_base.get_obj())
else:
shared_array = np.frombuffer(shared_array_base)
shared_array = shared_array.reshape(*shape)
return shared_array
def worker(indices=None, queue=None, data=None):
# Loop over each indice and "crush" some data
for i in indices:
time.sleep(0.01)
if data is not None:
data['sink'][i, :] = data['source'][i, :] + i
# Place ID for completed indice into the queue
queue.put(i)
if __name__ == '__main__':
# Set the start time
begin = timeit.default_timer()
# Size of arrays (m x n)
m = 1000
n = 1000
# Number of Processors
N = 2
# Create a queue to use for tracking progress
queue = mp.Queue()
# Create dictionary and shared arrays
data = dict()
# Form a shared array without a lock.
data['source'] = shared_array(shape=(m, n), lock=True)
data['sink'] = shared_array(shape=(m, n), lock=False)
# Create a list of the indices associated with the m direction
indices = range(0, m)
# Parse the indices list into range blocks; each process will get a block
indices_blocks = [int(i) for i in np.linspace(0, 1000, N+1)]
# Initialize a list for storing created sub-processes
procs = []
# Print initialization time-stap
print 'Time to initialize time: {}'.format(timeit.default_timer() - begin)
# Create and start each sbu-process
for i in range(1, N+1):
# Start of the block
start = indices_blocks[i-1]
# End of the block
end = indices_blocks[i]
# Create the sub-process
procs.append(mp.Process(target=worker,
args=(indices[start:end], queue, data)))
# Kill the sub-process if/when the parent is killed
procs[-1].daemon=True
# Start the sub-process
procs[-1].start()
# Initialize a list to store the indices that have been processed
completed = []
# Entry a loop dependent on whether any of the sub-processes are still alive
while any(i.is_alive() for i in procs):
# Read the queue, append completed indices, and print the progress
while not queue.empty():
done = queue.get()
if done not in completed:
completed.append(done)
message = "\rCompleted {:.2%}".format(float(len(completed))/len(indices))
sys.stdout.write(message)
sys.stdout.flush()
print ''
# Join all the sub-processes
for p in procs:
p.join()
# Print the run time and the modified sink array
print 'Running time: {}'.format(timeit.default_timer() - begin)
print data['sink']
Edit: I seems I've run into another issue; specifically, an value of n equal to 3 million will result in the kernel killing the process (I assume it's due to a memory issue). This appears to be with how shared_array() works (I can create np.zeros arrays of the same size and not have an issue). After playing with it a bit I get the traceback shown below. I'm not entirely sure what is causing the memory allocation error, but a quick Google search gives discussions about how mmap maps virtual address space, which I'm guessing is smaller than the amount of physical memory a machine has?
Traceback (most recent call last):
File "./shared_array.py", line 66, in <module>
data['source'] = shared_array(shape=(m, n), lock=True)
File "./shared_array.py", line 17, in shared_array
shared_array_base = mp.Array(ctypes.c_double, shape[0]*shape[1], lock=lock)
File "/usr/apps/python/lib/python2.7/multiprocessing/__init__.py", line 260, in Array
return Array(typecode_or_type, size_or_initializer, **kwds)
File "/usr/apps/python/lib/python2.7/multiprocessing/sharedctypes.py", line 120, in Array
obj = RawArray(typecode_or_type, size_or_initializer)
File "/usr/apps/python/lib/python2.7/multiprocessing/sharedctypes.py", line 88, in RawArray
obj = _new_value(type_)
File "/usr/apps/python/lib/python2.7/multiprocessing/sharedctypes.py", line 68, in _new_value
wrapper = heap.BufferWrapper(size)
File "/usr/apps/python/lib/python2.7/multiprocessing/heap.py", line 243, in __init__
block = BufferWrapper._heap.malloc(size)
File "/usr/apps/python/lib/python2.7/multiprocessing/heap.py", line 223, in malloc
(arena, start, stop) = self._malloc(size)
File "/usr/apps/python/lib/python2.7/multiprocessing/heap.py", line 120, in _malloc
arena = Arena(length)
File "/usr/apps/python/lib/python2.7/multiprocessing/heap.py", line 82, in __init__
self.buffer = mmap.mmap(-1, size)
mmap.error: [Errno 12] Cannot allocate memory

Memory Error with Multiprocessing in Python

I'm trying to perform some costly scientific calculation with Python. I have to read a bunch of data stored in csv files and process then. Since each process take a long time and I have some 8 processors to use, I was trying to use the Pool method from Multiprocessing.
This is how I structured the multiprocessing call:
pool = Pool()
vector_components = []
for sample in range(samples):
vector_field_x_i = vector_field_samples_x[sample]
vector_field_y_i = vector_field_samples_y[sample]
vector_component = pool.apply_async(vector_field_decomposer, args=(x_dim, y_dim, x_steps, y_steps,
vector_field_x_i, vector_field_y_i))
vector_components.append(vector_component)
pool.close()
pool.join()
vector_components = map(lambda k: k.get(), vector_components)
for vector_component in vector_components:
CsvH.write_vector_field(vector_component, '../CSV/RotationalFree/rotational_free_x_'+str(sample)+'.csv')
I was running a data set of 500 samples of size equal to 100 (x_dim) by 100 (y_dim).
Until then everything worked fine.
Then I receive a data set of 500 samples of 400 x 400.
When running it, I get an error when calling the get.
I also tried to run a single sample of 400 x 400 and got the same error.
Traceback (most recent call last):
File "__init__.py", line 33, in <module>
VfD.samples_vector_field_decomposer(samples, x_dim, y_dim, x_steps, y_steps, vector_field_samples_x, vector_field_samples_y)
File "/export/home/pceccon/VectorFieldDecomposer/Sources/Controllers/VectorFieldDecomposerController.py", line 43, in samples_vector_field_decomposer
vector_components = map(lambda k: k.get(), vector_components)
File "/export/home/pceccon/VectorFieldDecomposer/Sources/Controllers/VectorFieldDecomposerController.py", line 43, in <lambda>
vector_components = map(lambda k: k.get(), vector_components)
File "/export/home/pceccon/.pyenv/versions/2.7.5/lib/python2.7/multiprocessing/pool.py", line 554, in get
raise self._value
MemoryError
What should I do?
Thank you in advance.
Right now you're keeping several lists in memory - vector_field_x, vector_field_y, vector_components, and then a separate copy of vector_components during the map call (which is when you actually run out of memory). You can avoid needing either copy of the vector_components list by using pool.imap, instead of pool.apply_async along with a manually created list. imap returns an iterator instead of a complete list, so you never have all the results in memory.
Normally, pool.map breaks the iterable passed to it into chunks, and sends the those chunks to the child processes, rather than sending one element at a time. This helps improve performance. Because imap uses an iterator instead of a list, it doesn't know the complete size of the iterable you're passing to it. Without knowing the size of the iterable, it doesn't know how big to make each chunk, so it defaults to a chunksize of 1, which will work, but may not perform all that well. To avoid this, you can provide it with a good chunksize argument, since you know the iterable is sample elements long. It may not make much difference for your 500 element list, but it's worth experimenting with.
Here's some sample code that demonstrates all this:
import multiprocessing
from functools import partial
def vector_field_decomposer(x_dim, y_dim, x_steps, y_steps, vector_fields):
vector_field_x_i = vector_fields[0]
vector_field_y_i = vector_fields[1]
# Do whatever is normally done here.
if __name__ == "__main__":
num_workers = multiprocessing.cpu_count()
pool = multiprocessing.Pool(num_workers)
# Calculate a good chunksize (based on implementation of pool.map)
chunksize, extra = divmod(samples // 4 * num_workers)
if extra:
chunksize += 1
# Use partial so many arguments can be passed to vector_field_decomposer
func = partial(vector_field_decomposer, x_dim, y_dim, x_steps, y_steps)
# We use a generator expression as an iterable, so we don't create a full list.
results = pool.imap(func,
((vector_field_samples_x[s], vector_field_samples_y[s]) for s in xrange(samples)),
chunksize=chunksize)
for vector in results:
CsvH.write_vector_field(vector_component,
'../CSV/RotationalFree/rotational_free_x_'+str(sample)+'.csv')
pool.close()
pool.join()
This should allow you to avoid the MemoryError issues, but if not, you could try running imap over smaller chunks of your total sample, and just do multiple passes. I don't think you'll have any issues though, because you're not building any additional lists, other than the vector_field_* lists you start with.

Categories