GEKKO "Memory allocation failed" - python

I'm trying to use GEKKO to solve quite a large optimization problem locally (with remote=False).
When running the code, I get the error:
Error: At line 463 of file custom.f90
Traceback: not available, compile with -ftrace=frame or -ftrace=full
Operating system error: Not enough memory resources are available to process this command.
Memory allocation failed
So that hints that the operating system doesn't let GEKKO use enough memory.
However, I'm using a 32GB RAM machine, with nearly 25 GB free, while the model probably don't even need 10GB.
I've tried using m.options.MAX_MEMORY = 10, but this doesn't seem to matter.
Any thoughts on how to allow it to allocate more memory?
Here is some (simplified) code that triggers this error:
from gekko import GEKKO
quantiles = [(x+1)*.01 for x in range(300)]
#Initialize Model
m = GEKKO(remote=False)
#Set global options
m.options.IMODE = 3 #steady state optimization
m.options.SOLVER=3
m.options.MAX_ITER=100000
m.options.MAX_MEMORY = 10
m.options.REDUCE=10
#initialize variables
Est_array = m.Array(m.Var,(2, 16),value=1,lb=0,ub=48)
P_ij_t = m.Array(m.Var,(4, 16, 300), lb=0, ub=1)
Exp_ij_t = m.Array(m.Var,(4, 16, 300),value=1,lb=-36,ub=36)
C_t = m.Array(m.Var,300,lb=0,ub=5)
#Equations
for h in range(16):
for q in range(300):
m.Equation(m.sum([P_ij_t[i,h,q] for i in range(3)]) == 1)
for (q,t) in enumerate(quantiles):
m.Equation(C_t[q] == ( m.sum([P_ij_t[i+2,h,q]*(Est_array[i,h]-t)**2 for i in range(2) for h in range(16)]) + \
m.sum([P_ij_t[i,h,q]*(Est_array[1-i,15-h]-t)**2 for i in range(2) for h in range(16)])
)
)
#Objective
m.Minimize(C_t[0])
#Solve simulation
#m.open_folder()
m.solve()
#Results
print('C = ' + str(C_t[0].value[0]))
(All of the m.options.* parameters are things that I tried to get the solver to run, but none seem to help with the memory allocation problem).

The Windows binary is 32-bit while the Linux, MacOS, and ARM Linux are 64-bit executables when remote=False with Gekko v1.0.2. With remote=True, it runs on a Linux server that has 64 GB of RAM and uses a 64-bit executable. It is running into a memory limit issue with the Windows binary up to 4 GB RAM because of the 32-bit executable. The 64-bit executables have a 16 billion GB capacity (no limit). The 64-bit Windows local executable is a planned development with a future release. The Linux VM or an APM Linux server (such as host IP 10.0.0.10) are options for those who need to solve large problems on a Local Network with a Windows local gekko client.
m = GEKKO(remote=True, server='https://10.0.0.10')

Related

Step by step guide on how to run Gekko optimization locally

I am new to programming and my first Python project is nonlinear programming. I am using the Gekko Optimization Suite and I have everything running properly, but need guidance on how exactly to run it locally. Below is the explanation and code provided by the documentation, but I could use some help on how exactly to do it myself and what exactly it all means. Please act as if you are explaining to a small child or a golden retriever.
The run directory m.path contains the model file gk0_model.apm and
other files required to run the optimization problem either remotely
(default) or locally (m=GEKKO(remote=False)). Use m.open_folder() to
open the run directory. The run directory also contains diagnostic
files such as infeasibilities.txt that is produced if the solver fails
to find a solution. The default run directory can be changed:
from gekko import GEKKO
import numpy as np
import os
# create and change run directory
rd=r'.\RunDir'
if not os.path.isdir(os.path.abspath(rd)):
os.mkdir(os.path.abspath(rd))
m = GEKKO(remote=False) # solve locally
m.path = os.path.abspath(rd) # change run directory
Local Solve
m=GEKKO(remote=False)
The only option needed to run Gekko locally is remote=False. With remote=False, no Internet connection is required. There is no need to change the run directory. A default directory at m.path is created in the temporary folder to store the files that are compiled to byte code. This folder can be accessed with m.open_folder().
Local Solve with Intranet Server
m=GEKKO(remote=True,server='http://10.0.0.10')
There is an APMonitor server option (see Windows APMonitor Server or Linux APMonitor Server) for remote=True and server=http://10.0.0.10 (change to IP of local Intranet server). This is used as a local compute engine that runs control and optimization problems for microprocessors. This is useful for compute architectures that do not have sufficient memory or CPU power to solve the optimization problems, but when it is desirable to solve locally. This is an Edge compute option to complete the solution in the required cycle time (e.g. for a Model Predictive Controller). Some organizations use this option to have multiple clients that connect to one compute engine. This compute server can be upgraded so that all of the clients automatically use the updated version.
Remote Solve
m=GEKKO(remote=True)
A public server is available as a default server option. With remote=True (default option), Gekko sends the optimization problem to a remote server and then returns the solution. The public server is running a Linux APMonitor Server but with additional solver options that aren't in the local options due to distribution constraints.
Example Gekko and Scipy Optimize Minimize Solutions
GEKKO is a Python package for machine learning and optimization of mixed-integer and differential algebraic equations (see documentation). It is coupled with large-scale solvers for linear, quadratic, nonlinear, and mixed integer programming (LP, QP, NLP, MILP, MINLP). Modes of operation include parameter regression, data reconciliation, real-time optimization, dynamic simulation, and nonlinear predictive control. GEKKO is an object-oriented Python library to facilitate local execution of APMonitor. Below is a simple optimization example with remote=False (local solution mode). There are local options for MacOS, Windows, Linux, and Linux ARM. Other architectures need to use the remote=True option.
Python Gekko
from gekko import GEKKO
m = GEKKO(remote=False)
x = m.Array(m.Var,4,value=1,lb=1,ub=5)
x1,x2,x3,x4 = x
# change initial values
x2.value = 5; x3.value = 5
m.Equation(x1*x2*x3*x4>=25)
m.Equation(x1**2+x2**2+x3**2+x4**2==40)
m.Minimize(x1*x4*(x1+x2+x3)+x3)
m.solve()
print('x: ', x)
print('Objective: ',m.options.OBJFCNVAL)
Scipy Optimize Minimize
import numpy as np
from scipy.optimize import minimize
def objective(x):
return x[0]*x[3]*(x[0]+x[1]+x[2])+x[2]
def constraint1(x):
return x[0]*x[1]*x[2]*x[3]-25.0
def constraint2(x):
sum_eq = 40.0
for i in range(4):
sum_eq = sum_eq - x[i]**2
return sum_eq
# initial guesses
n = 4
x0 = np.zeros(n)
x0[0] = 1.0
x0[1] = 5.0
x0[2] = 5.0
x0[3] = 1.0
# show initial objective
print('Initial Objective: ' + str(objective(x0)))
# optimize
b = (1.0,5.0)
bnds = (b, b, b, b)
con1 = {'type': 'ineq', 'fun': constraint1}
con2 = {'type': 'eq', 'fun': constraint2}
cons = ([con1,con2])
solution = minimize(objective,x0,method='SLSQP',\
bounds=bnds,constraints=cons)
x = solution.x
# show final objective
print('Final Objective: ' + str(objective(x)))
# print solution
print('Solution')
print('x1 = ' + str(x[0]))
print('x2 = ' + str(x[1]))
print('x3 = ' + str(x[2]))
print('x4 = ' + str(x[3]))
Additional Examples
There are many additional optimization packages in Python and additional Gekko tutorials and benchmark problems. One additional example is a Mixed Integer Linear Programming solution.
from gekko import GEKKO
m = GEKKO(remote=False)
x,y = m.Array(m.Var,2,integer=True,lb=0)
m.Maximize(y)
m.Equations([-x+y<=1,
3*x+2*y<=12,
2*x+3*y<=12])
m.options.SOLVER = 1
m.solve()
print('Objective: ', -m.options.OBJFCNVAL)
print('x: ', x.value[0])
print('y: ', y.value[0])
The APOPT solver is a Mixed Integer Nonlinear Programming (MINLP) solver (that also solves MILP problems) and is included as a local solver for MacOS, Linux, and Windows.

Conda Numba Cuda: libNVVM cannot be found

My development environment is: Ubuntu 18.04.5 LTS, Python3.6 and I have installed via conda (numba and cudatoolkit). Nvidia GPU GeForce GTX 1050 Ti, which is supported by cuda.
The installation of conda and numba seem to work as intended as I can import numba within python3.6 scripts.
The problem seems identical situation to the question asked here: Cuda: library nvvm not found
but none of the proposed solutions seem to work in my case, and I'm not sure how to highlight my situation properly (I can't do it through an answer in the other thread...). If raising a duplicate of the question is inappropriate, then guide me to proper conduct.
When I try to run the code below I get the following error: numba.cuda.cudadrv.error.NvvmSupportError: libNVVM cannot be found. Do conda install cudatoolkit: library nvvm not found
from numba import cuda, float32
#Controls threads per block and shared memory usage.
#The computation will be done on blocks of TPBxTPB elements.
TPB = 16
#cuda.jit
def fast_matmul(A, B, C):
# Define an array in the shared memory
# The size and type of the arrays must be known at compile time
sA = cuda.shared.array(shape=(TPB, TPB), dtype=float32)
sB = cuda.shared.array(shape=(TPB, TPB), dtype=float32)
x, y = cuda.grid(2)
tx = cuda.threadIdx.x
ty = cuda.threadIdx.y
bpg = cuda.gridDim.x # blocks per grid
if x >= C.shape[0] and y >= C.shape[1]:
# Quit if (x, y) is outside of valid C boundary
return
# Each thread computes one element in the result matrix.
# The dot product is chunked into dot products of TPB-long vectors.
tmp = 0.
for i in range(bpg):
# Preload data into shared memory
sA[tx, ty] = A[x, ty + i * TPB]
sB[tx, ty] = B[tx + i * TPB, y]
# Wait until all threads finish preloading
cuda.syncthreads()
# Computes partial product on the shared memory
for j in range(TPB):
tmp += sA[tx, j] * sB[j, ty]
# Wait until all threads finish computing
cuda.syncthreads()
C[x, y] = tmp
import numpy as np
matrix_A = np.array([[0.1,0.2],[0.1,0.2]])
Doing as suggested and running conda install cudatoolkit does not work. I have tried many variations on this install that I've found online to no avail.
In the other post a solution that seems to have worked for many is to add lines about environment variables in the .bashrc file in the home directory. The suggestions however refer to files that exist in the /usr directory, where I have no cuda data since I've installed through conda. I have tried many variations on these exports without success. This is perhaps where the solution lies, but if so then the solution would benefit from being generalized.
Does anyone have any up-to-date or generalized solutions to this problem?
EDIT: adding information from terminal outputs (thanks for the hint of editing the question to do so)
> conda list numba
# packages in environment at /home/tobka/anaconda3:
#
# Name Version Build Channel
numba 0.51.2 py38h0573a6f_1
> conda list cudatoolkit
# packages in environment at /home/tobka/anaconda3:
#
# Name Version Build Channel
cudatoolkit 11.0.221 h6bb024c_0
Also adding output from numba -s: https://pastebin.com/raw/6u1MUkxg
Idea of possible cause (not yet confirmed): I noticed in the numba -s output that it specifies Python Version: 3.8.3, where I've been explicitly using python3.6 in the terminal since simply using python has usually meant using python2.7. I checked however, and my system now uses Python 3.8.3 with the python command, and Python 3.6.9 with the python3.6. And when running the code using python I get a syntax error instead, which is a good sign: raise ValueError(missing_launch_config_msg).
I will try to fix the syntax errors and confirm that the code works, after which I will report here of the situation.
Confirmation of solution: using python instead of python3.6 in the terminal solved the problem. The root cause was the user.

Python: Scipy very slow on Windows10 compared to MacOS

emphasized texti tried to execute a simple python program on my macbook pro 15 on my two partition: MacOS Mojave and Windows 10.
I use spsolve function for solve a sparse linear system on some matrix and I see that the same code with the same matrix is so much slower on Windows compared Macos.
For example:
matrix 1 -> MacOs: 29 sec / Windows: 377 sec
On MacOS, when I perform these calculations, the processor goes to full speed and I feel the fan turning strong.
On Windows this does not happen, the processor remains at 20%.
I use Python 3 64bit on both systems.
from scipy import array, linalg, dot
import scipy.io as sio
import numpy as np
import time
from scipy.sparse.linalg import spsolve
matrix_names = ['cfd1']
for matrice in matrix_names:
mat = sio.loadmat('/matrix_path/%s' %matrice)
A = mat['Problem']['A']
A=A[0][0]
matrix_size = np.shape(A)[0]
xe = np.ones(matrix_size)
b = A * xe
start = time.time()
X = spsolve(A, b)
end = time.time()
print("Times %.6f sec" %(end-start))
The slow function is
X = spsolve(A, b)
I find the problem.
The MKL library is not implemented by default on windows.
I'm not sure that on MacOS it's integrated, however using Anaconda on Windows (which implements Scipy with MKL library) the Python file is executed as fast as on MacOS.

Anaconda package for cufft keeping arrays in gpu memory between fft / ifft calls

I am using the anaconda suite with ipython 3.6.1 and their accelerate package. There is a cufft sub-package in this two functions fft and ifft. These, as far as I understand, takes in a numpy array and outputs to a numpy array, both in system ram, i.e. all gpu-memory and transfer between system and gpu memory is handled automatically and gpu memory is releaseed as function is ended. This seems all very nice and seems to work for me. However, I would like to run multiple fft/ifft calls on the same array and for each time extract just one number from the array. It would be nice to keep the array in the gpu memory to minimize system <-> gpu transfer. Am I correct that this is not possible using this package? If so, is there another package that would do the same. I have noticed the reikna project but that doesn't seem available in anaconda.
The thing I am doing (and would like to do efficiently on gpu) is in short shown here using numpy.fft
import math as m
import numpy as np
import numpy.fft as dft
nr = 100
nh = 2**16
h = np.random.rand(nh)*1j
H = np.zeros(nh,dtype='complex64')
h[10] = 1
r = np.zeros(nr,dtype='complex64')
fftscale = m.sqrt(nh)
corr = 0.12j
for i in np.arange(nr):
r[i] = h[10]
H = dft.fft(h,nh)/fftscale
h = dft.ifft(h*corr)*fftscale
r[nr-1] = h[10]
print(r)
Thanks in advance!
So I found Arrayfire which seems rather easy to work with.

How configure theano on Windows?

I have Installed Theano on Windows machine and followed the configuration instructions.
I placed the following .theanorc.txt file in C:\Users\my_username folder:
#!sh
[global]
device = gpu
floatX = float32
[nvcc]
fastmath = True
# flags=-m32 # we have this hard coded for now
[blas]
ldflags =
# ldflags = -lopenblas # placeholder for openblas support
I tried to run the test, but haven't managed to run it on GPU. I guess the values from .theanorc.txt are not read, because I added the line print config.device and it outputs "cpu".
Below is the basic test script and the output:
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time
print config.device
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print f.maker.fgraph.toposort()
t0 = time.time()
for i in xrange(iters):
r = f()
t1 = time.time()
print 'Looping %d times took' % iters, t1 - t0, 'seconds'
print 'Result is', r
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
print 'Used the cpu'
else:
print 'Used the gpu'
output:
pydev debugger: starting (pid: 9564)
cpu
[Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)]
Looping 1000 times took 10.0310001373 seconds
Result is [ 1.23178032 1.61879341 1.52278065 ..., 2.20771815 2.29967753
1.62323285]
Used the cpu
I have installed CUDA Toolkit successfully but haven't managed to install pyCUDA. I guess Theano should work without pyCUDA installed anyway.
I would be very thankful if anyone could help out solving this problem. I have followed these instructions but don't know why the configuration values in the program don't match the values in .theanorc.txt file.
Contrary to what has been said on a couple of pages, my installation (Windows 10, Python 2.7, Theano 0.10.0.dev1) would not interpret config instructions within a .theanorc.txt file in my user profile folder, but would read a .theanorc file.
If you are having trouble creating a file with that style of name, use the following commands at a terminal:
cd %USERPROFILE%
type NUL > .theanorc
Sauce: http://ankivil.com/making-theano-faster-with-cudnn-and-cnmem-on-windows-10/
You are right that Theano does not need PyCUDA.
It is strange that Theano does not read your configuration file. The exact path that gets read is this. Just run this in Python and you'll see where to put it:
os.path.expanduser('~/.theanorc.txt')
Try to change the content in .theanorc.txt as indicating by Theano website ( http://deeplearning.net/software/theano/install_windows.html). The path needs to be changed accordingly based on your installation.
[global]
floatX = float32
device = gpu
[nvcc]
flags=-LC:\Users\cchan\Anaconda3\libs
compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin

Categories