So I am trying to solve a linear programming problem with around 10,000 binary variables using the PULP python library. It's taking me a lot of time to solve the problem.
I was wondering if there is anyway for the code to use GPUs available in Colab to solve these linear programming issues.
GPUs have little or no advantage for general large, sparse LP and MIP models. Apart from some academic exercises on highly structured problems, there are few or no solvers available that use GPUs. The underlying problem is that GPUs are really good for data-parallel problems (SIMD architecture). Large, sparse LPs are different.
Related
I am interested in the performance of Pyomo to generate an OR model with a huge number of constraints and variables (about 10e6). I am currently using GAMS to launch the optimizations but I would like to use the different python features and therefore use Pyomo to generate the model.
I made some tests and apparently when I write a model, the python methods used to define the constraints are called each time the constraint is instanciated. Before going further in my implementation, I would like to know if there exists a way to create directly a block of constraints based on numpy array data ? From my point of view, constructing constraints by block may be more efficient for large models.
Do you think it is possible to obtain performance comparable to GAMS or other AML languages with pyomo or other python modelling library ?
Thanks in advance for your help !
While you can use NumPy data when creating Pyomo constraints, you cannot currently create blocks of constraints in a single NumPy-style command with Pyomo. Fow what it's worth, I don't believe that you can in languages like AMPL or GAMS, either. While Pyomo may eventually support users defining constraints using matrix and vector operations, it is not likely that that interface would avoid generating the individual constraints, as the solver interfaces (e.g., NL, LP, MPS files) are all "flat" representations that explicit represent individual constraints. This is because Pyomo needs to explicitly generate representations of the algebra (i.e., the expressions) to send out to the solvers. In contrast, NumPy only has to calculate the result: it gets its efficiency by creating the data in a C/C++ backend (i.e., not in Python), relying on low-level BLAS operations to compute the results efficiently, and only bringing the result back to Python.
As far as performance and scalability goes, I have generated raw models with over 13e6 variables and 21e6 constraints. That said, Pyomo was designed for flexibility and extensibility over speed. Runtimes in Pyomo can be an order of magnitude slower than AMPL when using cPython (although that can shrink to within a factor of 4 or 5 using pypy). At least historically, AMPL has been faster than GAMS, so the gap between Pyomo and GAMS should be smaller.
I was also wondering the same when I came across this piece of code from Jonas Hörsch and Tom Brown and it was very useful to me:
https://github.com/FRESNA/PyPSA/blob/master/pypsa/opt.py
They define classes to define constraints more efficiently than the original Pyomo parser do. I did some tests on a large model that I have and it reduced the generation time considerably.
You can build big linear (LP) and mix-integer (MILP) optimization problems in Python with the open-source tool Linopy. Linopy promises a speedup of times 4-6 and a memory reduction of roughly 50% reaching roughly Julia JUMP performance. See the benchmark:
The tool is part of the PyPSA ecosystem. This tool is the next-level version of the PyPSA 'opt.py' developments that Jon Cardodo mentioned. It has roughly the same speed, same performance but better usability -- reported by developers.
I am new to machine learning and Python. I am trying to understand when to use the functions in sklearn.linear_model (linearregression and logisticregression) and when to implement my own code for the same. Any suggestions or references will be highly appreciated.
regards
Souvik
I recommend you use as much as possible the functions given by sklearn or another ML library (I like TensorFlow). That's because it's very difficult to get the performance of any library. They are calculating in a low level of the operating system, meanwhile the common users don't implement the computational actions outside the python environment.
Moreover, python by itself is not very efficient in relation to the data structures. For example,a simple array is implemented as a LinkedList. The ML libraries use Numpy to their computations to get a better performance.
MATLAB has a nice function gpuArray(); for exploiting the GPU to perform operations at lightning speed (especially if a good graphics card is installed). For neural networks, this is necessary, as input matrices are huge and many complex operations are performed.
In matlab this is simply done by this piece of code,
G = gpuArray(ones(100, 'uint32'));
Is there something similar in Python? That is, a library which is open-source and as easy to use as MATLAB's GPU lib.
Thanks in advance!
I think you"ll find what you're looking for in PyCUDA:
http://documen.tician.de/pycuda/array.html
sorry if this all seem nooby and unclear, but I'm currently learning Netlogo to model agent-based collective behavior and would love to hear some advice on alternative software choices. My main thing is that I'd very much like to take advantage of PyCuda since, from what I understand, it enables parallel computation. However, does that mean I still have to write the numerical script in some other environment and implement the visuals in yet another one???
If so, my questions are:
What numerical package should I use? PyEvolve, DEAP, or something else? It appears that PyEvolve is no longer being developed and DEAP is just a wrapper on the outdated(?) EAP.
Graphic-wise, I find mayavi2 and vtk promising. The problem is, none of the numerical package seems to bind to these readily. Is there no better alternative than to save the numerical output to datafile and feed them into, say, mayavi2?
Another option is to generate the data via Netlogo and feed them into a graphing package from (2). Is there any disadvantage to doing this?
Thank you so much for shedding light on this confusion.
You almost certainly do not want to use CUDA unless you are running into a significant performance problem. In general CUDA is best used for solving floating point linear algebra problems. If you are looking for a framework built around parallel computations, I'd look towards OpenCL which can take advantage of GPUs if needed..
In terms of visualization, I'd strongly suggest targeting a a specific data interchange format and then letting some other program do that rendering for you. The only reason I'd use something like VTK is if for some reason you need more control over the visualization process or you are looking for a real time solution.
Probably the best choice for visualization would be to use an intermediate format and do it in another program. But for performance, i'd rather configure a JVM for a cluster and run NetLogo on it. I've not tried it yet but i'm thinking seriously to try NetLogo on a Beowulf style cluster.
BTW, there is an ABM platform called Repast that is said to have Python interface if you're planning to implement your code in Python.
long-time R and Python user here. I use R for my daily data analysis and Python for tasks heavier on text processing and shell-scripting. I am working with increasingly large data sets, and these files are often in binary or text files when I get them. The type of things I do normally is to apply statistical/machine learning algorithms and create statistical graphics in most cases. I use R with SQLite sometimes and write C for iteration-intensive tasks; before looking into Hadoop, I am considering investing some time in NumPy/Scipy because I've heard it has better memory management [and the transition to Numpy/Scipy for one with my background seems not that big] - I wonder if anyone has experience using the two and could comment on the improvements in this area, and if there are idioms in Numpy that deal with this issue. (I'm also aware of Rpy2 but wondering if Numpy/Scipy can handle most of my needs). Thanks -
R's strength when looking for an environment to do machine learning and statistics is most certainly the diversity of its libraries. To my knowledge, SciPy + SciKits cannot be a replacement for CRAN.
Regarding memory usage, R is using a pass-by-value paradigm while Python is using pass-by-reference. Pass-by-value can lead to more "intuitive" code, pass-by-reference can help optimize memory usage. Numpy also allows to have "views" on arrays (kind of subarrays without a copy being made).
Regarding speed, pure Python is faster than pure R for accessing individual elements in an array, but this advantage disappears when dealing with numpy arrays (benchmark). Fortunately, Cython lets one get serious speed improvements easily.
If working with Big Data, I find the support for storage-based arrays better with Python (HDF5).
I am not sure you should ditch one for the other but rpy2 can help you explore your options about a possible transition (arrays can be shuttled between R and Numpy without a copy being made).
I use NumPy daily and R nearly so.
For heavy number crunching, i prefer NumPy to R by a large margin (including R packages, like 'Matrix') I find the syntax cleaner, the function set larger, and computation is quicker (although i don't find R slow by any means). NumPy's Broadcasting functionality for instance, i do not think has an analog in R.
For instance, to read in a data set from a csv file and 'normalize' it for input to an ML algorithm (e.g., mean center then re-scale each dimension) requires just this:
data = NP.loadtxt(data1, delimiter=",") # 'data' is a NumPy array
data -= NP.mean(data, axis=0)
data /= NP.max(data, axis=0)
Also, i find that when coding ML algorithms, i need data structures that i can operate on element-wise and that also understand linear algebra (e.g., matrix multiplication, transpose, etc.). NumPy gets this and allows you to create these hybrid structures easily (no operator overloading or subclassing, etc.).
You won't be disappointed by NumPy/SciPy, more likely you'll be amazed.
So, a few recommendations--in general and in particular, given the facts in your question:
install both NumPy and Scipy. As a rough guide, NumPy provides the
core data structures (in particular
the ndarray) and SciPy (which is
actually several times larger than
NumPy) provides the domain-specific
functions (e.g., statistics, signal
processing, integration).
install the repository versions, particularly w/r/t NumPy because the
dev version is 2.0. Matplotlib and NumPy are tightly integrated, you can use one without the other of course, but both are the best in their respective class among python libraries. You can get all three via easy_install, which i assume you already.
NumPy/SciPy have several modules
specifically directed to Machine
Learning/Statistics, including the Clustering package and the Statistics package.
As well as packages directed to
general computation, but which are
make coding ML algorithms a lot
faster, in particular,
Optimization and Linear Algebra.
There are also the SciKits, not included in the base NumPy or
SciPy libraries; you need to install them separately.
Generally speaking, each SciKit is a
set of convenience wrappers to
streamline coding in a given domain. The SciKits you are likely to find most relevant are: ann (approximate Nearest Neighbor), and learn (a set of ML/Statistics regression and classification algorithms, e.g., Logistic Regression, Multi-Layer Perceptron, Support Vector Machine).