MATLAB has a nice function gpuArray(); for exploiting the GPU to perform operations at lightning speed (especially if a good graphics card is installed). For neural networks, this is necessary, as input matrices are huge and many complex operations are performed.
In matlab this is simply done by this piece of code,
G = gpuArray(ones(100, 'uint32'));
Is there something similar in Python? That is, a library which is open-source and as easy to use as MATLAB's GPU lib.
Thanks in advance!
I think you"ll find what you're looking for in PyCUDA:
http://documen.tician.de/pycuda/array.html
Related
So I am trying to solve a linear programming problem with around 10,000 binary variables using the PULP python library. It's taking me a lot of time to solve the problem.
I was wondering if there is anyway for the code to use GPUs available in Colab to solve these linear programming issues.
GPUs have little or no advantage for general large, sparse LP and MIP models. Apart from some academic exercises on highly structured problems, there are few or no solvers available that use GPUs. The underlying problem is that GPUs are really good for data-parallel problems (SIMD architecture). Large, sparse LPs are different.
I'm trying to perform an evaluation of total floating-point operations (FLOPs) of a neural network.
My problem is the following. I'm using a sigmoid function. My question is how to eval the FLOPs of the exponential function. I'm using Tensorflow which relies on NumPy for the exp function.
I tried to dig into the Numpy code but didn't find the implementation ... I saw some subjects here talking about fast implementation of exponential but it doesn't really help.
My guess is that it would use a Taylor implementation or Chebychev.
Do you have any clue about this? And if so an estimation of the amount of FLOPs. I tried to find some references as well on Google but nothing really standardized ...
Thank you a lot for your answers.
I looked into it for a bit and what i found is that numpy indeed uses the C implementation as seen here.
Tensorflow though doesnt use nmpy implementation, instead it uses the scalar_logistics_opfunction from the C++ library called Eigen. The source for that can be found here.
sorry if this all seem nooby and unclear, but I'm currently learning Netlogo to model agent-based collective behavior and would love to hear some advice on alternative software choices. My main thing is that I'd very much like to take advantage of PyCuda since, from what I understand, it enables parallel computation. However, does that mean I still have to write the numerical script in some other environment and implement the visuals in yet another one???
If so, my questions are:
What numerical package should I use? PyEvolve, DEAP, or something else? It appears that PyEvolve is no longer being developed and DEAP is just a wrapper on the outdated(?) EAP.
Graphic-wise, I find mayavi2 and vtk promising. The problem is, none of the numerical package seems to bind to these readily. Is there no better alternative than to save the numerical output to datafile and feed them into, say, mayavi2?
Another option is to generate the data via Netlogo and feed them into a graphing package from (2). Is there any disadvantage to doing this?
Thank you so much for shedding light on this confusion.
You almost certainly do not want to use CUDA unless you are running into a significant performance problem. In general CUDA is best used for solving floating point linear algebra problems. If you are looking for a framework built around parallel computations, I'd look towards OpenCL which can take advantage of GPUs if needed..
In terms of visualization, I'd strongly suggest targeting a a specific data interchange format and then letting some other program do that rendering for you. The only reason I'd use something like VTK is if for some reason you need more control over the visualization process or you are looking for a real time solution.
Probably the best choice for visualization would be to use an intermediate format and do it in another program. But for performance, i'd rather configure a JVM for a cluster and run NetLogo on it. I've not tried it yet but i'm thinking seriously to try NetLogo on a Beowulf style cluster.
BTW, there is an ABM platform called Repast that is said to have Python interface if you're planning to implement your code in Python.
I want to simulate a propagating wave with absorption and reflection on some bodies in three dimensional space. I want to do it with python. Should I use numpy? Are there some special libraries I should use?
How can I simulate the wave? Can I use the wave equation? But what if I have a reflection?
Is there a better method? Should I do it with vectors? But when the ray diverge the intensity gets lower. Difficult.
Thanks in advance.
If you do any computationally intensive numerical simulation in Python, you should definitely use NumPy.
The most general algorithm to simulate an electromagnetic wave in arbitrarily-shaped materials is the finite-difference time domain method (FDTD). It solves the wave equation, one time-step at a time, on a 3-D lattice. It is quite complicated to program yourself, though, and you are probably better off using a dedicated package such as Meep.
There are books on how to write your own FDTD simulations: here's one, here's a document with some code for 1-D FDTD and explanations on more than 1 dimension, and Googling "writing FDTD" will find you more of the same.
You could also approach the problem by assuming all your waves are plane waves, then you could use vectors and the Fresnel equations. Or if you want to model Gaussian beams being transmitted and reflected from flat or curved surfaces, you could use the ABCD matrix formalism (also known as ray transfer matrices). This takes into account the divergence of beams.
If you are solving 3D custom PDEs, I would recommend at least a look at FiPy. It'll save you the trouble of building a lot of your matrix conditioners and solvers from scratch. It uses numpy and/or trilinos. Here are some examples.
I recommend you use my project GarlicSim as the framework in which you build the simulation. You will still need to write your algorithm yourself, probably in Numpy, but GarlicSim may save you a bunch of boilerplate and allow you to explore your simulation results in a flexible way, similar to version control systems.
Don't use Python. I've tried using it for computationally expensive things and it just wasn't made for that.
If you need to simulate a wave in a Python program, write the necessary code in C/C++ and export it to Python.
Here's a link to the C API: http://docs.python.org/c-api/
Be warned, it isn't the easiest API in the world :)
long-time R and Python user here. I use R for my daily data analysis and Python for tasks heavier on text processing and shell-scripting. I am working with increasingly large data sets, and these files are often in binary or text files when I get them. The type of things I do normally is to apply statistical/machine learning algorithms and create statistical graphics in most cases. I use R with SQLite sometimes and write C for iteration-intensive tasks; before looking into Hadoop, I am considering investing some time in NumPy/Scipy because I've heard it has better memory management [and the transition to Numpy/Scipy for one with my background seems not that big] - I wonder if anyone has experience using the two and could comment on the improvements in this area, and if there are idioms in Numpy that deal with this issue. (I'm also aware of Rpy2 but wondering if Numpy/Scipy can handle most of my needs). Thanks -
R's strength when looking for an environment to do machine learning and statistics is most certainly the diversity of its libraries. To my knowledge, SciPy + SciKits cannot be a replacement for CRAN.
Regarding memory usage, R is using a pass-by-value paradigm while Python is using pass-by-reference. Pass-by-value can lead to more "intuitive" code, pass-by-reference can help optimize memory usage. Numpy also allows to have "views" on arrays (kind of subarrays without a copy being made).
Regarding speed, pure Python is faster than pure R for accessing individual elements in an array, but this advantage disappears when dealing with numpy arrays (benchmark). Fortunately, Cython lets one get serious speed improvements easily.
If working with Big Data, I find the support for storage-based arrays better with Python (HDF5).
I am not sure you should ditch one for the other but rpy2 can help you explore your options about a possible transition (arrays can be shuttled between R and Numpy without a copy being made).
I use NumPy daily and R nearly so.
For heavy number crunching, i prefer NumPy to R by a large margin (including R packages, like 'Matrix') I find the syntax cleaner, the function set larger, and computation is quicker (although i don't find R slow by any means). NumPy's Broadcasting functionality for instance, i do not think has an analog in R.
For instance, to read in a data set from a csv file and 'normalize' it for input to an ML algorithm (e.g., mean center then re-scale each dimension) requires just this:
data = NP.loadtxt(data1, delimiter=",") # 'data' is a NumPy array
data -= NP.mean(data, axis=0)
data /= NP.max(data, axis=0)
Also, i find that when coding ML algorithms, i need data structures that i can operate on element-wise and that also understand linear algebra (e.g., matrix multiplication, transpose, etc.). NumPy gets this and allows you to create these hybrid structures easily (no operator overloading or subclassing, etc.).
You won't be disappointed by NumPy/SciPy, more likely you'll be amazed.
So, a few recommendations--in general and in particular, given the facts in your question:
install both NumPy and Scipy. As a rough guide, NumPy provides the
core data structures (in particular
the ndarray) and SciPy (which is
actually several times larger than
NumPy) provides the domain-specific
functions (e.g., statistics, signal
processing, integration).
install the repository versions, particularly w/r/t NumPy because the
dev version is 2.0. Matplotlib and NumPy are tightly integrated, you can use one without the other of course, but both are the best in their respective class among python libraries. You can get all three via easy_install, which i assume you already.
NumPy/SciPy have several modules
specifically directed to Machine
Learning/Statistics, including the Clustering package and the Statistics package.
As well as packages directed to
general computation, but which are
make coding ML algorithms a lot
faster, in particular,
Optimization and Linear Algebra.
There are also the SciKits, not included in the base NumPy or
SciPy libraries; you need to install them separately.
Generally speaking, each SciKit is a
set of convenience wrappers to
streamline coding in a given domain. The SciKits you are likely to find most relevant are: ann (approximate Nearest Neighbor), and learn (a set of ML/Statistics regression and classification algorithms, e.g., Logistic Regression, Multi-Layer Perceptron, Support Vector Machine).