I've been working with spatstat in R for quite a while now and am curious what the big differences between the packages (PySAL in python and spatstat in R) are functionality-wise. Is either more potent or faster, does one have more pre-set functions?
Thanks loads
This is not really an appropriate question for this forum.
However, the main differences between the two packages can be seen by reading their documentation: spatstat is designed for analysing spatial point patterns, is written by statisticians following statistical principles and conventions, contains current techniques from the statistical literature, and does not handle file input/output directly. PySAL is designed for spatial data in general (with relatively less functionality for spatial point patterns), is written by geographers, and includes capabilities for reading spatial data file formats.
Related
I am using Latent Dirichlet Allocation with a corpus of news data from six different sources. I am interested in topic evolution, emergence, and want to compare how the sources are alike and different from each other over time. I know that there are a number of modified LDA algorithms such as the Author-Topic model, Topics Over Time, and so on.
My issue is that very few of these alternate model specifications are implemented in any standard format. A few are available in Java, but most exist as conference papers only. What is the best way to go about implementing some of these algorithms on my own? I am fairly proficient in R and jags, and can stumble around in Python when given long enough. I am willing to write the code, but I don't really know where to start and I don't know C or Java. Can I build a model in JAGS or Python just having the formulas from the manuscript? If so, can someone point me at an example of doing this? Thanks.
My friend's response is below, pardon the language please.
First I wrote up a Python implementation of the collapsed Gibbs sampler seen here (http://www.pnas.org/content/101/suppl.1/5228.full.pdf+html) and fleshed out here (http://cxwangyi.files.wordpress.com/2012/01/llt.pdf). This was slow as balls.
Then I used a Python wrapping of a C implementation of this paper (http://books.nips.cc/papers/files/nips19/NIPS2006_0511.pdf). Which is fast as f*ck, but the results are not as great as one would see with NMF.
But NMF implementations I've seen, with scitkits, and even with the scipy sparse-compatible recently released NIMFA library, they all blow the f*ck up on any sizable corpus. My new white whale is a sliced, distributed implementation of the thing. This'll be non-trivial.
In Python, do you know of PyMC? It's flexible in specifying both the model and the fitting algorithm.
Also, when starting with R and JAGS, there is this tutorial on "Using JAGS in R with the rjags Package" together with a collection of examples.
I am preparing to build an application in Python that works with a lot of spatial data. I am looking for a Python module that provides a nice set of spatially-enabled classes that I can inherit from. Two things I would like to have baked in are:
Support for both vector and raster data, and conversions between both formats.
Support for projecting coordinates between datums.
The best module I have been able to find so far is shapely but it focuses on vector data and does not include support for datum transformations. An example of the kind of library I am looking for is the sp package for R which provides classes for holding both vector point data and dense or sparse raster data along with datum transformation support.
Are there any Python modules that provide a nice set of spatially enabled classes that I may be overlooking?
Did you try the Geospatial Data Abstraction Library?
I found it on Linux where its called python-gdal.
Abstract of Debians GDAL:
GDAL supports 40+ popular data formats, including commonly used
ones (GeoTIFF, JPEG, PNG and more) as well as the ones used in
GIS and remote sensing software packages (ERDAS Imagine,
ESRI Arc/Info, ENVI, PCI Geomatics). Also supported many remote
sensing and scientific data distribution formats such as HDF,
EOS FAST, NOAA L1B, NetCDF, FITS.
OGR library supports popular vector formats like ESRI Shapefile,
TIGER data, S57, MapInfo File, DGN, GML and more.
see on trac.osgeo.org
More precisely: Shapely is about planar computational geometry and nothing more. It's not a vector data library at all. I use it with Pyproj (http://code.google.com/p/pyproj/). I haven't come across any Python foundation classes for geospatial. They abound, of course, in Java projects like GeoTools. Python arrays could be a good starting point: arrays of coordinates can be used by Shapely and raster-like arrays can be used by GDAL. You might also take a look at the GeoJSON-ish interfaces provided by Shapely, ArcPy, and the SimpleGeo APIs.
Almost all Python modules are listed at the CheeseShop so start there. I can't find anything obvious, though.
RSGISLib is a suite of command line tools with python bindings for processing remote sensing/spatial data in both vector and raster format, which you may find useful.
The link to the webiste is http://www.rsgislib.org and it can easily be installed using Anaconda
long-time R and Python user here. I use R for my daily data analysis and Python for tasks heavier on text processing and shell-scripting. I am working with increasingly large data sets, and these files are often in binary or text files when I get them. The type of things I do normally is to apply statistical/machine learning algorithms and create statistical graphics in most cases. I use R with SQLite sometimes and write C for iteration-intensive tasks; before looking into Hadoop, I am considering investing some time in NumPy/Scipy because I've heard it has better memory management [and the transition to Numpy/Scipy for one with my background seems not that big] - I wonder if anyone has experience using the two and could comment on the improvements in this area, and if there are idioms in Numpy that deal with this issue. (I'm also aware of Rpy2 but wondering if Numpy/Scipy can handle most of my needs). Thanks -
R's strength when looking for an environment to do machine learning and statistics is most certainly the diversity of its libraries. To my knowledge, SciPy + SciKits cannot be a replacement for CRAN.
Regarding memory usage, R is using a pass-by-value paradigm while Python is using pass-by-reference. Pass-by-value can lead to more "intuitive" code, pass-by-reference can help optimize memory usage. Numpy also allows to have "views" on arrays (kind of subarrays without a copy being made).
Regarding speed, pure Python is faster than pure R for accessing individual elements in an array, but this advantage disappears when dealing with numpy arrays (benchmark). Fortunately, Cython lets one get serious speed improvements easily.
If working with Big Data, I find the support for storage-based arrays better with Python (HDF5).
I am not sure you should ditch one for the other but rpy2 can help you explore your options about a possible transition (arrays can be shuttled between R and Numpy without a copy being made).
I use NumPy daily and R nearly so.
For heavy number crunching, i prefer NumPy to R by a large margin (including R packages, like 'Matrix') I find the syntax cleaner, the function set larger, and computation is quicker (although i don't find R slow by any means). NumPy's Broadcasting functionality for instance, i do not think has an analog in R.
For instance, to read in a data set from a csv file and 'normalize' it for input to an ML algorithm (e.g., mean center then re-scale each dimension) requires just this:
data = NP.loadtxt(data1, delimiter=",") # 'data' is a NumPy array
data -= NP.mean(data, axis=0)
data /= NP.max(data, axis=0)
Also, i find that when coding ML algorithms, i need data structures that i can operate on element-wise and that also understand linear algebra (e.g., matrix multiplication, transpose, etc.). NumPy gets this and allows you to create these hybrid structures easily (no operator overloading or subclassing, etc.).
You won't be disappointed by NumPy/SciPy, more likely you'll be amazed.
So, a few recommendations--in general and in particular, given the facts in your question:
install both NumPy and Scipy. As a rough guide, NumPy provides the
core data structures (in particular
the ndarray) and SciPy (which is
actually several times larger than
NumPy) provides the domain-specific
functions (e.g., statistics, signal
processing, integration).
install the repository versions, particularly w/r/t NumPy because the
dev version is 2.0. Matplotlib and NumPy are tightly integrated, you can use one without the other of course, but both are the best in their respective class among python libraries. You can get all three via easy_install, which i assume you already.
NumPy/SciPy have several modules
specifically directed to Machine
Learning/Statistics, including the Clustering package and the Statistics package.
As well as packages directed to
general computation, but which are
make coding ML algorithms a lot
faster, in particular,
Optimization and Linear Algebra.
There are also the SciKits, not included in the base NumPy or
SciPy libraries; you need to install them separately.
Generally speaking, each SciKit is a
set of convenience wrappers to
streamline coding in a given domain. The SciKits you are likely to find most relevant are: ann (approximate Nearest Neighbor), and learn (a set of ML/Statistics regression and classification algorithms, e.g., Logistic Regression, Multi-Layer Perceptron, Support Vector Machine).
Is there a library of data structures and operations for quadratic bezier curves? I need to implement:
bezier to bitmap converting with arbitrary quality
optimizing bezier curves
common operations like subtraction, extraction, rendering etc.
languages: c,c++,.net,python
Algorithms without implementation (pseudocode or etc) could be useful too. (especially optimization)
A little bit of python lib is included in nodebox:
http://nodebox.net/code/index.php/Bezier
There are plenty of algorithms inside inkscape, but I did not digg the code yet to find, how easy they could be used outside if inkscape.
Update: Inkscape is using lib2geom:
lib2geom (2Geom in private life) was
initially a library developed for
Inkscape but will provide a robust
computational geometry framework for
any application. It is not a rendering
library, instead concentrating on high
level algorithms such as computing arc
length.
lib2geom is at http://lib2geom.sourceforge.net
You might want to take a look at Cairo. I am not exactly sure if it covers all your requirements but it should be able to handle rendering at least.
I would like to perform a few basic machine vision tasks using Python and I'd like to know where I could find tutorials to help me get started.
As far as I know, the only free library for Python that does machine vision is PyCV (which is a wrapper for OpenCV apparently), but I can't find any appropriate tutorials.
My main tasks are to acquire an image from FireWire. Segment the image in different regions. And then perform statistics on each regions to determine pixel area and center of mass.
Previously, I've used Matlab's Image Processing Tootlbox without any problems. The functions I would like to find an equivalent in Python are graythresh, regionprops and gray2ind.
Thanks!
OpenCV is probably your best bet for a library; you have your choice of wrappers for them. I looked at the SWIG wrapper that comes with the standard OpenCV install, but ended up using ctypes-opencv because the memory management seemed cleaner.
They are both very thin wrappers around the C code, so any C references you can find will be applicable to the Python.
OpenCV is huge and not especially well documented, but there are some decent samples included in the samples directory that you can use to get started. A searchable OpenCV API reference is here.
You didn't mention if you were looking for online or print sources, but I have the O'Reilly book and it's quite good (examples in C, but easily translatable).
The FindContours function is a bit similar to regionprops; it will get you a list of the connected components, which you can then inspect to get their info.
For thresholding you can try Threshold. I was sure you could pass a flag to it to use Otsu's method, but it doesn't seem to be listed in the docs there.
I haven't come across specific functions corresponding to gray2ind, but they may be in there.
documentation: A few years ago I used OpenCV wrapped for Python quite a lot. OpenCV is extensively documented, ships with many examples, and there's even a book. The Python wrappers I was using were thin enough so that very little wrapper specific documentation was required (and this is typical for many other wrapped libraries). I imagine that a few minutes looking at an example, like the PyCV unit tests would be all you need, and then you could focus on the OpenCV documentation that suited your needs.
analysis: As for whether there's a better library than OpenCV, my somewhat outdated opinion is that OpenCV is great if you want to do fairly advanced stuff (e.g. object tracking), but it is possibly overkill for your needs. It sounds like scipy ndimage combined with some basic numpy array manipulation might be enough.
acquisition: The options I know of for acquisition are OpenCV, Motmot, or using ctypes to directly interface to the drivers. Of these, I've never used Motmot because I had trouble installing it. The other methods I found fairly straightforward, though I don't remember the details (which is a good thing, since it means it was easy).
I've started a website on this subject: pythonvision.org. It has some tutorials, &c and some links to software. There are more links and tutorials there.
You probably would be well served by SciPy. Here is the introductory tutorial for SciPy. It has a lot of similarities to Matlab. Especially the included matplotlib package, which is explicitly made to emulate the Matlab plotting functions. I don't believe SciPy has equivalents for the functions you mentioned. There are some things which are similar. For example, threshold is a very simple version of graythresh. It doesn't implement "Otsu's" method, it just does a simple threshold, but that might be close enough.
I'm sorry that I don't know of any tutorials which are closer to the task you described. But if you are accustomed to Matlab, and you want to do this in Python, SciPy is a good starting point.
I don't know much about this package Motmot or how it compares to OpenCV, but I have imported and used a class or two from it. Much of the image processing is done via numpy arrays and might be similar enough to how you've used Matlab to meet your needs.
I've acquired image from FW camera using .NET and IronPython. On CPython I would checkout ctypes library, unless you find any library support for grabbing.
Foreword: This book is more for people who want a good hands on introduction into computer or machine vision, even though it covers what the original question asked.
[BOOK]: Programming Computer Vision with Python
At the moment you can download the final draft from the book's website for free as pdf:
http://programmingcomputervision.com/
From the introduction:
The idea behind this book is to give an easily accessible entry point to hands-on
computer vision with enough understanding of the underlying theory and algorithms
to be a foundation for students, researchers and enthusiasts.
What you need to know
Basic programming experience. You need to know how to use an editor and run
scripts, how to structure code as well as basic data types. Familiarity with Python or other scripting style languages like Ruby or Matlab will help.
Basic mathematics. To make full use of the examples it helps if you know about
matrices, vectors, matrix multiplication, the standard mathematical functions
and concepts like derivatives and gradients. Some of the more advanced mathe-
matical examples can be easily skipped.
What you will learn
Hands-on programming with images using Python.
Computer vision techniques behind a wide variety of real-world applications.
Many of the fundamental algorithms and how to implement and apply them your-
self.