I'm pretty much a beginner when it comes to GIS, but I think I understand the basics - it doesn't seem to hard. But: All these acronyms and different libraries, GEOS, GDAL, PROJ, PCL, Shaply, OpenGEO, OGR, OGC, OWS and what not, each seemingly depending on any number of others, is slightly overwhelming me.
Here's what I would like to do: Given a number of points and a linestring, I want to determine the location on the line closest to a certain point. In other words, what PostGIS's line_locate_point() does:
http://postgis.refractions.net/documentation/manual-1.3/ch06.html#line_locate_point
Except I want do use plain Python. Which library or libraries should I have a look at generally for doing these kinds of spatial calculations in Python, and is there one that specifically supports a line_locate_point() equivalent?
For posterity:
http://bitbucket.org/miracle2k/pyutils/changeset/156c60ec88f8/
In another forum I suggested reimplementing the (simple) PostGIS algorithm in Python using Shapely.
For posterity, these functions are available in Shapely 1.2
All you need is Shapely, if you have shapefiles for points and linestrings, a line.distance(point) in for loop will do the trick. With that you can find the closest point to the line or vice versa. Make sure you check GDAL, Fiona, Shapely in order to complete this.
Related
Is there any libraries similar to igraph where I can create a hypergraph. I am working with hypergraph now and wanted to use some HyperGraph libraries to work on.
MGtoolkit and its paper
pygraph
halp
PyMETIS
SageMath's implementation, 1, 2. SageMath is not a python library but more like a python distribution (ships python 2.7 currently) which lots of interesting libraries are pre-installed.
I hope we see NetworkX and igraph support also soon.
There's also HyperNetX which is able to represent and visualise hypergraphs.
It seems very accessible. They have a number of nice tutorials on their GitHub page.
However, when working with it I identified some issues:
Performance: The library struggles with graphs that have several thousand nodes. I recommend igraph instead, although it does not have explicit support for hypergraphs. It does offer functionality for bipartite graphs, though. I believe if no hyperedge is fully contained in another, you can work with a bipartite graph that is isomorphic to your given hypergraph?
I encountered an issue in which the ordering of nodes would not be deterministic, i.e. if you constructed the same graph several times and iterated over the nodes, they would be given to you in different orders. This can probably be worked around.
I am starting web app in Django, which must provide one simple task: get all records from DB which are close enough to other record.
For example: Iam in latlang (50, 10), and I need to get all records with latlang closer than 5km from me.
I found that geodjango thing called GeoDjango, but it contains a lot of other dependencies and libraries like GEOS, POSTGIS, and other stuff which i don't really need. I need only this one range functionality.
So should I use GeoDjango, or just write my own range calculation query?
Most definitely not write your own. As you get more familiar with geographic data you will realize that this particular calculation isn't at all simple see for example this question for a detailed discussion. However most of the solutions (answers) given in that question only produce approximate results. Partly due to the fact that the earth is not a perfect sphere.
On the other hand if you use Geospatial extensions for mysql (5.7 onwards) or postgresql you can make use of the ST_DWithin function.
ST_DWithin — Returns true if the geometries are within the specified distance of one another. For geometry units are in those of spatial reference and For geography units are in meters and measurement is defaulted to use_spheroid=true (measure around spheroid), for faster check, use_spheroid=false to measure along sphere.
ST_DWithin makes use of spatial indexes which home made solutions will be unable to. WHen GeoDjango is enabled, ST_DWithin becomes available as a filter to django querysets in the form of dwithin
Last but not least, if you write your own code, you will have to write a lot of code to test it too. Whereas dwithin is thoroughly tested.
I'm trying to do a PCA analysis on a masked array. From what I can tell, matplotlib.mlab.PCA doesn't work if the original 2D matrix has missing values. Does anyone have recommendations for doing a PCA with missing values in Python?
Thanks.
Imputing data will skew the result in ways that might bias the PCA estimates. A better approach is to use a PPCA algorithm, which gives the same result as PCA, but in some implementations can deal with missing data more robustly.
I have found two libraries. You have
Package PPCA on PyPI, which is called PCA-magic on github
Package PyPPCA, having the same name on PyPI and github
Since the packages are in low maintenance, you might want to implement it yourself instead. The code above build on theory presented in the well quoted (and well written!) paper by Tipping and Bishop 1999. It is available on Tippings home page if you want guidance on how to implement PPCA properly.
As an aside, the sklearn implementation of PCA is actually a PPCA implementation based on TippingBishop1999, but they have not chosen to implement it in such a way that it handles missing values.
EDIT: both the libraries above had issues so I could not use them directly myself. I forked PyPPCA and bug fixed it. Available on github.
I think you will probably need to do some preprocessing of the data before doing PCA.
You can use:
sklearn.impute.SimpleImputer
https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer
With this function you can automatically replace the missing values for the mean, median or most frequent value. Which of this options is the best is hard to tell, it depends on many factors such as how the data looks like.
By the way, you can also use PCA using the same library with:
sklearn.decomposition.PCA
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
And many others statistical functions and machine learning tecniques.
For my app, I need to determine the nearest points to some other point and I am looking for a simple but relatively fast (in terms of performance) solution. I was thinking about using PostGIS and GeoDjango but I think my app is not really that "geographic" (I still don't really know what that means though). The geographic part (around 5 percent of the whole) is that I need to keep coordinates of objects (people and places) and then there is this task to find the nearest points. To put it simply, PostGIS and GeoDjango seems to be an overkill here.
I was also thinking of django-haystack with SOLR or Elasticsearch because I am going to need a strong, strong text search capabilities and these engines have also these "geographic" features. But not sure about it either as I am afraid of core db <-> search engine db synchronisation and hardware requirements for these engines. At the moment I am more akin to use posgreSQL trigrams and some custom way to do that "find near points problem". Is there any good one?
To find points or bounding boxes that are near each other, consider using the Rtree Python package. This uses a similar spatial index technique as PostGIS, except it is not database software and can be used in software. I've tested faster speeds than from PostGIS to find near points on millions of objects.
See examples in the tutoral to get a good feel to find nearest objects.
You're probably right, PostGIS/GeoDjango is probably overkill, but making your own Django app would not be too much trouble for your simple task. Django offers a lot in terms of templating, etc. and with the built in admin makes it pretty easy to enter single records. And GeoDjango is part of contrib, so you can always use it later if your project needs it.
check out shapely. Looks like the object's project() method may be what you're looking for.
I would like to perform a few basic machine vision tasks using Python and I'd like to know where I could find tutorials to help me get started.
As far as I know, the only free library for Python that does machine vision is PyCV (which is a wrapper for OpenCV apparently), but I can't find any appropriate tutorials.
My main tasks are to acquire an image from FireWire. Segment the image in different regions. And then perform statistics on each regions to determine pixel area and center of mass.
Previously, I've used Matlab's Image Processing Tootlbox without any problems. The functions I would like to find an equivalent in Python are graythresh, regionprops and gray2ind.
Thanks!
OpenCV is probably your best bet for a library; you have your choice of wrappers for them. I looked at the SWIG wrapper that comes with the standard OpenCV install, but ended up using ctypes-opencv because the memory management seemed cleaner.
They are both very thin wrappers around the C code, so any C references you can find will be applicable to the Python.
OpenCV is huge and not especially well documented, but there are some decent samples included in the samples directory that you can use to get started. A searchable OpenCV API reference is here.
You didn't mention if you were looking for online or print sources, but I have the O'Reilly book and it's quite good (examples in C, but easily translatable).
The FindContours function is a bit similar to regionprops; it will get you a list of the connected components, which you can then inspect to get their info.
For thresholding you can try Threshold. I was sure you could pass a flag to it to use Otsu's method, but it doesn't seem to be listed in the docs there.
I haven't come across specific functions corresponding to gray2ind, but they may be in there.
documentation: A few years ago I used OpenCV wrapped for Python quite a lot. OpenCV is extensively documented, ships with many examples, and there's even a book. The Python wrappers I was using were thin enough so that very little wrapper specific documentation was required (and this is typical for many other wrapped libraries). I imagine that a few minutes looking at an example, like the PyCV unit tests would be all you need, and then you could focus on the OpenCV documentation that suited your needs.
analysis: As for whether there's a better library than OpenCV, my somewhat outdated opinion is that OpenCV is great if you want to do fairly advanced stuff (e.g. object tracking), but it is possibly overkill for your needs. It sounds like scipy ndimage combined with some basic numpy array manipulation might be enough.
acquisition: The options I know of for acquisition are OpenCV, Motmot, or using ctypes to directly interface to the drivers. Of these, I've never used Motmot because I had trouble installing it. The other methods I found fairly straightforward, though I don't remember the details (which is a good thing, since it means it was easy).
I've started a website on this subject: pythonvision.org. It has some tutorials, &c and some links to software. There are more links and tutorials there.
You probably would be well served by SciPy. Here is the introductory tutorial for SciPy. It has a lot of similarities to Matlab. Especially the included matplotlib package, which is explicitly made to emulate the Matlab plotting functions. I don't believe SciPy has equivalents for the functions you mentioned. There are some things which are similar. For example, threshold is a very simple version of graythresh. It doesn't implement "Otsu's" method, it just does a simple threshold, but that might be close enough.
I'm sorry that I don't know of any tutorials which are closer to the task you described. But if you are accustomed to Matlab, and you want to do this in Python, SciPy is a good starting point.
I don't know much about this package Motmot or how it compares to OpenCV, but I have imported and used a class or two from it. Much of the image processing is done via numpy arrays and might be similar enough to how you've used Matlab to meet your needs.
I've acquired image from FW camera using .NET and IronPython. On CPython I would checkout ctypes library, unless you find any library support for grabbing.
Foreword: This book is more for people who want a good hands on introduction into computer or machine vision, even though it covers what the original question asked.
[BOOK]: Programming Computer Vision with Python
At the moment you can download the final draft from the book's website for free as pdf:
http://programmingcomputervision.com/
From the introduction:
The idea behind this book is to give an easily accessible entry point to hands-on
computer vision with enough understanding of the underlying theory and algorithms
to be a foundation for students, researchers and enthusiasts.
What you need to know
Basic programming experience. You need to know how to use an editor and run
scripts, how to structure code as well as basic data types. Familiarity with Python or other scripting style languages like Ruby or Matlab will help.
Basic mathematics. To make full use of the examples it helps if you know about
matrices, vectors, matrix multiplication, the standard mathematical functions
and concepts like derivatives and gradients. Some of the more advanced mathe-
matical examples can be easily skipped.
What you will learn
Hands-on programming with images using Python.
Computer vision techniques behind a wide variety of real-world applications.
Many of the fundamental algorithms and how to implement and apply them your-
self.