I just tried QGIS processing tools for the first time.
When using shortest path search for the first time, the algorithm is building a path graph (of course). However, the graph is never re-used. Every time the algorithm is used, the graph building starts again. I am not familiar with the exact code used for it, but I guess the graph is network-wide, not related to the specific points I selected. So is there a way to re-use the graph? Er even export it to a file?
My network is large (more than 200k features), so efficiency is important. The network is rarely updated, so calculating a graph could easily be done just once in a while.
I looked up the docs and settings of processing tools, and this option seems to be unavailable (which is surprising). So maybe I am missing something, or maybe someone could suggest a way to serialize the graph and save it using python code? I am using QGIS 3.1 (A Coruna).
As found on anitagraser github page processing tools that use graphs look straightforward using Dijkstra algoritghm.
builder = QgsGraphBuilder( crs )
graph = builder.graph()
from_id = graph.findVertex(from_point)
to_id = graph.findVertex(to_point)
(tree,cost) = QgsGraphAnalyzer.dijkstra(graph,from_id,0)
I guess using that would require building a different tool and not using the user-friendly shortest path search tool with a nice GUI and integration with point selecting and so on. That would be not acceptable. The goal was to be able to perform the same task on any pc with un-altered QGIS (no add-ins needed, only a script). But it may be that it is not possible. So the problem leads to:
is it possible to tweak the existing processing tool, to cache the graph or even store it in a file?
can I somehow duplicate the tool and apply some small changes?
I asked this question already on gis stack echange. My question probably has a programming answer, so I am reposting as an exception.
Related
I have a dynamic point cloud in 3D, and I would like to use nanoflann to dynamically add/subtract points in between queries, without rebuilding the tree (as seen here):
https://github.com/jlblancoc/nanoflann/blob/master/examples/dynamic_pointcloud_example.cpp
I have found a python wrapper for nanoflann as well (awesome!):
https://github.com/u1234x1234/pynanoflann
This is great! However, speed is extremely important for my application so I will almost certainly need to parallelize this k-NN implementation. Is there an existing python implementation or wrapper for such a dynamic k-NN implementation written with OpenCL or CUDA? I wanted to check if one exists before writing my own. This "dynamic" is different than wanting to be able to specify a new k with each query. I don't mind if it is a single KDTree with a set k. I simply need to be able to remove or add points in between queries without rebuilding the KDTree.
Thank you in advance!
While we are replicating the implementation of NGNN Dressing as a whole paper, I am stuck on one pickle file which is actually required to progress further i.e. fill_in_blank_1000_from_test_score.pkl.
Can someone help by sharing the same, else with its alternative?
Github implementation doesn't contain the same!
https://github.com/CRIPAC-DIG/NGNN
You're not supposed to use main_score.py (which requires the fill_in_blank_1000_from_test_score.pkl pickle), it's obsolete - only authors fail to mention this in the README. The problem was raised in this issue. Long story short, use another "main": main_multi_modal.py.
One of the comments explains in details how to proceed, I will copy it here so that it does not get lost:
Download the pre-processed dataset from the authors (the one on Google Drive, around 7 GB)
Download the "normal" dataset in order to get the text to the images (the one on Github, only a few MBs)
Change all the folder paths in the files to your corresponding ones
run "onehot_embedding.py" to create the textual features (The rest of the pre-processing was already done by the authors)
run "main_multi_modal.py" to train. In the end of the file you can adjust the config of the network (Beta, d, T etc.), so the file
"Config.py" is useless here.
If you want to train several instances in the for-loop, you need to reset the graph at the begining of the training. Just add
"tf.reset_default_graph()" at the start of the function "cm_ggnn()"
With this setup, I could reproduce the results fairly well with the
same accuracy as in the paper.
I am working on some algorithms that create network graphs, and I am finding it really hard to debug the output. The code is written in Python, and I am looking for the simplest way to view the resulting network.
Every node has a reference to its parent elements, but a helper function could be written to format the network in any other way.
What is the simplest way to display a network graph from Python? Even if it's not fully written in Python, ie it uses some other programs available to Linux, it would be fine.
It sounds like you want something to help debugging the network you are constructing. For this you might want to consider implementing a function that converts your network to DOT, a graph description language, which can then be rendered to a graph visualization using a number of tools, such as GraphViz. You can then log the output from this function to help debug.
Have you tried Netwulf? It takes a networkx.Graph object as input and launches an interactive d3-powered visualization in a separate browser window. The resulting image (and data) can then be posted back to Python for further processing.
Disclaimer: I'm a co-author of Netwulf.
Think about using existing graph libraries for your problem domain, e.g. NetworkX. Drawing can be done from there with matplotlib or pygraphviz.
For bigger projects, you might also want to check out a graph database like Neo4j with its toolkit (and own query language CYPHER) for working with python.
A good interface markup is also GraphML, can be useful with drawing tools like yEd in case you have small graphs and need some manual finish.
I'm working on a modeling/reconstruction algorithm for point cloud data. So far I've been developing in Python, and been relatively happy with VPython for my visualization needs.
One problem I have is that VPython becomes quite slow when rendering a great many objects (at least on my non-3d accelerated Linux laptop), making visual inspection of complicated models quite difficult.
I've been trying to use an external tool for visualization, but the problem is that I'm a bit lost in the sea of possible file formats and available tools. I've been trying MeshLab for instance, which works great for displaying point cloud data in simple ascii formats, but I couldn't decide in which compatible format to export my other types of geometry, to superimpose on the point cloud layer.
Here are the requirements for my whole pipeline:
The point cloud data may contain millions of points, stored as simple xyz ascii coords
The modeling primitives are primarily lines and cylinders (i.e. no polygons), numbered in the thousands
The visualization tool should ideally be cross-platform (it must run at least on Linux)
There should be a Python module for easy data import/export of the chosen file format (or the format is simple enough to write a simple converter, if not)
I've been googling a lot about this so I have tentative answers for all of these, but none that is 100% satisfying in my context. Any help or advice would be greatly appreciated.. many thanks in advance!
I finally settled for Geomview: the viewer itself is powerful enough, and the many OOGL file formats that it implements answer my needs. I use the .off format for point cloud data, and .skel for my other modeling primitives. These file formats are also human-readable, which makes writing import/export functions easy.
How about Panda3D? It's cross-platform, and it should be able to handle rendering millions of points as long as you have a decent graphics card.
I do a lot of statistical work and use Python as my main language. Some of the data sets I work with though can take 20GB of memory, which makes operating on them using in-memory functions in numpy, scipy, and PyIMSL nearly impossible. The statistical analysis language SAS has a big advantage here in that it can operate on data from hard disk as opposed to strictly in-memory processing. But, I want to avoid having to write a lot of code in SAS (for a variety of reasons) and am therefore trying to determine what options I have with Python (besides buying more hardware and memory).
I should clarify that approaches like map-reduce will not help in much of my work because I need to operate on complete sets of data (e.g. computing quantiles or fitting a logistic regression model).
Recently I started playing with h5py and think it is the best option I have found for allowing Python to act like SAS and operate on data from disk (via hdf5 files), while still being able to leverage numpy/scipy/matplotlib, etc. I would like to hear if anyone has experience using Python and h5py in a similar setting and what they have found. Has anyone been able to use Python in "big data" settings heretofore dominated by SAS?
EDIT: Buying more hardware/memory certainly can help, but from an IT perspective it is hard for me to sell Python to an organization that needs to analyze huge data sets when Python (or R, or MATLAB etc) need to hold data in memory. SAS continues to have a strong selling point here because while disk-based analytics may be slower, you can confidently deal with huge data sets. So, I am hoping that Stackoverflow-ers can help me figure out how to reduce the perceived risk around using Python as a mainstay big-data analytics language.
We use Python in conjunction with h5py, numpy/scipy and boost::python to do data analysis. Our typical datasets have sizes of up to a few hundred GBs.
HDF5 advantages:
data can be inspected conveniently using the h5view application, h5py/ipython and the h5* commandline tools
APIs are available for different platforms and languages
structure data using groups
annotating data using attributes
worry-free built-in data compression
io on single datasets is fast
HDF5 pitfalls:
Performance breaks down, if a h5 file contains too many datasets/groups (> 1000), because traversing them is very slow. On the other side, io is fast for a few big datasets.
Advanced data queries (SQL like) are clumsy to implement and slow (consider SQLite in that case)
HDF5 is not thread-safe in all cases: one has to ensure, that the library was compiled with the correct options
changing h5 datasets (resize, delete etc.) blows up the file size (in the best case) or is impossible (in the worst case) (the whole h5 file has to be copied to flatten it again)
I don't use Python for stats and tend to deal with relatively small datasets, but it might be worth a moment to check out the CRAN Task View for high-performance computing in R, especially the "Large memory and out-of-memory data" section.
Three reasons:
you can mine the source code of any of those packages for ideas that might help you generally
you might find the package names useful in searching for Python equivalents; a lot of R users are Python users, too
under some circumstances, it might prove convenient to just link to R for a particular analysis using one of the above-linked packages and then draw the results back into Python
Again, I emphasize that this is all way out of my league, and it's certainly possible that you might already know all of this. But perhaps this will prove useful to you or someone working on the same problems.