How to use networkx algorithm with my custom graph datastructure? - python

I have a graph database with a Gremlin query engine. I don't want to change that API. The point of the library is to be able to study graphs that can not fully stay in memory and maximize speed by not falling back to virtual memory.
The query engine is lazy, it will not fetch an edge or vertex until required or requested by the user. Otherwise it only use indices to traverse the graph.
Networkx has another API. What can I do to re-use networkx graph algorithm implementations with my graph?

You are talking about extending your Graph API.
Hopefully the code translates from one implementation to another in which case copy-paste'ing from the algorithms section might work for you. (check the licenses first)
If you want to use existing code going forward you could make a middle layer or adapter class to help out with this.
If the source code doesn't line up then NetworkX has copious notes about the algorithms used and the underpinning mathematics at the bottom of the help pages and the code itself.
For the future:
Maybe you could make it open source and get some traction with others who see the traversal engine as a good piece of engineering. In which case you would have help in maintaining/extending your work. Good Luck.

Related

How to Implement a Mass Spring Damper Filter in Python

For context, I am working on a project that involves controlling a cursor with eye-tracking data. To achieve this, I am imitating the techniques described in the paper below:
https://iopscience.iop.org/article/10.1088/1741-2560/11/5/056026
So far I have successfully implemented the Kalman filter calibration process described in sections 2.2.1 and 2.2.2. The focus of my question is 2.2.3 which briefly describes an additional filter modeled after a mass spring damper system:
I would like to implement this filter for my project. However, the paper does not provide any additional details or references, and frankly I'm not sure where to start. I have found some resources for simulating mass spring damper systems in python, particularly the document below:
https://www.halvorsen.blog/documents/programming/python/resources/powerpoints/Mass-Spring-Damper%20System%20with%20Python.pdf
My problem with the examples in this document is that they each take a description of the system and an array of time steps as inputs and return the state of the system at those time steps as outputs. I need something that will behave like a filter that takes a single state as input and returns a filtered state as output. I'm not sure how to adapt the examples in this document to behave this way or if it even makes sense to try and do so.
Could someone please point me in the right direction? Links to resources, useful python library recommendations, or example code would be much appreciated.

What is the maximum number of nodes and edges with attributes that graphs generated by NetworkX package can handle?

I am writing code in Python to analyze social networks with node and edge attributes. Currently, I am using the NetworkX package to generate the graphs. Is there any limit to the size (in terms of the number of nodes, edges) of the graph which can be generated using this package?
I am new to coding social network problems in Python and have recently come across another package called NetworKit for large networks, but am not sure at what size should NetworKit be a better option, could you please elaborate on difference in performance and functionality between the two packages?
Thanks for your reply in advance.
My suggestion:
Start w/ Networkx as it has a bigger community, it's well mainteined and documented... and the best of all... you can easily understand what it does as it's 100% done in Python.
It's true it's not exactly fast, but it will be fast enough for most of the calculations. If you are running calculations from your laptop it can be slow for intensive calculations (Eg: sigma/omega small worldness metrics) in big networks (> 10k nodes and >100k vertexes).
If you need to speed it up, then you can easily incorporate networKit in your code as it integrates very easily to networkx and pandas, but it has a much more limited library of algorithms.
Compare yourself:
NetworkX algorithms: https://networkx.github.io/documentation/stable/reference/algorithms/index.html
VS
NetworKit algorithms: https://networkit.github.io/dev-docs/python_api/modules.html
Is there any limit to the size (in terms of the number of nodes, edges) of the graph which can be generated using this package?
No. There is no limit. It is all dependent on your memory capacity and size.
could you please elaborate on difference in performance and functionality between the two packages?
I personally don't have any experience with NetworkKit, however here (by Timothy Lin) you can find a very good benchmarking analysis on different tools including networkx and networkkit. Check out its conclusion section:
As for recommendations on which package people should learn, I think picking up networkx is still important as it makes network science very accessible with a wide range of tools and capabilities. If analysis starts being too slow (and maybe that’s why you are here) then I will suggest taking a look at graph-tool or networkit to see if they contain the necessary algorithms for your needs.

6 millions of markers in folium/leaflet map

With the MarkerCluster algorithm, it's possible to cluster the nearby markers together, so the map is visually very acceptable.
however, I found that the performance and the response of leaflet map decrease with the number of markers inside it.
I still don't understand it but I found people talking about Server-side clustering solution instead of client-side clustering.
This durable module project is a solution for big numbers of markers that uses this concept (Server-side clustering) in leaflet map.
My questions are:
how it is done in leaflet map?
how to make this solution in python at folium maps?
Server-side clustering can be accomplished with XHR requests.
The simplest approach would be to divide your map into squares, and have it switch between single-feature layers and substitute geoJSON/JSON layers using a MAP.on('zoomend', function(e){}); event.
In an example, if jQuery is available, you can do $.getJSON(SERVER_SIDE_URL, {VARIABLE: 'VALUE'}, function(data){}); on zoomend. Here the anonymous function will carry response data. You can use this data to either create a substitute LayerGroup, or a single Layer, while keeping track of and destroying its predicate.
The server's side will need to have access to the full dataset, and be able to either provide JSON for a single feature abstracting those nearby, or a set of features within the radius/square radius of a placeholder.
That's the abstract of one option. Alternatively, there may be market-ready solutions. But writing your own should produce a more efficient solution for such a simple task.
I find an Opensource solution for leaflet and Mapbox.
it is the SuperCluster project created by the owner of leaflet.
it is Server-Side Clustring solution with node.js and Client-Side Clustring with MapBox.
the concept of these algorithme is explained here

Finding the Path of all Edges on a Graph

I'm trying to get the path on a graph which covers all edges, and traverses them only once.
This means there will only be two "end" points - which will have an odd-number of attached nodes. These end points would either have one connecting edge, or be part of a loop and have 3 connections.
So in the simple case below I need to traverse the nodes in this order 1-2-3-4-5 (or 5-4-3-2-1):
In the more complicated case below the path would be 1-2-3-4-2 (or 1-2-4-3-2):
Below is also a valid graph, with 2 end-points: 1-2-4-3-2-5
I've tried to find the name of an algorithm to solve this, and thought it was the "Chinese Postman Problem", but implementing this based on code at https://github.com/rkistner/chinese-postman/blob/master/postman.py didn't provide the results I expected.
The Eulerian path looks almost what is needed, but the networkx implementation will only work for closed (looped) networks.
I also looked at a Hamiltonian Path - and tried the networkx algorithm - but the graph types were not supported.
Ideally I'd like to use Python and networkx to implement this, and there may be a simple solution that is already part of the library, but I can't seem to find it.
You're looking for Eulerian Path that visits every edge exactly once. You can use Fleury's algorithm to generate the path. Fleury's algorithm has O(E^2) time complexity, if you need more efficient algorithm check Hierholzer's algorithm which is O(E) instead.
There is also an unmerged pull request for the networkx library that implements this. The source is easy to use.
(For networkx 1.11 the .edge has to be replaced with .edge_iter).
This is known as the Eulerian Path of a graph. It has now been added to NetworkX as eulerian_path().

Should I use advance GeoDjango libraries for one simple calculation?

I am starting web app in Django, which must provide one simple task: get all records from DB which are close enough to other record.
For example: Iam in latlang (50, 10), and I need to get all records with latlang closer than 5km from me.
I found that geodjango thing called GeoDjango, but it contains a lot of other dependencies and libraries like GEOS, POSTGIS, and other stuff which i don't really need. I need only this one range functionality.
So should I use GeoDjango, or just write my own range calculation query?
Most definitely not write your own. As you get more familiar with geographic data you will realize that this particular calculation isn't at all simple see for example this question for a detailed discussion. However most of the solutions (answers) given in that question only produce approximate results. Partly due to the fact that the earth is not a perfect sphere.
On the other hand if you use Geospatial extensions for mysql (5.7 onwards) or postgresql you can make use of the ST_DWithin function.
ST_DWithin — Returns true if the geometries are within the specified distance of one another. For geometry units are in those of spatial reference and For geography units are in meters and measurement is defaulted to use_spheroid=true (measure around spheroid), for faster check, use_spheroid=false to measure along sphere.
ST_DWithin makes use of spatial indexes which home made solutions will be unable to. WHen GeoDjango is enabled, ST_DWithin becomes available as a filter to django querysets in the form of dwithin
Last but not least, if you write your own code, you will have to write a lot of code to test it too. Whereas dwithin is thoroughly tested.

Categories