Networkx spring layout edge weights - python

I was wondering how spring_layout takes edge weight into account. From wikipedia,
'An alternative model considers a spring-like force for every pair of nodes (i,j) where the ideal length \delta_{ij} of each spring is proportional to the graph-theoretic distance between nodes i and j, without using a separate repulsive force. Minimizing the difference (usually the squared difference) between Euclidean and ideal distances between nodes is then equivalent to a metric multidimensional scaling problem.'
How is edge weight factored in, specifically?

This isn't a great answer, but it gives the basics. Someone else may come by who actually knows the Fruchterman-Reingold algorithm and can describe it. I'm giving an explanation based on what I can find in the code.
From the documentation,
weight : string or None optional (default=’weight’)
The edge attribute that holds the numerical value used for the edge weight. If None, then all edge weights are 1.
But that doesn't tell you what it does with the weight, which is your question.
You can find the source code. If you send in weighted edges, it will create an adjacency matrix A with those weights and pass A to _fruchterman_reingold.
Looking at the code there, the meat of it is in this line
displacement=np.transpose(np.transpose(delta)*\
(k*k/distance**2-A*distance/k)).sum(axis=1)
The A*distance is calculating how strong of a spring force is acting on the node. A larger value in the corresponding A entry means that there is a relatively stronger attractive force between those two nodes (or if they are very close together, a weaker repulsive force). Then the algorithm moves the nodes according to the direction and strength of the forces. It then repeats (50 times by default). Interestingly, if you look at the source code you'll notice a t and dt. It appears that at each iteration, the force is multiplied by a smaller and smaller factor, so the steps get smaller.
Here is a link to the paper describing the algorithm, which unfortunately is behind a paywall. Here is a link to the paper on the author's webpage

Related

Find intersection of 2 layers containing multiple lines (in python)

I have 2 layers with links and nodes: layer A (yellow) and layer B (blue).
I would like to get the places where the lines of layer A intersect with the lines of layer B (red nodes), directly in python.
I have the coordinates for all nodes in both layers (the nodes of layers A and B are hidden in the image below).
I saw this option to find line intersection in python, but since layer A has approx. 23,000 lines and layer B 50,000, it would be too computational intensive to use it:
from shapely.geometry import LineString
line1 = LineString([(x1,y1), (x2,y2), (x3,y3)])
line2 = LineString([(x4,y4), (x5,y5)])
output = line1.intersection(line2)
Does anyone know a better (faster) way to get these intersection nodes?
Thanks a lot!
Okay, I can see what do you mean. I will try to explain a method to do this. You can easily do this by using brute force. But it's time consuming as you mentioned. Specially when there are thousands of nodes and edges. I can suggest a less time consuming method.
Let there are N nodes in layer 1 and M nodes in layer 2. Then for your method the time complexity is O(N*M)
My method. A moderately complex method. I can't implement code here. I will describe steps at the best level I can. You have to figure out how to implement in code. The Cons: May miss intersections
We use Localization in the graph. Otherwise we select (relatively) small windows from layers and perform the same thing you have done. Using shapely.
Ok, First we have to determine the window size we going to use.
determine the size of the rectangle which contains the all nodes in both layers
Divide it into small squares of same size.( like a square grid)
Build zero matrix equivalent to squares.
get count of nodes in each square and assign it to the related element in matrix. Which will have time complexity of O(N+M)
Now you have the density of nodes.
If density of a tile is high, Window size is small. (3x3 would be enough). The selected tile is at the middle. Since nodes are closer less chance to miss intersections. So perform your method on nodes inside selected tiles. Will have time complexity of O(n*m) where n,m are nodes inside the selected window.
If density of a tile is low, Window size is large. (5x5,7x7,9x9.... you can determine). The selected tile is at the middle.Since nodes are far away less chance to miss intersections. So perform your method on nodes inside selected tiles. Will have time complexity of O(n*m) where n,m are nodes inside the selected window.
How does this reduce the time. It will be prevented from comparing the nodes far away from the selected lines. If you have learned about time complexity, This is very efficient than your previous brute force approach. This is a little bit less accurate than your previous approach But faster.
ATTENTION : This is my method for select less number of nodes for comparison. There may be methods much more faster and accurate than my solution. But most of them will be very complex. If you are worried about the accuracy use your method or look for much faster and accurate method. Else you can use mine.
IN ADDITION : You can determine window size using the line length instead of using node density. No need to draw a grid. Select large window around a line if the line is long. If short , Use small window. This is also faster. I think This will be much more accurate than my previous method.

Add extra cost depending on length of path

I have a graph/network that obviously consists of some nodes and some edges. Each edge has a weight attached to it, or in this case a cost. Each edge also have a distance attached to it AND a type. So basically the weight/cost is pre-calculated from the distance of the edge along with some other metrics for both type of edges.
However, in my case I would like there to be added some additional cost for let's say every 100 distance or so, but only for one type of edge.But I'm not even certain if it is possible to add additional cost/distance depending on the sum of the previous steps in the path in algorithms such as Dijkstra's ?
I know I could just divide the cost into each distance unit, and thus getting a somewhat estimate. The problem there would be the edge cases, where the cost would be almost double at distance 199 compared to adding the cost at exactly each 100 distance, i.e. adding cost at 100 and 200.
But maybe there are other ways to get around this ?
I think you cannot implement this using Dijkstra, because you would validate the invariant, which is needed for correctness (see e.g. wikipedia). In each step, Dijkstra builds on this invariant, which more or less states, that all "already found paths" are optimal, i.e. shortest. But to show that it does not hold in your case of "additional cost by edge type and covered distance", let's have a look at a counterexample:
Counterexample against Usage of Dijkstra
Assume we have two types of edges, first type (->) and second type (=>). The second type has an additional cost of 10 after a total distance of 10. Now, we take the following graph, with the following edges
start -1-> u_1
start -1-> u_2
start -1-> u_3
...
start -1-> u_7
u_7 -1-> v
start =7=> v
v =4=> end
When, we play that through with Dijkstra (I skip all intermediate steps) with start as start node and end as target, we will first retrieve the path start=7=>v. This path has a length of 7 and that is shorter than the "detour" start-1->u_1-1->... -1->u_7->v, which has a length of 8. However, in the next step, we have to choose the edge v=4=>end, which makes the first path to a total of 21 (11 original + 10 penalty). But the detour path becomes now shorter with a length of 12=8+4 (no penalty).
In short, Dijkstra is not applicable - even if you modify the algorithm to take the "already found path" into account for retrieving the cost of next edges.
Alternative?
Maybe you can build your algorithm around a variant of Dijkstra, which usually retrieves multiple (suboptimal) solutions. First, you would need to extend Dijkstra, so that it takes the already found path into account. (In this function replace cost = weight(v, u, e) with cost = weight(v, u, e, paths[v]) and write a suitable function to calculate the penalty based on the previous path and the considered edge). Afterwards, remove edges from your original optimal solution and iterate the procedure to find a new alternative shortest path. However, I see no easy way of selecting which edge to remove from the graph-beside those from your penalty type-and the runtime complexity is probably awful.

Is there a way that gives shortest path using Floyd-Warshall's algorithm where negative weight cycle exists whereas overlapped edges are not allowed?

We know that the result of Floyd-Warshall's algorithm is invalid if a negative weight cycle appears in the graph, that is because to travel multiple times on the negative weight cycle makes the weight sum arbitrarily small. However if we specify that no duplicated edge are allowed to be travelled on then the weight sum is correct by sense. I want to know a way that produces the least weight sum in such condition. Some modifications of the algorithm have been tried (Including to skip the loop when a weight sum from some vertex to itself is minus) but the predecessor matrix was still weird and the weight sum matrix was totally useless (By chance I knew that an exponentially increasing value of it would inevitably occur, see link).
An efficient solution to that problem would imply P=NP, so there almost certainly isn't such a solution.
With a polynomial-time solution to your problem, you could solve the longest trail problem by setting all edge weights to -1 and asking for the shortest path between two nodes.
As proven by Marzio De Biasi in the linked post, a solution to the longest trail problem can be used to solve the Hamiltonian cycle problem on grid graphs of max degree 3, by connecting two new nodes to the top-left node and asking for the longest trail.
The Hamiltonian cycle problem is still NP-complete when restricted to grid graphs of max degree 3, as proven in Christos H Papadimitriou, Umesh V Vazirani, On two geometric problems related to the travelling salesman problem, Journal of Algorithms, Volume 5, Issue 2, June 1984, Pages 231-246, ISSN 0196-6774.
Thus, your problem is NP-hard.

Appropriate encoding using Particle Swarm Optimization

The Problem
I've been doing a bit of research on Particle Swarm Optimization, so I said I'd put it to the test.
The problem I'm trying to solve is the Balanced Partition Problem - or reduced simply to the Subset Sum Problem (where the sum is half of all the numbers).
It seems the generic formula for updating velocities for particles is
but I won't go into too much detail for this question.
Since there's no PSO attempt online for the Subset Sum Problem, I looked at the Travelling Salesman Problem instead.
They're approach for updating velocities involved taking sets of visited towns, subtracting one from another and doing some manipulation on that.
I saw no relation between that and the formula above.
My Approach
So I scrapped the formula and tried my own approach to the Subset Sum Problem.
I basically used gbest and pbest to determine the probability of removing or adding a particular element to the subset.
i.e - if my problem space is [1,2,3,4,5] (target is 7 or 8), and my current particle (subset) has [1,None,3,None,None], and the gbest is [None,2,3,None,None] then there is a higher probability of keeping 3, adding 2 and removing 1, based on gbest
I can post code but don't think it's necessary, you get the idea (I'm using python btw - hence None).
So basically, this worked to an extent, I got decent solutions out but it was very slow on larger data sets and values.
My Question
Am I encoding the problem and updating the particle "velocities" in a smart way?
Is there a way to determine if this will converge correctly?
Is there a resource I can use to learn how to create convergent "update" formulas for specific problem spaces?
Thanks a lot in advance!
Encoding
Yes, you're encoding this correctly: each of your bit-maps (that's effectively what your 5-element lists are) is a particle.
Concept
Your conceptual problem with the equation is because your problem space is a discrete lattice graph, which doesn't lend itself immediately to the update step. For instance, if you want to get a finer granularity by adjusting your learning rate, you'd generally reduce it by some small factor (say, 3). In this space, what does it mean to take steps only 1/3 as large? That's why you have problems.
The main possibility I see is to create 3x as many particles, but then have the transition probabilities all divided by 3. This still doesn't satisfy very well, but it does simulate the process somewhat decently.
Discrete Steps
If you have a very large graph, where a high velocity could give you dozens of transitions in one step, you can utilize a smoother distance (loss or error) function to guide your model. With something this small, where you have no more than 5 steps between any two positions, it's hard to work with such a concept.
Instead, you utilize an error function based on the estimated distance to the solution. The easy one is to subtract the particle's total from the nearer of 7 or 8. A harder one is to estimate distance based on that difference and the particle elements "in play".
Proof of Convergence
Yes, there is a way to do it, but it requires some functional analysis. In general, you want to demonstrate that the error function is convex over the particle space. In other words, you'd have to prove that your error function is a reliable distance metric, at least as far as relative placement goes (i.e. prove that a lower error does imply you're closer to a solution).
Creating update formulae
No, this is a heuristic field, based on shape of the problem space as defined by the particle coordinates, the error function, and the movement characteristics.
Extra recommendation
Your current allowable transitions are "add" and "delete" element.
Include "swap elements" to this: trade one present member for an absent one. This will allow the trivial error function to define a convex space for you, and you'll converge in very little time.

What's the right algorithm for finding isolated subsets

Picture is worth a thousand words, so:
My input is the matrix on the left, and what I need to find is the sets of nodes that are maximum one step away from each other (not diagonally). Node that is more than one up/down/left/right step away would be in a separate set.
So, my plan was running a BFS from every node I find, then returning the set it traversed through, and removing it from the original set. Iterate this process until I'm done. But then I've had the wild idea of looking for a graph analysis tools - and I've found NetworkX. Is there an easy way (algorithm?) to achieve this without manually writing BFS, and traverse the whole matrix?
Thanks
What you are trying to do is searching for "connected components" and
NetworX has itself a method for doing exactly that as can be seen in the first example on this documentation page as others has already pointed out on the comments.
Reading your question it seems that your nodes are on a discrete grid and the concept of connected that you describe is the same used on the pixel of an image.
Connected components algorithms are available for graphs and for images also.
If performances are important in your case I would suggest you to go for the image version of connected components.
This comes by the fact that images (grids of pixels) are a specific class of graphs so the connected components algorithms dealing with grids of nodes
are built knowing the topology of the graph itself (i.e. graph is planar, the max vertex degree is four). A general algorithm for graphs has o be able to work on general graphs
(i.e they may be not planar, with multiple edges between some nodes) so it has to spend more work because it can't assume much about the properties of the input graph.
Since connected components can be found on graphs in linear time I am not telling the image version would be orders of magnitude faster. There will only be a constant factor between the two.
For this reason you should also take into account which is the data structure that holds your input data and how much time will be spent in creating the input structures which are required by each version of the algorithm.

Categories