Pruning in Alpha Beta: Ordering - python

I'm trying to implement move ordering in chess to gain the maximum benefit from alpha-beta pruning.
Right now, I'm starting from the root node, expanding it one level, evaluating all of those successors through my evaluation function, sorting them in ascending order (least advantageous to most figuring the answer is somewhere in the middle) and passing the nodes in that order to the alpha-beta algorithm. However, my results are roughly 60% slower than if I just pass the nodes in the order they are on the board. I'm only evaluating from the initial state, but at a high level, is this the correct way that move ordering is implemented?
FWIW, the way I've implemented my move ordering is through the following (maybe the inefficiency is coming how the list is being generated before even being passed to the alpha-beta function?)
scores = []
for state in states:
scores.append([state,Board(state, player).score()[1]) #this results in a tuple of (board, score)
scores.sort(key=lambda lst: abs(lst[1]), reverse = False) #sort the list of boards by scores in ascending order
output = [i[0] for i in scores] #only return the boards, which are then passed to the alpha-beta algorithm
Thanks very much!

Related

topological sort of a graph with emphasis on depth first seach and a given order of out edges

I have a directed acyclic graph and have specific requirements on the topological sort:
Depth first search: I want each branch to reach an end before a new branch is added to the sorting
Several nodes have multiple outgoing edges. For those nodes I have a sorted list of successor nodes, that is to be used in choosing with which node to continue the sorting.
Example:
When the node n is reached, that has three successors m1, m2, m3 of which each one of them would be a valid option to continue, I would provide a list such as [m3, m1, m2] that would indicate to continue with the node m3.
I am using networkx. I thought about iterating through the nodes with
sorting = []
for n in dfs_edges(dag, source = 'root'):
sorting.append(n[0])
Or using the method dfs_preorder_nodes but I have not found a way to make it use the list.
Any hints?

How to improce the performance of this algorithm?

I have the following algorithm:
I have a graph and a related I have a topological sorting (In graph theory, "a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. ").
Given a start_position and an end_position (different from the start_one), I want to verify if shifting the element of the list that is at start_position to the end_position preserves the topological order, i.e, if after the shifting i still have a topological order.
There are two cases : left_shift (if start_position > end_position) and right_shift (otherwise).
Here is my attempt:
def verify(from_position:int, to_position:int, node_list:List[str], instance:pb.Problem):
if from_position < to_position :
#right-shift
for task_temp in node_list[from_position+1:to_position+1]:
if (node_list[from_position],task_temp) in instance.all_predecessors:
return False
return True
if to_position < from_position :
#end_shift
for task_temp in node_list[to_position:from_position]:
if (task_temp, node_list[from_position]) in instance.all_predecessors:
return False
return True
PS: all_predecessors are a set of tuples (2 elements) that has all the edges of the graph.
Is there a way to make it faster?
The naive approach is asymptotically optimal: Just run through the (new) ordering and verify that it satisfies the topological criteria. You can do this by maintaining a bitfield of the nodes encountered so far, and check that each new node’s predecessors are set in the bitfield. This takes linear time in the number of nodes and edges, which any correct algorithm will need in the worst case.
For other variants of the problem (e.g. measuring in the size of the shift, or optimizing per-query time after preprocessing) there might be better approaches.

Selecting items with highest value in a list with the right order

I am solving a knapsack problem by using branch and bound algorithm I am working on right now. In the algorithm, I wanted to start selecting the items with the highest density(value/weight). I created a list named "density" and made necessary calculations. I need to pick the maximum value each time from that list. But everytime I try, the order get mixed. I need to update the variable "a" because everytime I delete an item the list gets one smaller. But couldn't figure out how to update it. I need help on selecting the items in the right order.
weight, value, density are lists. capacity and room are integer values given in the problem.
This is the density list.
What I want is, to get the index of the maximum item in this list. Then, subtract the "weight" of it from the "capacity" in order to find how much "room" left. And add the "value" to the "highest" in order the reach the highest value could be added in the knapsack. After I did this for the first item, then iterate it until no or very little room left.
def branch_n_bound(value,weight,capacity):
global highest,size
size=0
room=capacity
density = [0] * len(items)
highest = 0
for i in range(n):
density[i] = val[i] / weight[i]
for i in range(n):
a=density.index(max(density))
if weight[a]<=room:
room-=weight[a]
highest+=value[a]
size+=weight[a]
taken[a]=1
del density[a], weight[a], value[a]
else:
break
I think the problem you try to solve can be solved easier with a change in data structure. Instead of building the density array, you can build an array of tuples [(density, weight, value)...] and base your solution over that array. If you don't want to use so much extra memory and assuming you are ok with changing the input data, you can mark your indices as deleted - for example, you can set the value, weight and density to something negative to know that data was deleted from that index.
You can also take a look at the heapq data structure: https://docs.python.org/3/library/heapq.html . You can work with a heap to extract the maximum, and store indices in that heap.

Usage of forEdges iterator in networkit (python)

I carefully read the docs, but it still is unclear to me how to use G.forEdges(), described as an "experimental edge iterator interface".
Let's say that I want to decrease the density of my graph. I have a sorted list of weights, and I want to remove edges based on their weight until the graph splits into two connected components. Then I'll select the minimum number of links that keeps the graph connected. I would do something like this:
cc = components.ConnectedComponents(G).run()
while cc.numberOfComponents()==1:
for weight in weightlist:
for (u,v) in G.edges():
if G.weight(u,v)==weight:
G=G.removeEdge(u,v)
By the way I know from the docs that there is this edge iterator, which probably does the iteration in a more efficient way. But from the docs I really can't understand how to correctly use this forEdges, and I can't find a single example over the internet. Any ideas?
Or maybe an alternative idea to do what I want to do: since it's a huge graph (125millions links) the iteration will take forever, even if I am working on a cluster.
NetworKit iterators accept a callback function so if you want to iterate over edges (or nodes) you have to define a function and then pass it to the iterator as a parameter. You can find more information here. For example a simple function that just prints all edges is:
# Callback function.
# To iterate over edges it must accept 4 parameters
def myFunction(u, v, weight, edgeId):
print("Edge from {} to {} has weight {} and id {}".format(u, v, weight, edgeId))
# Using iterator with callback function
G.forEdges(myFunction)
Now if you want to keep removing edges whose weight is inside your weightlist until the graph splits into two connected components you also have to update the connected components of the graph since ConnectedComponents will not do that for you automatically (this may be also one of the reasons why the iteration takes forever). To do this efficiently, you can use the DynConnectedComponents class (see my example below). In this case, I think that the edge iterator will not help you much so I would suggest you to keep using the for loop.
from networkit import *
# Efficiently updates connected components after edge updates
cc = components.DynConnectedComponents(G).run()
# Removes edges with weight equals to w until components split
def removeEdges(w):
for (u, v) in G.edges():
if G.weight(u, v) == weight:
G.removeEdge(u, v)
# Updating connected components
event = dynamic.GraphEvent(dynamic.GraphEvent.EDGE_REMOVAL, u, v, weight)
cc.update(event)
if cc.numberOfComponents() > 1:
# Components did split
return True
# Components did not split
return False
if cc.numberOfComponents() == 1:
for weight in weights:
if removeEdges(weight):
break
This should speed up a bit your original code. However, it is still sequential code so even if you run it on a multi-core machine it will use only one core.

Optimising model of social network evolution

I am writing a piece of code which models the evolution of a social network. The idea is that each person is assigned to a node and relationships between people (edges on the network) are given a weight of +1 or -1 depending on whether the relationship is friendly or unfriendly.
Using this simple model you can say that a triad of three people is either "balanced" or "unbalanced" depending on whether the product of the edges of the triad is positive or negative.
So finally what I am trying to do is implement an ising type model. I.e. Random edges are flipped and the new relationship is kept if the new network has more balanced triangels (a lower energy) than the network before the flip, if that is not the case then the new relationship is only kept with a certain probability.
Ok so finally onto my question: I have written the following code, however the dataset I have contains ~120k triads, as a result it will take 4 days to run!
Could anyone offer any tips on how I might optimise the code?
Thanks.
#Importing required librarys
try:
import matplotlib.pyplot as plt
except:
raise
import networkx as nx
import csv
import random
import math
def prod(iterable):
p= 1
for n in iterable:
p *= n
return p
def Sum(iterable):
p= 0
for n in iterable:
p += n[3]
return p
def CalcTriads(n):
firstgen=G.neighbors(n)
Edges=[]
Triads=[]
for i in firstgen:
Edges.append(G.edges(i))
for i in xrange(len(Edges)):
for j in range(len(Edges[i])):# For node n go through the list of edges (j) for the neighboring nodes (i)
if set([Edges[i][j][1]]).issubset(firstgen):# If the second node on the edge is also a neighbor of n (its in firstgen) then keep the edge.
t=[n,Edges[i][j][0],Edges[i][j][1]]
t.sort()
Triads.append(t)# Add found nodes to Triads.
new_Triads = []# Delete duplicate triads.
for elem in Triads:
if elem not in new_Triads:
new_Triads.append(elem)
Triads = new_Triads
for i in xrange(len(Triads)):# Go through list of all Triads finding the weights of their edges using G[node1][node2]. Multiply the three weights and append value to each triad.
a=G[Triads[i][0]][Triads[i][1]].values()
b=G[Triads[i][1]][Triads[i][2]].values()
c=G[Triads[i][2]][Triads[i][0]].values()
Q=prod(a+b+c)
Triads[i].append(Q)
return Triads
###### Import sorted edge data ######
li=[]
with open('Sorted Data.csv', 'rU') as f:
reader = csv.reader(f)
for row in reader:
li.append([float(row[0]),float(row[1]),float(row[2])])
G=nx.Graph()
G.add_weighted_edges_from(li)
for i in xrange(800000):
e = random.choice(li) # Choose random edge
TriNei=[]
a=CalcTriads(e[0]) # Find triads of first node in the chosen edge
for i in xrange(0,len(a)):
if set([e[1]]).issubset(a[i]): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
TriNei.append(a[i])
preH=-Sum(TriNei) # Save the "energy" of all the triads of which the edge is a member
e[2]=-1*e[2]# Flip the weight of the random edge and create a new graph with the flipped edge
G.clear()
G.add_weighted_edges_from(li)
TriNei=[]
a=CalcTriads(e[0])
for i in xrange(0,len(a)):
if set([e[1]]).issubset(a[i]):
TriNei.append(a[i])
postH=-Sum(TriNei)# Calculate the post flip "energy".
if postH<preH:# If the post flip energy is lower then the pre flip energy keep the change
continue
elif random.random() < 0.92: # If the post flip energy is higher then only keep the change with some small probability. (0.92 is an approximate placeholder for exp(-DeltaH)/exp(1) at the moment)
e[2]=-1*e[2]
The following suggestions won't boost your performance that much because they are not on the algorithmic level, i.e. not very specific to your problem. However, they are generic suggestions for slight performance improvements:
Unless you are using Python 3, change
for i in range(800000):
to
for i in xrange(800000):
The latter one just iterates numbers from 0 to 800000, the first one creates a huge list of numbers and then iterates that list. Do something similar for the other loops using range.
Also, change
j=random.choice(range(len(li)))
e=li[j] # Choose random edge
to
e = random.choice(li)
and use e instead of li[j] subsequently. If you really need a index number, use random.randint(0, len(li)-1).
There are syntactic changes you can make to speed things up, such as replacing your Sum and Prod functions with the built-in equivalents sum(x[3] for x in iterable) and reduce(operator.mul, iterable) - it is generally faster to use builtin functions or generator expressions than explicit loops.
As far as I can tell the line:
if set([e[1]]).issubset(a[i]): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
is testing if a float is in a list of floats. Replacing it with if e[1] in a[i]: will remove the overhead of creating two set objects for each comparison.
Incidentally, you do not need to loop through the index values of an array, if you are only going to use that index to access the elements. e.g. replace
for i in range(0,len(a)):
if set([e[1]]).issubset(a[i]): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
TriNei.append(a[i])
with
for x in a:
if set([e[1]]).issubset(x): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
TriNei.append(x)
However I suspect that changes like this will not make a big difference to the overall runtime. To do that you either need to use a different algorithm or switch to a faster language. You could try running it in pypy - for some cases it can be significantly faster than CPython. You could also try cython, which will compile your code to C and can sometimes give a big performance gain especially if you annotate your code with cython type information. I think the biggest improvement may come from changing the algorithm to one that does less work, but I don't have any suggestions for that.
BTW, why loop 800000 times? What is the significance of that number?
Also, please use meaningful names for your variables. Using single character names or shrtAbbrv does not speed the code up at all, and makes it very hard to follow what it is doing.
There are quite a few things you can improve here. Start by profiling your program using a tool like cProfile. This will tell you where most of the program's time is being spent and thus where optimization is likely to be most helpful. As a hint, you don't need to generate all the triads at every iteration of the program.
You also need to fix your indentation before you can expect a decent answer.
Regardless, this question might be better suited to Code Review.
I'm not sure I understand exactly what you are aiming for, but there are at least two changes that might help. You probably don't need to destroy and create the graph every time in the loop since all you are doing is flipping one edge weight sign. And the computation to find the triangles can be improved.
Here is some code that generates a complete graph with random weights, picks a random edge in a loop, finds the triads and flips the edge weight...
import random
import networkx as nx
# complete graph with random 1/-1 as weight
G=nx.complete_graph(5)
for u,v,d in G.edges(data=True):
d['weight']=random.randrange(-1,2,2) # -1 or 1
edges=G.edges()
for i in range(10):
u,v = random.choice(edges) # random edge
nbrs = set(G[u]) & set(G[v]) - set([u,v]) # nodes in traids
triads = [(u,v,n) for n in nbrs]
print "triads",triads
for u,v,w in triads:
print (u,v,G[u][v]['weight']),(u,w,G[u][w]['weight']),(v,w,G[v][w]['weight'])
G[u][v]['weight']*=-1

Categories