Python - Dijkstra's Algorithm

Python - Dijkstra's Algorithm - python

I need to implement Dijkstra's Algorithm in Python. However, I have to use a 2D array to hold three pieces of information - predecessor, length and unvisited/visited.
I know in C a Struct can be used, though I am stuck on how I can do a similar thing in Python, I am told it's possible but I have no idea to be honest

Create a class for it.
class XXX(object):
def __init__(self, predecessor, length, visited):
self.predecessor = predecessor
self.length = length
self.visited = visited
Or use collections.namedtuple, which is particular cool for holding struct-like compound types without own behaviour but named members: XXX = collections.namedtuple('XXX', 'predecessor length visited').
Create one with XXX(predecessor, length, visited).

As mentioned above, you can use an instance of an object.
This author has a pretty convincing python implementation of Dijkstras in python.
#
# This file contains the Python code from Program 16.16 of
# "Data Structures and Algorithms
# with Object-Oriented Design Patterns in Python"
# by Bruno R. Preiss.
#
# Copyright (c) 2003 by Bruno R. Preiss, P.Eng. All rights reserved.
#
# http://www.brpreiss.com/books/opus7/programs/pgm16_16.txt
#
class Algorithms(object):
def DijkstrasAlgorithm(g, s):
n = g.numberOfVertices
table = Array(n)
for v in xrange(n):
table[v] = Algorithms.Entry()
table[s].distance = 0
queue = BinaryHeap(g.numberOfEdges)
queue.enqueue(Association(0, g[s]))
while not queue.isEmpty:
assoc = queue.dequeueMin()
v0 = assoc.value
if not table[v0.number].known:
table[v0.number].known = True
for e in v0.emanatingEdges:
v1 = e.mateOf(v0)
d = table[v0.number].distance + e.weight
if table[v1.number].distance > d:
table[v1.number].distance = d
table[v1.number].predecessor = v0.number
queue.enqueue(Association(d, v1))
result = DigraphAsLists(n)
for v in xrange(n):
result.addVertex(v, table[v].distance)
for v in xrange(n):
if v != s:
result.addEdge(v, table[v].predecessor)
return result
DijkstrasAlgorithm = staticmethod(DijkstrasAlgorithm)
Notice those pieces of information are 'held' in the object he is constructing by calling Algorithms.Entry(). Entry is a class and is defined like this:
class Entry(object):
"""
Data structure used in Dijkstra's and Prim's algorithms.
"""
def __init__(self):
"""
(Algorithms.Entry) -> None
Constructor.
"""
self.known = False
self.distance = sys.maxint
self.predecessor = sys.maxint
The self.known, self.distance... are those pieces of information. He does not set these explicit in the constructor (init) but sets them later. In Python you can access attributes with dot notation. for examle: myObject= Entry(). the myObject.known, myObject.distance... they are all public.

Encapsulate that information in a Python object and you should be fine.

Or you can simply use tuples or dictionaries inside your 2d array:
width=10
height=10
my2darray = []
for x in range(width):
my2darray[x]=[]
for x in range(width):
for y in range(height):
#here you set the tuple
my2darray[x][y] = (n,l,v)
#or you can use a dict..
my2darray[x][y] = dict(node=foo,length=12,visited=False)

Python is object oriented language. So think of it like moving from Structs in C to Classes of C++. You can use the same class structure in Python as well.

Related

How do I make a struct point in python?

I'm making a Tetris clone in Pygame based on https://www.youtube.com/watch?v=zH_omFPqMO4 and I need to know how to turn
struct Point
{int x,y;} a[4],b[4];
into Python from C++.

So the C++ code, struct point {int x, y} a[4], b[4] creates a new structure which is a data type with two ints x and y. The a[4] and b[4] are two created arrays of type point and each of size 4.
To replicate a structure in python we could use a class
Example:
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
a = []
b = []
# Creating an instance of the Point object
myPoint = Point(5, 6)
# Adding said point to our array
a.append(myPoint)

First, I'm going to echo what was said in the comments: Do not try to do a direct translation. Write your project fresh in Python. Python does things differently. In C++, you deal with memory and justifying what's safe to do with memory. In Python, you deal directly with classes at a high-level. They're at different levels of abstraction, and trying to translate directly is going to result in very awkward and stilted code.
Regardless, the way to write a Point type in Python naively with a class would be
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
But this leaves a lot to be desired. For one, points are not comparable for equality. For another, if you try to print them out, you'll get some sort of awkward Python pointer-like syntax. Instead, for basic structure-like data, you want dataclasses.
#dataclass
class Point:
x: float
y: float
Now you get equality and stringification for free, as well as a nice constructor and a replace function for creating similar instances. If you're planning to make your Point objects immutable (which is a great idea and I highly recommend), then you can throw a frozen=True as argument to the dataclass decorator and get hashing for free as well. This means that you can use points as keys to dictionaries.
#dataclass(frozen=True)
class Point:
x: float
y: float
On top of this, you'll probably want to implement magic methods like __add__ so you can use operators like + on your Point type (reader beware: that article I linked has some outdated bits from Python 2. Definitely read Appendix 2 for the differences, but it's still one of the best summaries of magic methods out there, in spite of its age)

Here we use list comprehension to create list of size 4 with each element initialized to Point(0,0)
class Point:
def __init__(self, x, y):
self.x = x
self.f = y
#create lists named a of size 4
a = [Point(p-p,p-p) for p in range(4) ] #equivalent to a = [Point(0,0), Point(0,0), Point(0,0), Point(0,0)
#create list named b of size 4
b = [Point(p-p,p-p) for p in range(4) ]

so a class is just struct in C++?
struct Point{int x,y;}a[4]b[4]
is
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
a = []
b = []
# Creating an instance of the Point object
myPoint = Point(5, 6)
# Adding said point to our array
a.append(myPoint)
b.append(myPoint)
?

Python multiprocessing pool with shared data

I'm attempting to speed up a multivariate fixed-point iteration algorithm using multiprocessing however, I'm running issues dealing with shared data. My solution vector is actually a named dictionary rather than a vector of numbers. Each element of the vector is actually computed using a different formula. At a high level, I have an algorithm like this:
current_estimate = previous_estimate
while True:
for state in all_states:
current_estimate[state] = state.getValue(previous_estimate)
if norm(current_estimate, previous_estimate) < tolerance:
break
else:
previous_estimate, current_estimate = current_estimate, previous_estimate
I'm trying to parallelize the for-loop part with multiprocessing. The previous_estimate variable is read-only and each process only needs to write to one element of current_estimate. My current attempt at rewriting the for-loop is as follows:
# Class and function definitions
class A(object):
def __init__(self,val):
self.val = val
# representative getValue function
def getValue(self, est):
return est[self] + self.val
def worker(state, in_est, out_est):
out_est[state] = state.getValue(in_est)
def worker_star(a_b_c):
""" Allow multiple arguments for a pool
Taken from http://stackoverflow.com/a/5443941/3865495
"""
return worker(*a_b_c)
# Initialize test environment
manager = Manager()
estimates = manager.dict()
all_states = []
for i in range(5):
a = A(i)
all_states.append(a)
estimates[a] = 0
pool = Pool(process = 2)
prev_est = estimates
curr_est = estimates
pool.map(worker_star, itertools.izip(all_states, itertools.repeat(prev_est), itertools.repreat(curr_est)))
The issue I'm currently running into is that the elements added to the all_states array are not the same as those added to the manager.dict(). I keep getting key value errors when trying to access elements of the dictionary using elements of the array. And debugging, I found that none of the elements are the same.
print map(id, estimates.keys())
>>> [19558864, 19558928, 19558992, 19559056, 19559120]
print map(id, all_states)
>>> [19416144, 19416208, 19416272, 19416336, 19416400]

This is happening because the objects you're putting into the estimates DictProxy aren't actually the same objects as those that live in the regular dict. The manager.dict() call returns a DictProxy, which is proxying access to a dict that actually lives in a completely separate manager process. When you insert things into it, they're really being copied and sent to a remote process, which means they're going to have a different identity.
To work around this, you can define your own __eq__ and __hash__ functions on A, as described in this question:
class A(object):
def __init__(self,val):
self.val = val
# representative getValue function
def getValue(self, est):
return est[self] + self.val
def __hash__(self):
return hash(self.__key())
def __key(self):
return (self.val,)
def __eq__(x, y):
return x.__key() == y.__key()
This means the key look ups for items in the estimates will just use the value of the val attribute to establish identity and equality, rather than the id assigned by Python.

Good Style in Python Objects

Most of my programming prior to Python was in C++ or Matlab. I don't have a degree in CS (almost completed a PhD in physics), but have done some courses and a good amount of actual programming. Now, I'm taking an algorithms course on Coursera (excellent course, by the way, with a professor from Stanford). I decided to implement the homeworks in Python. However, sometimes I find myself wanting things the language does not so easily support. I'm very used to creating classes and objects for things in C++ just to group together data (i.e. when there are no methods). In Python however, where you can add fields on the fly, what I basically end up wanting all the time are Matlab structs. I think this is possibly a sign I am not using good style and doing things the "Pythonic" way.
Underneath is my implementation of a union-find data structure (for Kruskal's algorithm). Although the implementation is relatively short and works well (there isn't much error checking), there are a few odd points. For instance, my code assumes that the data originally passed in to the union-find is a list of objects. However, if a list of explicit pieces of data are passed in instead (i.e. a list of ints), the code fails. Is there some much clearer, more Pythonic way to implement this? I have tried to google this, but most examples are very simple and relate more to procedural code (i.e. the "proper" way to do a for loop in python).
class UnionFind:
def __init__(self,data):
self.data = data
for d in self.data:
d.size = 1
d.leader = d
d.next = None
d.last = d
def find(self,element):
return element.leader
def union(self,leader1,leader2):
if leader1.size >= leader2.size:
newleader = leader1
oldleader = leader2
else:
newleader = leader2
oldleader = leader1
newleader.size = leader1.size + leader2.size
d = oldleader
while d != None:
d.leader = newleader
d = d.next
newleader.last.next = oldleader
newleader.last = oldleader.last
del(oldleader.size)
del(oldleader.last)

Generally speaking, doing this sort of thing Pythonically means that you try to make your code not care what is given to it, at least not any more than it really needs to.
Let's take your particular example of the union-find algorithm. The only thing that the union-find algorithm actually does with the values you pass to it is compare them for equality. So to make a generally useful UnionFind class, your code shouldn't rely on the values it receives having any behavior other than equality testing. In particular, you shouldn't rely on being able to assign arbitrary attributes to the values.
The way I would suggest getting around this is to have UnionFind use wrapper objects which hold the given values and any attributes you need to make the algorithm work. You can use namedtuple as suggested by another answer, or make a small wrapper class. When an element is added to the UnionFind, you first wrap it in one of these objects, and use the wrapper object to store the attributes leader, size, etc. The only time you access the thing being wrapped is to check whether it is equal to another value.
In practice, at least in this case, it should be safe to assume that your values are hashable, so that you can use them as keys in a Python dictionary to find the wrapper object corresponding to a given value. Of course, not all objects in Python are necessarily hashable, but those that are not are relatively rare and it's going to be a lot more work to make a data structure that is able to handle those.

The more pythonic way is to avoid tedious objects if you don't have to.
class UnionFind(object):
def __init__(self, members=10, data=None):
"""union-find data structure for Kruskal's algorithm
members are ignored if data is provided
"""
if not data:
self.data = [self.default_data() for i in range(members)]
for d in self.data:
d.size = 1
d.leader = d
d.next = None
d.last = d
else:
self.data = data
def default_data(self):
"""create a starting point for data"""
return Data(**{'last': None, 'leader':None, 'next': None, 'size': 1})
def find(self, element):
return element.leader
def union(self, leader1, leader2):
if leader2.leader is leader1:
return
if leader1.size >= leader2.size:
newleader = leader1
oldleader = leader2
else:
newleader = leader2
oldleader = leader1
newleader.size = leader1.size + leader2.size
d = oldleader
while d is not None:
d.leader = newleader
d = d.next
newleader.last.next = oldleader
newleader.last = oldleader.last
oldleader.size = 0
oldleader.last = None
class Data(object):
def __init__(self, **data_dict):
"""convert a data member dict into an object"""
self.__dict__.update(**data_dict)

One option is to use dictionaries to store the information you need about a data item, rather than attributes on the item directly. For instance, rather than referring to d.size you could refer to size[d] (where size is a dict instance). This requires that your data items be hashable, but they don't need to allow attributes to be assigned on them.
Here's a straightforward translation of your current code to use this style:
class UnionFind:
def __init__(self,data):
self.data = data
self.size = {d:1 for d in data}
self.leader = {d:d for d in data}
self.next = {d:None for d in data}
self.last = {d:d for d in data}
def find(self,element):
return self.leader[element]
def union(self,leader1,leader2):
if self.size[leader1] >= self.size[leader2]:
newleader = leader1
oldleader = leader2
else:
newleader = leader2
oldleader = leader1
self.size[newleader] = self.size[leader1] + self.size[leader2]
d = oldleader
while d != None:
self.leader[d] = newleader
d = self.next[d]
self.next[self.last[newleader]] = oldleader
self.last[newleader] = self.last[oldleader]
A minimal test case:
>>> uf = UnionFind(list(range(100)))
>>> uf.find(10)
10
>>> uf.find(20)
20
>>> uf.union(10,20)
>>> uf.find(10)
10
>>> uf.find(20)
10
Beyond this, you could also consider changing your implementation a bit to require less initialization. Here's a version that doesn't do any initialization (it doesn't even need to know the set of data it's going to work on). It uses path compression and union-by-rank rather than always maintaining an up-to-date leader value for all members of a set. It should be asymptotically faster than your current code, especially if you're doing a lot of unions:
class UnionFind:
def __init__(self):
self.rank = {}
self.parent = {}
def find(self, element):
if element not in self.parent: # leader elements are not in `parent` dict
return element
leader = self.find(self.parent[element]) # search recursively
self.parent[element] = leader # compress path by saving leader as parent
return leader
def union(self, leader1, leader2):
rank1 = self.rank.get(leader1,1)
rank2 = self.rank.get(leader2,1)
if rank1 > rank2: # union by rank
self.parent[leader2] = leader1
elif rank2 > rank1:
self.parent[leader1] = leader2
else: # ranks are equal
self.parent[leader2] = leader1 # favor leader1 arbitrarily
self.rank[leader1] = rank1+1 # increment rank

For checking if an argument is of the expected type, use the built-in isinstance() function:
if not isinstance(leader1, UnionFind):
raise ValueError('leader1 must be a UnionFind instance')
Additionally, it is a good habit to add docstrings to functions, classes and member functions. Such a docstring for a function or method should describe what it does, what arguments are to be passed to it and if applicable what is returned and which exceptions can be raised.

I'm guessing that the indentation issues here are just simple errors with inputting the code into SO. Could you possibly create a subclass of a simple, built in data type? For instance, you can create a sub-class of the list data type by putting the datatype in parenthesis:
class UnionFind(list):
'''extends list object'''

How do I copy a class and its list members in Python 2.7 and not copy the references?

I read this about Python classes (link) and it seems to be the issue I am having.
Here is an excerpt from my class and other code:
class s_board:
def __init__(self):
self.__board = [[n for n in range(1, 10)] for m in range(81)]
self.__solved = [False for m in range(81)]
def copy(self):
b = s_board()
b.__board = self.__board[:]
b.__solved = self.__solved[:]
return b
if __name__ == '__main__':
A = s_board()
B = A.copy()
B.do_some_operation_on_lists()
When I call B's method that does something to the list, A's lists seem to be affected as well.
So my questions:
Am I not copying the class or the lists correctly?
Is there another issue here?
How do I fix it so that I get a new copy of the class?

self.__board[:] creates a new list containing references to all the same objects that were in self.__board. Since self.__board contains lists, and lists are mutable, you end up with the two s_board instances with partially aliased data, and changing one affects the other.
As Raymond Hettinger suggested, you can use the copy.deepcopy to (mostly) guarantee that you take a true copy of an object and don't share any data. I say mostly, as I believe there are some strange objects that deepcopy will not work on, but for normal things like lists and straightforward classes it will work fine.
I have an additional suggestion though. You call b = s_board(), which goes to all the effort of constructing the lists for the new blank board, and then you throw them away by assigning to b.__board and b.__solved. It seems to be like it would be better to do something like the following:
class s_board:
def __init__(self, board=None, solved=None):
if board is None:
self.__board = [[n for n in range(1, 10)] for m in range(81)]
else:
self.__board = copy.deepcopy(board)
if solved is None:
self.__solved = [False for m in range(81)]
else:
self.__solved = copy.deepcopy(solved)
def copy(self):
b = s_board(self.__board, self.__solved)
return b
Now if you call A = s_board() you get a new blank board, and if you call A.copy() you get a distinct copy of A, without having had to allocate and then discard a new blank board.

try using deepcopy() instead of copy()
copy() inserts references if it is able, deepcopy() should copy all of the members without using references.

The inner lists are being shared. Here's an article that explains what is happening: http://www.python-course.eu/deep_copy.php
To fix the code, you can use copy.deepcopy to make sure there is no shared data:
def copy(self):
b = s_board()
b.__board = copy.deepcopy(self.__board)
b.__solved = copy.deepcopy(self.__solved)
return b

Cythonize a Python function to make it faster

Few weeks ago I asked a question on increasing the speed of a function written in Python. At that time, TryPyPy brought to my attention the possibility of using Cython for doing so. He also kindly gave an example of how I could Cythonize that code snippet. I want to do the same with the code below to see how fast I can make it by declaring variable types. I have a couple of questions related to that. I have seen the Tutorial on the cython.org, but I still have some questions. They are closely related:
I don't know any C. What parts do I need to learn, to use Cython to declare variable types?
What is the C type corresponding to python lists and tuples? For example, I can use double in Cython for float in Python. What do I do for lists? In general, where do I find the corresponding C type for a given Python type.
Any example of how I could Cythonize the code below would be really helpful. I have inserted comments in the code that give information about the variable type.
class Some_class(object):
** Other attributes and functions **
def update_awareness_status(self, this_var, timePd):
'''Inputs: this_var (type: float)
timePd (type: int)
Output: None'''
max_number = len(self.possibilities)
# self.possibilities is a list of tuples.
# Each tuple is a pair of person objects.
k = int(math.ceil(0.3 * max_number))
actual_number = random.choice(range(k))
chosen_possibilities = random.sample(self.possibilities,
actual_number)
if len(chosen_possibilities) > 0:
# chosen_possibilities is a list of tuples, each tuple is a pair
# of person objects. I have included the code for the Person class
# below.
for p1,p2 in chosen_possibilities:
# awareness_status is a tuple (float, int)
if p1.awareness_status[1] < p2.awareness_status[1]:
if p1.value > p2.awareness_status[0]:
p1.awareness_status = (this_var, timePd)
else:
p1.awareness_status = p2.awareness_status
elif p1.awareness_status[1] > p2.awareness_status[1]:
if p2.value > p1.awareness_status[0]:
p2.awareness_status = (price, timePd)
else:
p2.awareness_status = p1.awareness_status
else:
pass
class Person(object):
def __init__(self,id, value):
self.value = value
self.id = id
self.max_val = 50000
## Initial awareness status.
self.awarenessStatus = (self.max_val, -1)

As a general note, you can see exactly what C code Cython generates for every source line by running the cython command with the -a "annotate" option. See the Cython documentation for examples. This is extremely helpful when trying to find bottlenecks in a function's body.
Also, there's the concept of "early binding for speed" when Cython-ing your code. A Python object (like instances of your Person class below) use general Python code for attribute access, which is slow when in an inner loop. I suspect that if you change the Person class to a cdef class, then you will see some speedup. Also, you need to type the p1 and p2 objects in the inner loop.
Since your code has lots of Python calls (random.sample for example), you likely won't get huge speedups unless you find a way to put those lines into C, which takes a good amount of effort.
You can type things as a tuple or a list, but it doesn't often mean much of a speedup. Better to use C arrays when possible; something you'll have to look up.
I get a factor of 1.6 speedup with the trivial modifications below. Note that I had to change some things here and there to get it to compile.
ctypedef int ITYPE_t
cdef class CyPerson:
# These attributes are placed in the extension type's C-struct, so C-level
# access is _much_ faster.
cdef ITYPE_t value, id, max_val
cdef tuple awareness_status
def __init__(self, ITYPE_t id, ITYPE_t value):
# The __init__ function is much the same as before.
self.value = value
self.id = id
self.max_val = 50000
## Initial awareness status.
self.awareness_status = (self.max_val, -1)
NPERSONS = 10000
import math
import random
class Some_class(object):
def __init__(self):
ri = lambda: random.randint(0, 10)
self.possibilities = [(CyPerson(ri(), ri()), CyPerson(ri(), ri())) for i in range(NPERSONS)]
def update_awareness_status(self, this_var, timePd):
'''Inputs: this_var (type: float)
timePd (type: int)
Output: None'''
cdef CyPerson p1, p2
price = 10
max_number = len(self.possibilities)
# self.possibilities is a list of tuples.
# Each tuple is a pair of person objects.
k = int(math.ceil(0.3 * max_number))
actual_number = random.choice(range(k))
chosen_possibilities = random.sample(self.possibilities,
actual_number)
if len(chosen_possibilities) > 0:
# chosen_possibilities is a list of tuples, each tuple is a pair
# of person objects. I have included the code for the Person class
# below.
for persons in chosen_possibilities:
p1, p2 = persons
# awareness_status is a tuple (float, int)
if p1.awareness_status[1] < p2.awareness_status[1]:
if p1.value > p2.awareness_status[0]:
p1.awareness_status = (this_var, timePd)
else:
p1.awareness_status = p2.awareness_status
elif p1.awareness_status[1] > p2.awareness_status[1]:
if p2.value > p1.awareness_status[0]:
p2.awareness_status = (price, timePd)
else:
p2.awareness_status = p1.awareness_status

C does not directly know the concept of lists.
The basic data types are int (char, short, long), float/double (all of which have pretty straightforward mappings to python) and pointers.
If the concept of pointers is new to you, have a look at: Wikipedia:Pointers
Pointers can then be used as tuple/array replacements in some cases. Pointers of chars are the base for all strings.
Say you have an array of integers, you would then store it in as a continuous chunk of memory with a start address, you define the type (int) and that it’s a pointer (*):
cdef int * array;
Now you can access each element of the array like this:
array[0] = 1
However, memory has to be allocated (e.g. using malloc) and advanced indexing will not work (e.g. array[-1] will be random data in memory, this also hold for indexes exceeding the width of the reserved space).
More complex types don't directly map to C, but often there is a C way to do something that might not require the python types (e.g. a for loop does not need a range array/iterator).
As you noticed yourself, writing good cython code requires more detailed knowledge of C, so heading forward to a tutorial is probably the best next step.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.