Find path between source and dest

Find path between source and dest - python

I am looking for python code help to find the distance between source to destination. The input to the function will be number of Rows, number of columns and area which is number of Rows X number of columns matrix.
We could traverse one cell at a time up, down, left, or right.
The accessible areas are represented by 1, inaccessible 0 and destination 9.
Sample input
numRows=3
numCols=3
alist=[[1,0,0],[1,0,0],[1,9,1]]
Output: Should be an integer representing total distance to destination or -1 if there is no path
For the sample input, the path traversed will be (0,0)->(1,0)->(2,0)->(2,1) and the function should return and output of 3
Here's the pseudo code I could get to but need help figuring out the complete solution.
def findthepath(numRows,numCols,alist):
visited=[]
if alist[0,0] == 9:
return 0
for i in range(numRows):
for j in range(numCol):
if alist[i][j] == 1
visited.append(alist[i][j])

You want a Shortest Path algorithm. The most commonly used one is Dijkstra's Algorithm. Slightly difficult to understand, but fairly easy to implement. Example: https://gist.github.com/econchick/4666413
I've done this for C#, but not Python. It could be done fairly easily as the above link demonstrates, though.

Related

Using variable names for 2d matrix elements for readability

While solving Leetcode problems I've been trying to make my answers as easily intelligible as possible, so I can quickly glance at them later and make sense of them. Toward that end I assigned variable names to indices of interest in a 2D list. When I see "matrix[i][j+1]" and variations thereof repeatedly, I sometimes lose track of what I'm dealing with.
So, for this problem: https://leetcode.com/problems/maximal-square/
I wrote this code:
class Solution:
def maximalSquare(self, matrix: List[List[str]]) -> int:
maximum = 0
for y in range(len(matrix)):
for x in range(len(matrix[0])):
#convert to integer from string
matrix[y][x] = int(matrix[y][x])
#use variable for readability
current = matrix[y][x]
#build largest square counts by checking neighbors above and to left
#so, skip anything in first row or first column
if y!=0 and x!=0 and current==1:
#assign variables for readability. We're checking adjacent squares
left = matrix[y][x-1]
up = matrix[y-1][x]
upleft = matrix[y-1][x-1]
#have to use matrix directly to set new value
matrix[y][x] = current = 1 + min(left, up, upleft)
#reevaluate maximum
if current > maximum:
maximum = current
#return maximum squared, since we're looking for largest area of square, not largest side
return maximum**2
I don't think I've seen people do this before and I'm wondering if it's a bad idea, since I'm sort of maintaining two versions of a value.
Apologies if this is a "coding style" question and therefore just a matter of opinion, but I thought there might be a clear answer that I just haven't found yet.

It is very hard to give a straightforward answer, because it might vary from person to person. Let me start from your queries:
When I see "matrix[i][j+1]" and variations thereof repeatedly, I sometimes lose track of what I'm dealing with.
It depends. People who have moderate programming knowledge should not be confused by seeing a 2-D matrix in matrix[x-pos][y-pos] shape. Again, if you don't feel comfortable, you can use the way you have shared here. But, you should try to adopt and be familiar with this type of common concepts parallelly.
I don't think I've seen people do this before and I'm wondering if it's a bad idea, since I'm sort of maintaining two versions of a value.
It is not a bad idea at all. It is "Okay" as long as you are considering to do this for your comfort. But, if you like to share your code with others, then it might not be a very good idea to use something that is too obvious. It might reduce the understandability of your code to others. But, you should not worry with the maintaining two versions of a value, as long as the extra memory is constant.
Apologies if this is a "coding style" question and therefore just a matter of opinion, but I thought there might be a clear answer that I just haven't found yet.
You are absolutely fine by asking this question. As you mentioned, it is really just a matter of opinion. You can follow some standard language guideline like Google Python Style Guide. It is always recommended to follow some standards for this type of coding style things. Always keep in mind, a piece of good code is always self-documented and putting unnecessary comments sometimes make it boring. Also,
Here I have shared my version of your code. Feel free to comment if you have any question.
# Time: O(m*n)
# Space: O(1)
class Solution:
def maximalSquare(self, matrix: List[List[str]]) -> int:
"""Given an m x n binary matrix filled with 0's and 1's,
find the largest square containing only 1's and return its area.
Args:
matrix: An (m x n) string matrix.
Returns:
Area of the largest square containing only 1's.
"""
maximum = 0
for x in range(len(matrix)):
for y in range(len(matrix[0])):
# convert current matrix cell value from string to integer
matrix[x][y] = int(matrix[x][y])
# build largest square side by checking neighbors from up-row and left-column
# so, skip the cells from the first-row and first-column
if x != 0 and y != 0 and matrix[x][y] != 0:
# update current matrix cell w.r.t. the left, up and up-left cell values respectively
matrix[x][y] = 1 + min(matrix[x][y-1], matrix[x-1][y], matrix[x-1][y-1])
# re-evaluate maximum square side
if matrix[x][y] > maximum:
maximum = matrix[x][y]
# returning the area of the largest square
return maximum**2

Anyone knows a more efficient way to run a pairwise comparison of hundreds of trajectories?

So I have two different files containing multiple trajectories in a squared map (512x512 pixels). Each file contains information about the spatial position of each particle within a track/trajectory (X and Y coordinates) and to which track/trajectory that spot belongs to (TRACK_ID).
My goal was to find a way to cluster similar trajectories between both files. I found a nice way to do this (distance clustering comparison), but the code it's too slow. I was just wondering if someone has some suggestions to make it faster.
My files look something like this:
The approach that I implemented finds similar trajectories based on something called Fréchet Distance (maybe not to relevant here). Below you can find the function that I wrote, but briefly this is the rationale:
group all the spots by track using pandas.groupby function for file1 (growth_xml) and file2 (shrinkage_xml)
for each trajectories in growth_xml (loop) I compare with each trajectory in growth_xml
if they pass the Fréchet Distance criteria that I defined (an if statement) I save both tracks in a new table. you can see an additional filter condition that I called delay, but I guess that is not important to explain here.
so really simple:
def distance_clustering(growth_xml,shrinkage_xml):
coords_g = pd.DataFrame() # empty dataframes to save filtered tracks
coords_s = pd.DataFrame()
counter = 0 #initalize counter to count number of filtered tracks
for track_g, param_g in growth_xml.groupby('TRACK_ID'):
# define growing track as multi-point line object
traj1 = [(x,y) for x,y in zip(param_g.POSITION_X.values, param_g.POSITION_Y.values)]
for track_s, param_s in shrinkage_xml.groupby('TRACK_ID'):
# define shrinking track as a second multi-point line object
traj2 = [(x,y) for x,y in zip(param_s.POSITION_X.values, param_s.POSITION_Y.values)]
# compute delay between shrinkage and growing ends to use as an extra filter
delay = (param_s.FRAME.iloc[0] - param_g.FRAME.iloc[0])
# keep track only if the frechet Distance is lower than 0.2 microns
if frechetDist(traj1, traj2) < 0.2 and delay > 0:
counter += 1
param_g = param_g.assign(NEW_ID = np.ones(param_g.shape[0]) * counter)
coords_g = pd.concat([coords_g, param_g])
param_s = param_s.assign(NEW_ID = np.ones(param_s.shape[0]) * counter)
coords_s = pd.concat([coords_s, param_s])
coords_g.reset_index(drop = True, inplace = True)
coords_s.reset_index(drop = True, inplace = True)
return coords_g, coords_s
The main problem is that most of the times I have more than 2 thousand tracks (!!) and this pairwise combination takes forever. I'm wondering if there's a simple and more efficient way to do this. Perhaps by doing the pairwise combination in multiple small areas instead of the whole map? not sure...

Have you tried to make a matrix (DeltaX,DeltaY) lookUpTable for the pairwise combination distance. It will take some long time to calc the LUT once, or you can write it in a file and load it when the algo starts.
Then you'll only have to look on correct case to have the result instead of calc each time.
You can too make a polynomial regression for the distance calc, it will be less precise but definitely faster

Maybe not an outright answer, but it's been a while. Could you not segment the lines and use minimum bounding box around each segment to assess similarities? I might be thinking of your problem the wrong way around. I'm not sure. Right now I'm trying to work with polygons from two different data sets and want to optimize the processing by first identifying the polygons in both geometries that overlap.
In your case, I think segments would you leave you with some edge artifacts. Maybe look at this paper: https://drops.dagstuhl.de/opus/volltexte/2021/14879/pdf/OASIcs-ATMOS-2021-10.pdf or this paper (with python code): https://www.austriaca.at/0xc1aa5576_0x003aba2b.pdf

Genetic Algorithm / AI; basically, am I on the right track?

I know Python isn't the best idea to be writing any kind of software of this nature. My reasoning is to use this type of algorithm for a Raspberry Pi 3 in it's decision making (still unsure how that will go), and the libraries and APIs that I'll be using (Adafruit motor HATs, Google services, OpenCV, various sensors, etc) all play nicely for importing in Python, not to mention I'm just more comfortable in this environment for the rPi specifically. Already I've cursed it as object oriented such as Java or C++ just makes more sense to me, but Id rather deal with its inefficiencies and focus on the bigger picture of integration for the rPi.
I won't explain the code here, as it's pretty well documented in the comment sections throughout the script. My questions is as stated above; can this be considered basically a genetic algorithm? If not, what must it have to be a basic AI or genetic code? Am I on the right track for this type of problem solving? I know usually there are weighted variables and functions to promote "survival of the fittest", but that can be popped in as needed, I think.
I've read up quite a bit of forums and articles about this topic. I didn't want to copy someone else's code that I barely understand and start using it as a base for a larger project of mine; I want to know exactly how it works so I'm not confused as to why something isn't working out along the way. So, I just tried to comprehend the basic idea of how it works, and write how I interpreted it. Please remember I'd like to stay in Python for this. I know rPi's have multiple environments for C++, Java, etc, but as stated before, most hardware components I'm using have only Python APIs for implementation. if I'm wrong, explain at the algorithmic level, not just with a block of code (again, I really want to understand the process). Also, please don't nitpick code conventions unless it's pertinent to my problem, everyone has a style and this is just a sketch up for now. Here it is, and thanks for reading!
# Created by X3r0, 7/3/2016
# Basic genetic algorithm utilizing a two dimensional array system.
# the 'DNA' is the larger array, and the 'gene' is a smaller array as an element
# of the DNA. There exists no weighted algorithms, or statistical tracking to
# make the program more efficient yet; it is straightforwardly random and solves
# its problem randomly. At this stage, only the base element is iterated over.
# Basic Idea:
# 1) User inputs constraints onto array
# 2) Gene population is created at random given user constraints
# 3) DNA is created with randomized genes ( will never randomize after )
# a) Target DNA is created with loop control variable as data (basically just for some target structure)
# 4) CheckDNA() starts with base gene from DNA, and will recurse until gene matches the target gene
# a) Randomly select two genes from DNA
# b) Create a candidate gene by splicing both parent genes together
# c) Check candidate gene against the target gene
# d) If there exists a match in gene elements, a child gene is created and inserted into DNA
# e) If the child gene in DNA is not equal to target gene, recurse until it is
import random
DNAsize = 32
geneSize = 5
geneDiversity = 9
geneSplit = 4
numRecursions = 0
DNA = []
targetDNA = []
def init():
global DNAsize, geneSize, geneDiversity, geneSplit, DNA
print("This is a very basic form of genetic software. Input variable constraints below. "
"Good starting points are: DNA strand size (array size): 32, gene size (sub array size: 5, gene diversity (randomized 0 - x): 5"
"gene split (where to split gene array for splicing): 2")
DNAsize = int(input('Enter DNA strand size: '))
geneSize = int(input('Enter gene size: '))
geneDiversity = int(input('Enter gene diversity: '))
geneSplit = int(input('Enter gene split: '))
# initializes the gene population, and kicks off
# checkDNA recursion
initPop()
checkDNA(DNA[0])
def initPop():
# builds an array of smaller arrays
# given DNAsize
for x in range(DNAsize):
buildDNA()
# builds the goal array with a recurring
# numerical pattern, in this case just the loop
# control variable
buildTargetDNA(x)
def buildDNA():
newGene = []
# builds a smaller array (gene) using a given geneSize
# and randomized with vaules 0 - [given geneDiversity]
for x in range(geneSize):
newGene.append(random.randint(0,geneDiversity))
# append the built array to the larger array
DNA.append(newGene)
def buildTargetDNA(x):
# builds the target array, iterating with x as a loop
# control from the call in init()
newGene = []
for y in range(geneSize):
newGene.append(x)
targetDNA.append(newGene)
def checkDNA(childGene):
global numRecursions
numRecursions = numRecursions+1
gene = DNA[0]
targetGene = targetDNA[0]
parentGeneA = DNA[random.randint(0,DNAsize-1)] # randomly selects an array (gene) from larger array (DNA)
parentGeneB = DNA[random.randint(0,DNAsize-1)]
pos = random.randint(geneSplit-1,geneSplit+1) # randomly selects a position to split gene for splicing
candidateGene = parentGeneA[:pos] + parentGeneB[pos:] # spliced gene given split from parentA and parentB
print("DNA Splice Position: " + str(pos))
print("Element A: " + str(parentGeneA))
print("Element B: " + str(parentGeneB))
print("Candidate Element: " + str(candidateGene))
print("Target DNA: " + str(targetDNA))
print("Old DNA: " + str(DNA))
# iterates over the candidate gene and compares each element to the target gene
# if the candidate gene element hits a target gene element, the resulting child
# gene is created
for x in range(geneSize):
#if candidateGene[x] != targetGene[x]:
#print("false ")
if candidateGene[x] == targetGene[x]:
#print("true ")
childGene.pop(x)
childGene.insert(x, candidateGene[x])
# if the child gene isn't quite equal to the target, and recursion hasn't reached
# a max (apparently 900), the child gene is inserted into the DNA. Recursion occurs
# until the child gene equals the target gene, or max recursuion depth is exceeded
if childGene != targetGene and numRecursions < 900:
DNA.pop(0)
DNA.insert(0, childGene)
print("New DNA: " + str(DNA))
print(numRecursions)
checkDNA(childGene)
init()
print("Final DNA: " + str(DNA))
print("Number of generations (recursions): " + str(numRecursions))

I'm working with evolutionary computation right now so I hope my answer will be helpful for you, personally, I work with java, mostly because is one of my main languages, and for the portability, because I tested in linux, windows and mac. In my case I work with permutation encoding, but if you are still learning how GA works, I strongly recommend binary encoding. This is what I called my InitialPopulation. I try to describe my program's workflow:
1-. Set my main variables
This are PopulationSize, IndividualSize, MutationRate, CrossoverRate. Also you need to create an objective function and decide the crossover method you use. For this example lets say that my PopulationSize is equals to 50, the IndividualSize is 4, MutationRate is 0.04%, CrossoverRate is 90% and the crossover method will be roulette wheel.
My objective function only what to check if my Individuals are capable to represent the number 15 in binary, so the best individual must be 1111.
2-. Initialize my Population
For this I create 50 individuals (50 is given by my PopulationSize) with random genes.
3-. Loop starts
For each Individuals in Population you need to:
Evaluate fitness according to the objective function. If an Individual is represented by the next characters: 00100 this mean that his fitness is 1. As you can see this is a simple fitness function. You can create your own while you are learning, like fitness = 1/numberOfOnes. Also you need to assign the sum of all the fitness to a variable called populationFitness, this will be useful in the next step.
Select the best individuals. For this task there's a lot of methods you can use, but we will use the roulette wheel method as we say before. In this method, You assign a value to every individual inside your population. This value is given by the next formula: (fitness/populationFitness) * 100. So, if your population fitness is 10, and a certain individual fitness is 3, this mean that this individual has a 30% chance to be selected to make a crossover with another individual. Also, if another individual have a 4 in his fitness, his value will be 40%.
Apply crossover. Once you have the "best" individuals of your population, you need to create a new population. This new population is formed by others individuals of the previous population. For each individual you create a random number from 0 to 1. If this numbers is in the range of 0.9 (since our crossoverRate = 90%), this individual can reproduce, so you select another individual. Each new individual has this 2 parents who inherit his genes. For example:
Lets say that parentA = 1001 and parentB = 0111. We need to create a new individual with this genes. There's a lot of methods to do this, uniform crossover, single point crossover, two point crossover, etc. We will use the single point crossover. In this method we choose a random point between the first gene and the last gene. Then, we create a new individual according to the first genes of parentA and the last genes of parentB. In a visual form:
parentA = 1001
parentB = 0111
crossoverPoint = 2
newIndividual = 1011
As you can see, the new individual share his parents genes.
Once you have a new population with new individuals, you apply the mutation. In this case, for each individual in the new population generate a random number between 0 and 1. If this number is in the range of 0.04 (since our mutationRate = 0.04), you apply the mutation in a random gene. In binary encoding the mutation is just change the 1's for 0's or viceversa. In a visual form:
individual = 1011
randomPoint = 3
mutatedIndividual = 1010
Get the best individual
If this individual has reached the solution stop. Else, repeat the loop
End
As you can see, my english is not very good, but I hope you understand the basic idea of a genetic algorithm. If you are truly interested in learning this, you can check the following links:
http://www.obitko.com/tutorials/genetic-algorithms/
This link explains in a clearer way the basics of a genetic algorithm
http://natureofcode.com/book/chapter-9-the-evolution-of-code/
This book also explain what a GA is, but also provide some code in Processing, basically java. But I think you can understand.
Also I would recommend the following books:
An Introduction to Genetic Algorithms - Melanie Mitchell
Evolutionary algorithms in theory and practice - Thomas Bäck
Introduction to genetic algorithms - S. N. Sivanandam
If you have no money, you can easily find all this books in PDF.
Also, you can always search for articles in scholar.google.com
Almost all are free to download.

Just to add a bit to Alberto's great answer, you need to watch out for two issues as your solution evolves.
The first one is Over-fitting. This basically means that your solution is complex enough to "learn" all samples, but it is not applicable outside the training set. To avoid this, your need to make sure that the "amount" of information in your training set is a lot larger than the amount of information that can fit in your solution.
The second problem is Plateaus. There are cases where you would arrive at certain mediocre solutions that are nonetheless, good enough to "outcompete" any emerging solution, so your progress stalls (one way to see this is, if you see your fitness "stuck" at a certain, less than optimal number). One method for dealing with this is Extinctions: You could track the rate of improvement of your optimal solution, and if the improvement has been 0 for the last N generations, you just Nuke your population. (That is, delete your population and the list of optimal individuals and start over). Randomness will make it so that eventually the Solutions will surpass the Plateau.
Another thing to keep in mind, is that the default Random class is really bad at Randomness. I have had solutions improve dramatically by simply using something like the Mesernne Twister Random generator or a Hardware Entropy Generator.
I hope this helps. Good luck.

Modifying Dijkstra's algorithm for least changes

I'm using a version of Dijkstra's algorithm written in Python which I found online, and it works great. But because this is for bus routes, changing 10 times might be the shortest route, but probably not the quickest and definitely not the easiest. I need to modify it somehow to return the path with the least number of changes, regardless of distance to be honest (obviously if 2 paths have equal number of changes, choose the shortest one). My current code is as follows:
from priodict import priorityDictionary
def Dijkstra(stops,start,end=None):
D = {} # dictionary of final distances
P = {} # dictionary of predecessors
Q = priorityDictionary() # est.dist. of non-final vert.
Q[start] = 0
for v in Q:
D[v] = Q[v]
print v
if v == end: break
for w in stops[v]:
vwLength = D[v] + stops[v][w]
if w in D:
if vwLength < D[w]:
raise ValueError, "Dijkstra: found better path to already-final vertex"
elif w not in Q or vwLength < Q[w]:
Q[w] = vwLength
P[w] = v
return (D,P)
def shortestPath(stops,start,end):
D,P = Dijkstra(stops,start,end)
Path = []
while 1:
Path.append(end)
if end == start: break
end = P[end]
Path.reverse()
return Path
stops = MASSIVE DICTIONARY WITH VALUES (7800 lines)
print shortestPath(stops,'Airport-2001','Comrie-106')
I must be honest - I aint no mathematician so I don't quite understand the algorithm fully, despite all my research on it.
I have tried changing a few things but I don't get even close.
Any help? Thanks!

Here is a possible solution:
1)Run breadth first search from the start vertex. It will find the path with the least number of changes, but not the shortest among them. Let's assume that after running breadth first search dist[i] is the distance between the start and the i vertex.
2)Now one can run Djikstra algorithm on modified graph(add only those edges from the initial graph which satisfy this condition: dist[from] + 1 == dist[to]). The shortest path in this graph is the one you are looking for.
P.S If you don't want to use breadth first search, you can use Djikstra algorithm after making all edges' weights equal to 1.

What i would do is to add an offset to the actual costs if you have to change the line. For example if your edge weights represent the time needed between 2 stations, i would add the average waiting time between Line1 Line2 at station X (e.g. 0.5*maxWaitingTime) during the search process. Of course this is a heuristic solution for the problem. If your timetables are known, you can calculate a "exact" solution or at least a solution that satisfies the model because in reality you can't assume that every bus is always on time.

The solution is simple: instead of using the distances as weights, use a wright of 1 for each stop. Dijkstra's algorithm will minimize the number of changes as you requested (the total path weight is the number of rides, which is the number of changes +1). If you want to use the distance to break ties, use something like
vwLength = D[v] + 1+ alpha*stops[v][w]
where alpha<<1, e.g. alpha=0.0001
Practically, I think you're approach is exaggerated. You don't want to fly from Boston to Toronto through Paris even if two of flights are the minimum. I would play with alpha to get an approximation of total traveling time, which is what probably matters.

Modeling a graph in Python

I'm trying to solve a problem related to graphs in Python. Since its a comeptitive programming problem, I'm not using any other 3rd party packages.
The problem presents a graph in the form of a 5 X 5 square grid.
A bot is assumed to be at a user supplied position on the grid. The grid is indexed at (0,0) on the top left and (4,4) on the bottom right. Each cell in the grid is represented by any of the following 3 characters. ‘b’ (ascii value 98) indicates the bot’s current position, ‘d’ (ascii value 100) indicates a dirty cell and ‘-‘ (ascii value 45) indicates a clean cell in the grid.
For example below is a sample grid where the bot is at 0 0:
b---d
-d--d
--dd-
--d--
----d
The goal is to clean all the cells in the grid, in minimum number of steps.
A step is defined as a task, where either
i) The bot changes it position
ii) The bot changes the state of the cell (from d to -)
Assume that initially the position marked as b need not be cleaned. The bot is allowed to move UP, DOWN, LEFT and RIGHT.
My approach
I've read a couple of tutorials on graphs,and decided to model the graph as an adjacency matrix of 25 X 25 with 0 representing no paths, and 1 representing paths in the matrix (since we can move only in 4 directions). Next, I decided to apply Floyd Warshell's all pairs shortest path algorithm to it, and then sum up the values of the paths.
But I have a feeling that it won't work.
I'm in a delimma that the problem is either one of the following:
i) A Minimal Spanning Tree (which I'm unable to do, as I'm not able to model and store the grid as a graph).
ii) A* Search (Again a wild guess, but the same problem here, I'm not able to model the grid as a graph properly).
I'd be thankful if you could suggest a good approach at problems like these. Also, some hint and psuedocode about various forms of graph based problems (or links to those) would be helpful. Thank

I think you're asking two questions here.
1. How do I represent this problem as a graph in Python?
As the robot moves around, he'll be moving from one dirty square to another, sometimes passing through some clean spaces along the way. Your job is to figure out the order in which to visit the dirty squares.
# Code is untested and may contain typos. :-)
# A list of the (x, y) coordinates of all of the dirty squares.
dirty_squares = [(0, 4), (1, 1), etc.]
n = len(dirty_squares)
# Everywhere after here, refer to dirty squares by their index
# into dirty_squares.
def compute_distance(i, j):
return (abs(dirty_squares[i][0] - dirty_squares[j][0])
+ abs(dirty_squares[i][1] - dirty_squares[j][1]))
# distances[i][j] is the cost to move from dirty square i to
# dirty square j.
distances = []
for i in range(n):
distances.append([compute_distance(i, j) for j in range(n)])
# The x, y coordinates of where the robot starts.
start_node = (0, 0)
# first_move_distances[i] is the cost to move from the robot's
# start location to dirty square i.
first_move_distances = [
abs(start_node[0] - dirty_squares[i][0])
+ abs(start_node[1] - dirty_squares[i][1]))
for i in range(n)]
# order is a list of the dirty squares.
def cost(order):
if not order:
return 0 # Cleaning 0 dirty squares is free.
return (first_move_distances[order[0]]
+ sum(distances[order[i]][order[i+1]]
for i in range(len(order)-1)))
Your goal is to find a way to reorder list(range(n)) that minimizes the cost.
2. How do I find the minimum number of moves to solve this problem?
As others have pointed out, the generalized form of this problem is intractable (NP-Hard). You have two pieces of information that help constrain the problem to make it tractable:
The graph is a grid.
There are at most 24 dirty squares.
I like your instinct to use A* here. It's often good for solving find-the-minimum-number-of-moves problems. However, A* requires a fair amount of code. I think you'd be better of going with a Branch-and-Bound approach (sometimes called Branch-and-Prune), which should be almost as efficient but is much easier to implement.
The idea is to start enumerating all possible solutions using a depth-first-search, like so:
# Each list represents a sequence of dirty nodes.
[]
[1]
[1, 2]
[1, 2, 3]
[1, 3]
[1, 3, 2]
[2]
[2, 1]
[2, 1, 3]
Every time you're about to recurse into a branch, check to see if that branch is more expensive than the cheapest solution found so far. If so, you can skip the whole branch.
If that's not efficient enough, add a function to calculate a lower bound on the remaining cost. Then if cost([2]) + lower_bound(set([1, 3])) is more expensive than the cheapest solution found so far, you can skip the whole branch. The tighter lower_bound() is, the more branches you can skip.

Let's say V={v|v=b or v=d}, and get a full connected graph G(V,E). You could calculate the cost of each edge in E with a time complexity of O(n^2). Afterwards the problem becomes exactly the same as: Start at a specified vertex, and find a shortest path of G which covers V.
We call this Traveling Salesman Problem(TSP) since 1832.

The problem can certainly be stored as a graph. The cost between nodes (dirty cells) is their Manhattan distance. Ignore the cost of cleaning cells, because that total cost will be the same no matter what path taken.

This problem looks to me like the Minimum Rectilinear Steiner Tree problem. Unfortunately, the problem is NP hard, so you'll need to come up with an approximation (a Minimum Spanning Tree based on Manhattan distance), if I am correct.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.