I have 4x4 table of letters and I want to find all possible paths there. They are candidates for being words. I have problems with the variable "used" It is a list that includes all the places where the path has been already so it doesn't go there again. There should be one used-list for every path. But it doesn't work correctly. For example I had a test print that printed the current word and the used-list. Sometimes the word had only one letter, but path had gone through all 16 cells/indices.
The for-loop of size 8 is there for all possible directions. And main-function executes the chase-function 16 times - once for every possible starting point. Move function returns the indice after moving to a specific direction. And is_allowed tests for whether it is allowed to move to a certain division.
sample input: oakaoastsniuttot. (4x4 table, where first 4 letters are first row etc.)
sample output: all the real words that can be found in dictionary of some word
In my case it might output one or two words but not nearly all, because it thinks some cells are used eventhough they are not.
def chase(current_place, used, word):
used.append(current_place) #used === list of indices that have been used
word += letter_list[current_place]
if len(word)>=11:
return 0
for i in range(3,9):
if len(word) == i and word in right_list[i-3]: #right_list === list of all words
print word
break
for i in range(8):
if is_allowed(current_place, i) and (move(current_place, i) not in used):
chase(move(current_place, i), used, word)
The problem is that there's only one used list that gets passed around. You have two options for fixing this in chase():
Make a copy of used and work with that copy.
Before you return from the function, undo the append() that was done at the start.
Related
I know this is not the place to ask for homework answers, but I just want some direction since I'm completely lost. I started a Python course at my college where the professor assumes no one has experience in the language. This is our first assignment:
Build on this code to create a subsequent loop that starts at W and takes a random walk of a particular maximum length specified as a variable. If there are multiple possible next states, choose one uniformly at random. If there are no next states, then you should stop the loop, even if you haven't reached your maximum length.
I understand that this code my professor provided is walking through each character of the string s, but I don't know how it's actually working. I don't get what c in enumerate(s) or next[c]=[] are doing. Any help explaining how this works or how to handle characters and strings in Python would be greatly appreciated. I've been coding in other languages for a few years and I have no idea how to even start this assignment.
next = {} # This will hold the directed graph
s = "Welcome to cs 477"
# This loops through all of the characters in s
# and keeps track of their indices
for i, c in enumerate(s):
if not c in next:
# If this is the first time seeing a particular
# character, we need to make a new key/value pair for it
next[c] = []
if i < len(s)-1: # If there is a character after this
# Record that s[i+1] is one of the following characters
next[c].append(s[i+1])
enumerate(s) allows you to iterate over a list getting both the ith element in the list and the position it's in, as detailed here. In your case, the variable i holds the number of the current element starting from 0 and c the current character. If you were to print i and c inside your for loop, the result would look like this:
for i, c in enumerate(s):
print(i, c)
# 0, W
# 1, e
# 2, l
# ...
Regarding next[c] = [], you need to first understand that next is a dictionary, which is basically a hash set. What next[c] = [] is doing, is adding the character c to the dictionary as the key, and it's value as an empty list.
Can we say that a word with 2 characters are palindrome? like "oo" is palindrome and "go" is not?
I am going through a program which is detecting a palindrome from GeeksForGeeks, but it detects go as palindrome as well, though it is not:
# Function to check whether the
# string is plaindrome or not def palindrome(a):
# finding the mid, start
# and last index of the string
mid = (len(a)-1)//2
start = 0
last = len(a)-1
flag = 0
# A loop till the mid of the
# string
while(start<mid):
# comparing letters from right
# from the letters from left
if (a[start]== a[last]):
start += 1
last -= 1
else:
flag = 1
break;
# Checking the flag variable to
# check if the string is palindrome
# or not
if flag == 0:
print("The entered string is palindrome")
else:
print("The entered string is not palindrome")
# ... other code ...
# Driver code
string = 'amaama'
palindrome(string)
Is there any particular length or condition defined for a word to be a palindrome? I read the Wikipedia article, but did not find any particular condition on the length of a palindrome.
The above program detects "go" as palindrome because the midpoint is 0, which is "g" and the starting point is 0, which is also "g", and so it determines it is a palindrome. But I am still confused about the number of characters. Can a 2 number word be a palindrome? If yes, then do we need to just add a specific condition for it: if word[0] == word[1]?
Let's take a look at the definition of palindrome, according to Merriam-Webster:
a word, verse, or sentence (such as "Able was I ere I saw Elba") or a number (such as 1881) that reads the same backward or forward
Therefore, two-character words (or any even-numbered character words) can also be palindromes. The example code is simply poorly written and does not work correctly in the case of two-character strings. As you have correctly deduced, it sets the mid variable to 0 if the length of the string is 2. The loop, while (start < mid), is then instantly skipped, as start is also initialised as 0. Therefore, the flag variable (initialised as 0, corresponding to 'is a palindrome') is never changed, so the function incorrectly prints that go is a palindrome.
There are a number of ways in which you can adapt the algorithm; the simplest of which would be to simply check up to and including the middle character index, by changing the while condition to start <= mid. Note that this is only the simplest way to adapt the given code, the simplest piece of Python code to check whether a string is palindromic is significantly simpler (as you can easily reverse a string using a[::-1], and compare this to the original string).
(Edit to add: the other answer by trincot actually shows that the provided algorithm is incorrect for all even-numbered character words. The fix suggested in this answer still works.)
Your question is justified. The code from GeeksForGeeks you have referenced is not giving the correct result. In fact it also produces wrong results for longer words, like "gang".
The above program detects "go" as palindrome because the midpoint is 0, which is "g" and the starting point is 0, which is also "g", and so it determines it is a palindrome.
This is indeed where the algorithm goes wrong.
...then do we need to just add a specific condition for it: if word[0] == word[1]?
Given the while condition is start<mid, the midpoint should be the first index after the first half of the string that must be verified, and so in the case of a 2-letter word, the midpoint should be 1, not 0.
It is easy to correct the error in the program. Change:
mid = (len(a)-1)//2
To:
mid = len(a)//2
That fixes the issue. No extra line of code is needed to treat this as a separate case.
I did not find any particular condition on the length of a palindrome.
And right you are: there is no such condition. The GeeksForGeeks code made you doubt, but you were right from the start, and the code was wrong.
I'm having a little fun with python3 by trying to find words in a word search. I know I could easily do this with loops however, I don't know recursion too well and I really want to know how to do it this way.
I began by creating a 2-D list of the rows in the word search and calling that list "square". I created another list of the individual words I am looking for called "word" (for the sake of simplicity, let's pretend there is only one word).
I am going to use recursive functions for each direction a word can go and run the word in each function, returning True if it is found, and False if it is not.
This is the first function:
def down(word, square):
if (len(word)==0):
return True
elif (len(square)==0):
print(square)
return False
else:
if word[:1]==square[0][:1]:
return down(word[1:], square[1:])
elif (word[:1]!=square[0][:1]):
print(square)
return down(word, square[1:][1:])
else:
return False
This function will try to find the first letter of the word in the 2-D list and then check that same position where the first letter is found in each subsequent line of the square to see if the rest of the word is found.
I cannot get the function to go past the first letter of each 1-D list within the overall 2-D list and any assist would be greatly appreciated.
Thanks!
I am trying to solve the reverse Boggle problem. Simply put, given a list of words, come up with a 4x4 grid of letters in which as many words in the list can be found in sequences of adjacent letters (letters are adjacent both orthogonally and diagonally).
I DO NOT want to take a known board and solve it. That is an easy TRIE problem and has been discussed/solved to death here for people's CS projects.
Example word list:
margays, jaguars, cougars, tomcats, margay, jaguar, cougar, pumas, puma, toms
Solution:
ATJY
CTSA
OMGS
PUAR
This problem is HARD (for me). Algorithm I have so far:
For each word in the input, make a list of all possible ways it can legally be appear on the board by itself.
Try all possible combinations of placing word #2 on those boards and keep the ones that have no conflicts.
Repeat till end of list.
...
Profit!!! (for those that read /.)
Obviously, there are implementation details. Start with the longest word first. Ignore words that are substrings of other words.
I can generate all 68k possible boards for a 7 character word in around 0.4 seconds. Then when I add an additional 7 character board, I need to compare 68k x 68k boards x 7 comparisons. Solve time becomes glacial.
There must be a better way to do this!!!!
Some code:
BOARD_SIDE_LENGTH = 4
class Board:
def __init__(self):
pass
def setup(self, word, start_position):
self.word = word
self.indexSequence = [start_position,]
self.letters_left_over = word[1:]
self.overlay = []
# set up template for overlay. When we compare boards, we will add to this if the board fits
for i in range(BOARD_SIDE_LENGTH*BOARD_SIDE_LENGTH):
self.overlay.append('')
self.overlay[start_position] = word[0]
self.overlay_count = 0
#classmethod
def copy(boardClass, board):
newBoard = boardClass()
newBoard.word = board.word
newBoard.indexSequence = board.indexSequence[:]
newBoard.letters_left_over = board.letters_left_over
newBoard.overlay = board.overlay[:]
newBoard.overlay_count = board.overlay_count
return newBoard
# need to check if otherboard will fit into existing board (allowed to use blank spaces!)
# otherBoard will always be just a single word
#classmethod
def testOverlay(self, this_board, otherBoard):
for pos in otherBoard.indexSequence:
this_board_letter = this_board.overlay[pos]
other_board_letter = otherBoard.overlay[pos]
if this_board_letter == '' or other_board_letter == '':
continue
elif this_board_letter == other_board_letter:
continue
else:
return False
return True
#classmethod
def doOverlay(self, this_board, otherBoard):
# otherBoard will always be just a single word
for pos in otherBoard.indexSequence:
this_board.overlay[pos] = otherBoard.overlay[pos]
this_board.overlay_count = this_board.overlay_count + 1
#classmethod
def newFromBoard(boardClass, board, next_position):
newBoard = boardClass()
newBoard.indexSequence = board.indexSequence + [next_position]
newBoard.word = board.word
newBoard.overlay = board.overlay[:]
newBoard.overlay[next_position] = board.letters_left_over[0]
newBoard.letters_left_over = board.letters_left_over[1:]
newBoard.overlay_count = board.overlay_count
return newBoard
def getValidCoordinates(self, board, position):
row = position / 4
column = position % 4
for r in range(row - 1, row + 2):
for c in range(column - 1, column + 2):
if r >= 0 and r < BOARD_SIDE_LENGTH and c >= 0 and c < BOARD_SIDE_LENGTH:
if (r*BOARD_SIDE_LENGTH+c not in board.indexSequence):
yield r, c
class boardgen:
def __init__(self):
self.boards = []
def createAll(self, board):
# get the next letter
if len(board.letters_left_over) == 0:
self.boards.append(board)
return
next_letter = board.letters_left_over[0]
last_position = board.indexSequence[-1]
for row, column in board.getValidCoordinates(board, last_position):
new_board = Board.newFromBoard(board, row*BOARD_SIDE_LENGTH+column)
self.createAll(new_board)
And use it like this:
words = ['margays', 'jaguars', 'cougars', 'tomcats', 'margay', 'jaguar', 'cougar', 'pumas', 'puma']
words.sort(key=len)
first_word = words.pop()
# generate all boards for the first word
overlaid_boards = []
for i in range(BOARD_SIDE_LENGTH*BOARD_SIDE_LENGTH):
test_board = Board()
test_board.setup(first_word, i)
generator = boardgen()
generator.createAll(test_board)
overlaid_boards += generator.boards
This is an interesting problem. I can't quite come up with a full, optimized solution, but there here are some ideas you might try.
The hard part is the requirement to find the optimal subset if you can't fit all the words in. That's going to add a lot to the complexity. Start by eliminating word combinations that obviously aren't going to work. Cut any words with >16 letters. Count the number of unique letters needed. Be sure to take into account letters repeated in the same word. For example, if the list includes "eagle" I don't think you are allowed to use the same 'e' for both ends of the word. If your list of needed letters is >16, you have to drop some words. Figuring out which ones to cut first is an interesting sub-problem... I'd start with the words containing the least used letters. It might help to have all sub-lists sorted by score.
Then you can do the trivial cases where the total of word lengths is <16. After that, you start with the full list of words and see if there's a solution for that. If not, figure out which word(s) to drop and try again.
Given a word list then, the core algorithm is to find a grid (if one exists) that contains
all of those words.
The dumb brute-force way would be to iterate over all the grids possible with the letters you need, and test each one to see if all your words fit. It's pretty harsh though: middle case is 16! = 2x10exp13 boards. Exact formula for n unique letters is... (16!)/(16-n)! x pow(n, 16-n). Which gives a worst case in the range of 3x10exp16. Not very manageable.
Even if you can avoid rotations and flips, that only saves you 1/16 of the search space.
A somewhat smarter greedy algorithm would be to sort the words by some criteria, like difficulty or length. A recursive solution would be to take the top word remaining on the list, and attempt to place it on the grid. Then recurse with that grid and the remaining word list. If you fill up the grid before you run out of words, then you have to back track and try another way of placing the word. A greedier approach would be to try placements that re-use the most letters first.
You can do some pruning. If at any point the number of spaces left in the grid is less than the remaining set of unique letters needed, then you can eliminate those sub-trees. There are a few other cases where it's obvious there's no solution that can be cut, especially when the remaining grid spaces are < the length of the last word.
The search space for this depends on word lengths and how many letters are re-used. I'm sure it's better than brute-force, but I don't know if it's enough to make the problem reasonable.
The smart way would be to use some form of dynamic programming. I can't quite see the complete algorithm for this. One idea is to have a tree or graph of the letters, connecting each letter to "adjacent" letters in the word list. Then you start with the most-connected letter and try to map the tree onto the grid. Always place the letter that completes the most of the word list. There'd have to be some way of handling the case of multiple of the same letter in the grid. And I'm not sure how to order it so you don't have to search every combination.
The best thing would be to have a dynamic algorithm that also included all the sub word lists. So if the list had "fog" and "fox", and fox doesn't fit but fog does, it would be able to handle that without having to run the whole thing on both versions of the list. That's adding complexity because now you have to rank each solution by score as you go. But in the cases where all the words won't fit it would save a lot of time.
Good luck on this.
There are a couple of general ideas for speeding up backtrack search you could try:
1) Early checks. It usually helps to discard partial solutions that can never work as early as possible, even at the cost of more work. Consider all two-character sequences produced by chopping up the words you are trying to fit in - e.g. PUMAS contributes PU, UM, MA, and AS. These must all be present in the final answer. If a partial solution does not have enough overlapped two-character spaces free to contain all of the overlapped two-character sequences it does not yet have, then it cannot be extended to a final answer.
2) Symmetries. I think this is probably most useful if you are trying to prove that there is no solution. Given one way of filling in a board, you can rotate and reflect that solution to find other ways of filling in a board. If you have 68K starting points and one starting point is a rotation or reflection of another starting point, you don't need to try both, because if you can (or could) solve the problem from one starting point you can get the answer from the other starting point by rotating or reflecting the board. So you might be able to divide the number of starting points you need to try by some integer.
This problem is not the only one to have a large number of alternatives at each stage. This also affects the traveling salesman problem. If you can accept not having a guarantee that you will find the absolute best answer, you could try not following up the least promising of these 68k choices. You need some sort of score to decide which to keep - you might wish to keep those which use as many letters already in place as possible. Some programs for the traveling salesman problems discard unpromising links between nodes very early. A more general approach which discards partial solutions rather than doing a full depth first search or branch and bound is Limited Discrepancy Search - see for example http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.2426.
Of course some approaches to the TSP discard tree search completely in favor of some sort of hill-climbing approach. You might start off with a filled boggle square and repeatedly attempt to find your words in it, modifying a few characters in order to force them in, trying to find steps which successively increase the number of words that can be found in the square. The easiest form of hill-climbing is repeated simple hill-climbing from multiple random starts. Another approach is to restart the hill-climbing by randomizing only a portion of the solution so far - since you don't know the best size of portion to randomize you might decide to chose the size of portion to randomize at random, so that at least some fraction of the time you are randomizing the correct size of region to produce a new square to start from. Genetic algorithms and simulated annealing are very popular here. A paper on a new idea, Late Acceptance Hill-Climbing, also describes some of its competitors - http://www.cs.nott.ac.uk/~yxb/LAHC/LAHC-TR.pdf
Hey, I'm trying to decode a multilevel Caesar cipher. By that I mean a string of letters could have been shifted several times, so if I say apply_shifts[(2,3),(4,5)], that means I shift everything from the 2nd letter by 3 followed by everything from the 4th letter by 5. Here's my code so far.
def find_best_shifts_rec(wordlist, text, start):
"""
Given a scrambled string and a starting position from which
to decode, returns a shift key that will decode the text to
words in wordlist, or None if there is no such key.
Hint: You will find this function much easier to implement
if you use recursion.
wordlist: list of words
text: scambled text to try to find the words for
start: where to start looking at shifts
returns: list of tuples. each tuple is (position in text, amount of shift)
"""
for shift in range(27):
text=apply_shifts(text, [(start,-shift)])
#first word is text.split()[0]
#test if first word is valid. if not, go to next shift
if is_word(wordlist,text.split()[0])==False:
continue
#enter the while loop if word is valid, otherwise never enter and go to the next shift
i=0
next_index=0
shifts={}
while is_word(wordlist,text.split()[i])==True:
next_index+= len(text.split()[i])
i=i+1
#once a word isn't valid, then try again, starting from the new index.
if is_word(wordlist,text.split()[i])==False:
shifts[next_index]=i
find_best_shifts_rec(wordlist, text, next_index)
return shifts
My problems are
1) my code isn't running properly and I don't understand why it is messing up (it's not entering my while loop)
and
2) I don't know how to test whether none of my "final shifts" (e.g. the last part of my string) are valid words and I also don't know how to go from there to the very beginning of my loop again.
Help would be much appreciated.
I think the problem is that you always work on the whole text, but apply the (new) shifting at some start inside of the text. So your check is_word(wordlist,text.split()[0]) will always check the first word, which is - of course - a word after your first shift.
What you need to do instead is to get the first word after your new starting point, so check the actually unhandled parts of the text.
edit
Another problem I noticed is the way you are trying out to find the correct shift:
for shift in range(27):
text=apply_shifts(text, [(start,-shift)])
So you basically want to try all shifts from 0 to 26 until the first word is accepted. It is okay to do it like that, but note that after the first tried shifting, the text has changed. As such you are not shifting it by 1, 2, 3, ... but by 1, 3, 6, 10, ... which is of course not what you want, and you will of course miss some shifts while doing some identical ones multiple times.
So you need to temporarily shift your text and check the status of that temporary text, before you continue to work with the text. Or alternatively, you always shift by 1 instead.
edit²
And another problem I noticed is with the way you are trying to use recursion to get your final result. Usually recursion (with a result) works the way that you keep calling the function itself and pass the return values along, or collect the results. In your case, as you want to have multiple values, and not just a single value from somewhere inside, you need to collect each of the shifting results.
But right now, you are throwing away the return values of the recursive calls and just return the last value. So store all the values and make sure you don't lose them.
Pseudo-code for recursive function:
coded_text = text from start-index to end of string
if length of coded_text is 0, return "valid solution (no shifts)"
for shift in possible_shifts:
decoded_text = apply shift of (-shift) to coded_text
first_word = split decoded_text and take first piece
if first_word is a valid word:
rest_of_solution = recurse on (text preceding start-index)+decoded_text, starting at start+(length of first_word)+1
if rest_of_solution is a valid solution
if shift is 0
return rest_of_solution
else
return (start, -shift mod alphabet_size) + rest_of_solution
# no valid solution found
return "not a valid solution"
Note that this is guaranteed to give an answer composed of valid words - not necessarily the original string. One specific example: 'a add hat' can be decoded in place of 'a look at'.