def solver(word):
#trackerCount = len(word
convertedKey = sum(bytearray(word,'utf8'))
if(len(word)>=MIN_WORD_LENGTH):
countdownLetters = wordmap.get(convertedKey)
if(convertedKey in wordmap):
for str in countdownLetters:
if sorted(word)==sorted(str) and str not in result:
result.update({str:len(str)})
if(len(word)==9 and len(result)>0):
return result
tempList = list(word)
for i in range(len(tempList)):
charTmp = tempList.pop()
wordStr =''.join(tempList)
tempList.insert(0,charTmp)
solver(wordStr)
return result
I am writing a countdown letters solver using a recursive function. I want to completely STOP calling recursive
function when i find the longest letters. For example, say I passed a word "education" to a solver function.
Let us assume, we don't have any anagrams of education, now I would like to check if there are any words of length (8).
If there are words of length 8, I want to return out of the function but if there are no 8 letters words I want to
check of 7 and so on. Im only interested to find the longest words. Off course If I have more than one word of same (length)
as the longest word, I want to get them all. The above loop finds all the words from max length(9) to min length(5).
Explanation of the above code:
Basically, If I have no 9 letters(max) words then I am popping the last element, created a tempWord (wordStr) inserted charTemp(last element of the list), and called solver function with a letter removed. The above loop finds all the words from max length(9) to min length(4).
The output of above code is here. http://postimg.org/image/pgfixbglv/ . please have a look, it might make more sense. In the image, you can see 9 letters words. I want my recursive function to return at that point, but if there are no 9 letters, I wanna look for 8, and again no 8 letters words found, move on to 7 and so on.At the moment to prevent StackOverflow exception I have specified the min word length. i.e 5.
The problem here is in your termination clause;
if(len(word)==9 and len(result)>0):
return result
This very explicitly says that you stop only when you've found a result from a 9-letter word. Unless len(word) is 9, you drop to the recursion code.
As the old gag goes, if that's not what you want, then don't do that. Just check that you have some result; if so, return it. I can't tell if this, alone, is sufficient, since you haven't shown how you manage collecting all the results for the various runs through the lower loop (since you need them all), nor how you drop each of the 9 letters in turn, leaving the other 8.
Related
I'm working on a project which extracts keywords from customer reviews. I somehow managed to extract the keywords using a topic modelling technique.
Now I'm looking for a technique or algorithm in python to rank the reviews based on similarity between the keyword.
for example:
for the keyword 'delicious food' I would like to get the similarity score for reviews as below.
review
score
this is place is costly but their food is delicious
0.7
I would not recommend this place for hangout.
0.0
This is place is very clean and friendly, perhaps, food is not so great!
0.2
How can I get the semantic similarity score between a keyword and sentence?
I have a method for doing this, but its complex, so I'll just show it and go over it after. Here it is:
sentences = ["this is place is costly but their food is delicious", "This is place is very clean and friendly, perhaps, food is not so great!", "I would not recommend this place for hangout."]
search = "food delicious"
count = 0
lst = []
for sentence in sentences:
if search in sentence:
lst.append([sentence, 1])
else:
for word in search.split():
if word in sentence:
count += 1
lst.append([sentence, max(round(count / len(search.split()) - 0.3, 1), 0)])
count = 0
for i in lst:
print(*i)
This will give your desired outputs.
Basically, the first line puts the reviews into an array. The second line creates a variable called search which contains the keyphrase.
Now, after that we need to create 2 variables called count and count and lst. Lst will be the list we use to store our information, and count is a counter we will need later.
In line 7, we start a for loops, which will loop through the sentences one by one.
In line 8, we check if the exact key phrase is in the sentence, so if "food delicious" comes up somewhere in the sentence. If it does, then we add the sentence, and its PMI score of 1 to the list we created earlier.
Note: (The table does not specify that this is needed, so if it is not, then you can just remove it!)
So, next, we use else: to show that, if the direct key phrase is not in the sentence, then we need to do something else to get the PMI score. If we didn't have this else: then it could lead to duplications later on.
In line 11, it starts another for loop, but this time, it will iterate through every word in search.split(). search.split() just produces a list of search words, separating them by spaces. For example, here, the search.split() would be ["food", "delicious"]. So now, we are iterating through that list.
Now, in line 12, we check to see if the current word we are looping through is in the current sentence we are looping through, if that makes sense. If the word is, then that variable we created earlier on called count will be increased by the amount of times that word come in the sentence, or the count of that word. count will be incremented for each word.
Note: This means that if one word, e.g. food, came up twenty times, the computer would still act as if it only came up once.. To avoid this, you can change count += 1 to count += sentence.count(word), which would count every single occurrence of the word in the sentence.
Now, after the search.split() for loop has ended, we need to add our count to the list. Here comes some mathy stuff. Firstly, we divide the count by search.split(), to get a decimal percentage (less than 1) of how many words from the search variable come up in the sentence. However, this raises a problem. If 2 words came up, and there were 2 words in the search variable, then we would be doing 2/2, which is 1. We don't want 1, we want 0.7. Therefore, we also need to subtract 0.3 from our number. I rounded this value because it can end up getting pretty messy in division.
Now, we still have one last problem in the lst.append() row. If we had 0 words coming up in the sentence, but 2 words in the search variable, then we would be doing 0/2 which is 0. That's what we want, but then, we subtract 0.3. which gives us - 0.3/ To avoid this, we can set the max() to 0.
Finally, right after, we reset the count to 0, so that the next sentence can start with a fresh count, to avoid any statistical errors.
That's all! To print it, I just used a small for loop at the end, but you don't need it.
These are my results:
this is place is costly but their food is delicious 0.7
This is place is very clean and friendly, perhaps, food is not so great! 0.2
I would not recommend this place for hangout. 0
P.S: (The *i in the print() on the last line just removes the brackets and commas from the printed value. It does not change the list itself in any way.)
I know that this was long, but it is important to read everything to understand the point of each line.
I'm having a little fun with python3 by trying to find words in a word search. I know I could easily do this with loops however, I don't know recursion too well and I really want to know how to do it this way.
I began by creating a 2-D list of the rows in the word search and calling that list "square". I created another list of the individual words I am looking for called "word" (for the sake of simplicity, let's pretend there is only one word).
I am going to use recursive functions for each direction a word can go and run the word in each function, returning True if it is found, and False if it is not.
This is the first function:
def down(word, square):
if (len(word)==0):
return True
elif (len(square)==0):
print(square)
return False
else:
if word[:1]==square[0][:1]:
return down(word[1:], square[1:])
elif (word[:1]!=square[0][:1]):
print(square)
return down(word, square[1:][1:])
else:
return False
This function will try to find the first letter of the word in the 2-D list and then check that same position where the first letter is found in each subsequent line of the square to see if the rest of the word is found.
I cannot get the function to go past the first letter of each 1-D list within the overall 2-D list and any assist would be greatly appreciated.
Thanks!
I have 4x4 table of letters and I want to find all possible paths there. They are candidates for being words. I have problems with the variable "used" It is a list that includes all the places where the path has been already so it doesn't go there again. There should be one used-list for every path. But it doesn't work correctly. For example I had a test print that printed the current word and the used-list. Sometimes the word had only one letter, but path had gone through all 16 cells/indices.
The for-loop of size 8 is there for all possible directions. And main-function executes the chase-function 16 times - once for every possible starting point. Move function returns the indice after moving to a specific direction. And is_allowed tests for whether it is allowed to move to a certain division.
sample input: oakaoastsniuttot. (4x4 table, where first 4 letters are first row etc.)
sample output: all the real words that can be found in dictionary of some word
In my case it might output one or two words but not nearly all, because it thinks some cells are used eventhough they are not.
def chase(current_place, used, word):
used.append(current_place) #used === list of indices that have been used
word += letter_list[current_place]
if len(word)>=11:
return 0
for i in range(3,9):
if len(word) == i and word in right_list[i-3]: #right_list === list of all words
print word
break
for i in range(8):
if is_allowed(current_place, i) and (move(current_place, i) not in used):
chase(move(current_place, i), used, word)
The problem is that there's only one used list that gets passed around. You have two options for fixing this in chase():
Make a copy of used and work with that copy.
Before you return from the function, undo the append() that was done at the start.
This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I wonder, if you open a text file in Python. And then you'd like to search of words containing a number of letters.
Say you type in 6 different letters (a,b,c,d,e,f) you want to search.
You'd like to find words matching at least 3 letters.
Each letter can only appear once in a word.
And the letter 'a' always has to be containing.
How should the code look like for this specific kind of search?
Let's see...
return [x for x in document.split()
if 'a' in x and sum((1 if y in 'abcdef' else 0 for y in x)) >= 3]
split with no parameters acts as a "words" function, splitting on any whitespace and removing words that contain no characters. Then you check if the letter 'a' is in the word. If 'a' is in the word, you use a generator expression that goes over every letter in the word. If the letter is inside of the string of available letters, then it returns a 1 which contributes to the sum. Otherwise, it returns 0. Then if the sum is 3 or greater, it keeps it. A generator is used instead of a list comprehension because sum will accept anything iterable and it stops a temporary list from having to be created (less memory overhead).
It doesn't have the best access times because of the use of in (which on a string should have an O(n) time), but that generally isn't a very big problem unless the data sets are huge. You can optimize that a bit to pack the string into a set and the constant 'abcdef' can easily be a set. I just didn't want to ruin the nice one liner.
EDIT: Oh, and to improve time on the if portion (which is where the inefficiencies are), you could separate it out into a function that iterates over the string once and returns True if the conditions are met. I would have done this, but it ruined my one liner.
EDIT 2: I didn't see the "must have 3 different characters" part. You can't do that in a one liner. You can just take the if portion out into a function.
def is_valid(word, chars):
count = 0
for x in word:
if x in chars:
count += 1
chars.remove(x)
return count >= 3 and 'a' not in chars
def parse_document(document):
return [x for x in document.split() if is_valid(x, set('abcdef'))]
This one shouldn't have any performance problems on real world data sets.
Here is what I would do if I had to write this:
I'd have a function that, given a word, would check whether it satisfies the criteria and would return a boolean flag.
Then I'd have some code that would iterate over all words in the file, present each of them to the function, and print out those for which the function has returned True.
I agree with aix's general plan, but it's perhaps even more general than a 'design pattern,' and I'm not sure how far it gets you, since it boils down to, "figure out a way to check for what you want to find and then check everything you need to check."
Advice about how to find what you want to find: You've entered into one of the most fundamental areas of algorithm research. Though LCS (longest common substring) is better covered, you'll have no problems finding good examples for containment either. The most rigorous discussion of this topic I've seen is on a Google cs wonk's website: http://neil.fraser.name. He has something called diff-match-patch which is released and optimized in many different languages, including python, which can be downloaded here:
http://code.google.com/p/google-diff-match-patch/
If you'd like to understand more about python and algorithms, magnus hetland has written a great book about python algorithms and his website features some examples within string matching and fuzzy string matching and so on, including the levenshtein distance in a very simple to grasp format. (google for magnus hetland, I don't remember address).
WIthin the standard library you can look at difflib, which offers many ways to assess similarity of strings. You are looking for containment which is not the same but it is quite related and you could potentially make a set of candidate words that you could compare, depending on your needs.
Alternatively you could use the new addition to python, Counter, and reconstruct the words you're testing as lists of strings, then make a function that requires counts of 1 or more for each of your tested letters.
Finally, on to the second part of the aix's approach, 'then apply it to everything you want to test,' I'd suggest you look at itertools. If you have any kind of efficiency constraint, you will want to use generators and a test like the one aix proposes can be most efficiently carried out in python with itertools.ifilter. You have your function that returns True for the values you want to keep, and the builtin function bool. So you can just do itertools.ifilter(bool,test_iterable), which will return all the values that succeed.
Good luck
words = 'fubar cadre obsequious xray'
def find_words(src, required=[], letters=[], min_match=3):
required = set(required)
letters = set(letters)
words = ((word, set(word)) for word in src.split())
words = (word for word in words if word[1].issuperset(required))
words = (word for word in words if len(word[1].intersection(letters)) >= min_match)
words = (word[0] for word in words)
return words
w = find_words(words, required=['a'], letters=['a', 'b', 'c', 'd', 'e', 'f'])
print list(w)
EDIT 1: I too didn't read the requirements closely enough. To ensure a word contains only 1 instance of a valid letter.
from collections import Counter
def valid(word, letters, min_match):
"""At least min_match, no more than one of any letter"""
c = 0
count = Counter(word)
for letter in letters:
char_count = count.get(letter, 0)
if char_count > 1:
return False
elif char_count == 1:
c += 1
if c == min_match:
return True
return True
def find_words(srcfile, required=[], letters=[], min_match=3):
required = set(required)
words = (word for word in srcfile.split())
words = (word for word in words if set(word).issuperset(required))
words = (word for word in words if valid(word, letters, min_match))
return words
Hey, I'm trying to decode a multilevel Caesar cipher. By that I mean a string of letters could have been shifted several times, so if I say apply_shifts[(2,3),(4,5)], that means I shift everything from the 2nd letter by 3 followed by everything from the 4th letter by 5. Here's my code so far.
def find_best_shifts_rec(wordlist, text, start):
"""
Given a scrambled string and a starting position from which
to decode, returns a shift key that will decode the text to
words in wordlist, or None if there is no such key.
Hint: You will find this function much easier to implement
if you use recursion.
wordlist: list of words
text: scambled text to try to find the words for
start: where to start looking at shifts
returns: list of tuples. each tuple is (position in text, amount of shift)
"""
for shift in range(27):
text=apply_shifts(text, [(start,-shift)])
#first word is text.split()[0]
#test if first word is valid. if not, go to next shift
if is_word(wordlist,text.split()[0])==False:
continue
#enter the while loop if word is valid, otherwise never enter and go to the next shift
i=0
next_index=0
shifts={}
while is_word(wordlist,text.split()[i])==True:
next_index+= len(text.split()[i])
i=i+1
#once a word isn't valid, then try again, starting from the new index.
if is_word(wordlist,text.split()[i])==False:
shifts[next_index]=i
find_best_shifts_rec(wordlist, text, next_index)
return shifts
My problems are
1) my code isn't running properly and I don't understand why it is messing up (it's not entering my while loop)
and
2) I don't know how to test whether none of my "final shifts" (e.g. the last part of my string) are valid words and I also don't know how to go from there to the very beginning of my loop again.
Help would be much appreciated.
I think the problem is that you always work on the whole text, but apply the (new) shifting at some start inside of the text. So your check is_word(wordlist,text.split()[0]) will always check the first word, which is - of course - a word after your first shift.
What you need to do instead is to get the first word after your new starting point, so check the actually unhandled parts of the text.
edit
Another problem I noticed is the way you are trying out to find the correct shift:
for shift in range(27):
text=apply_shifts(text, [(start,-shift)])
So you basically want to try all shifts from 0 to 26 until the first word is accepted. It is okay to do it like that, but note that after the first tried shifting, the text has changed. As such you are not shifting it by 1, 2, 3, ... but by 1, 3, 6, 10, ... which is of course not what you want, and you will of course miss some shifts while doing some identical ones multiple times.
So you need to temporarily shift your text and check the status of that temporary text, before you continue to work with the text. Or alternatively, you always shift by 1 instead.
edit²
And another problem I noticed is with the way you are trying to use recursion to get your final result. Usually recursion (with a result) works the way that you keep calling the function itself and pass the return values along, or collect the results. In your case, as you want to have multiple values, and not just a single value from somewhere inside, you need to collect each of the shifting results.
But right now, you are throwing away the return values of the recursive calls and just return the last value. So store all the values and make sure you don't lose them.
Pseudo-code for recursive function:
coded_text = text from start-index to end of string
if length of coded_text is 0, return "valid solution (no shifts)"
for shift in possible_shifts:
decoded_text = apply shift of (-shift) to coded_text
first_word = split decoded_text and take first piece
if first_word is a valid word:
rest_of_solution = recurse on (text preceding start-index)+decoded_text, starting at start+(length of first_word)+1
if rest_of_solution is a valid solution
if shift is 0
return rest_of_solution
else
return (start, -shift mod alphabet_size) + rest_of_solution
# no valid solution found
return "not a valid solution"
Note that this is guaranteed to give an answer composed of valid words - not necessarily the original string. One specific example: 'a add hat' can be decoded in place of 'a look at'.