Search in matrix - python

Search in matrix - python - python

I need to write a function that will search for words in a matrix. For the moment i'm trying to search line by line to see if the word is there. This is my code:
def search(p):
w=[]
for i in p:
w.append(i)
s=read_wordsearch() #This is my matrix full of letters
for line in s:
l=[]
for letter in line:
l.append(letter)
if w==l:
return True
else:
pass
This code works only if my word begins in the first position of a line.
For example I have this matrix:
[[a,f,l,y],[h,e,r,e],[b,n,o,i]]
I want to find the word "fly" but can't because my code only works to find words like "here" or "her" because they begin in the first position of a line...
Any form of help, hint, advice would be appreciated. (and sorry if my english is bad...)

You can convert each line in the matrix to a string and try to find the search work in it.
def search(p):
s=read_wordsearch()
for line in s:
if p in ''.join(line):
return True

I'll give you a tip to search within a text for a word. I think you will be able to extrapolate to your data matrix.
s = "xxxxxxxxxhiddenxxxxxxxxxxx"
target = "hidden"
for i in xrange(len(s)-len(target)):
if s[i:i+len(target)] == target:
print "Found it at index",i
break
If you want to search for words of all length, if perhaps you had a list of possible solutions:
s = "xxxxxxxxxhiddenxxxtreasurexxxxxxxx"
targets = ["hidden","treasure"]
for i in xrange(len(s)-1):
for j in xrange(i+1,len(s)):
if s[i:j] in targets:
print "Found",s[i:j],"at index",

def search(p):
w = ''.join(p)
s=read_wordsearch() #This is my matrix full of letters
for line in s:
word = ''.join(line)
if word.find(w) >= 0:
return True
return False
Edit: there is already lot of string functions available in Python. You just need to use strings to be able to use them.

join the characters in the inner lists to create a word and search with in.
def search(word, data):
return any(word in ''.join(characters) for characters in data)
data = [['a','f','l','y'], ['h','e','r','e'], ['b','n','o','i']]
if search('fly', data):
print('found')
data contains the matrix, characters is the name of each individual inner list. any will stop after it has found the first match (short circuit).

Related

Converting a file in list format into a dictionary with multiple conditions. (python)

Disclaimer, sorry if I have not explicitly expressed my issue. Terminology is still new to me. Thank you in advance for reading.
alright, I have a function named
def pluralize(word)
The aim is to pluralize all nouns within a file. The output I desire is: {'plural': word_in_plural, 'status' : x}
word_in_plural is the pluralized version of the input argument (word) and x is a string which can have one of the following values; 'empty_string', 'proper_noun', 'already_in_plural', 'success'.
My code so far looks like..
filepath = '/proper_noun.txt'
def pluralize(word):
proper_nouns = [line.strip() for line in open (filepath)] ### reads in file as list when function is called
dictionary = {'plural' : word_in_plural, 'status', : x} ### defined dictionary
if word == '': ### if word is an empty string, return values; 'word_in_plural = '' and x = 'empty_string'
dictionary['plural'] = ''
dictionary['status'] = 'empty_string'
return dictionary
what you can see above is my attempt at writing a condition that returns a value specified if the word is an empty string.
The next goal is to create a condition that if word is already in plural (assuming it ends with 's' 'es' 'ies' .. etc), then the function returns a dictionary with the values: **word_in_plural = word and x = 'already_in_plural'. So the input word remains untouched. eg. (input: apartments, output: apartments)
if word ### is already in plural (ending with plural), function returns a dictionary with values; word_in_plural = word and x = 'already_in_plural'
any ideas on how to read the last characters of the string to implement the rules ? I also very much doubt the logic.
Thank you for your input SOF community.

You can index the word by -1 to get its last character. You can slice a string to get the the last two [-2:] or last three [-3:] characters
last_char = word[-1]
last_three_char = word[-3:]

Remove certain characters if they are not in a specific location in a string #python

I am trying to figure out the following function situation from my python class. I've gotten the code to remove the three letters but from exactly where they don't want me to. IE removing WGU from the first line where it's supposed to stay but not from WGUJohn.
# Complete the function to remove the word WGU from the given string
# ONLY if it's not the first word and return the new string
def removeWGU(mystring):
#if mystring[0]!= ('WGU'):
#return mystring.strip('WGU')
#if mystring([0]!= 'WGU')
#return mystring.split('WGU')
# Student code goes here
# expected output: WGU Rocks
print(removeWGU('WGU Rocks'))
# expected output: Hello, John
print(removeWGU('Hello, WGUJohn'))

Check this one:
def removeWGU(mystring):
s = mystring.split()
if s[0] == "WGU":
return mystring
else:
return mystring.replace("WGU","")
print(removeWGU('WGU Rocks'))
print(removeWGU('Hello, WGUJohn'))

def removeWGU(mystring):
return mystring[0] + mystring[1:].replace("WGU","")
Other responses I seen wouldn't work on a edgy case where there is multiple "WGU" in the text and one at the beginning, such as
print(removeWGU("WGU, something else, another WGU..."))

Frequency of keywords in a list

Hi so i have 2 text files I have to read the first text file count the frequency of each word and remove duplicates and create a list of list with the word and its count in the file.
My second text file contains keywords I need to count the frequency of these keywords in the first text file and return the result without using any imports, dict, or zips.
I am stuck on how to go about this second part I have the file open and removed punctuation etc but I have no clue how to find the frequency
I played around with the idea of .find() but no luck as of yet.
Any suggestions would be appreciated this is my code at the moment seems to find the frequency of the keyword in the keyword file but not in the first text file
def calculateFrequenciesTest(aString):
listKeywords= aString
listSize = len(listKeywords)
keywordCountList = []
while listSize > 0:
targetWord = listKeywords [0]
count =0
for i in range(0,listSize):
if targetWord == listKeywords [i]:
count = count +1
wordAndCount = []
wordAndCount.append(targetWord)
wordAndCount.append(count)
keywordCountList.append(wordAndCount)
for i in range (0,count):
listKeywords.remove(targetWord)
listSize = len(listKeywords)
sortedFrequencyList = readKeywords(keywordCountList)
return keywordCountList;
EDIT- Currently toying around with the idea of reopening my first file again but this time without turning it into a list? I think my errors are somehow coming from it counting the frequency of my list of list. These are the types of results I am getting
[[['the', 66], 1], [['of', 32], 1], [['and', 27], 1], [['a', 23], 1], [['i', 23], 1]]

You can try something like:
I am taking a list of words as an example.
word_list = ['hello', 'world', 'test', 'hello']
frequency_list = {}
for word in word_list:
if word not in frequency_list:
frequency_list[word] = 1
else:
frequency_list[word] += 1
print(frequency_list)
RESULT: {'test': 1, 'world': 1, 'hello': 2}
Since, you have put a constraint on dicts, I have made use of two lists to do the same task. I am not sure how efficient it is, but it serves the purpose.
word_list = ['hello', 'world', 'test', 'hello']
frequency_list = []
frequency_word = []
for word in word_list:
if word not in frequency_word:
frequency_word.append(word)
frequency_list.append(1)
else:
ind = frequency_word.index(word)
frequency_list[ind] += 1
print(frequency_word)
print(frequency_list)
RESULT : ['hello', 'world', 'test']
[2, 1, 1]
You can change it to how you like or re-factor it as you wish

I agree with #bereal that you should use Counter for this. I see that you have said that you don't want "imports, dict, or zips", so feel free to disregard this answer. Yet, one of the major advantages of Python is its great standard library, and every time you have list available, you'll also have dict, collections.Counter and re.
From your code I'm getting the impression that you want to use the same style that you would have used with C or Java. I suggest trying to be a little more pythonic. Code written this way may look unfamiliar, and can take time getting used to. Yet, you'll learn way more.
Claryfying what you're trying to achieve would help. Are you learning Python? Are you solving this specific problem? Why can't you use any imports, dict, or zips?
So here's a proposal utilizing built in functionality (no third party) for what it's worth (tested with Python 2):
#!/usr/bin/python
import re # String matching
import collections # collections.Counter basically solves your problem
def loadwords(s):
"""Find the words in a long string.
Words are separated by whitespace. Typical signs are ignored.
"""
return (s
.replace(".", " ")
.replace(",", " ")
.replace("!", " ")
.replace("?", " ")
.lower()).split()
def loadwords_re(s):
"""Find the words in a long string.
Words are separated by whitespace. Only characters and ' are allowed in strings.
"""
return (re.sub(r"[^a-z']", " ", s.lower())
.split())
# You may want to read this from a file instead
sourcefile_words = loadwords_re("""this is a sentence. This is another sentence.
Let's write many sentences here.
Here comes another sentence.
And another one.
In English, we use plenty of "a" and "the". A whole lot, actually.
""")
# Sets are really fast for answering the question: "is this element in the set?"
# You may want to read this from a file instead
keywords = set(loadwords_re("""
of and a i the
"""))
# Count for every word in sourcefile_words, ignoring your keywords
wordcount_all = collections.Counter(sourcefile_words)
# Lookup word counts like this (Counter is a dictionary)
count_this = wordcount_all["this"] # returns 2
count_a = wordcount_all["a"] # returns 1
# Only look for words in the keywords-set
wordcount_keywords = collections.Counter(word
for word in sourcefile_words
if word in keywords)
count_and = wordcount_keywords["and"] # Returns 2
all_counted_keywords = wordcount_keywords.keys() # Returns ['a', 'and', 'the', 'of']

Here is a solution with no imports. It uses nested linear searches which are acceptable with a small number of searches over a small input array, but will become unwieldy and slow with larger inputs.
Still the input here is quite large, but it handles it in reasonable time. I suspect if your keywords file was larger (mine has only 3 words) the slow down would start to show.
Here we take an input file, iterate over the lines and remove punctuation then split by spaces and flatten all the words into a single list. The list has dupes, so to remove them we sort the list so the dupes come together and then iterate over it creating a new list containing the string and a count. We can do this by incrementing the count as long the same word appears in the list and moving to a new entry when a new word is seen.
Now you have your word frequency list and you can search it for the required keyword and retrieve the count.
The input text file is here and the keyword file can be cobbled together with just a few words in a file, one per line.
python 3 code, it indicates where applicable how to modify for python 2.
# use string.punctuation if you are somehow allowed
# to import the string module.
translator = str.maketrans('', '', '!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~')
words = []
with open('hamlet.txt') as f:
for line in f:
if line:
line = line.translate(translator)
# py 2 alternative
#line = line.translate(None, string.punctuation)
words.extend(line.strip().split())
# sort the word list, so instances of the same word are
# contiguous in the list and can be counted together
words.sort()
thisword = ''
counts = []
# for each word in the list add to the count as long as the
# word does not change
for w in words:
if w != thisword:
counts.append([w, 1])
thisword = w
else:
counts[-1][1] += 1
for c in counts:
print('%s (%d)' % (c[0], c[1]))
# function to prevent need to break out of nested loop
def findword(clist, word):
for c in clist:
if c[0] == word:
return c[1]
return 0
# open keywords file and search for each word in the
# frequency list.
with open('keywords.txt') as f2:
for line in f2:
if line:
word = line.strip()
thiscount = findword(counts, word)
print('keyword %s appear %d times in source' % (word, thiscount))
If you were so inclined you could modify findword to use a binary search, but its still not going to be anywhere near a dict. collections.Counter is the right solution when there are no restrictions. Its quicker and less code.

Making multiple search and replace more precise in Python for lemmatizer

I am trying to make my own lemmatizer for Spanish in Python2.7 using a lemmatization dictionary.
I would like to replace all of the words in a certain text with their lemma form. This is the code that I have been working on so far.
def replace_all(text, dic):
for i, j in dic.iteritems():
text = text.replace(i, j)
return text
my_text = 'Flojo y cargantes. Decepcionantes. Decenté decentó'
my_text_lower= my_text.lower()
lemmatize_list = 'ExampleDictionary'
lemmatize_word_dict = {}
with open(lemmatize_list) as f:
for line in f:
depurated_line = line.rstrip()
(val, key) = depurated_line.split("\t")
lemmatize_word_dict[key] = val
txt = replace_all(my_text_lower, lemmatize_word_dict)
print txt
Here is an example dictionary file which contains the lemmatized forms used to replace the words in the input, or my_tyext_lower. The example dictionary is a tab-separated 2-column file in which Col. 1 Represented the values and Col 2 represents the keys to match.
ExampleDictionary
flojo floja
flojo flojas
flojo flojos
cargamento cargamentos
cargante cargantes
decepción decepciones
decepcionante decepcionantes
decentar decenté
decentar decentéis
decentar decentemos
decentar decentó
My desired output is as follows:
flojo y cargante. decepcionante. decentar decentar
Using these inputs (and the example phrase, as listed in my_textwithin the code). My actual output currently is:
felitrojo y cargramarramarrartserargramarramarrunirdo. decepáginacionarrtícolitroargramarramarrunirdo. decentar decentar
Currently, I can't seem to understand what it going wrong with the code.
It seems that it is replacing letters or chunks of each word, instead of recognizing the word, finding it in the lemma dictionary and then replace that instead.
For instance, this is the result that I am getting when I use the entire dictionary (more than 50.000 entries). This problem does not happen with my small example dictionary. Only when I use the complete dictionary which makes me think that prehaps it is double "replacing" at some point?
Is there a pythonic technique that I am missing and can incorporate into this code to make my search and replace function more precise, to identify the full words for replacement rather than chunks and/or NOT make any double replacements?

Because you use text.replace there's a chance that you'll still be matching a sub-string, and the text will get processed again. It's better to process one input word at a time and build the output string word-by-word.
I've switched your key-value the other way around (because you want to look up the right and find the word on the left), and I mainly changed the replace_all:
import re
def replace_all(text, dic):
result = ""
input = re.findall(r"[\w']+|[.,!?;]", text)
for word in input:
changed = dic.get(word,word)
result = result + " " + changed
return result
my_text = 'Flojo y cargantes. Decepcionantes. Decenté decentó'
my_text_lower= my_text.lower()
lemmatize_list = 'ExampleDictionary'
lemmatize_word_dict = {}
with open(lemmatize_list) as f:
for line in f:
kv = line.split()
lemmatize_word_dict[kv[1]] =kv[0]
txt = replace_all(my_text_lower, lemmatize_word_dict)
print txt

I see two problems with your code:
it will also replace words if they appear as part of a bigger word
by replacing words one after the other, you could replace (parts of) words that have already been replaced
Instead of that loop, I suggest using re.sub with word boundaries \b to make sure that you replace complete words only. This way, you can also pass a callable as a replacement function.
import re
def replace_all(text, dic):
return re.sub(r"\b\w+\b", lambda m: dic.get(m.group(), m.group()), text)

(permutation/Anagrm) words find in python 2.72 (need help to find what's wrong with my code)

i hope this request is legit.
i'm taking a programming course in python for engineers, so i'm kinda new at this business.
anyway, in my homework i was requested to write a function with receive two strings and check if one is a (permutation/Anagrm) of the other. (which means if they both have exactly the same letters and same number of appearances for each letter)
iv'e found some great codes here while searching, but i still don't get what's wrong with my code (and it's important for me to know for my studying process).
we got a tests file which suppose to check our functions, and it gave me that error:
Traceback (most recent call last):
File "C:\Users\Or\Desktop\תכנות\4\hw4\123456789_a4.py", line 110, in <module>
test_hw4()
File "C:\Users\Or\Desktop\תכנות\4\hw4\123456789_a4.py", line 97, in test_hw4
test(is_anagram('Tom Marvolo Riddle','I Am Lord Voldemort'), True)
File "C:\Users\Or\Desktop\תכנות\4\hw4\123456789_a4.py", line 31, in is_anagram
s2_list.sort()
NameError: global name 's2_list' is not defined
this is my code:
def is_anagram(string1, string2):
string1 = string1.lower() #turns Capital letter to small ones
string2 = string2.lower()
string1 = string1.replace(" ","") #turns the words inside the string to one word
string2 = string2.replace(" ","")
if len(string1)!= len(string2):
return False
s1_list = [string1[i] for i in range(len(string1))] #creates a list of string 1 letters
a2_list = [string1[k] for k in range(len(string1))]
s1_list.sort() #sorting the list
s2_list.sort()
booli=False
k=0
for i in s1_list: #for loop which compares each letter in the two lists
if s1_list[k]==s2_list[k]:
booli = True
k=k+1
else:
booli=False
break
return booli
any one know how to fix it ?
Thanks!

It looks like you have a typo with a2_list. That section should read:
s1_list = [string1[i] for i in range(len(string1))] #creates a list of string 1 letters
s2_list = [string2[k] for k in range(len(string2))]
s1_list.sort() #sorting the list
s2_list.sort()
FWIW, here is an interactive prompt example of how to tell if two strings are anagrams of one another:
>>> string1 = 'Logarithm'
>>> string2 = 'algorithm'
>>> sorted(string1.lower()) == sorted(string2.lower()) # see if they are anagrams
True

If you make a listify_string function and use that to set your s1_list and s2_list, it might be easier to see that there are multiple things that look to be wrong with your code, unless you intended both s1_list and s2_list to be populated from the same string.
def listify(string):
return [c for c in string]
Then you can simply do s1_list = listify(string1) and s2_list = ... to set the values.
I would probably turn at least the 'check if the two lists are the same' into a function, so I could use an early return to indicate falseness (so instead of starting with booli as true, setting it on each iteration through the loop and breaking out of the loop if false).
If you look at the join method of Python strings, you might find inspiration for another way to check if s1_list and s2_list are the same.

Try this one-liner instead:
sorted(s1.lower().replace(' ', '')) == sorted(s2.lower().replace(' ', ''))
Python strings are essentially lists, so they can be sorted. We just need to take care of uppercase and whitespace first. The python equals operator then takes care of the actual comparison.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Search in matrix - python - python

You can convert each line in the matrix to a string and try to find the search work in it. def search(p): s=read_wordsearch() for line in s: if p in ''.join(line): return True

def search(p): w = ''.join(p) s=read_wordsearch() #This is my matrix full of letters for line in s: word = ''.join(line) if word.find(w) >= 0: return True return False Edit: there is already lot of string functions available in Python. You just need to use strings to be able to use them.

Related

Converting a file in list format into a dictionary with multiple conditions. (python)

Remove certain characters if they are not in a specific location in a string #python

Frequency of keywords in a list

Making multiple search and replace more precise in Python for lemmatizer

(permutation/Anagrm) words find in python 2.72 (need help to find what's wrong with my code)

Categories

Resources