Compare two strings, including duplicate letters? - python

I'm trying to write a function that takes two user inputs: a word and a maximum length. The function reads from a text file (opened earlier in the program), looks at all the words that fit within the maximum length given, and returns a list of words from the file that contain all of the letters in the word that the user gave. Here's my code so far:
def comparison():
otherWord = input("Enter word: ")
otherWord = list(otherWord)
maxLength = input("What is the maximum length of the words you want: ")
listOfWords = []
for line in file:
line = line.rstrip()
letterCount = 0
if len(line) <= int(maxLength):
for letter in otherWord:
if letter in line:
letterCount += 1
if letterCount == len(otherLine):
listOfWords.append(line)
return listOfWords
This code works, but my problem is that it does not account for duplicate letters in the words read from the file. For example, if I enter "GREEN" as otherWord, then the function returns a list of words containing the letters G, R, E, and N. I would like it to return a list containing words that have 2 E's. I imagine I'll also have to do some tweaking with the letterCount part, as the duplicates would affect that, but I'm more concerned with recognizing duplicates for now. Any help would be much appreciated.

You could use a Counter for the otherWord, like this:
>>> from collections import Counter
>>> otherWord = 'GREEN'
>>> otherWord = Counter(otherWord)
>>> otherWord
Counter({'E': 2, 'R': 1, 'N': 1, 'G': 1})
And then your check could look like this:
if len(line) <= int(maxLength):
match = True
for l, c in counter.items():
if line.count(l) < c:
match = False
break
if match:
listOfWords.append(line)
You can also write this without a match variable using Python’s for..else construct:
if len(line) <= int(maxLength):
for l, c in counter.items():
if line.count(l) < c:
break
else:
listOfWords.append(line)
Edit: If you want to have an exact match on character count, check for equality instead, and further check if there are any extra characters (which is the case if the line length is different).

You can use collections.Counter that also lets you perform (multi)set operations:
In [1]: from collections import Counter
In [2]: c = Counter('GREEN')
In [3]: l = Counter('GGGRREEEENN')
In [4]: c & l # find intersection
Out[4]: Counter({'E': 2, 'R': 1, 'G': 1, 'N': 1})
In [5]: c & l == c # are all letters in "GREEN" present "GGGRREEEENN"?
Out[5]: True
In [6]: c == l # Or if you want, test for equality
Out[6]: False
So your function could become something like:
def word_compare(inputword, wordlist, maxlenght):
c = Counter(inputword)
return [word for word in wordlist if maxlenght <= len(word)
and c & Counter(word) == c]

Related

Can't get my head around the problem, list index out of range (inside 3 loops)

I know list index out of range has been covered a million times before and I know the issue is probably that I am trying to reference an index position that does not exist but as there are 3 for loops nested I just cant figure out what is going on.
I am trying to calculate the frequency of each letter of the alphabet in a list of words.
alphabet_string = string.ascii_uppercase
g = list(alphabet_string)
a_count = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
y = 0
for word in words:
for chars in word:
for letters in chars:
if letters == g[y]:
a_count[y] = a_count[y] +1
y = y + 1
print(a_count[0])
word is in the format of: ['ZYMIC']
chars is in the format of: ZYMIC
letters is in the format of: C
If I substitute the y value for a value between 0 and 25 then it returns as expected.
I have a feeling the issue is as stated above that I am exceeding the index number of 25, so I guess y = y + 1 is in the wrong position. I have however tried it in different positions.
Any help would be appreciated.
Thanks!
Edit: Thanks everyone so much, never had this many responses before, all very helpful!
Storing a_count as a dictionary is the better option for this problem.
a_count = {}
for word in words:
for chars in word:
for letters in chars:
a_count[letters] = a_count.get(letters, 0) + 1
You can also use the Counter() class from the collections library.
from collections import Counter
a_count = Counter()
for word in words:
for chars in word:
for letters in chars:
a_count[letters] += 1
print(a.most_common())
Solution via Counter -
from collections import Counter
words = ['TEST','ZYMIC']
print(Counter(''.join(words)))
If you wanna stick to your code then change the if condition -
when y = 0 g[y] means 'A' and you're checking if 'A' == 'Z' which is the 1st letter. Basically, you need to fetch the index location of the element from list g and increase the value by 1. That's what you need to do to make it work. If I understood your problem correctly.
import string
words = ['ZYMIC']
alphabet_string = string.ascii_uppercase
g = list(alphabet_string)
a_count = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
for word in words:
for chars in word:
for letters in chars:
if letters in g:
y = g.index(letters)
a_count[y] += 1
print(a_count)
And you can very well replace the if condition, and check for the index directly because the letter will always be there in g. Therefore, this particular condition is redundant here.
for word in words:
for chars in word:
for letters in chars:
y = g.index(letters)
a_count[y] += 1
I think it's because of list a_count.
I would suggest another approach here, based on dictionaries:
listeletters = ['foihroh','ZYMIC','ajnaisui', 'fjindsonosn']
alphabeth = {'a' : 0,
'b' : 0,
'c': 0}
for string in listeletters:
for l in string:
if l in alphabeth.keys():
alphabeth[l] = alphabeth[l] + 1
print(alphabeth)
I inialize the alphabeth and then I get the result wanted

Counting number of occurrences in a string

I need to return a dictionary that counts the number of times each letter in a predetermined list occurs. The problem is that I need to count both Upper and Lower case letters as the same, so I can't use .lower or .upper.
So, for example, "This is a Python string" should return {'t':3} if "t" is the letter being searched for.
Here is what I have so far...
def countLetters(fullText, letters):
countDict = {i:0 for i in letters}
lowerString = fullText.lower()
for i in lowerString:
if i in letters:
countDict[i] += 1
return countDict
Where 'letters' is the condition and fullText is the string I am searching.
The obvious issue here is that if the test is "T" rather than "t", my code won't return anything Sorry for any errors in my terminology, I am pretty new to this. Any help would be much appreciated!
To ignore capitalization, you need to input =
input = input.lower ()
.Lists all characters of the input text using list operations.
It can also be used as a word counter if you scan the space character.
input = "Batuq batuq BatuQ" # Reads all inputs up to the EOF character
input = input.replace('-',' ')#Replace (-, + .etc) expressions with a space character.
input = input.replace('.','')
input = input.replace(',','')
input = input.replace("`",'')
input = input.replace("'",'')
#input= input.split(' ') #if you use it, it will sort by the most repetitive words
dictionary = dict()
count = 0
for word in input:
dictionary[word] = input.count(word)
print(dictionary)
#Writes the 5 most repetitive characters
for k in sorted(dictionary,key=dictionary.get,reverse=True)[:5]:
print(k,dictionary[k])
Would something like this work that handles both case sensitive letter counts and non case sensitive counts?
from typing import List
def count_letters(
input_str: str,
letters: List[str],
count_case_sensitive: bool=True
):
"""
count_letters consumes a list of letters and an input string
and returns a dictionary of counts by letter.
"""
if count_case_sensitive is False:
input_str = input_str.lower()
letters = list(set(map(lambda x: x.lower(), letters)))
# dict comprehension - build your dict in one line
# Tutorial on dict comprehensions: https://www.datacamp.com/community/tutorials/python-dictionary-comprehension
counts = {letter: input_str.count(letter) for letter in letters}
return counts
# define function inputs
letters = ['t', 'a', 'z', 'T']
string = 'this is an example with sTrings and zebras and Zoos'
# case sensitive
count_letters(
string,
letters,
count_case_sensitive=True
)
# {'t': 2, 'a': 5, 'z': 1, 'T': 1}
# not case sensitive
count_letters(
string,
letters,
count_case_sensitive=False
)
# {'a': 5, 'z': 2, 't': 3} # notice input T is now just t in dictionary of counts
Try it - like this:
def count_letters(fullText, letters):
countDict = {i: 0 for i in letters}
lowerString = fullText.lower()
for i in lowerString:
if i in letters:
countDict[i] += 1
return countDict
test = "This is a Python string."
print(count_letters(test, 't')) #Output: 3
You're looping over the wrong string. You need to loop over lowerString, not fullString, so you ignore the case when counting.
It's also more efficient to do if i in countDict than if i in letter.
def countLetters(fullText, letters):
countDict = {i.lower():0 for i in letters}
lowerString = fullText.lower()
for i in lowerString:
if i in countDict:
countDict[i] += 1
return countDict
What you can do is simply duplicate the dict with both upper and lowercase like so:
def countLetters(fullText, letters):
countDict = {}
for i in letters:
countDict[i.upper()]=0
countDict[i.lower()]=0
lowerString = fullText.lower()
letters = letters.lower()
for i in lowerString:
if i in letters:
countDict[i] += 1
if (i!=i.upper()):
countDict[i.upper()] +=1
return countDict
print(countLetters("This is a Python string", "TxZY"))
Now some things you can also do is loop over the original string and change countDict[i] += 1 to countDict[i.lower()] +=1
Use the Counter from the collections module
from collections import Counter
input = "Batuq batuq BatuQ"
bow=input.split(' ')
results=Counter(bow)
print(results)
output:
Counter({'Batuq': 1, 'batuq': 1, 'BatuQ': 1})

To look for words in a dictionary based on random input letters, will this code perform effeciently?

I'm new to coding. I tried to build a simple code that can take a subset of alphabet letters and return back a valid words from a text based dictionary.
In the code below, I ask the user to input a number of characters (e.g. abcdef) then the program will make words out of these letters.
Now my question is this the best method to do it in term of performance, code length and the blocks sequence? If not, can you suggest a better way to do?
#Read the dictionary
fh = open('C:\\english-dict2.txt')
dict = []
while True:
line = fh.readline()
dict.append(line.strip())
if not line:
break
fh.close()
#Input letters
letters = input("Please enter your letters: ")
letters_list=[]
for l in letters:
letters_list.append(l)
mini = 2 #default value
maks = len(letters_list)
mini = input("Minimum length of the word (default is 2): ")
if mini == "":
mini = 2 #default value
mini = int(mini)
#Here I create a new dictionary based on the number of letters input or less than.
newdic=[]
for words1 in dict:
if len(words1) <= maks and len(words1)>= mini:
newdic.append(words1)
for words2 in newdic:
ok = 1
for i in words2:
if i in letters_list:
ok = ok * 1
else:
ok = ok * 0
if ok == 1:
print(words2)
Lists are inefficient for lookups. You should use a dict of sets instead to index every word with each letter in the word, so that you can simply use set intersection to find the words that contain all of the given letters:
from functools import reduce
d = {}
with open('C:\\english-dict2.txt') as f:
for l in f:
w = l.strip()
for c in set(w):
d.setdefault(c, set()).add(w)
letters = input("Please enter your letters: ")
print(reduce(lambda a, b: a & d[b], letters[1:], d[letters[0]]))
For example, given a dictionary of the following words:
apple
book
cat
dog
elephant
The index dictionary d would become:
{'p': {'elephant', 'apple'}, 'a': {'cat', 'elephant', 'apple'}, 'l': {'elephant', 'apple'}, 'e': {'elephant', 'apple'}, 'k': {'book'}, 'b': {'book'}, 'o': {'book', 'dog'}, 'c': {'cat'}, 't': {'cat', 'elephant'}, 'd': {'dog'}, 'g': {'dog'}, 'h': {'elephant'}, 'n': {'elephant'}}
Here's a sample input/output of the above code, where both the words apple and elephant are found to contain both of the letters a and e:
Please enter your letters: ae
{'apple', 'elephant'}
From here you can easily filter the resulting set based on a given minimum number of letters if you want.
modification 1: You do not need to loop over letters in letters, just
letters_list=list(letters)
is enough to make list of letters.
modification2: You can make sure any mini can be handled using:
try:
mini = int(mini)
except:
mini = 2
For your dictionary, you don't need to iterate through using readline(), just do:
with open(path) as fh:
dict = readlines()
This will also safely close your file, even if there's an error. If you want to do lookups for words, I'd use a set rather than a list, as the lookups in sets are O(1), whereas lookups in list are not, they are O(n).
d_set = set(dict)
This way if you want to create all combinations of letters, you can look them up like so:
import itertools
letters = input("Input your letters, please ")
def check_for_match(combos):
for combo in combos:
if combo in d_set:
yield combo
i = len(letters)
my_list = []
while i:
combos = itertools.permutations(words, i)
results = list(check_for_match(combos))
my_list = [*my_list, *results]
i-=1
This will give you all of the permutations of letters, check if they are in your dictionary, and build my_list if they are. I think that's what you are looking for

Counting e's in a string, using for loop [duplicate]

I am trying to write a program to count the occurrences of a specific letter in a string without the count function. I made the string into a list and set a loop to count but the count is never changing and i cant figure out why. This is what I have right now:
letter = 'a'
myString = 'aardvark'
myList = []
for i in myString:
myList.append(i)
count = 1
for i in myList:
if i == letter:
count == count + 1
else:
continue
print (count)
Any help is greatly appreciated.
Be careful, you are using count == count + 1, and you must use count = count + 1
The operator to attribute a new value is =, the operator == is for compare two values
Instead of
count == count + 1
you need to have
count = count + 1
Although someone else has solved your problem, the simplest solution to do what you want to do is to use the Counter data type:
>>> from collections import Counter
>>> letter = 'a'
>>> myString = 'aardvark'
>>> counts = Counter(myString)
>>> print(counts)
Counter({'a': 3, 'r': 2, 'v': 1, 'k': 1, 'd': 1})
>>> count = counts[letter]
>>> print(count)
3
Or, more succinctly (if you don't want to check multiple letters):
>>> from collections import Counter
>>> letter = 'a'
>>> myString = 'aardvark'
>>> count = Counter(myString)[letter]
>>> print(count)
3
The simplest way to do your implementation would be:
count = sum(i == letter for i in myString)
or:
count = sum(1 for i in myString if i == letter)
This works because strings can be iterated just like lists, and False is counted as a 0 and True is counted as a 1 for arithmetic.
Use filter function like this
len(filter(lambda x: x==letter, myString))
Your count is never changing because you are using == which is equality testing, where you should be using = to reassign count.
Even better, you can increment with
count += 1
Also note that else: continue doesn't really do anything as you will continue with the next iteration of the loop anyways. If I were to have to come up with an alternative way to count without using the count function, I would lean towards regex:
import re
stringy = "aardvark"
print(len(re.findall("a", stringy)))
Apart from the above methods, another easiest way is to solve the problem by using the python dictionary
word="purple"
dic={}
for letter in word:
if letter in dic:
dic[letter]+=1
else:
dic[letter]=1
print(dic)
{'p': 2, 'u': 1, 'r': 1, 'l': 1, 'e': 1}
In case if you want to count the occurences of a particular character in the word.we can get that it by following the below mentioned way,
dic['p']
2
Your code logic is right except for considering count == 1 while using count == 1 you are comparing if count == 1 and count = 1 is for assigning and count += 1 is for incrementing.
Probably you know this, you might have got confused
also, you have to initialize count = 0
letter = 'a'
myString = 'aardvark'
myList = []
for i in myString:
myList.append(i)
count = 0
for i in myList:
if i == letter:
count +=1
else:
continue
print(count)

Comparing strings to a dictionary in groups of multiples of 3

I am writing a program which reads in a number of DNA characters (which is always divisible by 3) and checks if they correspond to the same amino acid. For example AAT and AAC both correspond to N so my program should print "It's the same". It does this fine but i just don't know how to compare 6/9/12/any multiple of 3 and see if the definitions are the same. For example:
AAAAACAAG
AAAAACAAA
Should return me It's the same as they are both KNK.
This is my code:
sequence = {}
d = 0
for line in open('codon_amino.txt'):
pattern, character = line.split()
sequence[pattern] = character
a = input('Enter original DNA: ')
b = input('Enter patient DNA: ')
for i in range(len(a)):
if sequence[a] == sequence[b]:
d = d + 0
else:
d = d + 1
if d == 0:
print('It\'s the same')
else:
print('Mutated!')
And the structure of my codon_amino.txt is structured like:
AAA K
AAC N
AAG K
AAT N
ACA T
ACC T
ACG T
ACT T
How do i compare the DNA structures in patters of 3? I have it working for strings which are 3 letters long but it returns an error for anything more.
EDIT:
If i knew how to split a and b into a list which was in intervals of three that might help so like:
a2 = a.split(SPLITINTOINTERVALSOFTHREE)
then i could easily use a for loop to iterate through them, but how do i split them in the first place?
EDIT: THE SOLUTION:
sequence = {}
d = 0
for line in open('codon_amino.txt'):
pattern, character = line.split()
sequence[pattern] = character
a = input('Enter original DNA: ')
b = input('Enter patient DNA: ')
for i in range(len(a)):
if all(sequence[a[i:i+3]] == sequence[b[i:i+3]] for i in range(0, len(a), 3)):
d = d + 1
else:
d = d + 0
if d == 0:
print('The patient\'s amino acid sequence is mutated.')
else:
print('The patient\'s amino acid sequence is not mutated.')
I think you can replace your second loop and comparisons with:
if all(sequence[a[i:i+3]] == sequence[b[i:i+3]] for i in range(0, len(a), 3)):
print('It\'s the same')
else:
print('Mutated!')
The all function iterates over the generator expression, and will be False if any of the values is False. The generator expression compares length-three slices of the strings.
I think what you should do is :
write a function to split a string into chunks a 3 characters.
(Some hints here)
write a function to convert a string into it's corresponding amino acid sequence (using previous function)
compare the sequences.
If this is what you mean:
def DNA(string):
return [string[i:i+3] for i in xrange(0,len(string),3)]
amino_1 = DNA("AAAAACAAG")
amino_2 = DNA("AAAAACAAA")
print amino_1, amino_2
print amino_1 == amino_2
Output: ['AAA', 'AAC', 'AAG'] ['AAA', 'AAC', 'AAA']
False

Categories