Detecting if a string is a pangram in Python - python

How is this working? It checks if a string contains each character from a-z at least once?
import string
def ispangram(str1, alphabet=string.ascii_lowercase):
alphaset = set(alphabet)
return alphaset <= set(str1.lower())
This returns True for example:
ispangram("The quick brown fox jumps over the lazy dog")
I can only assume it is something to do with lexographical ordering as stated here, but still a bit confused.
Comparing two lists using the greater than or less than operator
When I read the link in this SO question:
https://docs.python.org/3/tutorial/datastructures.html#comparing-sequences-and-other-types
It says:
Sequence objects may be compared to other objects with the same
sequence type. The comparison uses lexicographical ordering: first the
first two items are compared, and if they differ this determines the
outcome of the comparison; if they are equal, the next two items are
compared, and so on, until either sequence is exhausted. If two items
to be compared are themselves sequences of the same type, the
lexicographical comparison is carried out recursively. If all items of
two sequences compare equal, the sequences are considered equal. If
one sequence is an initial sub-sequence of the other, the shorter
sequence is the smaller (lesser) one. Lexicographical ordering for
strings uses the Unicode code point number to order individual
characters. Some examples of comparisons between sequences of the same
type.
But this isn't clear to me.

It is a set operation, not list. Which is equivalent to,
alphaset.issubset(set(str1.lower()))
s <= t
s.issubset(t)
Test whether every element in s is in t.
See here:
https://docs.python.org/2/library/sets.html
Edit: See here for the current version of Set. Though easier explanation is given in the old version (For comparisons).

No. It's comparing two sets. So it's converting the input string to lower case and then using Python's set type to compare it with the set of lowercase letters.
This is very useful (and fast) technique for comparing two lists to see what members they have in common/difference.

def pangram(s):
alphabet = set('abcdefghijklmnopqrstuvwxyz')
s = s.replace(' ','').lower()
s= sorted(s)
count = {}
#alphabet could be one sting within '' later sorted, but I just went straight to the point.
#After initializing my dictionary at null, we start the count
for letter in s:
if letter in count:
count[letter] =[]
else:
count[letter] = 1
for letter in alphabet:
if letter in count:
count[letter] =[]
else:
count[letter] = 0
for letter in count:
if count[letter]== 0:
print (letter +' missing!')
print (count[letter]!= 0)

Related

Finding common string in list and displaying them

I am trying to create a function compare(lst1,lst2) which compares the each element in a list and returns every common element in a new list and shows percentage of how common it is. All the elements in the list are going to be strings. For example the function should return:
lst1 = AAAAABBBBBCCCCCDDDD
lst2 = ABCABCABCABCABCABCA
common strand = AxxAxxxBxxxCxxCxxxx
similarity = 25%
The parts of the list which are not similar will simply be returned as x.
I am having trouble in completing this function without the python set and zip method. I am not allowed to use them for this task and I have to achieve this using while and for loops. Kindly guide me as to how I can achieve this.
This is what I came up with.
lst1 = 'AAAAABBBBBCCCCCDDDD'
lst2 = 'ABCABCABCABCABCABCA'
common_strand = ''
score = 0
for i in range(len(lst1)):
if lst1[i] == lst2[i]:
common_strand = common_strand + str(lst1[i])
score += 1
else:
common_strand = common_strand + 'x'
print('Common Strand: ', common_strand)
print('Similarity Score: ', score/len(lst1))
Output:
Common Strand: AxxAxxxBxxxCxxCxxxx
Similarity Score: 0.2631578947368421
I am having trouble in completing this function without the python set and zip method. I am not allowed to use them for this task and I have to achieve this using while and for loops. Kindly guide me as to how I can achieve this.
You have two strings A and B. Strings are ordered sequences of characters.
Suppose both A and B have equal length (the same number of characters). Choose some position i < len(A), len(B) (remember Python sequences are 0-indexed). Your problem statement requires:
If character i in A is identical to character i in B, yield that character
Otherwise, yield some placeholder to denote the mismatch
How do you find the ith character in some string A? Take a look at Python's string methods. Remember: strings are sequences of characters, so Python strings also implement several sequence-specific operations.
If len(A) != len(B), you need to decide what to do if you're comparing the ith element in either string to a string smaller than i. You might think to represent these as the same placeholder in (2).
If you know how to iterate the result of zip, you know how to use for loops. All you need is a way to iterate over the sequence of indices. Check out the language built-in functions.
Finally, for your measure of similarity: if you've compared n characters and found that N <= n are mismatched, you can define 1 - (N / n) as your measure of similarity. This works well for equally-long strings (for two strings with different lengths, you're always going to be calculating the proportion relative to the longer string).

Longest Subsequence problem if the lengths are different

Let the input sequences be X[0..m-1] and Y[0..n-1] of lengths m and n respectively. And let L(X[0..m-1], Y[0..n-1]) be the length of LCS of the two sequences X and Y. Following is the recursive definition of L(X[0..m-1], Y[0..n-1]).
If last characters of both sequences match (or X[m-1] == Y[n-1]) then
L(X[0..m-1], Y[0..n-1]) = 1 + L(X[0..m-2], Y[0..n-2])
If last characters of both sequences do not match (or X[m-1] != Y[n-1]) then
L(X[0..m-1], Y[0..n-1]) = MAX ( L(X[0..m-2], Y[0..n-1]), L(X[0..m-1], Y[0..n-2]) )
How to solve the problem if the lengths are different ? and how to print the respective sequences
It doesn't matter if the length of input strings are same or not, and this is taken care by the base case of recursion.
if (m == 0 || n == 0)
return 0;
If we reach the end of any one of the string, the recursion stops and unwinds from there.
Also the example you mentioned in comment:
ABCEFG and ABXDE
First we compare last character from both string. In this case, they are not same.
So we try two cases:
Remove last character from first string and compare it with second.
Remove last character from second string and compare it with first.
And return the max from both cases.
(As a side note, if the last character had matched, we would add 1 to our answer and remove the last character from both strings)
This process continues till any of the string reaches it's end, in which case, the base case of your recursion is satisfied and the recursion returns.
So it doesn't matter if the original length of string is same or not.

The number of differences between characters in a string in Python 3

Given a string, lets say "TATA__", I need to find the total number of differences between adjacent characters in that string. i.e. there is a difference between T and A, but not a difference between A and A, or _ and _.
My code more or less tells me this. But when a string such as "TTAA__" is given, it doesn't work as planned.
I need to take a character in that string, and check if the character next to it is not equal to the first character. If it is indeed not equal, I need to add 1 to a running count. If it is equal, nothing is added to the count.
This what I have so far:
def num_diffs(state):
count = 0
for char in state:
if char != state[char2]:
count += 1
char2 += 1
return count
When I run it using num_diffs("TATA__") I get 4 as the response. When I run it with num_diffs("TTAA__") I also get 4. Whereas the answer should be 2.
If any of that makes sense at all, could anyone help in fixing it/pointing out where my error lies? I have a feeling is has to do with state[char2]. Sorry if this seems like a trivial problem, it's just that I'm totally new to the Python language.
import operator
def num_diffs(state):
return sum(map(operator.ne, state, state[1:]))
To open this up a bit, it maps !=, operator.ne, over state and state beginning at the 2nd character. The map function accepts multible iterables as arguments and passes elements from those one by one as positional arguments to given function, until one of the iterables is exhausted (state[1:] in this case will stop first).
The map results in an iterable of boolean values, but since bool in python inherits from int you can treat it as such in some contexts. Here we are interested in the True values, because they represent the points where the adjacent characters differed. Calling sum over that mapping is an obvious next step.
Apart from the string slicing the whole thing runs using iterators in python3. It is possible to use iterators over the string state too, if one wants to avoid slicing huge strings:
import operator
from itertools import islice
def num_diffs(state):
return sum(map(operator.ne,
state,
islice(state, 1, len(state))))
There are a couple of ways you might do this.
First, you could iterate through the string using an index, and compare each character with the character at the previous index.
Second, you could keep track of the previous character in a separate variable. The second seems closer to your attempt.
def num_diffs(s):
count = 0
prev = None
for ch in s:
if prev is not None and prev!=ch:
count += 1
prev = ch
return count
prev is the character from the previous loop iteration. You assign it to ch (the current character) at the end of each iteration so it will be available in the next.
You might want to investigate Python's groupby function which helps with this kind of analysis.
from itertools import groupby
def num_diffs(seq):
return len(list(groupby(seq))) - 1
for test in ["TATA__", "TTAA__"]:
print(test, num_diffs(test))
This would display:
TATA__ 4
TTAA__ 2
The groupby() function works by grouping identical entries together. It returns a key and a group, the key being the matching single entry, and the group being a list of the matching entries. So each time it returns, it is telling you there is a difference.
Trying to make as little modifications to your original code as possible:
def num_diffs(state):
count = 0
for char2 in range(1, len(state)):
if state[char2 - 1] != state[char2]:
count += 1
return count
One of the problems with your original code was that the char2 variable was not initialized within the body of the function, so it was impossible to predict the function's behaviour.
However, working with indices is not the most Pythonic way and it is error prone (see comments for a mistake that I made). You may want rewrite the function in such a way that it does one loop over a pair of strings, a pair of characters at a time:
def num_diffs(state):
count = 0
for char1, char2 in zip(state[:-1], state[1:]):
if char1 != char2:
count += 1
return count
Finally, that very logic can be written much more succinctly — see #Ilja's answer.

Caesar Cipher algorithm with strings and for loop Python

The assignment is to write a Caesar Cipher algorithm that receives 2 parameters, the first being a String parameter, the second telling how far to shift the alphabet. The first part is to set up a method and set up two strings, one normal and one shifted. I have done this. Then I need to make a loop to iterate through the original string to build a new string, by finding the original letters and selecting the appropriate new letter from the shifted string. I've spent at least two hours staring at this one, and talked to my teacher so I know I'm doing some things right. But as for what goes in the while loop, I really don't have a clue. Any hints or pushes in the right direction would be very helpful so I at least have somewhere to start would be great, thank you.
def cipher(x, dist):
alphabet = "abcdefghijklmnopqrstuvwxyz"
shifted = "xyzabcdefghijklmnopqrstuvw"
stringspot = 0
shiftspot = (x.find("a"))
aspot = (x.find("a"))
while stringspot < 26:
aspot = shifted(dist)
shifted =
stringspot = stringspot + 1
ans =
return ans
print(cipher("abcdef", 1))
print(cipher("abcdef", 2))
print(cipher("abcdef", 3))
print(cipher("dogcatpig", 1))
Here are some pushes and hints:
You should validate your inputs. In particular, make sure that the shift distance is "reasonable," where reasonable means something you can handle. I recommend <=25.
If the maximum shift amount is 25, the letter 'a' plus 25 would get 'z'. The letter 'z' plus 25 will go past the end of the alphabet. But it wouldn't go past the end of TWO alphabets. So that's one way to handle wrap-around.
User #zondo, in his solution, handles upper-case letters. You didn't mention if you want to handle them or not. You may want to clarify that with your teacher.
If you know about dictionaries, you might want to build one to make it easy to map the old letters to the new letters.
You need to realize that strings are treated as tuples or lists - you can index them. I don't see you doing that in your code.
You can get an "ASCII code" number for a letter using ord(). The numbers are arbitrary, but both upper and lower case numbers are packed together tightly in ranges of 26. This means you can do math with them. (For example, ord('a') is 97. Not super useful. But ord('b') - ord('a') is 1, which might be good to know.)
alphabet and shifted are supposed to be a mapping between the original stream and the ciphertext. The loop's job is to iterate over all letters in the stream substitute them. More specifically, the letter in alphabet and the substitute letter in shifted reside at the same index, hence the mapping. In pseudocode:
ciphertext = empty
for each letter in x
i = index of letter in alphabet
new_letter = shifted[i]
add new_letter to ciphertext
The whole loop can be simplified to a comprehension list, but this shouldn't be your primary concern.
For more direct mapping than doing as in the pseudocode above, look into dictionaries.
Another thing that stands out in your code is the generation of shifted, which should depend on the argument dist so it can't just be hardcoded. So, if dist is 5, the first letter in shifted should be whatever lies at the 0+5 in alphabet, and so on. Hint: modulo operator.

String Occurrence Counting Algorithm

I am curious what is the most efficient algorithm (or commonly used) to count the number of occurrences of a string in a chunk of text.
From what I read, the Boyer–Moore string search algorithm is the standard for string searches but I am not sure if counting occurrences in an efficient way would be same as searching a string.
In Python this is what I want:
text_chunck = "one two three four one five six one"
occurance_count(text_chunck, "one") # gives 3.
EDIT: It seems like python str.count serves as such a method; however, I am not able to find what algorithm it uses.
For starters, yes, you can accomplish this with Boyer-Moore very efficiently. However, depending on some other parameters of your problem, there might be a better solution.
The Aho-Corasick string matching algorithm will find all occurrences of a set of pattern strings in a target string and does so in time O(m + n + z), where m is the length of the string to search, n is the combined length of all the patterns to match, and z is the total number of matches produced. This is linear in the size of the source and target strings if you just have one string to match. It also will find overlapping occurrences of the same string. Moreover, if you want to check how many times a set of strings appears in some source string, you only need to make one call to the algorithm. On top of this, if the set of strings that you want to search for never changes, you can do the O(n) work as preprocessing time and then find all matches in O(m + z).
If, on the other hand, you have one source string and a rapidly-changing set of substrings to search for, you may want to use a suffix tree. With O(m) preprocessing time on the string that you will be searching in, you can, in O(n) time per substring, check how many times a particular substring of length n appears in the string.
Finally, if you're looking for something you can code up easily and with minimal hassle, you might want to consider looking into the Rabin-Karp algorithm, which uses a roling hash function to find strings. This can be coded up in roughly ten to fifteen lines of code, has no preprocessing time, and for normal text strings (lots of text with few matches) can find all matches very quickly.
Hope this helps!
Boyer-Moore would be a good choice for counting occurrences, since it has some overhead that you would only need to do once. It does better the longer the pattern string is, so for "one" it would not be a good choice.
If you want to count overlaps, start the next search one character after the previous match. If you want to ignore overlaps, start the next search the full pattern string length after the previous match.
If your language has an indexOf or strpos method for finding one string in another, you can use that. If it proves to slow, then choose a better algorithm.
Hellnar,
You can use a simple dictionary to count occurrences in a String. The algorithm is a counting algorithm, here is an example:
"""
The counting algorithm is used to count the occurences of a character
in a string. This allows you to compare anagrams and strings themselves.
ex. animal, lamina a=2,n=1,i=1,m=1
"""
def count_occurences(str):
occurences = {}
for char in str:
if char in occurences:
occurences[char] = occurences[char] + 1
else:
occurences[char] = 1
return occurences
def is_matched(s1,s2):
matched = True
s1_count_table = count_occurences(s1)
for char in s2:
if char in s1_count_table and s1_count_table[char]>0:
s1_count_table[char] -= 1
else:
matched = False
break
return matched
#counting.is_matched("animal","laminar")
This example just returns True or False if the strings match. Keep in mind, this algorithm counts the number of times a character shows up in a string, this is good for anagrams.

Categories