How do I count up the number of letter pairs in python? - python

I am making a program in python that count up the number of letter pairs.
For example ------> 'ddogccatppig' will print 3, 'a' will print 0, 'dogcatpig' will print 0, 'aaaa' will print 3, and 'AAAAAAAAAA' will print 9.
My teacher told me to use a for loop to get the i and i+1 index to compare. I do not know how to do this, and I am really confused. My code:
def count_pairs( word ):
pairs = 0
chars = set(word)
for char in chars:
pairs += word.count(char + char)
return pairs
Please help me!!! Thank you.

The for loop is only to iterate through the appropriate values of i, not directly to do the comparison. You need to start i at 0, and iterate through i+1 being the last index in the string. Work this out on paper.
Alternately, use i-1 and i; then you want to start i-1 at 0, which takes less typing:
for i in range(1, len(word)):
if word[i] == word[i-1]:
...
Even better, don't use the counter at all -- make a list of equality results and count the True values:
return sum([word[i] == word[i-1] for i in range(1, len(word))])
This is a bit of a "dirty trick" using the fact that True evaluates as 1 and False as 0.

If you want to loop over indices instead of the actual characters, you can do:
for i in range(len(word)):
# do something with word[i] and/or word[i+1] or word[i-1]
Converting the string to a set first is counterproductive because it removes the ordering and the duplicates, making the entire problem impossible for two different reasons. :)

Here is an answer:
test = "ddogccatppig"
def count_pairs(test):
counter = 0
for i in range(0,len(test)-1):
if test[i] == test[i+1]
counter+=1
return counter
print(count_pairs(test))
Here you iterate through the length of the string (minus 1 because otherwise you will get an index out of bounds exception). Add to a counter if the letter is the same as the one in front and return it.

This is another (similar) way to get the same answer.
def charPairs(word):
word = list(word)
count = 0
for n in range(0, len(word)-1):
if word[n] == word[n+1]:
count +=1
return count
print(charPairs("ddogccatppig"))
print(charPairs("a"))
print(charPairs("dogcatpig"))
print(charPairs("aaaa"))
print(charPairs("AAAAAAAAAA"))

Related

How to stop over counting of duplicate letters in a list of strings

I'm trying to count the number of times a duplicate letter shows up in the list element.
For example, given
arr = ['capps','hat','haaah']
I out put a list and I get ['1','0','1']
def myfunc(words):
counter = 0 #counters dup letters in words
len_ = len(words)-1
for i in range(len_):
if words[i] == words[i+1]: #if the letter ahead is the same add one
counter+=1
return counter
def minimalOperations(arr):
return [*map(myfunc,arr)] #map fuc applies myfunc to element in words.
But my code would output [1,0,2]
I'm not sure why I am over counting.
Can anyone help me resolve this, thank you in advance.
A more efficient solution using a regular expression:
import re
def myfunc(words):
reg_str = r"(\w)\1{1,}"
return len(re.findall(reg_str, words))
This function will find the number of substrings of length 2 or more containing the same letter. Thus 'aaa' in your example will only be counted once.
For a string like
'hhhhfafaahggaa'
the output will be 4 , since there are 4 maximal substrings of the same letter occuring at least twice : 'hhh' , 'ss', 'gg', 'aa'
You aren't accounting for situations where you have greater than 2 identical characters in succession. To do this, you can look back as well as forward:
if (words[i] == words[i+1]) and (words[i] != words[i-1] if i != 0 else True)
# as before
The ternary statement helps for the first iteration of the loop, to avoid comparing the last letter of a string with the first.
Another solution is to use itertools.groupby and count the number of instances where a group has a length greater than 1:
arr = ['capps','hat','haaah']
from itertools import groupby
res = [sum(1 for _, j in groupby(el) if sum(1 for _ in j) > 1) for el in arr]
print(res)
[1, 0, 1]
The sum(1 for _ in j) part is used to count the number items in a generator. It's also possible to use len(list(j)), though this requires list construction.
Well, your code counts the number of duplications, so what you observe is quite logical:
your input is arr = ['capps','hat','haaah']
in 'capps', the letter p is duplicated 1 time => myfunc() returns 1
in 'hat', there is no duplicated letter => myfunc() returns 0
in 'haaah', the letter a is duplicated 2 times => myfunc() returns 2
So finally you get [1,0,2].
For your purpose, I suggest you to use a regex to match and count the number of groups of duplicated letters in each word. I also replaced the usage of map() with a list comprehension that I find more readable:
import re
def myfunc(words):
return len(re.findall(r'(\w)\1+', words))
def minimalOperations(arr):
return [myfunc(a) for a in arr]
arr = ['capps','hat','haaah']
print(minimalOperations(arr)) # [1,0,1]
arr = ['cappsuul','hatppprrrrtyyy','haaah']
print(minimalOperations(arr)) # [2,3,1]
You need to keep track of a little more state, specifically if you're looking at duplicates now.
def myfunc(words):
counter = 0 #counters dup letters in words
seen = None
len_ = len(words)-1
for i in range(len_):
if words[i] == words[i+1] and words[i+1] != seen: #if the letter ahead is the same add one and wasn't the first
counter+=1
seen = words[i]
return counter
This gives you the following output
>>> arr = ['capps','hat','haaah']
>>> map(myfunc, arr)
[1, 0, 1]
As others have pointed out, you could use a regular expression and trade clarity for performance. They key is to find a regular expression that means "two or more repeated characters" and may depend on what you consider to be characters (e.g. how do you treat duplicate punctuation?)
Note: the "regex" used for this is technically an extension on regular expressions because it requires memory.
The form will be len(re.findall(regex, words))
I would break this kind of problem into smaller chunks. Starting by grouping duplicates.
The documentation for itertools has groupby and recipes for this kind of things.
A slightly edited version of unique_justseen would look like this:
duplicates = (len(sum(1 for _ in group) for _key, group in itertools.groupby("haaah")))
and yields values: 1, 3, 1. As soon as any of these values are greater than 1 you have a duplicate. So just count them:
sum(n > 1 for n in duplicates)
Use re.findall for matches of 2 or more letters
>>> arr = ['capps','hat','haaah']
>>> [len(re.findall(r'(.)\1+', w)) for w in arr]
[1, 0, 1]

Comparing characters in a string in Python

I want to compare following chars in a string and if they're equal, raise the counter.
With my example code I always get TypErrors related to line 6.
Do you know where's the problem?
Thank you!
def func(text):
counter = 0
text_l = text.lower()
for i in text_l:
if text_l[i+1] == text_l[i]:
print(text_l[i+1], text_l[i])
counter += 1
return counter
i is not an index. Your for will iterate over the elements directly, so i at any point of time is a character, not an integer. Use the range function if you want the indices:
for i in range(len(text_l) - 1): # i is the current index
if text_l[i + 1] == text_l[i]:
You can also use enumerate:
for i, c in enumerate(text_l[:-1]): # i is the current index, c is the current char
if text_l[i + 1] == c:
In either case, you'll want to iterate until the penultimate character because, you'll hit an IndexError on the last iteration with i + 1 as i + 1 is out of bounds for the last character.

Intro to Python - Lists questions

we've started doing Lists in our class and I'm a bit confused thus coming here since previous questions/answers have helped me in the past.
The first question was to sum up all negative numbers in a list, I think I got it right but just want to double check.
import random
def sumNegative(lst):
sum = 0
for e in lst:
if e < 0:
sum = sum + e
return sum
lst = []
for i in range(100):
lst.append(random.randrange(-1000, 1000))
print(sumNegative(lst))
For the 2nd question, I'm a bit stuck on how to write it. The question was:
Count how many words occur in a list up to and including the first occurrence of the word “sap”. I'm assuming it's a random list but wasn't given much info so just going off that.
I know the ending would be similar but no idea how the initial part would be since it's string opposed to numbers.
I wrote a code for a in-class problem which was to count how many odd numbers are on a list(It was random list here, so assuming it's random for that question as well) and got:
import random
def countOdd(lst):
odd = 0
for e in lst:
if e % 2 = 0:
odd = odd + 1
return odd
lst = []
for i in range(100):
lst.append(random.randint(0, 1000))
print(countOdd(lst))
How exactly would I change this to fit the criteria for the 2nd question? I'm just confused on that part. Thanks.
The code to sum -ve numbers looks fine! I might suggest testing it on a list that you can manually check, such as:
print(sumNegative([1, -1, -2]))
The same logic would apply to your random list.
A note about your countOdd function, it appears that you are missing an = (== checks for equality, = is for assignment) and the code seems to count even numbers, not odd. The code should be:
def countOdd(lst):
odd = 0
for e in lst:
if e%2 == 1: # Odd%2 == 1
odd = odd + 1
return odd
As for your second question, you can use a very similar function:
def countWordsBeforeSap(inputList):
numWords = 0
for word in inputList:
if word.lower() != "sap":
numWords = numWords + 1
else:
return numWords
inputList = ["trees", "produce", "sap"]
print(countWordsBeforeSap(inputList))
To explain the above, the countWordsBeforeSap function:
Starts iterating through the words.
If the word is anything other than "sap" it increments the counter and continues
If the word IS "sap" then it returns early from the function
The function could be more general by passing in the word that you wanted to check for:
def countWordsBefore(inputList, wordToCheckFor):
numWords = 0
for word in inputList:
if word.lower() != wordToCheckFor:
numWords = numWords + 1
else:
return numWords
inputList = ["trees", "produce", "sap"]
print(countWordsBeforeSap(inputList, "sap"))
If the words that you are checking come from a single string then you would initially need to split the string into individual words like so:
inputString = "Trees produce sap"
inputList = inputString.split(" ")
Which splits the initial string into words that are separated by spaces.
Hope this helps!
Tom
def count_words(lst, end="sap"):
"""Note that I added an extra input parameter.
This input parameter has a default value of "sap" which is the actual question.
However you can change this input parameter to any other word if you want to by
just doing "count_words(lst, "another_word".
"""
words = []
# First we need to loop through each item in the list.
for item in lst:
# We append the item to our "words" list first thing in this loop,
# as this will make sure we will count up to and INCLUDING.
words.append(item)
# Now check if we have reached the 'end' word.
if item == end:
# Break out of the loop prematurely, as we have reached the end.
break
# Our 'words' list now has all the words up to and including the 'end' variable.
# 'len' will return how many items there are in the list.
return len(words)
lst = ["something", "another", "woo", "sap", "this_wont_be_counted"]
print(count_words(lst))
Hope this helps you understand lists better!
You can make effective use of list/generator comprehensions. Below are fast and memory efficient.
1. Sum of negatives:
print(sum( i<0 for i in lst))
2. Count of words before sap: Like you sample list, it assumes no numbers are there in list.
print(lst.index('sap'))
If it's a random list. Filter strings. Find Index for sap
l = ['a','b',1,2,'sap',3,'d']
l = filter(lambda x: type(x)==str, l)
print(l.index('sap'))
3. Count of odd numbers:
print(sum(i%2 != 0 for i in lst))

How to append even and odd chars python

I want to convert all the even letters using one function and all the odd numbers using another function. So, each letter represents 0-25 correspsonding with a-z, so a,c,e,g,i,k,m,o,q,s,u,w,y are even characters.
However, only my even letters are converting correctly.
def encrypt(plain):
charCount = 0
answer=[]
for ch in plain:
if charCount%2==0:
answer.append(pycipher.Affine(7,6).encipher(ch))
else:
answer.append(pycipher.Affine(3,0).encipher(ch))
return ''.join(answer)
You never change charCount in your loop -- So it starts at 0 and stays at 0 which means that each ch will be treated as "even".
Based on your update, you actually want to check if the character is odd or even based on it's "index" in the english alphabet. Having some sort of mapping of characters to numbers is helpful here. You could build it yourself:
alphabet = 'abcde...' # string.ascii_lowercase?
mapping = {k: i for i, k in enumerate(alphabet)}
OR we can use the builtin ord noticing that ord('a') produces an odd result, ord('b') is even, etc.
def encrypt(plain):
answer=[]
for ch in plain:
if ord(ch) % 2 == 1: # 'a', 'c', 'e', ...
answer.append(pycipher.Affine(7,6).encipher(ch))
else: # 'b', 'd', 'f', ...
answer.append(pycipher.Affine(3,0).encipher(ch))
return ''.join(answer)
Your basic approach is to re-encrypt a letter each time you see it. With only 26 possible characters to encrypt, it is probably worth pre-encrypting them, then just performing a lookup for each character in the plain text. While doing that, you don't need to compute the position of each character, because you know you are alternating between even and odd the entire time.
import string
def encrypt(plain):
# True == 1, False == 0
fs = [pycipher.Affine(3,0).encipher,
pycipher.Affine(7,6).encipher]
is_even = True # assuming "a" is even; otherwise, just set this to False
d = dict()
for ch in string.ascii_lowercase:
f = fs[is_even]
d[ch] = f(ch)
is_even = not is_even
return ''.join([d[ch] for ch in plain])
You can also use itertools.cycle to simplify the alternation for you.
def encrypt(plain):
# again, assuming a is even. If not, reverse this list
fs = itertools.cycle([pycipher.Affine(3,0).encipher,
pycipher.Affine(7,6).encipher])
d = dict((ch, f(ch)) for f, ch in zip(fs, string.ascii_lowercase))
return ''.join([d[ch] for ch in plain])
This are my two cents on that. What #mgilson is proposing also works of course but not in the way you specified (in the comments). Try to debug your code in your head after writing it.. Go through the for loop and perform 1-2 iterations to see whether the variables take the values you intended them to. charCount is never reassigned a value. It is always 0. And, yes charCount += 1 would make it change but not in the way you want it to..
def encrypt(plain):
alphabet = 'abcdefghijklmnopqrwstuvwxyz'
answer = ''
for letter in plain:
try:
if alphabet.index(letter.lower()) % 2 == 0:
answer += pycipher.Affine(7, 6).encipher(letter)
else:
answer += pycipher.Affine(3, 0).encipher(letter)
except:
answer += letter
return answer
my_text = 'Your question was not very clear OP'
encripted_text = encrypt(my_text)
Also, i would not use ord(ch) because ord('a') = 97 and not 0 therefore odd instead of even.
Since your notion of even letter is based on the position of a character in the alphabet, you could use ord(), like this:
if ord(ch)%2==0:
Note that ord('a') and ord('A') are both odd, so that would make a go in the else part. If you want the opposite, then just negate the condition:
if ord(ch)%2!=0:

While loops list index out of range error. Strip off strings with double slases

I was wondering if could help me debug this code. I'm curious as to why I'm getting a list index out of range error. I'm trying to add up all the items in the list and using the number as the index for the list. In the end, I wanted all strings in the list to cut off '//'.
word_list = []
i = 0
while i < len(word_list):
word_list.extend(['He//llo', 'Ho//w are yo//u', 'Be///gone'])
i += 1
word_list[i].strip('//')
print(i)
print(word_list[i])
print(i)
You condition is never True i < len(word_list), i is 0 and the so is the length of your list so you never enter the loop. You cannot index an empty list so print(word_list[i]) with i being 0 gives you an IndexError.
Your next problem is adding more items to your list in the loop so if you did start the loop it would be infinite as the list size would grow faster than i, for example adding a single string to your list initially:
word_list = ["foo"]
i = 0
# i will never be greater than len(word_list) so loops infinitely
while i < len(word_list): # never enter i is not < len(wordlist)
print(i)
word_list.extend(['He//llo', 'Ho//w are yo//u', 'Be///gone'])
i += 1
word_list[i].strip('//')
print(i)
You add 3 elements to your list, you increase i by 1 so that equal an infinite loop. Not sure what your goal is but using a while does not seem to be what you really want.
If you wanted to use a loop to replace the / and you actually have some strings in your list initially:
word_list = ['He//llo', 'Ho//w are yo//u', 'Be///gone']
for i in range(len(word_list)):
word_list[i] = word_list[i].replace("/","")
print(i)
strings are also immutable so you need to reassign the value, you cannot change a string inplace, the above can also simply become a list comp:
word_list = ['He//llo', 'Ho//w are yo//u', 'Be///gone']
word_list[:] = [s.replace("/","") for s in word_list]
I also used str.replace as strip only removes from the start and end of strings.
Here is what's happening in the code:
word_list = [] # you initialize an empty list with no elements
i = 0
while i < len(word_list): # word_list has no elements, so its length is zero
# so this expression is 'while 0 < 0, *which is false*
# so, we skip the loop entirely
print(word_list[i]) # print the word_list[0], the first element of word_list
# word_list has not changed, so still has zero elements at this point
# error! panic! etc
You start with word_list equal to [] which has length 0, and i equal to 0, thus the while loop is never entered (it's false that 0 < 0). When you try to print(word_list[i]) you get an IndexError because there is no ith (0th) element in word_list -- word_list is empty, so index 0 really is out of range.
What you want is presumably:
word_list = ['He//llo', 'Ho//w are yo//u', 'Be///gone']
i = 0
while i < len(word_list):
word_list[i] = word_list[i].replace('//')
print(i, word_list[i])
i += 1
I say "presumably" because maybe you do want the list to grow infinitely and faster than i (joke -- see #Padraic Cunningham's answer, which I think you should accept as the correct one).
use for loop, there no need to keep track of indexing with for loop
import re
for i, w in enumerate(word_list):
word_list[i] = re.sub(r'/+','',w)
print(word_list)
['Hello', 'How are you', 'Begone']

Categories