Python 3 - count letters/words in text document/quick question - python

I'd like to know how to achieve the same result as the code I listed below without using any collections, or for someone to explain what goes on inside the Counter collection (in code or in a way that isn't confusing) since I can't seem to find it anywhere. This code is meant to read a text file called juliet.txt. I am trying to make it count the amount of letters and spaces inside the document and then print it as a result.
Code:
from collections import Counter
text = open('juliet.txt', 'r').read()
letters = 0
counter = Counter(text)
spacesAndNewlines = counter[' '] + counter['\n']
while letters < len(text):
print (text[letters])
letters += 1
while letters == len(text):
print (letters)
letters += 1
print (spacesAndNewlines)

Sounds like a homework question to me, in which case you won't get any benefit from me answering you.
letters = {}
with open('juliet.txt') as fh:
data = fh.read()
for char in data:
if char in letters:
letters[char] = 1
else:
letters[char] += 1
print(letters)
This uses a standard dictionary - normally I would use a defaultdict but for some weird reason you don't like collections. With the defaultdict you wouldn't need to do the laborious test to see if the char is already in the dictionary.

Related

How to count number of words inside a string using a for loop in python 3

I am trying to write a for loop that finds a specific word inside a string. I know that there is a one liner to do this in python, but I am practicing the for loops and I want to see how using a for I can identify specific words as it identifies especific letters (like vowels). I've been reading some questions, and I think the code should go like this:
s='bananasdasdnansbanana'
b='banana'
def count_words(s,b):
answer = 0
for i in range(len(s)):
if any(s[i:].startswith(b) for b in s):
answer += 1
print(answer)
but it is not printing anything. I did something similar when I was looking for vowels in the same string, but now I know I am supouse to "arrange" the characters in the word "banana" and then comparate it to the string, that is the porpuse of this part:
if any(s[i:].startswith(b) for b in s):
if you could help me I would really apreciate it.
Thank you.
Your code doesn't print because you don't call the function(you only define it), you should call the function by adding a command at the end:
count_words(s,b)
Your function actually count the number of character in string s:
s='bananasdasdnansbanana'
b='banana'
def count_words(s,b):
answer = 0
# Loop over each character in s
for i in range(len(s)):
# create a list contain at least current character => any will always return True
if any(s[i:].startswith(b) for b in s):
answer += 1
print(answer)
Right codes:
s='bananasdasdnansbanana'
b='banana'
def count_words(s,b):
answer = 0
for i in range(len(s)):
if s[i:].startswith(b):
answer += 1
print(answer)
count_words(s,b)
s='bananasdasdnansbanana'
b='banana'
def count_words(s,b):
answer = 0
counter = 0
if b in s:
for i in range(len(s)):
if s[i]!=b[counter]:
counter=0
else:
counter+=1
if counter == len(b):
answer+=1
counter = 0
print(answer)
count_words(s, b)
Above algorithm first check whether banana exists in s at least once. Then, it will loop to find the count.
If your goal is to use a for loop, you could find the length of the word you're looking for, then check sections of the larger string that are the same length and see if they match. You don't need to use any unless you are intentionally wanting to. Something like this:
s='bananasdasdnansbanana'
b='banana'
def count_words(s,b):
word_length = len(b)
answer = 0
for i in range(len(s) - len(b) + 1):
if s[i:i+word_length] == b:
answer += 1
return answer
count_words(s,b)
Note I also changed your print to return. It works either way.

Counting uppercase letters in a list excluding the first capital in a word

So my function must take a list of strings and return the total number of capital letters that appear in positions other than the beginning of a word. Also to break this problem into a sub problem, it needs a second function that takes a single word and returns the number of capital letters that appear in positions other than the beginning of that word. So far I have a function that works, but I have been told it needs to be done better and I am not quite sure how to do that.
def count_strange_caps(words):
if words[0].isupper():
count = abs(1 -sum(1 for c in words if c.isupper())
elif words[0].islower():
count = abs(sum(1 for c in words if c.isupper()))
return count
def total_strange_caps(words):
total_count = 0
for word in words:
if word[0].isupper():
total_count -= 1
for letter in word:
if letter.isupper():
total_count += 1
return total_count
My teacher told me to combine the two list comprehensions in count_strange_caps as they are basically the same code and use the code from count_strange_caps in the inner for loop for the second function.
print(total_strange_caps(["Five","FiVe","fIVE"]))
print(total_strange_caps(["fIVE"]))
print(count_strange_caps("fIVE"))
These are the types of tests it needs to pass and if anyone could help me with a solution using more rudimentary concepts it would be much appreciated. I can not use numpy if that makes a difference.
You may use str.isupper() and sum() to achieve this. Using these, the function definition of count_strange_caps() should be like:
def count_strange_caps(word):
return sum(my_char.isupper() for my_char in word[1:]) # word[1:] to skip the first character
Sample run:
>>> count_strange_caps('HeLlo')
1
>>> count_strange_caps('HeLLo')
2
>>> count_strange_caps('heLLo')
2
Also, your total_strange_caps() can be simplified using sum() as:
def total_strange_caps(words):
return sum(count_strange_caps(word) for word in words)
Sampl run:
>>> total_strange_caps(['HeLlo', 'HeLLo', 'heLLo'])
5
You can use string comprehension as follows:
def total_strange_caps(words):
total_count = 0
for letter in words[1:]:
if letter.isupper():
total_count += 1
return total_count
print total_strange_caps("AbCdE")
Output:
2

Find Most Frequent Character(s) in any sequence(list, string, tuple) in python 3+ with for loop

question:
10. Most Frequent Character
Write a program that lets the user enter a string and displays the character that appears most frequently in the string.
This is an answer for those who are studying intro to cs with "Starting out with Python" chapter 9 question 10. This question is answered solely with what I have learned in previous chapters of the book. I couldn't find anything similar on this website. This code might be OK for beginners like me, so I want to share it. I know this code looks bad, but it gets job done so... Original code I found on Youtube where it is written in Java, here is a link: https://www.youtube.com/watch?v=dyWYLXKSPus
sorry for my broken English!)
string = "a11aawww1cccertgft1tzzzzzz1ggg111"
mylist_char = []
mylist_count = []
char = None
count = 0
for ch in string:
temp_char = ch
temp_count = 0
for ch1 in string:
if temp_char == ch1:
temp_count += 1
if temp_count > count:
count = temp_count
char = temp_char
mylist_char.append(char)
mylist_count.append(count)
for x in range(len(string)):
for ch in string:
temp_char = ch
temp_count = 0
for ch1 in string:
if temp_char == ch1:
temp_count += 1
if temp_count == count and not(temp_char in mylist_char):
mylist_char.append(temp_char)
mylist_count.append(temp_count)
for x in range(len(mylist_char)):
print("Character", mylist_char[x], "occurred", mylist_count[x], "times")
My issue with your solution is that it looks like Java rewritten in Python -- if you want to use Java, use Java. If you want to use Python, take advantage of what it has to offer. I've written a "simple" solution below that doesn't use any sophisticated Python functions (e.g. if I were really writing it, I'd use a defaultdict and a comprehension)
string = "a11aawww1cccertgft1tzzzzzz1ggg111"
dictionary = {}
for character in list(string):
if character in dictionary:
dictionary[character] += 1
else:
dictionary[character] = 1
results = []
for key, value in dictionary.items():
results.append((value, key))
for value, key in reversed(sorted(results)):
print("Character", key, "occurred", value, "times")
# Most frequent Character
def most_frequent(a_string):
# Create a string of symbols to exclude from counting.
symbols = ' ,.-/?'
characters = []
characters_count = []
# Check each individual character in the string.
for ch in a_string:
# Check that the character is not one of the symbols.
if ch not in symbols:
# If its not and we haven't seen it already,
# append it to the characters list.
if ch not in characters:
characters.append(ch)
# And in the same index in the characters_count list
characters_count.append(1)
else:
# If it is in the characters list, find its index
# and add 1 to the same index at characters_count
position = characters.index(ch)
characters_count[position] = characters_count[position] + 1
# find the largest value in the character_count list, it's index
# and show the character at the same index at the characters list.
print(characters[characters_count.index(max(characters_count))])
def main():
# Get a string from the user.
text = input('Give me some text and I will find you the most frequent character: ')
most_frequent(text)
# Call main
main()

Small issue with Palindrome program

I've been working on this Palindrome program and am really close to completing it.Close to the point that it's driving me a bit crazy haha.
The program is supposed to check each 'phrase' to determine if it is a Palindrome or not and return a lowercase version with white space and punctuation removed if it is in fact a Palindrome. Otherwise, if not, it's supposed to return None.
I'm just having an issue with bringing my test data into the function. I can't seem to think of the correct way of dealing with it. It's probably pretty simple...Any ideas?
Thanks!
import string
def reverse(word):
newword = ''
letterflag = -1
for numoletter in word:
newword += word[letterflag]
letterflag -= 1
return newword
def Palindromize(phrase):
for punct in string.punctuation:
phrase= phrase.replace(punct,'')
phrase = str(phrase.lower())
firstindex = 0
secondindex = len(phrase) - 1
flag = 0
while firstindex != secondindex and firstindex < secondindex:
char1 = phrase[firstindex]
char2 = phrase[secondindex]
if char1 == char2:
flag += 1
else:
break
firstindex += 1
secondindex -= 1
if flag == len(phrase) // (2):
print phrase.strip()
else:
print None
def Main():
data = ['Murder for a jar of red rum',12321, 'nope', 'abcbA', 3443, 'what',
'Never odd or even', 'Rats live on no evil star']
for word in data:
word == word.split()
Palindromize(word)
if __name__ == '__main__':
Main()
Maybe this line is causing the problems.
for word in data:
word == word.split() # This line.
Palindromize(word)
You're testing for equality here, rather than reassigning the variable word which can be done using word = word.split(). word then becomes a list, and you might want to iterate over the list using
for elem in word:
Palindromize(elem)
Also, you seem to be calling the split method on int, which is not possible, try converting them to strings.
Also, why do you convert the phrase to lower case in the for loop, just doing it once will suffice.
At the "core" of your program, you could do much better in Python, using filter for example. Here is a quick demonstration:
>>> phrase = 'Murder for a jar of red rum!'
>>> normalized = filter(str.isalnum, phrase.lower())
>>> normalized
'murderforajarofredrum'
>>> reversed = normalized[-1::-1]
>>> reversed
'murderforajarofredrum'
# Test is it is a palindrome
>>> reversed == normalized
True
Before you go bananas, let's rethink the problem:
You have already pointed out that Palindromes only make sense in strings without punctuation, whitespace, or mixed case. Thus, you need to convert your input string, either by removing the unwanted characters or by picking the allowed ones. For the latter, one can imagine:
import string
clean_data = [ch for ch in original_data if ch in string.ascii_letters]
clean_data = ''.join(clean_data).lower()
Having the cleaned version of the input, one might consider the third parameter in slicing of strings, particularly when it's -1 ;)
Does a comparison like
if clean_data[::-1] == clean_data:
....
ring a bell?
One of the primary errors that i spotted is here:
for word in data:
word==word.split()
Here, there are two mistakes:
1. Double equals make no point here.
2. If you wish to split the contents of each iteration of data, then doing like this doesn't change the original list, since you are modifying the duplicate set called word. To achieve your list, do:
for i in range(data):
data[i]=data[i].split()
This may clear your errors

Count letters in a word in python debug

I am trying to count the number of times 'e' appears in a word.
def has_no_e(word): #counts 'e's in a word
letters = len(word)
count = 0
while letters >= 0:
if word[letters-1] == 'e':
count = count + 1
letters = letters - 1
print count
It seems to work fine except when the word ends with an 'e'. It will count that 'e' twice. I have no idea why. Any help?
I know my code may be sloppy, I'm a beginner! I'm just trying to figure out the logic behind what's happening.
>>> word = 'eeeooooohoooooeee'
>>> word.count('e')
6
Why not this?
As others mention, you can implement the test with a simple word.count('e'). Unless you're doing this as a simple exercise, this is far better than trying to reinvent the wheel.
The problem with your code is that it counts the last character twice because you are testing index -1 at the end, which in Python returns the last character in the string. Fix it by changing while letters >= 0 to while letters > 0.
There are other ways you can tidy up your code (assuming this is an exercise in learning):
Python provides a nice way of iterating over a string using a for loop. This is far more concise and easier to read than using a while loop and maintaining your own counter variable. As you've already seen here, adding complexity results in bugs. Keep it simple.
Most languages provide a += operator, which for integers adds the amount to a variable. It's more concise than count = count + 1.
Use a parameter to define which character you're counting to make it more flexible. Define a default argument for using char='e' in the parameter list when you have an obvious default.
Choose a more appropriate name for the function. The name has_no_e() makes the reader think the code checks to see if the code has no e, but what it actually does is counts the occurrences of e.
Putting this all together we get:
def count_letter(word, char='e'):
count = 0
for c in word:
if c == char:
count += 1
return count
Some tests:
>>> count_letter('tee')
2
>>> count_letter('tee', 't')
1
>>> count_letter('tee', 'f')
0
>>> count_letter('wh' + 'e'*100)
100
Why not simply
def has_no_e(word):
return sum(1 for letter in word if letter=="e")
The problem is that the last value of 'letters' in your iteration is '0', and when this happens you look at:
word[letters-1]
meaning, you look at word[-1], which in python means "last letter of the word".
so you're actually counting correctly, and adding a "bonus" one if the last letter is 'e'.
It will count it twice when ending with an e because you decrement letters one time too many (because you loop while letters >= 0 and you should be looping while letters > 0). When letters reaches zero you check word[letters-1] == word[-1] which corresponds to the last character in the word.
Many of these suggested solutions will work fine.
Know that, in Python, list[-1] will return the last element of the list.
So, in your original code, when you were referencing word[letters-1] in a while loop constrained by letters >= 0, you would count the 'e' on the end of the word twice (once when letters was the length-1 and a second time when letters was 0).
For example, if my word was "Pete" your code trace would look like this (if you printed out word[letter] each loop.
e (for word[3])
t (for word[2])
e (for word[1])
P (for word[0])
e (for word[-1])
Hope this helps to clear things up and to reveal an interesting little quirk about Python.
#marcog makes some excellent points;
in the meantime, you can do simple debugging by inserting print statements -
def has_no_e(word):
letters = len(word)
count = 0
while letters >= 0:
ch = word[letters-1] # what is it looking at?
if ch == 'e':
count = count + 1
print('{0} <-'.format(ch))
else:
print('{0}'.format(ch))
letters = letters - 1
print count
then
has_no_e('tease')
returns
e <-
s
a
e <-
t
e <-
3
from which you can see that
you are going through the string in reverse order
it is correctly recognizing e's
you are 'wrapping around' to the end of the string - hence the extra e if your string ends in one
If what you really want is 'has_no_e' then the following may be more appropriate than counting 'e's and then later checking for zero,
def has_no_e(word):
return 'e' not in word
>>> has_no_e('Adrian')
True
>>> has_no_e('test')
False
>>> has_no_e('NYSE')
True
If you want to check there are no 'E's either,
def has_no_e(word):
return 'e' not in word.lower()
>>> has_no_e('NYSE')
False
You don't have to use a while-loop. Strings can be used for-loops in Python.
def has_no_e(word):
count = 0
for letter in word:
if letter == "e":
count += 1
print count
or something simpler:
def has_no_e(word):
return sum(1 for letter in word if letter=="e")

Categories