This might be an easy one, but I can't spot where I am making the mistake.
I wrote a simple program to read words from a wordfile (don't have to be dictionary words), sum the characters and print them out from lowest to highest. (PART1)
Then, I wrote a small script after this program to filter and search for only those words which have only alphabetic, characters in them. (PART2)
While the first part works correctly, the second part prints nothing. I think the error is at the line 'print ch' where a character of a list converted to string is not being printed. Please advise what could be the error
#!/usr/bin/python
# compares two words and checks if word1 has smaller sum of chars than word2
def cmp_words(word_with_sum1,word_with_sum2):
(word1_sum,__)=word_with_sum1
(word2_sum,__)=word_with_sum2
return word1_sum.__cmp__(word2_sum)
# PART1
word_data=[]
with open('smalllist.txt') as f:
for l in f:
word=l.strip()
word_sum=sum(map(ord,(list(word))))
word_data.append((word_sum,word))
word_data.sort(cmp_words)
for index,each_word_data in enumerate(word_data):
(word_sum,word)=each_word_data
#PART2
# we only display words that contain alphabetic characters and numebrs
valid_characters=[chr(ord('A')+x) for x in range(0,26)] + [x for x in range(0,10)]
# returns true if only alphabetic characters found
def only_alphabetic(word_with_sum):
(__,single_word)=word_with_sum
map(single_word.charAt,range(0,len(single_word)))
for ch in list(single_word):
print ch # problem might be in this loop -- can't see ch
if not ch in valid_characters:
return False
return True
valid_words=filter(only_alphabetic,word_data)
for w in valid_words:
print w
Thanks in advance,
John
The problem is that charAt does not exist in python.
You can use directly: 'for ch in my_word`.
Notes:
you can use the builtin str.isalnum() for you test
valid_characters contains only the uppercase version of the alphabet
Related
I'm trying to have the user input a string of characters with one asterisk. The asterisk indicates a character that can be subbed out for a vowel (a,e,i,o,u) in order to see what substitutions produce valid words.
Essentially, I want to take an input "l*g" and have it return "lag, leg, log, lug" because "lig" is not a valid English word. Below I have invalid words to be represented as "x".
I've gotten it to properly output each possible combination (e.g., including "lig"), but once I try to compare these words with the text file I'm referencing (for the list of valid words), it'll only return 5 lines of x's. I'm guessing it's that I'm improperly importing or reading the file?
Here's the link to the file I'm looking at so you can see the formatting:
https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/words.zip
Using the "en" file ~2.5MB
It's not in a dictionary layout i.e. no corresponding keys/values, just lines (maybe I could use the line number as the index, but I don't know how to do that). What can I change to check the test words to narrow down which are valid words based on the text file?
with open(os.path.expanduser('~/Downloads/words/en')) as f:
words = f.readlines()
inputted_word = input("Enter a word with ' * ' as the missing letter: ")
letters = []
for l in inputted_word:
letters.append(l)
### find the index of the blank
asterisk = inputted_word.index('*') # also used a redundant int(), works fine
### sub in vowels
vowels = ['a','e','i','o','u']
list_of_new_words = []
for v in vowels:
letters[asterisk] = v
new_word = ''.join(letters)
list_of_new_words.append(new_word)
for w in list_of_new_words:
if w in words:
print(new_word)
else:
print('x')
There are probably more efficient ways to do this, but I'm brand new to this. The last two for loops could probably be combined but debugging it was tougher that way.
print(list_of_new_words)
gives
['lag', 'leg', 'lig', 'log', 'lug']
So far, so good.
But this :
for w in list_of_new_words:
if w in words:
print(new_word)
else:
print('x')
Here you print new_word, which is defined in the previous for loop :
for v in vowels:
letters[asterisk] = v
new_word = ''.join(letters) # <----
list_of_new_words.append(new_word)
So after the loop, new_word still has the last value it was assigned to : "lug" (if the script input was l*g).
You probably meant w instead ?
for w in list_of_new_words:
if w in words:
print(w)
else:
print('x')
But it still prints 5 xs ...
So that means that w in words is always False. How is that ?
Looking at words :
print(words[0:10]) # the first 10 will suffice
['A\n', 'a\n', 'aa\n', 'aal\n', 'aalii\n', 'aam\n', 'Aani\n', 'aardvark\n', 'aardwolf\n', 'Aaron\n']
All the words from the dictionary contain a newline character (\n) at the end. I guess you were not aware that it is what readlines do. So I recommend using :
words = f.read().splitlines()
instead.
With these 2 modifications (w and splitlines) :
Enter a word with ' * ' as the missing letter: l*g
lag
leg
x
log
lug
🎉
I've got the text file for the Dracula novel and I want to count the number of lower case letters contained within it. The code I've got executes without a problem but prints out 4297. I'm not sure where I went wrong and hoped you guys could point out my issue here. Thank you!
Indentation isn't necessarily reflective of what I see on my text editor
def main():
book_file = open('dracula.txt', 'r')
lower_case = sum(map(str.islower, book_file))
print (lower_case)
book_file.close()
main()
expected: 621607
results: 4297
When you iterate over a file, you get a line as a value on each iteration. Your current code would be correct if it was running on characters, not lines. When you call islower on a longer string (like a line from a book), it only returns True if all the letters in the string are lowercase.
In your copy of Dracula, there are apparently 4297 lines that contain no capital letters, so that's the result you're getting. The much larger number is the count of characters.
You can fix your code by adding an extra step to read the file as a single large string, the iterating on that.
def main():
with open('dracula.txt', 'r') as book_file:
text = book_file.read()
lower_case = sum(map(str.islower, text))
print(lower_case)
I also modified your code slightly by using a with statement to handle closing the file. This is nice because it will always close the file when it exits the intended block, even if something has gone wrong and an exception has been raised.
You can use regex to count the lower-case and upper-case characters
import re
text = "sdfsdfdTTsdHSksdsklUHD"
lowercase = len(re.findall("[a-z]", text))
uppercase = len(re.findall("[A-Z]", text))
print(lowercase)
print(uppercase)
Outputs:
15
7
And you will need to change how you read the file to
text = open("dracula.txt").read()
with open('dracula.txt', 'r') as book_file:
count=0
for line in book_file: # for each line in the file you will count the number # of lower case letters and add it to the variable "count"
count+=sum(map(str.islower, line))
print("number of lower case letters = " +int(count))
Here is a version that uses a list comprehension rather than map()
It iterates over the characters in the text and creates a list of all lowercase characters. The length of this list is the number of lowercase letters in the text.
with open('dracula.txt') as f:
text = f.read()
lowers = [char for char in text if char.islower()]
print(len(lowers))
I have this code:
print('abcdefg')
input('Arrange word from following letters: ')
I want to return True if the input consists of letters from the printed string but it doesn't have to have all of printed letters.
That's a perfect use case for sets especially for set.issubset:
print('abcdefg')
given_input = input('Arrange word from following letters: ')
if set(given_input).issubset('abcdefg'):
print('True')
else:
print('False')
or directly print (or return) the result of the issubset operation without if and else:
print(set(given_input).issubset('abcdefg'))
This sounds a little like homework...
Basically you would need to do this: Store both strings in variables. e.g. valid_chars and s.
Then loop through s one character at a time. For each character check if it is in valid_chars (using the in operator). If any character is not found in valid_chars then you should return False. If you get to the end of the loop, return True.
If the valid_chars string is very long it would be better to first put them into a set but for short strings this is not necessary.
I am trying to import the alphabet but split it so that each character is in one array but not one string. splitting it works but when I try to use it to find how many characters are in an inputted word I get the error 'TypeError: Can't convert 'list' object to str implicitly'. Does anyone know how I would go around solving this? Any help appreciated. The code is below.
import string
alphabet = string.ascii_letters
print (alphabet)
splitalphabet = list(alphabet)
print (splitalphabet)
x = 1
j = year3wordlist[x].find(splitalphabet)
k = year3studentwordlist[x].find(splitalphabet)
print (j)
EDIT: Sorry, my explanation is kinda bad, I was in a rush. What I am wanting to do is count each individual letter of a word because I am coding a spelling bee program. For example, if the correct word is 'because', and the user who is taking part in the spelling bee has entered 'becuase', I want the program to count the characters and location of the characters of the correct word AND the user's inputted word and compare them to give the student a mark - possibly by using some kind of point system. The problem I have is that I can't simply say if it is right or wrong, I have to award 1 mark if the word is close to being right, which is what I am trying to do. What I have tried to do in the code above is split the alphabet and then use this to try and find which characters have been used in the inputted word (the one in year3studentwordlist) versus the correct word (year3wordlist).
There is a much simpler solution if you use the in keyword. You don't even need to split the alphabet in order to check if a given character is in it:
year3wordlist = ['asdf123', 'dsfgsdfg435']
total_sum = 0
for word in year3wordlist:
word_sum = 0
for char in word:
if char in string.ascii_letters:
word_sum += 1
total_sum += word_sum
# Length of characters in the ascii letters alphabet:
# total_sum == 12
# Length of all characters in all words:
# sum([len(w) for w in year3wordlist]) == 18
EDIT:
Since the OP comments he is trying to create a spelling bee contest, let me try to answer more specifically. The distance between a correctly spelled word and a similar string can be measured in many different ways. One of the most common ways is called 'edit distance' or 'Levenshtein distance'. This represents the number of insertions, deletions or substitutions that would be needed to rewrite the input string into the 'correct' one.
You can find that distance implemented in the Python-Levenshtein package. You can install it via pip:
$ sudo pip install python-Levenshtein
And then use it like this:
from __future__ import division
import Levenshtein
correct = 'because'
student = 'becuase'
distance = Levenshtein.distance(correct, student) # distance == 2
mark = ( 1 - distance / len(correct)) * 10 # mark == 7.14
The last line is just a suggestion on how you could derive a grade from the distance between the student's input and the correct answer.
I think what you need is join:
>>> "".join(splitalphabet)
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
join is a class method of str, you can do
''.join(splitalphabet)
or
str.join('', splitalphabet)
To convert the list splitalphabet to a string, so you can use it with the find() function you can use separator.join(iterable):
"".join(splitalphabet)
Using it in your code:
j = year3wordlist[x].find("".join(splitalphabet))
I don't know why half the answers are telling you how to put the split alphabet back together...
To count the number of characters in a word that appear in the splitalphabet, do it the functional way:
count = len([c for c in word if c in splitalphabet])
import string
# making letters a set makes "ch in letters" very fast
letters = set(string.ascii_letters)
def letters_in_word(word):
return sum(ch in letters for ch in word)
Edit: it sounds like you should look at Levenshtein edit distance:
from Levenshtein import distance
distance("because", "becuase") # => 2
While join creates the string from the split, you would not have to do that as you can issue the find on the original string (alphabet). However, I do not think is what you are trying to do. Note that the find that you are trying attempts to find the splitalphabet (actually alphabet) within year3wordlist[x] which will always fail (-1 result)
If what you are trying to do is to get the indices of all the letters of the word list within the alphabet, then you would need to handle it as
for each letter in the word of the word list, determine the index within alphabet.
j = []
for c in word:
j.append(alphabet.find(c))
print j
On the other hand if you are attempting to find the index of each character within the alphabet within the word, then you need to loop over splitalphabet to get an individual character to find within the word. That is
l = []
for c within splitalphabet:
j = word.find(c)
if j != -1:
l.append((c, j))
print l
This gives the list of tuples showing those characters found and the index.
I just saw that you talk about counting the number of letters. I am not sure what you mean by this as len(word) gives the number of characters in each word while len(set(word)) gives the number of unique characters. On the other hand, are you saying that your word might have non-ascii characters in it and you want to count the number of ascii characters in that word? I think that you need to be more specific in what you want to determine.
If what you are doing is attempting to determine if the characters are all alphabetic, then all you need to do is use the isalpha() method on the word. You can either say word.isalpha() and get True or False or check each character of word to be isalpha()
I'm writing a function that will take a word as a parameter and will look at each character and if there is a number in the word, it will return the word
This is my string that I will iterate through
'Let us look at pg11.'
and I want to look at each character in each word and if there is a digit in the word, I want to return the word just the way it is.
import string
def containsDigit(word):
for ch in word:
if ch == string.digits
return word
if any(ch.isdigit() for ch in word):
print word, 'contains a digit'
To make your code work use the in keyword (which will check if an item is in a sequence), add a colon after your if statement, and indent your return statement.
import string
def containsDigit(word):
for ch in word:
if ch in string.digits:
return word
Why not use Regex?
>>> import re
>>> word = "super1"
>>> if re.search("\d", word):
... print("y")
...
y
>>>
So, in your function, just do:
import re
def containsDigit(word):
if re.search("\d", word):
return word
print(containsDigit("super1"))
output:
'super1'
You are missing a colon:
for ch in word:
if ch.isdigit(): #<-- you are missing this colon
print "%s contains a digit" % word
return word
Often when you want to know if "something" contains "something_else" sets may be usefull.
digits = set('0123456789')
def containsDigit(word):
if set(word) & digits:
return word
print containsDigit('hello')
If you desperately want to use the string module. Here is the code:
import string
def search(raw_string):
for raw_array in string.digits:
for listed_digits in raw_array:
if listed_digits in raw_string:
return True
return False
If I run it in the shell here I get the wanted resuts. (True if contains. False if not)
>>> search("Give me 2 eggs")
True
>>> search("Sorry, I don't have any eggs.")
False
Code Break Down
This is how the code works
The string.digits is a string. If we loop through that string we get a list of the parent string broke down into pieces. Then we get a list containing every character in a string with'n a list. So, we have every single characters in the string! Now we loop over it again! Producing strings which we can see if the string given contains a digit because every single line of code inside the loop takes a step, changing the string we looped through. So, that means ever single line in the loop gets executed every time the variable changes. So, when we get to; for example 5. It agains execute the code but the variable in the loop is now changed to 5. It runs it agin and again and again until it finally got to the end of the string.