Can't convert 'list'object to str implicitly Python - python

I am trying to import the alphabet but split it so that each character is in one array but not one string. splitting it works but when I try to use it to find how many characters are in an inputted word I get the error 'TypeError: Can't convert 'list' object to str implicitly'. Does anyone know how I would go around solving this? Any help appreciated. The code is below.
import string
alphabet = string.ascii_letters
print (alphabet)
splitalphabet = list(alphabet)
print (splitalphabet)
x = 1
j = year3wordlist[x].find(splitalphabet)
k = year3studentwordlist[x].find(splitalphabet)
print (j)
EDIT: Sorry, my explanation is kinda bad, I was in a rush. What I am wanting to do is count each individual letter of a word because I am coding a spelling bee program. For example, if the correct word is 'because', and the user who is taking part in the spelling bee has entered 'becuase', I want the program to count the characters and location of the characters of the correct word AND the user's inputted word and compare them to give the student a mark - possibly by using some kind of point system. The problem I have is that I can't simply say if it is right or wrong, I have to award 1 mark if the word is close to being right, which is what I am trying to do. What I have tried to do in the code above is split the alphabet and then use this to try and find which characters have been used in the inputted word (the one in year3studentwordlist) versus the correct word (year3wordlist).

There is a much simpler solution if you use the in keyword. You don't even need to split the alphabet in order to check if a given character is in it:
year3wordlist = ['asdf123', 'dsfgsdfg435']
total_sum = 0
for word in year3wordlist:
word_sum = 0
for char in word:
if char in string.ascii_letters:
word_sum += 1
total_sum += word_sum
# Length of characters in the ascii letters alphabet:
# total_sum == 12
# Length of all characters in all words:
# sum([len(w) for w in year3wordlist]) == 18
EDIT:
Since the OP comments he is trying to create a spelling bee contest, let me try to answer more specifically. The distance between a correctly spelled word and a similar string can be measured in many different ways. One of the most common ways is called 'edit distance' or 'Levenshtein distance'. This represents the number of insertions, deletions or substitutions that would be needed to rewrite the input string into the 'correct' one.
You can find that distance implemented in the Python-Levenshtein package. You can install it via pip:
$ sudo pip install python-Levenshtein
And then use it like this:
from __future__ import division
import Levenshtein
correct = 'because'
student = 'becuase'
distance = Levenshtein.distance(correct, student) # distance == 2
mark = ( 1 - distance / len(correct)) * 10 # mark == 7.14
The last line is just a suggestion on how you could derive a grade from the distance between the student's input and the correct answer.

I think what you need is join:
>>> "".join(splitalphabet)
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

join is a class method of str, you can do
''.join(splitalphabet)
or
str.join('', splitalphabet)

To convert the list splitalphabet to a string, so you can use it with the find() function you can use separator.join(iterable):
"".join(splitalphabet)
Using it in your code:
j = year3wordlist[x].find("".join(splitalphabet))

I don't know why half the answers are telling you how to put the split alphabet back together...
To count the number of characters in a word that appear in the splitalphabet, do it the functional way:
count = len([c for c in word if c in splitalphabet])

import string
# making letters a set makes "ch in letters" very fast
letters = set(string.ascii_letters)
def letters_in_word(word):
return sum(ch in letters for ch in word)
Edit: it sounds like you should look at Levenshtein edit distance:
from Levenshtein import distance
distance("because", "becuase") # => 2

While join creates the string from the split, you would not have to do that as you can issue the find on the original string (alphabet). However, I do not think is what you are trying to do. Note that the find that you are trying attempts to find the splitalphabet (actually alphabet) within year3wordlist[x] which will always fail (-1 result)
If what you are trying to do is to get the indices of all the letters of the word list within the alphabet, then you would need to handle it as
for each letter in the word of the word list, determine the index within alphabet.
j = []
for c in word:
j.append(alphabet.find(c))
print j
On the other hand if you are attempting to find the index of each character within the alphabet within the word, then you need to loop over splitalphabet to get an individual character to find within the word. That is
l = []
for c within splitalphabet:
j = word.find(c)
if j != -1:
l.append((c, j))
print l
This gives the list of tuples showing those characters found and the index.
I just saw that you talk about counting the number of letters. I am not sure what you mean by this as len(word) gives the number of characters in each word while len(set(word)) gives the number of unique characters. On the other hand, are you saying that your word might have non-ascii characters in it and you want to count the number of ascii characters in that word? I think that you need to be more specific in what you want to determine.
If what you are doing is attempting to determine if the characters are all alphabetic, then all you need to do is use the isalpha() method on the word. You can either say word.isalpha() and get True or False or check each character of word to be isalpha()

Related

average character in a given word

I'm new to python and I am trying to write a program that will calculate the average character in a given word.
This is the assignment: Write a program that accepts a word (a sequence of letters) and return the average character.
And the program should flow in this order:
Turn the string into a list of characters.
Use ord() function to convert from character to number.
Calculate the average value of the the number.
Use chr() to convert from number back to character.
I think I am supposed to take an input and then find the integer values using ord() function, and then find the average of those integers.
This is what I have written so far:
word = input('Enter a word here:')
list(word)
print(list(word))
for char in range(len(word)):
print(ord(word[char]))
The program can turn the word into a list of characters and also display those integer values. But now, I'm lost and not sure where to go to find the average of these numbers. Can someone point me in the right direction like maybe a website that explains how to do it or something like that.
What I understood is something like this:
word = input('Enter a word here:')
sum_word = 0
for character in word:
sum_word += ord(character)
average = chr(sum_word//len(word))
print(average)
Just got a doubt about average, but let me know if it isn't right and I'll change it
The Pythonic way is to use either list comprehension or mapping.
word = 'hello, world'
With mapping, you tell function map to apply ord to each character:
chr(sum(map(ord, word)) // len(word))
#'`'
With list comprehension, you apply the function yourself:
chr(sum(ord(x) for x in word) // len(word))
#'`'
The mapping solution is about twice as fast as the comprehension.
I think this is what you want:
word = 'hello'
chars = [ord(c) for c in word]
avg = sum(chars)//len(chars)
avg_char = chr(avg)
For 'hello', the average character is 'j'.

Printing a String in Reverse After Extracting [duplicate]

This question already has answers here:
How do I reverse a string in Python?
(19 answers)
Closed 2 years ago.
I am trying to create a program in which the user inputs a statement containing two '!' surrounding a string. (example: hello all! this is a test! bye.) I am to grab the string within the two exclamation points, and print it in reverse letter by letter. I have been able to find the start and endpoints that contain the statement, however I am having difficulties creating an index that would cycle through my variable userstring in reverse and print.
test = input('Enter a string with two "!" surrounding portion of the string:')
expoint = test.find('!')
#print (expoint)
twoexpoint = test.find('!', expoint+1)
#print (twoexpoint)
userstring = test[expoint+1 : twoexpoint]
#print(userstring)
number = 0
while number < len(userstring) :
letter = [twoexpoint - 1]
print (letter)
number += 1
twoexpoint - 1 is the last index of the string you need relative to the input string. So what you need is to start from that index and reduce. In your while loop:
letter = test[twoexpoint- number - 1]
Each iteration you increase number which will reduce the index and reverse the string.
But this way you don't actually use the userstring you already found (except for the length...). Instead of caring for indexes, just reverse the userstring:
for letter in userstring[::-1]:
print(letter)
Explanation we use regex to find the pattern
then we loop for every occurance and we replace the occurance with the reversed string. We can reverse string in python with mystring[::-1] (works for lists too)
Python re documentation Very usefull and you will need it all the time down the coder road :). happy coding!
Very usefull article Check it out!
import re # I recommend using regex
def reverse_string(a):
matches = re.findall(r'\!(.*?)\!', a)
for match in matches:
print("Match found", match)
print("Match reversed", match[::-1])
for i in match[::-1]:
print(i)
In [3]: reverse_string('test test !test! !123asd!')
Match found test
Match reversed tset
t
s
e
t
Match found 123asd
Match reversed dsa321
d
s
a
3
2
1
You're overcomplicating it. Don't bother with an index, simply use reversed() on userstring to cycle through the characters themselves:
userstring = test[expoint+1:twoexpoint]
for letter in reversed(userstring):
print(letter)
Or use a reversed slice:
userstring = test[twoexpoint-1:expoint:-1]
for letter in userstring:
print(letter)

Python - Bug in code

This might be an easy one, but I can't spot where I am making the mistake.
I wrote a simple program to read words from a wordfile (don't have to be dictionary words), sum the characters and print them out from lowest to highest. (PART1)
Then, I wrote a small script after this program to filter and search for only those words which have only alphabetic, characters in them. (PART2)
While the first part works correctly, the second part prints nothing. I think the error is at the line 'print ch' where a character of a list converted to string is not being printed. Please advise what could be the error
#!/usr/bin/python
# compares two words and checks if word1 has smaller sum of chars than word2
def cmp_words(word_with_sum1,word_with_sum2):
(word1_sum,__)=word_with_sum1
(word2_sum,__)=word_with_sum2
return word1_sum.__cmp__(word2_sum)
# PART1
word_data=[]
with open('smalllist.txt') as f:
for l in f:
word=l.strip()
word_sum=sum(map(ord,(list(word))))
word_data.append((word_sum,word))
word_data.sort(cmp_words)
for index,each_word_data in enumerate(word_data):
(word_sum,word)=each_word_data
#PART2
# we only display words that contain alphabetic characters and numebrs
valid_characters=[chr(ord('A')+x) for x in range(0,26)] + [x for x in range(0,10)]
# returns true if only alphabetic characters found
def only_alphabetic(word_with_sum):
(__,single_word)=word_with_sum
map(single_word.charAt,range(0,len(single_word)))
for ch in list(single_word):
print ch # problem might be in this loop -- can't see ch
if not ch in valid_characters:
return False
return True
valid_words=filter(only_alphabetic,word_data)
for w in valid_words:
print w
Thanks in advance,
John
The problem is that charAt does not exist in python.
You can use directly: 'for ch in my_word`.
Notes:
you can use the builtin str.isalnum() for you test
valid_characters contains only the uppercase version of the alphabet

How do I calculate the number of times a word occurs in a sentence?

So I've been learning Python for some months now and was wondering how I would go about writing a function that will count the number of times a word occurs in a sentence. I would appreciate if someone could please give me a step-by-step method for doing this.
Quick answer:
def count_occurrences(word, sentence):
return sentence.lower().split().count(word)
'some string.split() will split the string on whitespace (spaces, tabs and linefeeds) into a list of word-ish things. Then ['some', 'string'].count(item) returns the number of times item occurs in the list.
That doesn't handle removing punctuation. You could do that using string.maketrans and str.translate.
# Make collection of chars to keep (don't translate them)
import string
keep = string.lowercase + string.digits + string.whitespace
table = string.maketrans(keep, keep)
delete = ''.join(set(string.printable) - set(keep))
def count_occurrences(word, sentence):
return sentence.lower().translate(table, delete).split().count(word)
The key here is that we've constructed the string delete so that it contains all the ascii characters except letters, numbers and spaces. Then str.translate in this case takes a translation table that doesn't change the string, but also a string of chars to strip out.
wilberforce has the quick, correct answer, and I'll give the long winded 'how to get to that conclusion' answer.
First, here are some tools to get you started, and some questions you need to ask yourself.
You need to read the section on Sequence Types, in the python docs, because it is your best friend for solving this problem. Seriously, read it. Once you have read that, you should have some ideas. For example you can take a long string and break it up using the split() function. To be explicit:
mystring = "This sentence is a simple sentence."
result = mystring.split()
print result
print "The total number of words is: " + str(len(result))
print "The word 'sentence' occurs: " + str(result.count("sentence"))
Takes the input string and splits it on any whitespace, and will give you:
["This", "sentence", "is", "a", "simple", "sentence."]
The total number of words is 6
The word 'sentence' occurs: 1
Now note here that you do have the period still at the end of the second 'sentence'. This is a problem because 'sentence' is not the same as 'sentence.'. If you are going to go over your list and count words, you need to make sure that the strings are identical. You may need to find and remove some punctuation.
A naieve approach to this might be:
no_period_string = mystring.replace(".", " ")
print no_period_string
To get me a period-less sentence:
"This sentence is a simple sentence"
You also need to decide if your input going to be just a single sentence, or maybe a paragraph of text. If you have many sentences in your input, you might want to find a way to break them up into individual sentences, and find the periods (or question marks, or exclamation marks, or other punctuation that ends a sentence). Once you find out where in the string the 'sentence terminator' is you could maybe split up the string at that point, or something like that.
You should give this a try yourself - hopefully I've peppered in enough hints to get you to look at some specific functions in the documentation.
Simplest way:
def count_occurrences(word, sentence):
return sentence.count(word)
text=input("Enter your sentence:")
print("'the' appears", text.count("the"),"times")
simplest way to do it
Problem with using count() method is that it not always gives the correct number of occurrence when there is overlapping, for example
print('banana'.count('ana'))
output
1
but 'ana' occurs twice in 'banana'
To solve this issue, i used
def total_occurrence(string,word):
count = 0
tempsting = string
while(word in tempsting):
count +=1
tempsting = tempsting[tempsting.index(word)+1:]
return count
You can do it like this:
def countWord(word):
numWord = 0
for i in range(1, len(word)-1):
if word[i-1:i+3] == 'word':
numWord += 1
print 'Number of times "word" occurs is:', numWord
then calling the string:
countWord('wordetcetcetcetcetcetcetcword')
will return: Number of times "word" occurs is: 2
def check_Search_WordCount(mySearchStr, mySentence):
len_mySentence = len(mySentence)
len_Sentence_without_Find_Word = len(mySentence.replace(mySearchStr,""))
len_Remaining_Sentence = len_mySentence - len_Sentence_without_Find_Word
count = len_Remaining_Sentence/len(mySearchStr)
return (int(count))
I assume that you just know about python string and for loop.
def count_occurences(s,word):
count = 0
for i in range(len(s)):
if s[i:i+len(word)] == word:
count += 1
return count
mystring = "This sentence is a simple sentence."
myword = "sentence"
print(count_occurences(mystring,myword))
explanation:
s[i:i+len(word)]: slicing the string s to extract a word having the same length with the word (argument)
count += 1 : increase the counter whenever matched.

Find max length word from arbitrary letters

I have 10 arbitrary letters and need to check the max length match from words file
I started to learn RE just some time ago, and can't seem to find suitable pattern
first idea that came was using set: [10 chars] but it also repeats included chars and I don't know how to avoid that
I stared to learn Python recently but before RE and maybe RE is not needed and this can be solved without it
using "for this in that:" iterator seems inappropriate, but maybe itertools can do it easily (with which I'm not familiar)
I guess solution is known even to novice programmers/scripters, but not to me
Thanks
I'm guessing this is something like finding possible words given a set of Scrabble tiles, so that a character can be repeated only as many times as it is repeated in the original list.
The trick is to efficiently test each character of each word in your word file against a set containing your source letters. For each character, if found in the test set, remove it from the test set and proceed; otherwise, the word is not a match, and go on to the next word.
Python has a nice function all for testing a set of conditions based on elements in a sequence. all has the added feature that it will "short-circuit", that is, as soon as one item fails the condition, then no more tests are done. So if your first letter of your candidate word is 'z', and there is no 'z' in your source letters, then there is no point in testing any more letters in the candidate word.
My first shot at writing this was simply:
matches = []
for word in wordlist:
testset = set(letters)
if all(c in testset for c in word):
matches.append(word)
Unfortunately, the bug here is that if the source letters contained a single 'm', a word with several 'm's would erroneously match, since each 'm' would separately match the given 'm' in the source testset. So I needed to remove each letter as it was matched.
I took advantage of the fact that set.remove(item) returns None, which Python treats as a Boolean False, and expanded my generator expression used in calling all. For each c in word, if it is found in testset, I want to additionally remove it from testset, something like (pseudo-code, not valid Python):
all(c in testset and "remove c from testset" for c in word)
Since set.remove returns a None, I can replace the quoted bit above with "not testset.remove(c)", and now I have a valid Python expression:
all(c in testset and not testset.remove(c) for c in word)
Now we just need to wrap that in a loop that checks each word in the list (be sure to build a fresh testset before checking each word, since our all test has now become a destructive test):
for word in wordlist:
testset = set(letters)
if all(c in testset and not testset.remove(c) for c in word):
matches.append(word)
The final step is to sort the matches by descending length. We can pass a key function to sort. The builtin len would be good, but that would sort by ascending length. To change it to a descending sort, we use a lambda to give us not len, but -1 * len:
matches.sort(key=lambda wd: -len(wd))
Now you can just print out the longest word, at matches[0], or iterate over all matches and print them out.
(I was surprised that this brute force approach runs so well. I used the 2of12inf.txt word list, containing over 80,000 words, and for a list of 10 characters, I get back the list of matches in about 0.8 seconds on my little 1.99GHz laptop.)
I think this code will do what you are looking for:
>>> words = open('file.txt')
>>> max(len(word) for word in set(words.split()))
If you require more sophisticated tokenising, for example if you're not using Latin text, would should use NLTK:
>>> import nltk
>>> words = open('file.txt')
>>> max(len(word) for word in set(nltk.word_tokenize(words)))
I assume you are trying to find out what is the longest word that can be made from your 10 arbitrary letters.
You can keep your 10 arbitrary letters in a dict along with the frequency they occur.
e.g., your 4 (using 4 instead of 10 for simplicity) arbitrary letters are: e, w, l, l. This would be in a dict as:
{'e':1, 'w':1, 'l':2}
Then for each word in the text file, see if all of the letters for that word can be found in your dict of arbitrary letters. If so, then that is one of your candidate words.
So:
we
wall
well
all of the letters in well would be found in your dict of arbitrary letters so save it and its length for comparison against other words.

Categories