How to get word by word from a string? - python

For the purpose of sentiment analysis I want to analyse each word in a sentence. I want to store each word in a variable and then process it. I use the following code and i got an error message saying :
Attribute Error: 'list' object has no attribute 'split'
line = ' hello this is a test sentence'
while line:
line=line.split(' ')
print '\n'
What is the solution for above problem?

Here is what happens in your code:
line = "..." - line is a string
while line: - start looping, as non-empty string evaluates to True
line = line.split(" ") - split line by spaces, line is now a list
print '\n' - print a newline character
while line: - non-empty list evaluates True, so loop again
line = line.split(" ") - line is a list, hence AttributeError
I am not sure why you are using a while loop here, you probably want:
for word in line.split(" "):
print word
# ... process word

the issue here is actually when the loop hits its second iteration line is no longer a string. and so the logic says is object line not None if yes, run split on it. However at this point line is now a list.
what you really want is
line = 'hello this is a sentance'
words = line.split()
for w in words:
print w

Here are two ways:
string.split(' ') ?
>>> a="1.MATCHES$$TEXT$$STRING"
>>> a.split("$$TEXT$$")
['1.MATCHES', 'STRING']
>>> a="2.MATCHES $$TEXT$$ STRING"
>>> a.split("$$TEXT$$")
['2.MATCHES ', ' STRING']
and:
>>> [x.strip() for x in "2.MATCHES $$TEXT$$ STRING".split("$$TEXT$$")]
['2.MATCHES', 'STRING']
so whats nice is you don't have to loop, you have have to assign it and use it.
a="my;string;here"
a = a.split(";")
for w in a:
print w

Just split the string once: wordList = line.split()
And use the wordList to iterate over:
for x in wordList:
doWork...
p.s.: I don't quite get why you would print a newline character in each iteration of the loop.

Related

Reading whitespaces inside of a list of strings

I'm having a problem trying to count whitespaces in a list in python.
Here's my code
Data = ''
index = 0
num_words = 0
# Open a file for reading.
infile = open('article.txt', 'r')
# Read the contents of the file into a list.
data = infile.readlines()
# Strip the \n from each element.
while index < len(data):
data[index] = data[index].rstrip('\n')
index += 1
for ch in data:
if ch.isspace():
num_words += 1
# Close the file.
infile.close()
# Print the contents of the list.
print(num_words)
The contents of the article.txt is just a list of sentences so the list is just a list of strings such as:
data = ['this is sentence one.', 'this is sentence two.' , 'this is sentence three.' , 'this is sentence four.' , 'this is sentence five.' , 'this is sentence six.' ]
I think I know what the problem is because I did:
print(ch)
Which results in 'false' getting printed 6 times. I'm thinking this is because the for loop is searching to see if the whole string is a space rather than checking for spaces inside of the string.
I know I could just do:
data = infile.read()
But I need each line in a list. Is there anything I can change so the for loop searches for spaces in each string in the list or am I out of luck?
Python has a handy method for that on strings, called str.split. When passed no arguments, it will split on whitespace. If you count the items in the resulting list, you will have the number of words.
Handles multiple spaces:
>>> line = "this is some string."
>>> len(line.split())
4
Handles empty lines:
>>> line = " "
>>> len(line.split())
0
Handles extra space before and after:
>>> line = " space before and after. "
>>> len(line.split())
4
Here is some sample code:
lines = 0
words = 0
with open('yourfile', 'rt') as yourfile:
for line in yourfile:
lines += 1
words += len(line.split())

Python Pig Latin convertor and line/word counter

I have to create a python file that prompts the user for a file path to a text document and then convert it into pig Latin and do a line/word count.
• A function to generate the pig Latin version of a single word
• A function to print line and word counts to standard output
• Correct pig Latin output with identical formatting as the original text file
• Correct line and word counts
I can't figure out why the pig latin is coming out wrong. My teacher said that I need another string.strip("\n") because it is making the words convert wrong but I have no idea where I am supposed to put that.
Also my line counter is broken. It counts but it always says 222 lines.
How can I make it just count the lines with words ?
#Step 1: User enters text file.
#Step 2: Pig Latin function rewrites file and saves as .txt.
#Step 3: Tracks how many lines and words it rewrites.
vowels = ("A", "a", "E", "e", "I", "i", "O", "o", "U", "u")
# Functions
def pig_word(string):
line = string.strip("\n")
for word in string.split(" "):
first_letter = word[0]
if first_letter in vowels:
return word + "way"
else:
return word[1:] + first_letter + "ay"
def pig_sentence(sentence):
word_list = sentence.split(" ")
convert = " "
for word in word_list:
convert = convert + pig_word(word)
convert = convert + " "
return convert
def line_counter(s):
line_count = 0
for line in s:
line_count += 1
return line_count
def word_counter(line):
word_count = 0
list_of_words = line.split()
word_count += len(list_of_words)
return word_count
# File path conversion
text = raw_input("Enter the path of a text file: ")
file_path = open(text, "r")
out_file = open("pig_output.txt", "w")
s = file_path.read()
pig = pig_sentence(s)
out_file.write(pig+" ")
out_file.write("\n")
linecount = line_counter(s)
wordcount = word_counter(s)
file_path.close()
out_file.close()
# Results
print "\n\n\n\nTranslation finished and written to pig_output.txt"
print "A total of {} lines were translated successfully.".format(linecount)
print "A total of {} words were translated successfully.".format(wordcount)
print "\n\n\n\n"
your first problem is here:
def pig_word(string):
line = string.strip("\n") #!!!! line is NEVER USED !!!
for word in string.split(" "): #you want *line*.split here
the second issue is caused by iterating over a string, it goes through every character instead of every line like a file does:
>>> for i in "abcd":
... print(i)
a
b
c
d
so in your line_counter instead of doing:
for line in s:
line_count += 1
you just need to do:
for line in s.split("\n"):
line_count += 1
The first reason why your not getting the output you want is because in your pig_word(string) function, you return the first word in the string when you put that return inside of your for loop. Also, your teacher was talking about taking all the lines into the function, and iterating over each line via str.split('\n'). \n represents the "new-line" character.
You can try something like this to correct that.
def pig_sentence(string):
lines = []
for line in string.split('\n'):
new_string = ""
for word in line.split(" "):
first_letter = word[0]
if first_letter in vowels:
new_string += word + "way"
else:
new_string += word[1:] + first_letter + "ay"
lines.append(new_string)
return lines
The Changes Made
Initialized a new list lines that we can append to throughout the loops.
Iterate over each line in the passed in string.
For each line, create a new string new_string.
Use your code, but instead of returning we add it to new_string, then append new_string to our list of new lines, lines.
Note that this removes the need for two functions. Also note that I renamed pig_word to pig_sentence.
The second error is in your function line_counter(s). You are iterating over each character rather than each line. Here add that str.split('\n') again to get the output you want by splitting the string into a list of lines then iterating over the list.
Here is the modified function:
def line_counter(s):
line_count = 0
for _ in s.split('\n'):
line_count += 1
return line_count
(Since there is nothing erroneous with your file i.o., I'm just going to use a string literal here for the testing.)
Test
paragraph = """\
Hello world
how are you
pig latin\
"""
lines = line_counter(paragraph)
words = sum([word_counter(line) for line in paragraph.split('\n')])
out = pig_sentence(paragraph)
print(lines, words, out)
The output is what we expect!
3 7 ['elloHay', 'elloHayorldway', 'owhay', 'owhayareway', 'owhayarewayouyay', 'igpay', 'igpayatinlay']
You are removing only the space, you need to remove all punctuation as well as end of line characters. Replace
split(" ")
with
split()
Your sentence list is the equivalent of
sentence = 'Hello there.\nMy name is Roxy.\nHow are you?
If you print after split(" ") and split() you will see the difference and you will get the results that you expect.
Additionally, you will get incorrect results because you will have there translated to heretay. you need to loop around so that it comes out as erethay
That is move every consonent to the end before adding 'ay' so that the new word starts with a vowel.

Python Reading List From File And Printing Back In Idle Error

When I try to read from a file, I'm getting an annoying error, I think it's got something to do with the list format of the variables but I'm not sure.
If anyone can help with with this issue that would be great.
I think it's also got something to do with \n being printed at the end of the list.
This is my code:
Option 1
def one():
print ("")
print ("You have chosen to read the file!")
print ("")
file = open("sentence.txt" , "r")
words = file.readlines(1)
nums = file.readlines(2)
#Remove "\n"
#This bit doesn't work, I'm not sure how to remove "\n"
#These were me trying to get rid of the "\n"
#map(str.strip, words)
#words = words.strip('\n')
print(words)
print (nums)
print ("")
#Reconstruct sentence here
What file.readlines(1) returns is a single element list, and the element is a string. What you want to do get the string itself and replace the '\n', '[', ']', etc.
Try
words[0].strip("\n][").replace("'", "").split(",")
The function file.readlines() does not take an argument for the number of lines, it reads all the lines of the file at once. (for the record, if you do pass an argument like file.readlines(n), the argument n is a "hint" about the number of bytes to read...more info here at Python's function readlines(n) behavior)
def one():
print ("")
print ("You have chosen to read the file!")
print ("")
file = open("sentence.txt" , "r")
lines = file.readlines() # read all the lines into a list
words = lines[0]
nums = lines[1]
#Remove "\n"
#This bit doesn't work, I'm not sure how to remove "\n"
#These were me trying to get rid of the "\n"
#map(str.strip, words)
words = words.strip("\n][").replace("'", "").split(",")
nums = nums.strip("\n][").replace("'", "").split(",")
nums = list(map(int, numbers)) ## assuming you want to convert the string to integers, use this
print(words)
print (nums)
print ("")
#Reconstruct sentence here
remade_sentence = ' '.join([words[i-1] for i in nums]) ## changed empty string to space to add spaces
print (remade_sentence)
file.close() ## also, make sure to close your file!
EDIT: I have updated the code to deal with nums being a list of strings.
EDIT 2: updating code to reflect #notevenwrong's method of removing brackets and quotes
EDIT 3: Resimplifying...when I open my input file in a text editor, I literally see:
['the', 'dog', 'cat']
[1, 1, 1, 2, 1, 3]
If that is not the right input, then this code may not work.
You are getting exception at [words[i-1], Because i is str and you can perform -: 'str' and 'int'.
>>> i = '1'
>>> i -1
Traceback (most recent call last):
File "<pyshell#101>", line 1, in <module>
i -1
TypeError: unsupported operand type(s) for -: 'str' and 'int'
A quick fix should be-
remade_sentence = ''.join([words[int(i)-1] for i in nums])

Python Coding for Reversing Sentences

this is the code i am using so far.
translated = []
line = input('Line: ')
while line != '':
for word in line.split():
letters = list(word)
letters.reverse()
word = ''.join(letters)
translated.append(word)
if line == '':
print(' '.join(translated))
elif line:
line = input('Line: ')
it is suppose to read lines of input from the user. An empty line is suppose to signify the end of any inputs. Then the program is suppose to read all the lines, then reproduce them in their original order with each word reversed in place.
For example if i was to input: Hello how are you
Its output shout be: olleH woh era uoy
Currently it is asking for the inputs, then stopping when there is an empty line, but not producing anything. No reversed words no nothing.
Can anyone tell me what i am doing wrong, and help me out with my code??
The print statement needs to be outside the loop. Your loop condition ensures that line is never '' inside the loop, so the if condition is never satisfied.
For the same reason, you need to rethink the elif.
as #Flav points out to read all lines before an empty line to end the input. I have edited the solution as below:
lines = [] # to store all line inputs
while True:
line = raw_input('Line: ') # input if using python3 or raw_input if python2.6/7
if line == '':
break
lines.append(line)
for line in lines:
print (' '.join([word[::-1] for word in line.split(' ')]))
You could probably do it like this.
' '.join( [ i[::-1] for i in line.split( ' ' ) ] )
Split the line into words
Reverse each word
Put them back together
The issue is that when the line is empty, your while loop stops. You should get rid of the if / else which are useless here.
Full script:
translated = []
line = input('Line: ')
while line != '':
for word in line.split():
letters = list(word)
letters.reverse()
word = ''.join(letters)
translated.append(word)
#The above for loop could be done in one line with:
#translated.extend([word[::-1] for word in line.split()])
line = input('Line: ')
print(' '.join(translated))
This works perfect
a = "Hello how are you"
" ".join([ "".join(reversed(x)) for x in re.findall('\w+',a) ])

reading and checking the consecutive words in a file

I want to read the words in a file, and say for example, check if the word is "1",if word is 1, I have to check if the next word is "two". After that i have to do some other task. Can u help me to check the occurance of "1" and "two" consecutively.
I have used
filne = raw_input("name of existing file to be proceesed:")
f = open(filne, 'r+')
for word in f.read().split():
for i in xrange(len(word)):
print word[i]
print word[i+1]
but its not working.
The easiest way to deal with consecutive items is with zip:
with open(filename, 'r') as f: # better way to open file
for line in f: # for each line
words = line.strip().split() # all words on the line
for word1, word2 in zip(words, words[1:]): # iterate through pairs
if word1 == '1' and word2 == 'crore': # test the pair
At the moment, your indices (i and i+1) are within each word (i.e. characters) not for words within the list.
I think you want to print two consecutive words from the file,
In your code you are iterating over the each character instead of each word in file if thats what you intend to do.
You can do that in following way:
f = open('yourFileName')
str1 = f.read().split()
for i in xrange(len(str1)-1): # -1 otherwise it will be index out of range error
print str1[i]
print str1[i+1]
and if you want to check some word is present and want check for word next to it, use
if 'wordYouWantToCheck' in str1:
index=str1.index('wordYouWantToCheck')
Now you have index for the word you are looking for, you can check for the word next to it using str1[index+1].
But 'index' function will return only the first occurrence of the word. To accomplish your intent here, you can use 'enumerate' function.
indices = [i for i,x in enumerate(str1) if x == "1"]
This will return list containing indices of all occurrences of word '1'.

Categories