Issue in user input or text file data in sentiment analysis - python

I am new to Python-NLTK. I have written my code using movie reviews data set.
When I put hard coded sample text for sentiment analysis it is working fine but when I try to take user input or fetch the data from text file it shows alphabet level splitting.
for e.g.
When sample text is hard coded like
["Music was awesome", "Special effects are awesome"]
Then splitting is like a
Review : Music was awesome
Review : Special effects are awesome.
But if I asked for user input or fetch the data from text file then it shows review as;
Review: M
Review: u
Review: S
Review: i
Review: c
Review: .
#For text file Below is my sample code.
t = open ("Sample1.txt", "r")
File_input = (t.read())
for review in File_input:
print ("\nReview:", review)
probdist = classifier.prob_classify(extract_features(review.split()))
pred_sentiment = probdist.max()
print ("Predicted sentiment:", pred_sentiment)
print ("Probability:", round(probdist.prob(pred_sentiment), 5))
#For user input Below is my sample code.
User_input = input("Enter your value: ")
for review in User_input:
print ("\nReview:", review)
probdist = classifier.prob_classify(extract_features(review.split()))
pred_sentiment = probdist.max()
print ("Predicted sentiment:", pred_sentiment)
print ("Probability:", round(probdist.prob(pred_sentiment), 3))
plz guide.
Thanks!

the User_input variable is a string, so iterating over it is iterating over the chars, what you want to do is remove the for loop and treat User_input as a review assuming it holds 1 review, otherwise you could define a separating char between reviews and iterate like so:
for review in User_input.split(sep_char):

Related

Python - Dictionary - If loop variable not changing

Project is about to convert short forms into long description and read from csv file
Example: user enters LOL and then it should response 'Laugh of Laughter'
Expectation: Till the time time user enter wrong keyword computer keep on asking to enter short form and system answers it's long description from CSV file
I considered each row of CSV file as dictionary and broke down into keys and values
logic used: - Used while so, that it keeps on asking until short column didn't finds space, empty cell. But issue is after showing successful first attempt comparison in IF loop is not happening because readitems['short' ] is not getting updated on each cycle
AlisonList.csv Values are:
short,long
lol,laugh of laughter
u, you
wid, with
import csv
from lib2to3.fixer_util import Newline
from pip._vendor.distlib.util import CSVReader
from _overlapped import NULL
READ = "r"
WRITE = 'w'
APPEND = 'a'
# Reading the CSV file and converted into Dictionary
with open ("AlisonList.csv", READ) as csv_file:
readlist = csv.DictReader(csv_file)
# Reading the short description and showing results
for readitems in readlist:
readitems ['short'] == ' '
while readitems['short'] !='' :
# Taking input of short description
smsLang = str(input("Enter SMS Language : "))
if smsLang == readitems['short']:
print(readitems['short'], ("---Means---"), readitems['long'])
else:
break
Try this:
import csv
READ = "r"
WRITE = 'w'
APPEND = 'a'
# Reading the CSV file and converted into Dictionary
with open ("AlisonList.csv", READ) as csv_file:
readlist = csv.DictReader(csv_file)
word_lookup = { x['short'].strip() : x['long'].strip() for x in readlist }
while True:
# Taking input of short description
smsLang = str(input("Enter SMS Language : ")).lower()
normalWord = word_lookup.get(smsLang.lower())
if normalWord is not None:
print(f"{smsLang} ---Means--- {normalWord}")
else:
print(f"Sorry, '{smsLang}' is not in my dictionary.")
Sample output:
Enter SMS Language : lol
lol ---Means--- laugh of laughter
Enter SMS Language : u
u ---Means--- you
Enter SMS Language : wid
wid ---Means--- with
Enter SMS Language : something that won't be in the dictionary
Sorry, 'something that won't be in the dictionary' is not in my dictionary.
Basically, we compile a dictionary from the csv file, using the short words as the keys, and the long words as the items. This allows us in the loop to then just call word_lookup.get(smsLang) to find the longer version. If such a key does not exist, we get a result of None, so a simple if statement can handle the case where there is no longer version.
Hope this helps.

How to extract specific line in text file

I am text mining a large document. I want to extract a specific line.
CONTINUED ON NEXT PAGE CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 4 OF 16 PAGES
SPE2DH-20-T-0133 SECTION B
PR: 0081939954 NSN/MATERIAL: 6530015627381
ITEM DESCRIPTION
BOTTLE, SAFETY CAP
BOTTLE, SAFETY CAP RPOO1: DLA PACKAGING REQUIREMENTS FOR PROCUREMENT
RAQO1: THIS DOCUMENT INCORPORATES TECHNICAL AND/OR QUALITY REQUIREMENTS (IDENTIFIED BY AN 'R' OR AN 'I' NUMBER) SET FORTH IN FULL TEXT IN THE DLA MASTER LIST OF TECHNICAL AND QUALITY REQUIREMENTS FOUND ON THE WEB AT:
I want to extract the description immediately under ITEM DESCRIPTION.
I have tried many unsuccessful attempts.
My latest attempt was:
for line in text:
if 'ITEM' and 'DESCRIPTION'in line:
print ('Possibe Descript:\n', line)
But it did not find the text.
Is there a way to find ITEM DESCRIPTION and get the line after it or something similar?
The following function finds the description on the line below some given pattern, e.g. "ITEM DESCRIPTION", and also ignores any blank lines that may be present in between. However, beware that the function does not handle the special case when the pattern exists, but the description does not.
txt = '''
CONTINUED ON NEXT PAGE CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 4 OF 16 PAGES
SPE2DH-20-T-0133 SECTION B
PR: 0081939954 NSN/MATERIAL: 6530015627381
ITEM DESCRIPTION
BOTTLE, SAFETY CAP
BOTTLE, SAFETY CAP RPOO1: DLA PACKAGING REQUIREMENTS FOR PROCUREMENT
RAQO1: THIS DOCUMENT INCORPORATES TECHNICAL AND/OR QUALITY REQUIREMENTS (IDENTIFIED BY AN 'R' OR AN 'I' NUMBER) SET FORTH IN FULL TEXT IN THE DLA MASTER LIST OF TECHNICAL AND QUALITY REQUIREMENTS FOUND ON THE WEB AT:
'''
I've assumed you got your text as a text string, and thus the function below will split it into a list of lines ..
pattern = "ITEM DESCRIPTION" # to search for
def find_pattern_in_txt(txt, pattern):
lines = [line for line in txt.split("\n") if line] # remove empty lines
if pattern in lines: return lines[lines.index(pattern)+1]
return None
print(find_pattern_in_txt(txt, pattern)) # prints: "BOTTLE, SAFETY CAP"
Test like this :
description = False
for line in text:
if 'ITEM DESCRIPTION' in line:
description = True
if description:
print(line)
Know this will work but you need something to stop reading the description, maybe another title like this
description = False
for line in text:
if 'ITEM DESCRIPTION' in line:
description = True
if description:
print(line)
if "END OF SOMETHING":
description = False
Use the string function 'find' as in the following, 'find' will return the index of the string you are looking for, so a positive number shows that you have found it.
code:
txt = "Hello, welcome to my world."
x = txt.find("welcome")
if x > 0:
print(x)
***
output:
***
7
f=open("aa.txt","r")
a=[]
for i in f:
a.append(i.split())
t1=0
for j in range(len(a)):
for i in range(len(a[j])):
if(a[j][i]=="ITEM" and a[j][i+1]=="DESCRIPTION"):
t1=j
for i in range(t1+1,len(a)):
for j in range(len(a[i])):
print(a[i][j]),
Use regex
import re
pattern = re.compile("(ITEM DESCRIPTION)\n.*") #if the information is directly
below without white space
pattern = re.compile("(ITEM DESCRIPTION)\n\n.*") #if there is a white space
before the information
for i, line in enumerate(open('file.txt')):
for match in re.finditer(pattern, line):
print 'Found on line %s: %s' % (i+1, match.group())

find key in dictionary and print value

Hello everyone I am stuck on a class assignment and not sure where to go at this point as my college does not offer tutors for the programming field as this is the first semester that this has been offered. Assignment is:
Write a program that:
Prints out the toy name for that code in a useful message such as, ‘The toy for that code is a Baseball’
The program exits when instead of a toy code, the user enters ‘quit’
below is a sample of the text file that the dict is suppose to populate from
D1,Tyrannasaurous
D2,Apatasauros
D3,Velociraptor
D4,Tricerotops
D5,Pterodactyl
T1,Diesel-Electric
T2,Steam Engine
T3,Box Car
and what I have gotten so far is:
**
fin=open('C:/Python34/Lib/toys.txt','r')
print(fin)
toylookup=dict() #creates a dictionary named toy lookup
def get_line(): #get a single line from the file
newline=fin.readline() #get the line
newline=newline.strip() #strip away extra characters
return newline
print ('please enter toy code here>>>')
search_toy_code= input()
for toy_code in toylookup.keys():
if toy_code == search_toy_code:
print('The toy for that code is a','value')
else:
print("toy code not found")
**
and to be honest I am not even sure I am right with what I have. any help at all would be greatly appreciate thank you.
There are two issues.
Your dictionary isn't getting populated; however there currently isn't enough info in your question to help with that problem. Need to know what the file looks like, etc.
Your lookup loop won't display the values for keys that match. Below is the solution for that.
Try iterating over key:value pairs like this:
for code, toy in toylookup.items():
if key == search_toy_code:
print('The toy for that code ({}) is a {}'.format(code, toy))
else:
print("Toy code ({}) not found".format(code))
Take a look at the docs for dict.items():
items():
Return a new view of the dictionary’s items ((key, value) pairs).
You should make yourself familiar with basic python programming. In order to solve such tasks you need to know about basic data structures and loops.
# define path and name of file
filepath = "test.txt" # content like: B1,Baseball B2,Basketball B3,Football
# read file data
with open(filepath) as f:
fdata = f.readlines() # reads every line in fdata
# fdata is now a list containing each line
# prompt the user
print("please enter toy code here: ")
user_toy_code = input()
# dict container
toys_dict = {}
# get the items with toy codes and names
for line in fdata: # iterate over every line
line = line.strip() # remove extra whitespaces and stuff
line = line.split(" ") # splits "B1,Baseball B2,Basketball"
# to ["B1,Baseball", "B2,Basketball"]
for item in line: # iterate over the items of a line
item = item.split(",") # splits "B1,Baseball"
# to ["B1", "Baseball"]
toys_dict[item[0]] = item[1] # saves {"B1": "Baseball"} to the dict
# check if the user toy code is in our dict
if user_toy_code in toys_dict:
print("The toy for toy code {} is: {}".format(user_toy_code, toys_dict[user_toy_code]))
else:
print("Toy code {} not found".format(user_toy_code))

How to read and label line by line a text file using nltk.corpus in Python

My problem is to classify documents given two training data good_reviews.txt and bad_reviews.txt. So to start I need to load and label my training data where every line is a document itself which corresponds to a review. So my main task is to classify reviews (lines) from a given testing data.
I found a way how to load and label names data as follow:
from nltk.corpus import names
names = ([(name, 'male') for name in names.words('male.txt')] +
[(name, 'female') for name in names.words('female.txt')])
So what I want to have is a similar thing which labels lines and not words.
I am expecting that the code would be something like this which of course doesn't work since .lines is an invalid syntax:
reviews = ([(review, 'good_review') for review in reviews.lines('good_reviews.txt')] +
[(review, 'bad_review') for review in reviews.lines('bad_reviews.txt')])
and I would like to have a result like this:
>>> reviews[0]
('This shampoo is very good blablabla...', 'good_review')
If you're reading your own textfile, then there's nothing much to do with NLTK, you can simply use file.readlines():
good_reviews = """This is great!
Wow, it amazes me...
An hour of show, a lifetime of enlightment
"""
bad_reviews = """Comme si, Comme sa.
I just wasted my foo bar on this.
An hour of s**t, ****.
"""
with open('/tmp/good_reviews.txt', 'w') as fout:
fout.write(good_reviews)
with open('/tmp/bad_reviews.txt', 'w') as fout:
fout.write(bad_reviews)
reviews = []
with open('/tmp/good_reviews.txt', 'r') as fingood, open('/tmp/bad_reviews.txt', 'r') as finbad:
reviews = ([(review, 'good_review') for review in fingood.readlines()] + [(review, 'bad_review') for review in finbad.readlines()])
print reviews
[out]:
[('This is great!\n', 'good_review'), ('Wow, it amazes me...\n', 'good_review'), ('An hour of show, a lifetime of enlightment\n', 'good_review'), ('Comme si, Comme sa.\n', 'bad_review'), ('I just wasted my foo bar on this.\n', 'bad_review'), ('An hour of s**t, ****.\n', 'bad_review')]
If you're going to use the NLTK movie review corpus, see Classification using movie review corpus in NLTK/Python

Python - changing the output when querying my CSV file

I have been tasked to create a program in Python which searches through a CSV file; a list of academic papers (Author, Year, Title, Journal - it's actually TSV).
With my current code, I can achieve correct output (as in the information is correct), but it is not formatted correctly.
What I'm getting is;
['Albers;Bergman', '1995', 'The audible Web', 'Proc. ACM CHI']
Where as what I need is this format;
Author/s. (Year). Title. Journal.
So the commas are changed for full stops (periods).
Also the ; between authors should be changed for an & sign if there are two authors, or there should be a comma followed by an & for three or more authors.
I.E
Glenn & Freg. (1995). Cool book title. Epic journal title.
or
Perry, Smith # Jones. (1998). Cooler book title. Boring journal name.
I'm not entirely sure how to do this. I have searched the python reference, google and here at Stackoverflow, but couldn't come across anything (that I understood at least). There is a LOT on here about completely removing punctuation, but that isn't what I'm after.
I first thought the replace function would work, but it gives me this error. (I'll leave the code in to show what I was attempting, but commented out)
str.replace(',', '.')
TypeError: replace() takes at least 2 arguments (1 given)
It wouldn't have totally solved my problem, but I figured it's something to move from. I'm assume str.replace() won't take punctuation?
Anyway, below is my code. Anybody have any other ideas?
import csv
def TitleSearch():
titleSearch = input("Please enter the Title (or part of the title). \n")
for row in everything:
title = row[2]
if title.find(titleSearch) != -1:
print (row)
def AuthorSearch():
authorSearch = input("Please type Author name (or part of the author name). \n")
for row in everything:
author = row[0]
if author.find(authorSearch) != -1:
#str.replace(',', '.')
print (row)
def JournalSearch():
journalSearch = input("Please type in a Journal (or part of the journal name). \n")
for row in everything:
journal = row[3]
if journal.find(journalSearch) != -1:
print (row)
def YearSearch():
yearSearch = input("Please type in the Year you wish to search. If you wish to search a decade, simply enter the first three numbers of the decade; i.e entering '199' will search for papers released in the 1990's.\n")
for row in everything:
year = row[1]
if year.find(yearSearch) != -1:
print (row)
data = csv.reader (open('List.txt', 'rt'), delimiter='\t')
everything = []
for row in data:
everything.append(row)
while True:
searchOption = input("Enter A to search by Author. \nEnter J to search by Journal name.\nEnter T to search by Title name.\nEnter Y to search by Year.\nOr enter any other letter to exit.\nIf there are no matches, or you made a mistake at any point, you will simply be prompted to search again. \n" )
if searchOption == 'A' or searchOption =='a':
AuthorSearch()
print('\n')
elif searchOption == 'J' or searchOption =='j':
JournalSearch()
print('\n')
elif searchOption == 'T' or searchOption =='t':
TitleSearch()
print('\n')
elif searchOption == 'Y' or searchOption =='y':
YearSearch()
print('\n')
else:
exit()
Thanks in advance to anybody who can help, it's really appreciated!
What you've got so far is a great start; you just need to process it a little further. Replace print(row) with PrettyPrintCitation(row), and add the function below.
Basically, it looks like you need to format the authors with a switch, which would best be implemented as a function. Then, you can handle the rest with just a nice format string. Suppose your reference rows look like the following:
references = [
['Albers', '1994', 'The audible Internet', 'Proc. ACM CHI'],
['Albers;Bergman', '1995', 'The audible Web', 'Proc. ACM CHI'],
['Glenn;Freg', '1995', 'Cool book title', 'Epic journal title'],
['Perry;Smith;Jones', '1998', 'Cooler book title', 'Boring journal name']
]
Then the following will give you what I believe you're looking for:
def PrettyPrintCitation(row) :
def adjustauthors(s):
authorlist = s[0].split(';')
if(len(authorlist)<2) :
s[0] = authorlist[0]
elif(len(authorlist)==2) :
s[0] = '{0} & {1}'.format(*authorlist)
else :
s[0] = ', '.join(authorlist[:-1]) + ', & ' + authorlist[-1]
return s
print('{0}. ({1}). {2}. {3}.'.format(*adjustauthors(row)))
applied to the citations above, this gives you
Albers. (1994). The audible Internet. Proc. ACM CHI.
Albers & Bergman. (1995). The audible Web. Proc. ACM CHI.
Glenn & Freg. (1995). Cool book title. Epic journal title.
Perry, Smith, & Jones. (1998). Cooler book title. Boring journal name.
(I'm assuming that "#" in your proposed output was a mistake...)
You need to work on your python syntax.
try something along these lines:
authorlist=row[0].split(';') # split the multiple authors on semicolon
authors=" & ".join(ahthorlist) # now join them together with ampersand
print"""%s. (%s) %s.""" % (authorlist,row[1],row[2]) # print with pretty brackets etc.

Categories