Python - changing the output when querying my CSV file - python

I have been tasked to create a program in Python which searches through a CSV file; a list of academic papers (Author, Year, Title, Journal - it's actually TSV).
With my current code, I can achieve correct output (as in the information is correct), but it is not formatted correctly.
What I'm getting is;
['Albers;Bergman', '1995', 'The audible Web', 'Proc. ACM CHI']
Where as what I need is this format;
Author/s. (Year). Title. Journal.
So the commas are changed for full stops (periods).
Also the ; between authors should be changed for an & sign if there are two authors, or there should be a comma followed by an & for three or more authors.
I.E
Glenn & Freg. (1995). Cool book title. Epic journal title.
or
Perry, Smith # Jones. (1998). Cooler book title. Boring journal name.
I'm not entirely sure how to do this. I have searched the python reference, google and here at Stackoverflow, but couldn't come across anything (that I understood at least). There is a LOT on here about completely removing punctuation, but that isn't what I'm after.
I first thought the replace function would work, but it gives me this error. (I'll leave the code in to show what I was attempting, but commented out)
str.replace(',', '.')
TypeError: replace() takes at least 2 arguments (1 given)
It wouldn't have totally solved my problem, but I figured it's something to move from. I'm assume str.replace() won't take punctuation?
Anyway, below is my code. Anybody have any other ideas?
import csv
def TitleSearch():
titleSearch = input("Please enter the Title (or part of the title). \n")
for row in everything:
title = row[2]
if title.find(titleSearch) != -1:
print (row)
def AuthorSearch():
authorSearch = input("Please type Author name (or part of the author name). \n")
for row in everything:
author = row[0]
if author.find(authorSearch) != -1:
#str.replace(',', '.')
print (row)
def JournalSearch():
journalSearch = input("Please type in a Journal (or part of the journal name). \n")
for row in everything:
journal = row[3]
if journal.find(journalSearch) != -1:
print (row)
def YearSearch():
yearSearch = input("Please type in the Year you wish to search. If you wish to search a decade, simply enter the first three numbers of the decade; i.e entering '199' will search for papers released in the 1990's.\n")
for row in everything:
year = row[1]
if year.find(yearSearch) != -1:
print (row)
data = csv.reader (open('List.txt', 'rt'), delimiter='\t')
everything = []
for row in data:
everything.append(row)
while True:
searchOption = input("Enter A to search by Author. \nEnter J to search by Journal name.\nEnter T to search by Title name.\nEnter Y to search by Year.\nOr enter any other letter to exit.\nIf there are no matches, or you made a mistake at any point, you will simply be prompted to search again. \n" )
if searchOption == 'A' or searchOption =='a':
AuthorSearch()
print('\n')
elif searchOption == 'J' or searchOption =='j':
JournalSearch()
print('\n')
elif searchOption == 'T' or searchOption =='t':
TitleSearch()
print('\n')
elif searchOption == 'Y' or searchOption =='y':
YearSearch()
print('\n')
else:
exit()
Thanks in advance to anybody who can help, it's really appreciated!

What you've got so far is a great start; you just need to process it a little further. Replace print(row) with PrettyPrintCitation(row), and add the function below.
Basically, it looks like you need to format the authors with a switch, which would best be implemented as a function. Then, you can handle the rest with just a nice format string. Suppose your reference rows look like the following:
references = [
['Albers', '1994', 'The audible Internet', 'Proc. ACM CHI'],
['Albers;Bergman', '1995', 'The audible Web', 'Proc. ACM CHI'],
['Glenn;Freg', '1995', 'Cool book title', 'Epic journal title'],
['Perry;Smith;Jones', '1998', 'Cooler book title', 'Boring journal name']
]
Then the following will give you what I believe you're looking for:
def PrettyPrintCitation(row) :
def adjustauthors(s):
authorlist = s[0].split(';')
if(len(authorlist)<2) :
s[0] = authorlist[0]
elif(len(authorlist)==2) :
s[0] = '{0} & {1}'.format(*authorlist)
else :
s[0] = ', '.join(authorlist[:-1]) + ', & ' + authorlist[-1]
return s
print('{0}. ({1}). {2}. {3}.'.format(*adjustauthors(row)))
applied to the citations above, this gives you
Albers. (1994). The audible Internet. Proc. ACM CHI.
Albers & Bergman. (1995). The audible Web. Proc. ACM CHI.
Glenn & Freg. (1995). Cool book title. Epic journal title.
Perry, Smith, & Jones. (1998). Cooler book title. Boring journal name.
(I'm assuming that "#" in your proposed output was a mistake...)

You need to work on your python syntax.
try something along these lines:
authorlist=row[0].split(';') # split the multiple authors on semicolon
authors=" & ".join(ahthorlist) # now join them together with ampersand
print"""%s. (%s) %s.""" % (authorlist,row[1],row[2]) # print with pretty brackets etc.

Related

Python statement merge

I have a standard format letter(in txt format) and within it I want to replace some words(keywords) with the words I have created in a separate wordlist.
to visualise my letter is:
Dear [name],
I have b-day party on [date] at [venue]
please be there # [time]
Sincerely,
the following is the file that I would like to replace:
Mary
12.02.2022
Soho House
08.00pm
so Mary would be seen on the [name] and 12.02.2022 is seen on the [date] such as:
Dear Mary,
I have a b-day party on 12.02.2022 xxxx
Following code didnot work. Could you please support how to solve this issue?
import keyword
placeholder1="[name]"
placeholder2="[date]"
placeholder3="[venue]"
placeholder4="[time]"
with open("occurence.txt") as letter:
keyref=letter.readlines()
with open("sample_letter.txt") as target:
target_contents=target.read()
for name in keyref:
word1=target_contents.replace(placeholder1,name)
for date in keyref:
word2 = target_contents.replace(placeholder2, date)
print(word1)
print(word2)
You don't need to import keywords use variables instead. Write things and that. I don't think it will work that way. Please sort it out nicely. You can also make it so that it is an input but this is what I wrote:
name = 'name'
date = 'date'
venue = 'venue'
time = 'time'
print("Dear ",name + "\nI have b-day party on ",date + "at ",venue +
"please be there #",time + "\nSincerely,"

How do I ask for a set list in Python, ask for a number (x), remove x number of inputs, and re-list those?

I am new to Python, and I am currently learning about lists. This is the question that I am trying to solve:
Your favourite band is in town, and tickets are selling fast! Alas,
you were too late to snag one, so you put your name in the waitlist,
in case any extra tickets are released.
Write a program to manage the waitlist for the concert.
Your program should read in a list of the names in the waitlist, and
the number of extra tickets released.
Then, it should announce the names of people who score the extra
tickets.
Here's an example of how your program should work:
People in line: Dave, Lin, Toni, Markhela, Ravi
Number of extra tickets: 3
Tickets released for: Dave, Lin, Toni
Note: The names are separated
by a comma and a space (', ').
If there are no more tickets released, your program should work like
this:
People in line: Mali, Micha, Mary, Monica
Number of extra tickets: 0
Fully Booked!
This band is so popular that there will always be at least as many
people as extra tickets. You won't have to worry about index errors.
I have tried the following, but it always prints the entire list, not just a subset.
ppl = []
sep = ', '
ppl_in_line = input('People in line: ')
ppl.append(ppl_in_line)
x = int(input('Number of extra tickets: '))
if x == 0:
print('Fully Booked!')
else:
y = ppl[:x]
print('Tickets released for: ' + (sep.join(y)))
ppl_in_line is a string. So when you append to ppl, you are appending a single string.
To enter a separated list of ppl on a single line do this:
ppl_in_line = input('People in line: ').split(sep)
You forgot to split your people in line into multiple elements:
ppl_in_line = input('People in line: ')
ppl = ppl_in_line.split(sep)
This is assuming that your input for People in line: is something like
Dave, Lin, Toni, Markhela, Ravi
If you want to use ppl.append, you have to mention them name by name in a loop:
while True:
ppl_in_line = input('People in line: ')
if not ppl_in_line:
break
ppl.append(ppl_in_line)
You can enter the names like
Dave
Lin
Toni
Markhela
Ravi
An empty input will finish the list.

convert output received to dataframe in python

I have selected some fields from a json file and I saved its name along with its respective comment to do preprocessing..
Below are the codes:
import re
import json
with open('C:/Users/User/Desktop/Coding/parsehubjsonfileeg/all.json', encoding='utf8') as f:
data = json.load(f)
# dictionary for element which you want to keep
new_data = {'selection1': []}
print(new_data)
# copy item from old data to new data if it has 'reviews'
for item in data['selection1']:
if 'reviews' in item:
new_data['selection1'].append(item)
print(item['reviews'])
print('--')
# save in file
with open('output.json', 'w') as f:
json.dump(new_data, f)
selection1 = new_data['selection1']
for item in selection1:
name = item['name']
print('>>>>>>>.', name)
CommentID = item['reviews']
for com in CommentID:
comment = com['review'].lower() # converting all to lowercase
result = re.sub(r'\d+', '', comment) # remove numbers
results = (result.translate(
str.maketrans('', '', string.punctuation))).strip() # remove punctuations and white spaces
comments = (results)
print(comment)
my output is:
>>>>>>>. Heritage The Villas
we booked at villa valriche through mari deal for 2 nights and check-in was too lengthy (almost 2 hours) and we were requested to make a deposit of rs 10,000 or credit card which we were never informed about it upon booking.
lovely place to recharge.
one word: suoerb
definitely not a 5 star. extremely poor staff service.
>>>>>>>. Oasis Villas by Evaco Holidays
excellent
spent 3 days with my family and really enjoyed my stay. the advantage of oasis is its privacy - with 3 children under 6 years, going to dinner/breakfast at hotels is often a burden rather than an enjoyable experience.
staff were very friendly and welcoming. artee and menni made sure everything was fine and brought breakfast - warm croissants - every morning. atish made the check-in arrangements - and was fast and hassle free.
will definitely go again!
what should I perform to convert this output to a dataframe having column name and comment?

Issue in user input or text file data in sentiment analysis

I am new to Python-NLTK. I have written my code using movie reviews data set.
When I put hard coded sample text for sentiment analysis it is working fine but when I try to take user input or fetch the data from text file it shows alphabet level splitting.
for e.g.
When sample text is hard coded like
["Music was awesome", "Special effects are awesome"]
Then splitting is like a
Review : Music was awesome
Review : Special effects are awesome.
But if I asked for user input or fetch the data from text file then it shows review as;
Review: M
Review: u
Review: S
Review: i
Review: c
Review: .
#For text file Below is my sample code.
t = open ("Sample1.txt", "r")
File_input = (t.read())
for review in File_input:
print ("\nReview:", review)
probdist = classifier.prob_classify(extract_features(review.split()))
pred_sentiment = probdist.max()
print ("Predicted sentiment:", pred_sentiment)
print ("Probability:", round(probdist.prob(pred_sentiment), 5))
#For user input Below is my sample code.
User_input = input("Enter your value: ")
for review in User_input:
print ("\nReview:", review)
probdist = classifier.prob_classify(extract_features(review.split()))
pred_sentiment = probdist.max()
print ("Predicted sentiment:", pred_sentiment)
print ("Probability:", round(probdist.prob(pred_sentiment), 3))
plz guide.
Thanks!
the User_input variable is a string, so iterating over it is iterating over the chars, what you want to do is remove the for loop and treat User_input as a review assuming it holds 1 review, otherwise you could define a separating char between reviews and iterate like so:
for review in User_input.split(sep_char):

How can I test for different strings in a string and act differently upon some of them? Python

What I want to do is look for different strings in a string and act differently upon some of them. Ths is what I have now:
import re
book = raw_input("What book do you want to read from today? ")
keywords = ["Genesis", "genesis", "Gen", "Gen.", "gen", "gen.", "Matthew", "matthew", "Matt", "Matt.", "matt", "matt." ]
if any(keyword in book for keyword in keywords):
print("You chose the book of: " + book)
I plan to change the "print" at the end to another action later on. So basicly if the user inputs the string "Genisis" then it will take action #1 and if the user inputs "Gen." it will also take action #1 as with all the other forms of the string "Genisis" but if the user inputs the string "Matthew" I want it to take action #2 and it should take action #2 with all the other variations of matthew.
I considered something like this:
book = raw_input("What book do you want to read from today? "
if book == "Genesis":
print "Genesis"
but that would require lots of lines for all the variations I have listed of "genesis"
I hope someone can help!
book = raw_input("What book do you want to read from today? ").lower().strip('.')
# keywords = ["Genesis", "genesis", "Gen", "Gen.", "gen", "gen.", "Matthew", "matthew", "Matt", "Matt.", "matt", "matt." ]
if book == 'genesis':
#action1
pass
elif book == 'gen':
#action2
pass
else:
print('not find the book!')
Using slices would still require you to write an if statement, but it would make the reduce the amount of code needed:
if book in keywords[:6]:
print "Genesis"
You can use a for loop and test for the containment of a book in any of a unique set of keywords. Whatever variation the book input takes, str.lower ensures you can find it in a keyword and take action based on the keyword:
actions = {...} # dictionary of functions
keywords = ['genesis', 'matthew', ...]
book = raw_input("What book do you want to read from today? ")
for kw in keywords:
if book.lower() in kw:
actions[kw]() # take action!
break # stop iteration

Categories