Parsing and grouping text in a string using Python

Parsing and grouping text in a string using Python - python

I need to parse a series of short strings that are comprised of 3 parts: a question and 2 possible answers. The string will follow a consistent format:
This is the question "answer_option_1 is in quotes" "answer_option_2 is in quotes"
I need to identify the question part and the two possible answer choices that are in single or double quotes.
Ex.:
What color is the sky today? "blue" or "grey"
Who will win the game 'Michigan' 'Ohio State'
How do I do this in python?

>>> import re
>>> s = "Who will win the game 'Michigan' 'Ohio State'"
>>> re.match(r'(.+)\s+([\'"])(.+?)\2\s+([\'"])(.+?)\4', s).groups()
('Who will win the game', "'", 'Michigan', "'", 'Ohio State')

If your format is a simple as you say (i.e. not as in your examples), you don't need regex. Just split the line:
>>> line = 'What color is the sky today? "blue" "grey"'.strip('"')
>>> questions, answers = line.split('"', 1)
>>> answer1, answer2 = answers.split('" "')
>>> questions
'What color is the sky today? '
>>> answer1
'blue'
>>> answer2
'grey'

One possibility is that you can use regex.
import re
robj = re.compile(r'^(.*) [\"\'](.*)[\"\'].*[\"\'](.*)[\"\']')
str1 = "Who will win the game 'Michigan' 'Ohio State'"
r1 = robj.match(str1)
print r1.groups()
str2 = 'What color is the sky today? "blue" or "grey"'
r2 = robj.match(str2)
r2.groups()
Output:
('Who will win the game', 'Michigan', 'Ohio State')
('What color is the sky today?', 'blue', 'grey')

Pyparsing will give you a solution that will adapt to some variability in the input text:
questions = """\
What color is the sky today? "blue" or "grey"
Who will win the game 'Michigan' 'Ohio State'""".splitlines()
from pyparsing import *
quotedString.setParseAction(removeQuotes)
q_and_a = SkipTo(quotedString)("Q") + delimitedList(quotedString, Optional("or"))("A")
for qn in questions:
print qn
qa = q_and_a.parseString(qn)
print "qa.Q", qa.Q
print "qa.A", qa.A
print
Will print:
What color is the sky today? "blue" or "grey"
qa.Q What color is the sky today?
qa.A ['blue', 'grey']
Who will win the game 'Michigan' 'Ohio State'
qa.Q Who will win the game
qa.A ['Michigan', 'Ohio State']

Related

Picking a restaurant based off mood

Im trying to have python look at a list of restaurants I've made, each one being in a category (ex. McDonalds, Burger King, Canes are all in the american category). I would then like to input my mood (american, asian, chicken, etc.), and have python choose a place at random from that category. But so far it just chooses a place at random, no matter what I enter.
import random
# create list of food choices
american = ['Mcdonalds', 'Burger King', 'Culvers', 'Wendys', 'KFC']
asian = ['Panda Express', 'Miso Japan', 'Sakura']
burger = ['Mcdonalds', 'Burger King', 'Culvers', 'Wendys']
chicken = ['Canes', 'KFC']
healthy = ['Panera']
rest = [american, asian, burger, chicken, healthy]
# pick random restaurant based off mood
print('What Mood Are You In?')
mood = input()
mood_choice = random.choice(rest)
final_choice = random.choice(mood_choice)
print(final_choice)

You should use a dict, storing the different categories as the keys and the list restaurants as the value. Then you can ask for user-input and select the appropriate key to get the list of restaurants which you can then feed to random.choice to get the desired result.
import random
food = {
"american" : ['Mcdonalds', 'Burger King', 'Culvers', 'Wendys', 'KFC'],
"asian" : ['Panda Express', 'Miso Japan', 'Sakura'],
"burger" : ['Mcdonalds', 'Burger King', 'Culvers', 'Wendys'],
"chicken" : ['Canes', 'KFC'],
"healthy" : ['Panera']
}
mood = input('What Mood Are You In? ')
choices = food[mood.lower()] # use str.lower in case input is 'Healthy'
mood_choice = random.choice(choices)
print(mood_choice)

if/else: get condition met - Python

animals = 'silly monkey small bee white cat'
text1 = 'brown dog'
text2 = 'white cat'
text3 = 'fat cow'
if(text1 in animals or text2 in animals or text3 in animals):
print(text2) # because it was met in the if/else statment!
I tried to simplify but this animals string will be update everytime.
What is the best and easy way to achieve this without so many if/else statment in my code?

You can use regex.
import re
pattern = '|'.join([text1, text2, text3])
# pattern -> 'brown dog|white cat|fat cow'
res = re.findall(pattern, animals)
print(res)
# ['white cat']

ANY time you have a set of variables of the form xxx1, xxx2, and xxx3, you need to convert that to a list.
animals = 'silly monkey small bee white cat'
text = [
'brown dog',
'white cat',
'fat cow'
]
for t in text:
if t in animals:
print("Found",t)

Use a loop to check each case:
animals = 'silly monkey small bee white cat'
text1 = 'brown dog'
text2 = 'white cat'
text3 = 'fat cow'
for text in [text1, text2, text3]:
if text in animals:
print(text)

Poem-writing and getting error for re-using the same function - Python

I'm doing this for a project. for which I need to do some web-scraping from Wikipedia specifically. This is the second phase of the project, so I need to create a poem about a person that the user enters (they have to have a Wikipedia page). I am using the Datamuse API for python to get some rhyming words which works really well.
Function ->
import requests
def get_10_rhyme_words(word):
key = 'https://api.datamuse.com/words?rel_rhy='
rhyme_words = []
rhymes = requests.get(key + word)
for i in rhymes.json()[0:10]:
rhyme_words.append(i['word'])
return rhyme_words
The criteria for the poem is that it needs to be at least 50 words long and make sense, so I came up with something like this:
“firstName” is nothing like “nameWord1”,
but it sounds a lot like “nameWord2”.
“genderPronoun” is a “professionFinal”,
Which sounds a lot like “professionWord1”.
“genderPronoun”’s favourite food might be waffles,
But it might also be “foodWord1”.
I now close this poem about the gorgeous “firstName”,
By saying “genderPronoun”’s name sounds a lot like “nameWord3”.
professionFinal was a variable used to describe their profession.
It works well for the name, but I get an IndexError every time I run it for the profession.
Name ->
The name poem
Here is a short poem on Serena:
Serena is nothing like hyena, but it sounds a lot like marina.
Profession ->
The Profession Poem (Error)
Here is a short poem on Serena:
Traceback (most recent call last): File "main.py", line 153, in <module> line4 = 'which sounds a lot like ' + random.choice(professionRhymes) + '.' File "/usr/lib/python3.8/random.py", line 290, in choice raise IndexError('Cannot choose from an empty sequence') from None IndexError: Cannot choose from an empty sequence
Here is the code I am using to make the poem ->
#Writing a poem about the person
firstName = person.split()[0]
foodWord = 'waffles'
print('\nHere is a short poem on {}:\n'.format(firstName))
nameRhymes = get_10_rhyme_words(firstName)
professionRhymes = get_10_rhyme_words(professionFinal)
foodRhymes = get_10_rhyme_words(foodWord)
if gender == 'Male':
heOrShe = 'He'
else:
heOrShe = 'She'
if gender == 'Male':
himOrHer = 'Him'
else:
himOrHer = 'Her'
line1 = firstName + ' is nothing like ' + random.choice(nameRhymes) + ','
line2 = 'but it sounds a lot like ' + random.choice(nameRhymes) + '.'
line3 = heOrShe + ' is a ' + professionFinal + ','
line4 = 'which sounds a lot like ' + random.choice(professionRhymes) + '.'
line5 = heOrShe + '\'s favourite food might be foodWord,'
line6 = 'but it might also be ' + random.choice(foodRhymes) + '.'
line7 = 'I now close this poem about the gorgeous {},'.format(firstName)
line8 = 'By saying {0}\'s name sounds a lot like {1}'.format(firstName, random.choice(nameRhymes))
print(line1)
print(line2)
print(line3)
print(line4)
print(line5)
print(line6)
print(line7)
print(line8)
**ignore the inconsistency and the lack of loops for printing each line
How do I make it so I don't get the error because frankly, I don't even know why I'm getting it...
Thanks!
(P.S.) Sorry for making it this long. Bye!

You should add a check for what the request returns. If it returns an empty list, it cannot be used as a random.choice() argument, since it requires a list with one or more item.
This part of this error
line4 = 'which sounds a lot like ' + random.choice(professionRhymes) + '.'
File "/usr/lib/python3.8/random.py",
line 290, in choice
raise IndexError('Cannot choose from an empty sequence')
from None IndexError: Cannot choose from an empty sequence
professionRhymes is probably returning an empty list.

Thanks to everyone that responded. It seems the consensus was enough to make me print the list and see that it comes up empty. Sadly, I am using repl and it doesn't have a debugger. But thanks guys, I found out the problem and will alter my poem to suit the needs of the program. As for the people asking the code, I only needed to check if their profession was that of a scientist, sportsperson, or politician. So I made a list, made a for loop check for keywords related to professions, then picked the right one. That is what professionFinal was.
Code:
#Finding their profession
#Declaring keywords for each profession
sportspersonKeywords = ['Sportsperson', 'Sportsman', 'Sportsman', 'Sports', 'Sport', 'Coach', 'Game', 'Olympics', 'Paralympics', 'Medal', 'Bronze', 'Silver', 'Gold', 'Player', 'sportsperson', 'sportsman', 'sportsman', 'sports', 'sport', 'coach', 'game', 'olympics', 'paralympics', 'medal', 'bronze', 'silver', 'gold', 'player', 'footballer', 'Footballer']
scientistKeywords = ['Scientist', 'Mathematician', 'Chemistry', 'Biology', 'Physics', 'Nobel Prize', 'Invention', 'Discovery', 'Invented', 'Discovered', 'science', 'scientist', 'mathematician', 'chemistry', 'biology', 'physics', 'nobel prize', 'invention', 'discovery', 'invented', 'discovered', 'science', 'Physicist', 'physicist', 'chemist', 'Chemist', 'Biologist', 'biologist']
politicianKeywords = ['Politician', 'Politics', 'Election', 'President', 'Vice-President', 'Vice President', 'Senate', 'Senator', 'Representative', 'Democracy', 'politician', 'politics', 'election', 'president', 'vice-president', 'vice president', 'senate', 'senator', 'representative', 'democracy']
#Declaring the first sentence (from the summary)
firstSentence = summary.split('.')[0]
profession = ['Scientist', 'Sportsperson', 'Politician']
professionFinal = ''
#Splitting the first sentence of the summary into separate words
firstSentenceList = firstSentence.split()
#Checking each word in the first sentence against the keywords in each profession to try to get a match
for i in firstSentenceList:
if i in sportspersonKeywords:
professionFinal = profession[1]
break
elif i in scientistKeywords:
professionFinal = profession[0]
break
elif i in politicianKeywords:
professionFinal = profession[2]
break
#if a match is found, then that person has that profession, if not, then their profession is not in our parameters
if professionFinal == '':
print('[PROFESSION]: NOT A SPORTPERSON, SCIENTIST, OR POLITICIAN')
else:
print('[PROFESSION]: ' + professionFinal)
Thanks guys!

Why isn't this if statement returning True?

I'm making a program that counts how many times a band has played a song from a webpage of all their setlists. I have grabbed the webpage and converted all the songs played into one big list so all I wanted to do was see if the song name was in the list and add to a counter but it isn't working and I can't seem to figure out why.
I've tried using the count function instead and that didn't work
sugaree_counter = 0
link = 'https://www.cs.cmu.edu/~mleone/gdead/dead-sets/' + year + '/' + month+ '-' + day + '-' + year + '.txt'
page = requests.get(link)
page_text = page.text
page_list = [page_text.split('\n')]
print(page_list)
This code returns the list:
[['Winterland Arena, San Francisco, CA (1/2/72)', '', "Truckin'", 'Sugaree',
'Mr. Charlie', 'Beat it on Down the Line', 'Loser', 'Jack Straw',
'Chinatown Shuffle', 'Your Love At Home', 'Tennessee Jed', 'El Paso',
'You Win Again', 'Big Railroad Blues', 'Mexicali Blues',
'Playing in the Band', 'Next Time You See Me', 'Brown Eyed Women',
'Casey Jones', '', "Good Lovin'", 'China Cat Sunflower', 'I Know You Rider',
"Good Lovin'", 'Ramble On Rose', 'Sugar Magnolia', 'Not Fade Away',
"Goin' Down the Road Feeling Bad", 'Not Fade Away', '',
'One More Saturday Night', '', '']]
But when I do:
sugaree_counter = int(sugaree_counter)
if 'Sugaree' in page_list:
sugaree_counter += 1
print(str(sugaree_counter))
It will always be zero.
It should add 1 to that because 'Sugaree' is in that list

Your page_list is a list of lists, so you need two for loops to get the pages, you need to do
for page in page_list:
for item in page:
sugaree_counter += 1

Use sum() and list expressions:
sugaree_counter = sum([page.count('Sugaree') for page in page_list])

IndexError: list index out of range with tkinter and procedures

I'm recently adjusting to using tkinter alongside Python and as part of an assignment I've been asked to produce a tkinter-based program.
I decided I'd attempt a quiz which has three difficulties and is able to choose a question out of random from the list available for that difficulty.
When I run the program and select the difficulty (easy is all I've done up to now) it continues running but I get an error within the IDE:
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Python27\lib\lib-tk\Tkinter.py", line 1536, in __call__
return self.func(*args)
File "E:\Unit 6 - Software Development\randomtkintergame.py", line 98, in easy_mode
question, answer, wrong_1, wrong_2, wrong_3 = easy[question_choice]
IndexError: list index out of range
This is the code I have:
import Tkinter as tk
from tkSimpleDialog import *
import math
import string
import time
from random import randint
global question_title, button_1, button_2, button_3, button_4, question_number, score_count, easy, question_choice, question, answer, wrong_1, wrong_2, wrong_3
question_number = 1
score_count = 0
question_choice = 0
question = 0
answer = 0
wrong_1 = 0
wrong_2 = 0
wrong_3 = 0
easy = [("What is the longest river in the world?", "The River Nile", "The Amazon River", "The River Humber", "The River Trent"),
("What is Doctor Who`s time box called?", "TARDIS", "TIMEY-WIMEY", "TASDIR", "WIMEY-TIMEY"),
("How many faces are on a die?", "6", "5", "7", "4"),
("How many wives did Henry VIII have?", "6", "8", "4", "9"),
("What is the square root of 169?", "13", "11", "17", "19"),
("In a game of chess, what is the only piece able to jump over other pieces?", "Knight", "Pawn", "Bishop", "All"),
("Who is the author of the `Harry Potter` books?", "J.K. Rowling", "J.R.R. Tolkien", "George R.R. Martin", "Julius Caesar"),
("What is the name of the clockwork device used by musicians to measure time?", "Metronome", "Tuner", "Amplifier", "Time Measurer"),
("Which two colours are Dennis the Menace`s jumper?", "Red and black", "Blue and black", "White and gold", "Red and blue"),
("An octagon has how many sides?", "8", "6", "10", "7"),
("Which sign of the zodiac is represented by the Ram?", "Aries", "Scorpio", "Ophiuchus", "Aquarius"),
("Which animal is associated with the beginning of an MGM film?", "A lion", "An alpaca", "A very small albatross", "A tiger"),
("What was the hunchback of Notre Dame`s name?", "Quasimodo", "Esmerelda", "Frollo", "Not Re Dame"),
("Who is the animated star of the computer game Tomb Raider?", "Lara Croft", "Sara Craft", "Tom Cruise", "Bill Gates"),
("What is the name of the city in which The Simpsons live?", "Springfield", "Quahog", "South Park", "Boston"),
("In which film would you first have come across the character of Marty McFly?", "Back to the Future", "Lord of the Rings", "The IT Crowd", "Harry Potter"),
("How many years are there in a millennium?", "1,000", "10,000", "100", "1,000,000"),
("In Greek mythology, what was Medusa`s hair made of?", "Snakes", "Threads of silk", "Stone", "Leeches"),
("What is the first letter of the Greek alphabet?", "A - Alpha", "B - Beta", "G - Gamma", "E - Epsilon"),
("What type of animal was Stuart, in the 1999 film `Stuart Little`?", "Mouse", "Frog", "Guinea pig", "Porcupine"),
("What creature appears on the flag of Wales?", "Dragon", "Alligator", "Crocodile", "Lizard"),
("On what part of the body would you wear a `sombrero`?", "Head", "Feet", "Hands", "Chest"),
("How many wheels are on a tricycle?", "3", "2", "6", "8"),
("Oxygen and which other element makes up water?", "Hydrogen", "Helium", "Ytterbium", "Einsteinium"),
("How many inches are in a yard?", "36", "12", "8", "24"),
("What colour is an emerald?", "Green", "Black", "Orange", "White")]
def easy_mode():
global button_1, button_2, button_3, button_4, question_number, question_title, score_count, easy, question_choice, question, answer, wrong_1, wrong_2, wrong_3
repeat = True
while repeat == True:
for i in range (0, len(easy)): #the code works through all the questions
question_choice = randint(0, 25)#generate random int
question, answer, wrong_1, wrong_2, wrong_3 = easy[question_choice]
easy.pop(question_choice)
root = tk.Tk()
root.geometry("700x500")
root.title("Educational Quiz")
root.configure(background="#f2e5ff")
question_title = tk.Label(root, text="Please select a difficulty.", relief=GROOVE, bd=5, bg="#66b2ff", fg="black", font=("Calibri Light", 20))
question_title.place(x = 50, y = 25, width=625, height=80)
button_1 = tk.Button(root, text = "EASY", relief=GROOVE, bd=5, command = easy_mode, bg="#0055ff", fg="black", font=("Calibri Light", 20))
button_1.place (x = 50, y = 180 , width=300, height=80)
Of course, it's not the entire code as I didn't want to paste things that aren't actually related to the problem, but I can provide the rest if needed.
(Also, I might not need all the global variables set, but I'll deal with that when it's needed.)

The problem is that you continually select for the original range while shrinking the list, until you eventually get an index too large. You could replace the while loop with
while easy:
question_choice = randint(0, len(easy)-1) #generate random int
question, answer, wrong_1, wrong_2, wrong_3 = easy[question_choice]
easy.pop(question_choice)
but even better, I think, to present all questions in random order,
random.shuffle(easy)
for q_and_a in easy:
question, answer, wrong_1, wrong_2, wrong_3 = q_and_a
Neither replacement is tested, so there may be a typo.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing and grouping text in a string using Python - python

>>> import re >>> s = "Who will win the game 'Michigan' 'Ohio State'" >>> re.match(r'(.+)\s+([\'"])(.+?)\2\s+([\'"])(.+?)\4', s).groups() ('Who will win the game', "'", 'Michigan', "'", 'Ohio State')

Related

Picking a restaurant based off mood

if/else: get condition met - Python

Poem-writing and getting error for re-using the same function - Python

Why isn't this if statement returning True?

IndexError: list index out of range with tkinter and procedures

Categories

Resources