Comparing multiple strings - python

Hey I am new I need some help with comparing strings
My Assignment is to make a chatbot, one that reads from a text file, that has possible things to input, and what the resulting output will be.
My problem is that it asks to choose the most suited one from the text file, easy yeh? but you also must save variables at the same time
Ok an example is one of the lines of the rules is:
you <w1> <w2> <w3> <w4> me | What makes you think I <w1> <w2> <w3> <w4> you?
You must save the <w1> and so on to a variable.
AND the input can be like, "did you know that you are really nice to me" so you have to adjust the code for that as well.
And also we cant make the code just for this text file, it is supposed to adjust to anything that is put into the text file.
Can someone help me ?
This is what I'm up to:
import string
import sys
import difflib
#File path:
rules = open("rules.txt", "rU")
#Set some var's:
currentField = 0
fieldEnd = 0
questions = []
responses = []
Input = ""
run = True
#Check if we are not at the end of the file:
for line in rules:
linem = line.split(" | ")
question = linem[0]
response = linem[1]
questions.append(question.replace("\n", ""))
responses.append(response.replace("\n", ""))
print questions
print responses
for q in questions:
qwords.appendq.split()
while run = True:
Input = raw_input('> ').capitalize()
for char in Input:
for quest in questions:
if char in quest:
n += 1
else:
if "<" in i:
n += 1
closestQuestion = questions.index(q)
print response

I would prefer pyparsing over any regex-based approach to tackle this task. It's easier to construct a readable parser even for more involved and complex grammars.

As a quick-and-stupid solution, parse input file and store entries in list. Each entry should contain dynamically-compiled "matching regex" (e.g. r'you (\w+) (\w+) (\w+) (\w+) me(?i)') and "replacement string" (e.g. r'What makes you think I \1 \2 \3 \4 you?'). For each incoming request, chat bot should match text agains regex list, find appropriate entry and then call regex.sub() for "replacement string".
But first of all, read some beginner's tutorial on Python. Your code is un-pythonic and just wrong in many ways.

Related

how to make input in python non-punctuation sensitive

I'm trying to make a french practice chatbot however users may not type the punctuation in the code. how do I fix this? this is my code. for example, I want to make it so that no matter what you type, like instead of "bonjour!" they can type "bonjour" only with no punctuation sensitivity. this may be not possible without a module but I don't care if it requires a module
#menu starts first
line1 = ("=-----------=")
line2 = ("-------------")
bots = ["Adam", "Marie", "James", "Joy"]
def conversation():
print("For your conversation, Choose a Bot")
print(line1)
for bot in bots:
print(bot)
print(line2)
botopt = input("option: ")
if botopt==("Adam").lower():
print("adam: Bonjour!")
msg = input("Type a message: ")
if msg==("Bonjour! Comment ca va?").lower():
print("adam: Très bien. et toi?")
msg = input("Type a message: ")
if msg==("Tres bien. Comment tu t'appelle?").lower():
print("adam: Je m'appelle Adam. Et toi?")
input("Name: ")
print("adam: Enchante. ")
def menu():
print("Bonjour! Please choose an area to practice on.")
wordoption = ["Verbs"]
placeoption = ["Restaurant", "Store", "Conversation"]
#pour le optionnes
print("Vocabulary")
print("=----------=")
for wopt in wordoption:
print(wopt)
print("=---------=")
print("Locations")
print("-----------")
for popt in placeoption:
print(popt)
print("--------------")
opt = input("option: ")
if opt==("Conversation").lower():
conversation()
menu()
i cant find info online so i am not sure what to do, so please try to help me, send me modules if needed
I'm also new to python, so if my suggestion(s) do not seem adequate to you, then don't use them. I use Python v3.10.2, and tested my solution in IDLE shell v3.10.2.
In the header portion of your code, you typed:
line1 = ("=-----------=")
and line2 = ("-------------")
I believe these are global variables, because these appear outside a defined function. You used these effectively in conversation(), your first function:
print(line1)
for bot in bots:
print(bot)
print(line2)
But, you did not use these in menu(), the second function. Why not? If for no other reason, I suggest you use it in the second function as well for consistency. This should improve your programming style, and make it easier for others to understand your code.
For the punctuation problem, I think you can probably work this out without the need of an additional module. If I understand the problem correctly, you don't want users to enter punctuation of any kind (! , . : ; - " ') in response to your code. Is this right? If so, you want users to be confined to using letters A-Z or a-z and nothing else in their response.
You could create another function that takes a single argument, such as:
def Response(msg):
# create an empty string for processed word with punctuation removed
rsp = ''
#msg = "Bonjour!"
for c in range(len(msg)):
word = msg[c] # index of 1st character in msg string = 0, 2nd = 1, 3rd = 2, etc. . .
# add exceptions here
if word != '!': # exclamation mark
rsp = rsp + word
elif word != ',': # comma
rsp = rsp + word
elif word != '.': # period
rsp = rsp + word
# etc...
else:
print('Please, do not use punctuation of any kind.')
print (word)
Keep in mind that this will work for single words input from the user. For full sentences, you'll have to allow for spaces to be used for word spacing.
Also, the French comment in the menu() definition should read:
pour les options
I'll try my best to clearly answer your question.
The Problem
First, your code uses capitalization in such a way that the capitalization of the input has to match that of the strings you're checking it with.
To fix that, I recommend making all your input string AND the input-checking strings lowercase and unpunctuated.
To do that, you can either chose to manually adjust all your checking strings, or you can just use str.lower().
To allow the user to only input unpunctuated and lowercase text, you can make your own function customInput() that works like input() but also removes all punctuation, and also lowercases the text before return.
Normal code
Here is the code, feel free to use/modify:
# you NEED to import this for the unpunctuation to work
from string import punctuation
# this is a function that works like input(), but removes all punctuation and makes the input lowercase
def customInput(text:str):
inputText = input(text).lower()
for character in [symbol for symbol in punctuation]:
inputText = inputText.replace(character,'')
return inputText
# do remember to use string.lower() on the strings you check the input with
Code for also removing numbers
If you want to remove numbers too from the input, then you can use this code:
# you NEED to import these for the unpunctuation to work
from string import punctuation, digits
# this is a function that works like input(), but removes all punctuation and numbers, and makes the input lowercase
def customInput(text:str):
inputText = input(text).lower()
for character in [symbol for symbol in punctuation] + [number for number in digits]:
inputText = inputText.replace(character,'')
return inputText
# do remember to use string.lower() on the strings you check the input with
Example usage
# an example of using customInput()
example = customInput('Saisissez une réponse: ')
# printing answer out
print(example)
# There will be no punctuation/capitalization in the printed output
Hope this helps! If it doesn't, comment and I will update the answer.

How to have the .isupper() and .islower() methods in one line of code?

Working on some code that reads a csv file and gets the count of a searchWord the user looks for. The function shouldn't be case sensitive, so if a user wants to find the word: trampoline, it will catch TRAMPOLINE, Trampoline, etc.
I was wondering if it was possible to add in .isupper() and .islower() in the same argument, to simplify the code? I get the sense that I'm doing something else wrong though; just having trouble finding out what that is.
Ex. from the .csv file
I have 12 trampolines. The TRAMPOLINES are round and have netting
surrounding them.
Trampolines are my favorite activity.
I've tried adding in both methods on separate lines of code, but am met with unexpected outputs.
def countingWords(word):
openFile= open('file.csv', 'r')
contents = openFile.read()
openFile.close
counter = 0
for separateLines in contents.split():
if str(word) in separateLines:
counter += 1
elif str(word).isupper():
counter += 1
elif str(word).islower():
counter += 1
return count
Currently if a user inputs: countingWords('Trampoline') the output will only be 1, when it should be 3
Convert both the target word and the line of text from the file to lowercase:
for separateLines in contents.split():
if word.lower() in separateLines.lower():
counter += 1
Maybe it's overkill, but you could use a regular expression. Why? It's probably a little faster than splitting and looping (not that it isn't happening behind the scenes anyway). Enjoy - I wrote a little snippet below!
import re
data = """
I have 12 trampolines. The TRAMPOLINES are round and have netting
surrounding them.
Trampolines are my favorite activity.
"""
def count_instances(data, word):
res = re.findall('.?{}.?'.format(word.lower()), data, flags=re.IGNORECASE)
print(res)
return len(res)
print(count_instances(data, 'trampolines'))
islower and isupper only return a Boolean as to whether a string is in lowercase or not. You want a function which will convert a string to uppercase. That is, a function with write/capabilities, not just read. Use str.lower() and str.upper()

Using a keyword to print a sentence in Python

Hello I am writing a Python program that reads through a given .txt file and looks for keywords. In this program once I have found my keyword (for example 'data') I would like to print out the entire sentence the word is associated with.
I have read in my input file and used the split() method to rid of spaces, tabs and newlines and put all the words into an array.
Here is the code I have thus far.
text_file = open("file.txt", "r")
lines = []
lines = text_file.read().split()
keyword = 'data'
for token in lines:
if token == keyword:
//I have found my keyword, what methods can I use to
//print out the words before and after the keyword
//I have a feeling I want to use '.' as a marker for sentences
print(sentence) //prints the entire sentence
file.txt Reads as follows
Welcome to SOF! This website securely stores data for the user.
desired output:
This website securely stores data for the user.
We can just split text on characters that represent line endings and then loop trough those lines and print those who contain our keyword.
To split text on multiple characters , for example line ending can be marked with ! ? . we can use regex:
import re
keyword = "data"
line_end_chars = "!", "?", "."
example = "Welcome to SOF! This website securely stores data for the user?"
regexPattern = '|'.join(map(re.escape, line_end_chars))
line_list = re.split(regexPattern, example)
# line_list looks like this:
# ['Welcome to SOF', ' This website securely stores data for the user', '']
# Now we just need to see which lines have our keyword
for line in line_list:
if keyword in line:
print(line)
But keep in mind that: if keyword in line: matches a sequence of
characters, not necessarily a whole word - for example, 'data' in
'datamine' is True. If you only want to match whole words, you ought
to use regular expressions:
source explanation with example
Source for regex delimiters
My approach is similar to Alberto Poljak but a little more explicit.
The motivation is to realise that splitting on words is unnecessary - Python's in operator will happily find a word in a sentence. What is necessary is the splitting of sentences. Unfortunately, sentences can end with ., ? or ! and Python's split function does not allow multiple separators. So we have to get a little complicated and use re.
re requires us to put a | between each delimiter and escape some of them, because both . and ? have special meanings by default. Alberto's solution used re itself to do all this, which is definitely the way to go. But if you're new to re, my hard-coded version might be clearer.
The other addition I made was to put each sentence's trailing delimiter back on the sentence it belongs to. To do this I wrapped the delimiters in (), which captures them in the output. I then used zip to put them back on the sentence they came from. The 0::2 and 1::2 slices will take every even index (the sentences) and concatenate them with every odd index (the delimiters). Uncomment the print statement to see what's happening.
import re
lines = "Welcome to SOF! This website securely stores data for the user. Another sentence."
keyword = "data"
sentences = re.split('(\.|!|\?)', lines)
sentences_terminated = [a + b for a,b in zip(sentences[0::2], sentences[1::2])]
# print(sentences_terminated)
for sentence in sentences_terminated:
if keyword in sentence:
print(sentence)
break
Output:
This website securely stores data for the user.
This solution uses a fairly simple regex in order to find your keyword in a sentence, with words that may or may not be before and after it, and a final period character. It works well with spaces and it's only one execution of re.search().
import re
text_file = open("file.txt", "r")
text = text_file.read()
keyword = 'data'
match = re.search("\s?(\w+\s)*" + keyword + "\s?(\w+\s?)*.", text)
print(match.group().strip())
Another Solution:
def check_for_stop_punctuation(token):
stop_punctuation = ['.', '?', '!']
for i in range(len(stop_punctuation)):
if token.find(stop_punctuation[i]) > -1:
return True
return False
text_file = open("file.txt", "r")
lines = []
lines = text_file.read().split()
keyword = 'data'
sentence = []
stop_punctuation = ['.', '?', '!']
i = 0
while i < len(lines):
token = lines[i]
sentence.append(token)
if token == keyword:
found_stop_punctuation = check_for_stop_punctuation(token)
while not found_stop_punctuation:
i += 1
token = lines[i]
sentence.append(token)
found_stop_punctuation = check_for_stop_punctuation(token)
print(sentence)
sentence = []
elif check_for_stop_punctuation(token):
sentence = []
i += 1

Python - how to separate paragraphs from text?

I need to separate texts into paragraphs and be able to work with each of them. How can I do that? Between every 2 paragraphs can be at least 1 empty line. Like this:
Hello world,
this is an example.
Let´s program something.
Creating new program.
Thanks in advance.
This sould work:
text.split('\n\n')
Try
result = list(filter(lambda x : x != '', text.split('\n\n')))
Not an entirely trivial problem, and the standard library doesn't seem to have any ready solutions.
Paragraphs in your example are split by at least two newlines, which unfortunately makes text.split("\n\n") invalid. I think that instead, splitting by regular expressions is a workable strategy:
import fileinput
import re
NEWLINES_RE = re.compile(r"\n{2,}") # two or more "\n" characters
def split_paragraphs(input_text=""):
no_newlines = input_text.strip("\n") # remove leading and trailing "\n"
split_text = NEWLINES_RE.split(no_newlines) # regex splitting
paragraphs = [p + "\n" for p in split_text if p.strip()]
# p + "\n" ensures that all lines in the paragraph end with a newline
# p.strip() == True if paragraph has other characters than whitespace
return paragraphs
# sample code, to split all script input files into paragraphs
text = "".join(fileinput.input())
for paragraph in split_paragraphs(text):
print(f"<<{paragraph}>>\n")
Edited to add:
It is probably cleaner to use a state machine approach. Here's a fairly simple example using a generator function, which has the added benefit of streaming through the input one line at a time, and not storing complete copies of the input in memory:
import fileinput
def split_paragraph2(input_lines):
paragraph = [] # store current paragraph as a list
for line in input_lines:
if line.strip(): # True if line is non-empty (apart from whitespace)
paragraph.append(line)
elif paragraph: # If we see an empty line, return paragraph (if any)
yield "".join(paragraph)
paragraph = []
if paragraph: # After end of input, return final paragraph (if any)
yield "".join(paragraph)
# sample code, to split all script input files into paragraphs
for paragraph in split_paragraph2(fileinput.input()):
print(f"<<{paragraph}>>\n")
I usually split then filter out the '' and strip. ;)
a =\
'''
Hello world,
this is an example.
Let´s program something.
Creating new program.
'''
data = [content.strip() for content in a.splitlines() if content]
print(data)
this is worked for me:
text = "".join(text.splitlines())
text.split('something that is almost always used to separate sentences (i.e. a period, question mark, etc.)')
Easier. I had the same problem.
Just replace the double \n\n entry by a term that you seldom see in the text (here ¾):
a ='''
Hello world,
this is an example.
Let´s program something.
Creating new program.'''
a = a.replace("\n\n" , "¾")
splitted_text = a.split('¾')
print(splitted_text)

Parsing a huge dictionary file with python. Simple task I cant get my head around

I just got a giant 1.4m line dictionary for other programming uses, and i'm sad to see notepad++ is not powerful enough to do the parsing job to the problem. The dictionary contains three types of lines:
<ar><k>-aaltoiseen</k>
yks.ill..ks. <kref>-aaltoinen</kref></ar>
yks.nom. -aaltoinen; yks.gen. -aaltoisen; yks.part. -aaltoista; yks.ill. -aaltoiseen; mon.gen. -aaltoisten -aaltoisien; mon.part. -aaltoisia; mon.ill. -aaltoisiinesim. Lyhyt-, pitkäaaltoinen.</ar>
and I want to extract every word of it to a list of words without duplicates. Lets start by my code.
f = open('dic.txt')
p = open('parsed_dic.txt', 'r+')
lines = f.readlines()
for line in lines:
#<ar><k> lines
#<kref> lines
#ending to ";" - lines
for word in listofwordsfromaline:
p.write(word,"\n")
f.close()
p.close()
Im not particulary asking you how to do this whole thing, but anything would be helpful. A link to a tutorial or one type of line parsing method would be highly appreciated.
For the first two cases you can see that any word starts and ends with a specific tag , if we see it closely , then we can say that every word must have a ">-" string preceding it and a "
# First and second cases
start = line.find(">-")+2
end = line.find("</")+1
required_word = line[start:end]
In the last case you can use the split method:
word_lst = line.split(";")
ans = []
for word in word_list:
start = word.find("-")
ans.append(word[start:])
ans = set(ans)
First find what defines a word for you.
Make a regular expression to capture those matches. For example - word break '\b' will match word boundaries (non word characters).
https://docs.python.org/2/howto/regex.html
If the word definition in each type of line is different - then if statements to match the line first, then corresponding regular expression match for the word, and so on.
Match groups in Python

Categories