I am trying to write a loop in python to extract information from a sentence per row. THe input sentences look like this:
[t] troubleshooting ad-2500 and ad-2600 no picture scrolling b/w .
##repost from january 13 , 2004 with a better fit title .
i/p button[+2]##im a more happier person after discovering the i/p button !
dvd player[+1][p]##it practically plays almost everything you give it .
player[+2],sound[-1]##i 've had the player for about 2 years now and it still performs nicely with the exception of an occasional wwhhhrrr sound from the motor .
I would like to extract the sentences only and use the information before the ## as tags and write this all to a variable that then contains all the information. The expected output:
Variable: title
troubleshooting ad-2500 and ad-2600 no picture scrolling b/w .
troubleshooting ad-2500 and ad-2600 no picture scrolling b/w .
troubleshooting ad-2500 and ad-2600 no picture scrolling b/w .
troubleshooting ad-2500 and ad-2600 no picture scrolling b/w .
troubleshooting ad-2500 and ad-2600 no picture scrolling b/w .
So the variable should be maintained until a new [t] is in the row.
Variable: sentence_only
repost from january 13 , 2004 with a better fit title .
im a more happier person after discovering the i/p button !
it practically plays almost everything you give it .
i 've had the player for about 2 years now and it still performs nicely with the exception of an occasional wwhhhrrr sound from the motor .
Variable: tag
i/p button[+2]
dvd player[+1][p]
player[+2],sound[-1]
The current output only maintains the last row and not the full list in the variable.
Here is my attempt in solving this:
import nltk
from nltk.corpus import PlaintextCorpusReader
corpus_root = "Data/Customer_review_data"
filelists = PlaintextCorpusReader(corpus_root, '.*')
filelists.fileids()
rawlist = filelists.raw('Apex AD2600 Progressive-scan DVD player.txt')
sentence = rawlist.split("\n")[:]
a_line = ""
sentence_only = ""
content = ""
title = ""
tag = ""
for b_line in sentence:
if title != '' or content != '' or sentence_only != '':
content = title, tag, sentence_only
if re.match(r"^\*", b_line):
continue
if re.match(r"^\[t\][ ]", b_line):
title = b_line[4:]
continue
if re.match(r"^\[t\]", b_line):
title = b_line[3:]
continue
if re.match(r"^##", b_line):
sentence_only = b_line[2:]
continue
if re.match(r".*##", b_line):
i = len(b_line.split('##')[0])+2
sentence_only = b_line[i:]
tag = b_line[:i-2]
continue
if re.match(r".*#", b_line):
sentence_only = b_line[2:]
continue
print(test)
Actually, I reread your question, and it seems each file only contains one item. If that is the case, you can do this much easier.
with open("somefile.txt") as infile:
data = infile.read().splitlines() # this seems to work OS agnostic
item = {
"title": data[0][4:],
"contents": [{"tag": line.split("##")[0], "sentence": line.split("##")[1]} for line in data[1:]]
}
This will result in a dict item that is the same as the ones in the old answer below...
OLD ANSWER
I would use a list of dict items to contain the data, but you can easily adjust what variables you put the resulting data in.
from pprint import pprint
with open("somefile.txt") as infile:
data = infile.read().splitlines() # this seems to work OS agnostic
result = []
current_item = None
for line in data:
if line.startswith('[t]'):
# add everything stored sofar to result
# check is needed for the first loop
if current_item:
result.append(current_item)
current_item = {
"title": line[4:], # strip the [t] part
"contents": [] # reset the contents list
}
else:
current_item["contents"].append({
"tag": line.split("##")[0], # the first element of the split
"sentence": line.split("##")[1] # the second element of the split
})
# finally, add last item
result.append(current_item)
# usage:
for item in result:
print(f"\nTITLE: {item['title']}")
print("Variable: sentence_only")
for content in item["contents"]:
print(content["sentence"])
for item in result:
print(f"\nTITLE: {item['title']}")
print("Variable: tag")
for content in item["contents"]:
print(content["tag"])
# pprint:
pprint(result)
Output below.
Note that I just duplicated the example input and added the very imaginative NR2 to the lines to differentiate between the two "items" in the source file...
TITLE: troubleshooting ad-2500 and ad-2600 no picture scrolling b/w .
Variable: sentence_only
repost from january 13 , 2004 with a better fit title .
im a more happier person after discovering the i/p button !
it practically plays almost everything you give it .
i 've had the player for about 2 years now and it still performs nicely with the exception of an occasional wwhhhrrr sound from the motor .
TITLE: troubleshooting ad-2500 and ad-2600 no picture scrolling b/w . NR2
Variable: sentence_only
repost from january 13 , 2004 with a better fit title . NR2
im a more happier person after discovering the i/p button ! NR2
it practically plays almost everything you give it . NR2
i 've had the player for about 2 years now and it still performs nicely with the exception of an occasional wwhhhrrr sound from the motor . NR2
TITLE: troubleshooting ad-2500 and ad-2600 no picture scrolling b/w .
Variable: tag
i/p button[+2]
dvd player[+1][p]
player[+2],sound[-1]
TITLE: troubleshooting ad-2500 and ad-2600 no picture scrolling b/w . NR2
Variable: tag
i/p button[+2] NR2
dvd player[+1][p] NR2
player[+2],sound[-1] NR2
[{'contents': [{'sentence': 'repost from january 13 , 2004 with a better fit '
'title .',
'tag': ''},
{'sentence': 'im a more happier person after discovering the '
'i/p button ! ',
'tag': 'i/p button[+2]'},
{'sentence': 'it practically plays almost everything you give '
'it . ',
'tag': 'dvd player[+1][p]'},
{'sentence': "i 've had the player for about 2 years now and it "
'still performs nicely with the exception of an '
'occasional wwhhhrrr sound from the motor . ',
'tag': 'player[+2],sound[-1]'}],
'title': 'troubleshooting ad-2500 and ad-2600 no picture scrolling b/w . '},
{'contents': [{'sentence': 'repost from january 13 , 2004 with a better fit '
'title . NR2',
'tag': ''},
{'sentence': 'im a more happier person after discovering the '
'i/p button ! NR2',
'tag': 'i/p button[+2] NR2'},
{'sentence': 'it practically plays almost everything you give '
'it . NR2',
'tag': 'dvd player[+1][p] NR2'},
{'sentence': "i 've had the player for about 2 years now and it "
'still performs nicely with the exception of an '
'occasional wwhhhrrr sound from the motor . NR2',
'tag': 'player[+2],sound[-1] NR2'}],
'title': 'troubleshooting ad-2500 and ad-2600 no picture scrolling b/w . '
'NR2'}]
Related
I can open this file directly from the net,and I want to add row numbers to each line based on rules. If you need header row number,then start from number 1, if no need, then start from next line. This is my code, I tried a lot but doesn't work. It looks like picture. Does anyone how to solve this problem? Thanks in advance!
import sys
class Main:
def task1(self):
print('*' * 30, 'Task')
import urllib.request
# url
url = 'http://www.born.nhely.hu/group_list.txt'
# Initiate a request to get a response
while True:
try:
response = urllib.request.urlopen(url)
except Exception as e:
print('An error has occurred, the request is being made again, the error message is as follows:', e)
else:
break
# Print all student information
content = response.read().decode('utf-8')
#add row number
header_row = input("Do you want to know header_row numbers? Y OR N?")
if header_row == 'Y':
for i, line in enumerate(content, start=1):
print(f'{i},{line}')
else:
for i, line in enumerate(content, start=0):
print('{},{}'.format(i, line.strip()))
def start(self):
self.task1()
Main().start()
Have a look at the data you are downloading:
Name;Short name;Email;Country;Other spoken languages
ABOUELHASSAN Shehab Ibrahim Adbelazin;?;dwedar909#gmail.com;?;?
AGHAEI HOSSEIN ABADI Mohammad Mehdi;Matt;mahdiaghaei355#gmail.com;Iran;English
...
Now look at the results you are getting:
1,N
2,a
3,m
4,e
5,;
6,S
7,h
8,o
...
It should be apparent that you are looping character by character; not line by line.
When you have:
for i, line in enumerate(content, start=1):
print(f'{i},{line}')
content is a string -- not a list of lines -- so you will loop over the string character by character with the for loop.
So to fix, do:
for i, line in enumerate(content.splitlines(), start=1):
print(f'{i},{line}')
Or, you can change the method of reading from the server to reading lines instead of characters:
content = response.readlines()
Your absorbing the .txt content in one big string... if you use .readlines() instead of .read(), you can achieve what you want.
You should modify this:
# Print all student information
content = response.read().decode('utf-8')
To this:
# Print all student information
content = response.readlines()
You can use the repr() method to take a look at your data:
print(repr(content))
'Name;Short name;Email;Country;Other spoken languages\r\nABOUELHASSAN Shehab Ibrahim Adbelazin;?;dwedar909#gmail.com;?;?\r\nAGHAEI HOSSEIN ABADI Mohammad Mehdi;Matt;mahdiaghaei355#gmail.com;Iran;English\r\nAMIN Asjad;?;;?;?\r\nATILA Arda Burak;Arda;arda_atila#hotmail.com;Turkey;English\r\nBELTRAN CASTRO Carlos Ricardo;Ricardo;crbeltrancas#gmail.com;Colombia;English, Chinese\r\nBhatti Muhammad Hasan;?;;?;?\r\nCAKIR Alp Hazar;Alp;alphazarc#gmail.com;Turkey;English\r\nDENG Zhihui;Deng;dzhfalcon0727#gmail.com;China;English\r\nDURUER Ahmet Enes;Ahmet / kahverengi;hello#ahmetduruer.com;Turkey;English\r\nENKHZAYA Jagar;Jager;japman2400#gmail.com;Mongolia;English\r\nGHAIBAH Sanaa;Sanaa;sanaagheibeh12#gmail.com;Syria;English\r\nGUO Ruizheng;?;ruizhengguo#gmail.com;China;English\r\nGURBANZADE Gurban;Qurban;gurbanzade01#gmail.com;Azeribaijan;English, Russian, Turkish\r\nHASNAIN Syed Muhammad;Hasnain;syedhasnainhijazy313#gmail.com;Pakistan;?\r\nISMAYILOV Firdovsi;Firi;firiisi#gmail.com;Azeribaijan ?;English,Russian,Turkish\r\nKINGRANI Muskan;Muskan;muskankingrani4#gmail.com;India;English\r\nKOKO Susan Kekeli Ruth;Susan;susankoko3#gmail.com;Ghana;N/A\r\nKOLA-OLALEYE Adeola Damilola;Adeola;inboxadeola#gmail.com;Nigeria;French\r\nLEWIS Madison Buse;?;madisonbuse#yahoo.com;Turkey;Turkish\r\nLI Ting;Ting;514053044#qq.com;China;English\r\nMARUSENKO Svetlana;Svetlana;svetlana.maru#gmail.com;Russia;English, German\r\nMOHANTY Cyrus;cyrus;cyrusmohanty5261#gmail.com;India;English\r\nMOTHOBI Thabo Emmanuel;thabo;thabomothobi#icloud.com;South Africa;English\r\nNayudu Yashmit Vinay;?;;?;?\r\nPurevsuren Davaadorj;?;Purevsuren.davaadorj99#gmail.com;Mongolia ?;English\r\nSAJID Anoosha;Anoosha;anooshasajid12#gmail.com;Pakistan;English\r\nSHANG Rongxiang;Xiang;1074482757#qq.com;China;English\r\nSU Haobo;Su;2483851740#qq.com;China;English\r\nTAKEUCHI ROSSMAN Elly;Elly;elliebanana10th#gmail.com;Japan;English\r\nULUSOY Nedim Can;Nedim;nedimcanulusoy#gmail.com;Turkey;English, Hungarian\r\nXuan Qijian;Xuan;xjwjadon#gmail.com;China ?;?\r\nYUAN Gaopeng;Yuan;1277237374#qq.com;China;English\r\n'
vs
print(repr(content))
[b'Name;Short name;Email;Country;Other spoken languages\r\n', b'ABOUELHASSAN Shehab Ibrahim Adbelazin;?;dwedar909#gmail.com;?;?\r\n', b'AGHAEI HOSSEIN ABADI Mohammad Mehdi;Matt;mahdiaghaei355#gmail.com;Iran;English\r\n', b'AMIN Asjad;?;;?;?\r\n', b'ATILA Arda Burak;Arda;arda_atila#hotmail.com;Turkey;English\r\n', b'BELTRAN CASTRO Carlos Ricardo;Ricardo;crbeltrancas#gmail.com;Colombia;English, Chinese\r\n', b'Bhatti Muhammad Hasan;?;;?;?\r\n', b'CAKIR Alp Hazar;Alp;alphazarc#gmail.com;Turkey;English\r\n', b'DENG Zhihui;Deng;dzhfalcon0727#gmail.com;China;English\r\n', b'DURUER Ahmet Enes;Ahmet / kahverengi;hello#ahmetduruer.com;Turkey;English\r\n', b'ENKHZAYA Jagar;Jager;japman2400#gmail.com;Mongolia;English\r\n', b'GHAIBAH Sanaa;Sanaa;sanaagheibeh12#gmail.com;Syria;English\r\n', b'GUO Ruizheng;?;ruizhengguo#gmail.com;China;English\r\n', b'GURBANZADE Gurban;Qurban;gurbanzade01#gmail.com;Azeribaijan;English, Russian, Turkish\r\n', b'HASNAIN Syed Muhammad;Hasnain;syedhasnainhijazy313#gmail.com;Pakistan;?\r\n', b'ISMAYILOV Firdovsi;Firi;firiisi#gmail.com;Azeribaijan ?;English,Russian,Turkish\r\n', b'KINGRANI Muskan;Muskan;muskankingrani4#gmail.com;India;English\r\n', b'KOKO Susan Kekeli Ruth;Susan;susankoko3#gmail.com;Ghana;N/A\r\n', b'KOLA-OLALEYE Adeola Damilola;Adeola;inboxadeola#gmail.com;Nigeria;French\r\n', b'LEWIS Madison Buse;?;madisonbuse#yahoo.com;Turkey;Turkish\r\n', b'LI Ting;Ting;514053044#qq.com;China;English\r\n', b'MARUSENKO Svetlana;Svetlana;svetlana.maru#gmail.com;Russia;English, German\r\n', b'MOHANTY Cyrus;cyrus;cyrusmohanty5261#gmail.com;India;English\r\n', b'MOTHOBI Thabo Emmanuel;thabo;thabomothobi#icloud.com;South Africa;English\r\n', b'Nayudu Yashmit Vinay;?;;?;?\r\n', b'Purevsuren Davaadorj;?;Purevsuren.davaadorj99#gmail.com;Mongolia ?;English\r\n', b'SAJID Anoosha;Anoosha;anooshasajid12#gmail.com;Pakistan;English\r\n', b'SHANG Rongxiang;Xiang;1074482757#qq.com;China;English\r\n', b'SU Haobo;Su;2483851740#qq.com;China;English\r\n', b'TAKEUCHI ROSSMAN Elly;Elly;elliebanana10th#gmail.com;Japan;English\r\n', b'ULUSOY Nedim Can;Nedim;nedimcanulusoy#gmail.com;Turkey;English, Hungarian\r\n', b'Xuan Qijian;Xuan;xjwjadon#gmail.com;China ?;?\r\n', b'YUAN Gaopeng;Yuan;1277237374#qq.com;China;English\r\n']
Also, instead of hard-coding the charset as utf-8, you can use response.headers.get_content_charset()
I'm using bibtexparser to parse a bibtex file.
import bibtexparser
with open('MetaGJK12842.bib','r') as bibfile:
bibdata = bibtexparser.load(bibfile)
While parsing I get the error message:
Could not parse properly, starting at
#article{Frenn:EvidenceBasedNursing:1999,
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/pyparsing.py", line 3183, in parseImpl
raise ParseException(instring, loc, self.errmsg, self)
pyparsing.ParseException: Expected end of text (at char 5773750),
(line:47478, col:1)`
The line refers to the following bibtex entry:
#article{Frenn:EvidenceBasedNursing:1999,
author = {Frenn, M.},
title = {A Mediterranean type diet reduced all cause and cardiac mortality after a first myocardial infarction [commentary on de Lorgeril M, Salen P, Martin JL, et al. Mediterranean dietary pattern in a randomized trial: prolonged survival and possible reduced cancer rate. ARCH INTERN MED 1998;158:1181-7]},
journal = {Evidence Based Nursing},
uuid = {15A66A61-0343-475A-8700-F311B08BB2BC},
volume = {2},
number = {2},
pages = {48-48},
address = {College of Nursing, Marquette University, Milwaukee, WI},
year = {1999},
ISSN = {1367-6539},
url = {},
keywords = {Treatment Outcomes;Mediterranean Diet;Mortality;France;Neoplasms -- Prevention and Control;Phase One Excluded - No Assessment of Vegetable as DV;Female;Phase One - Reviewed by Hao;Myocardial Infarction -- Diet Therapy;Diet, Fat-Restricted;Phase One Excluded - No Fruit or Vegetable Study;Phase One Excluded - No Assessment of Fruit as DV;Male;Clinical Trials},
tags = {Phase One Excluded - No Assessment of Vegetable as DV;Phase One Excluded - No Fruit or Vegetable Study;Phase One - Reviewed by Hao;Phase One Excluded - No Assessment of Fruit as DV},
accession_num = {2000008864. Language: English. Entry Date: 20000201. Revision Date: 20130524. Publication Type: journal article},
remote_database_name = {rzh},
source_app = {EndNote},
EndNote_reference_number = {4413},
Secondary_title = {Evidence Based Nursing},
Citation_identifier = {Frenn 1999a},
remote_database_provider = {EBSCOhost},
publicationStatus = {Unknown},
abstract = {Question: text.},
notes = {(0) abstract; commentary. Journal Subset: Core Nursing; Europe; Nursing; Peer Reviewed; UK \& Ireland. No. of Refs: 1 ref. NLM UID: 9815947.}
}
What is wrong with this entry?
It seems that the issue has been addressed and resolved in the project repository (see Issue 147)
Until the next release, installing the library from the git repository can serve as a temporary fix.
pip install --upgrade git+https://github.com/sciunto-org/python-bibtexparser.git#master
I had this same error and found an entry near the line mentioned in the error that had a line like this
...
year = {1959},
month =
}
When I removed the null month item it parsed for me.
Good Afternoon all,
I've been working on a contact-book program for a school project. I've got all of the underlying code complete. However I've decided to take it one step further and implement a basic interface. I am trying to display all of the contacts using the code snippet below:
elif x==2:
phonebook_data= open(data_path,mode='r',encoding = 'utf8')
if os.stat(data_path)[6]==0:
print("Your contact book is empty.")
else:
for line in phonebook_data:
data= eval(line)
for k,v in sorted(data.items()):
x= (k + ": " + v)
from tkinter import *
root = Tk()
root.title("Contacts")
text = Text(root)
text.insert('1.0', x)
text.pack()
text.update()
root.mainloop()
phonebook_data.close()
The program works, however every contact opens in a new window. I would like to display all of the same information in a single loop. I'm fairly new to tkinter and I apologize if the code is confusing at all. Any help would be greatly appreciated!!
First of all, the top of the snippet could be much more efficient:
phonebook_data= open(data_path,mode='r',encoding = 'utf8') should be changed to
phonebook_data = open(data_path).
Afterwards, just use:
contents = phonebook_data.read()
if contents == "": # Can be shortened to `if not contents:`
print("Your contact book is empty.")
And by the way, it's good practice to close the file as soon as you're done using it.
phonebook_data = open(data_path)
contents = phonebook_data.read()
phonebook_data.close()
if contents == "":
print("Your contact book is empty.")
Now for your graphics issue. Firstly, you should consider whether or not you really need a graphical interface for this application. If so:
# Assuming that the contact book is formatted `Name` `Number` (split by a space)
name_number = []
for line in contents.split("\n"): # Get each line
name, number = line.split()
name_number.append(name + ": " + number) # Append a string of `Name`: `Number` to the list
name_number.sort() # Sort by name
root = Tk()
root.title("Contact Book")
text = Text(root)
text.pack(fill=BOTH)
text.insert("\n".join(name_number))
root.mainloop()
Considering how much I have shown you, it would probably be considered cheating for you to use it. Do some more research into the code though, it didn't seem like the algorithm would work in the first place.
I am running Python 2.7.5 and using the built-in html parser for what I am about to describe.
The task I am trying to accomplish is to take a chunk of html that is essentially a recipe. Here is an example.
html_chunk = "<h1>Miniature Potato Knishes</h1><p>Posted by bettyboop50 at recipegoldmine.com May 10, 2001</p><p>Makes about 42 miniature knishes</p><p>These are just yummy for your tummy!</p><p>3 cups mashed potatoes (about<br> 2 very large potatoes)<br>2 eggs, slightly beaten<br>1 large onion, diced<br>2 tablespoons margarine<br>1 teaspoon salt (or to taste)<br>1/8 teaspoon black pepper<br>3/8 cup Matzoh meal<br>1 egg yolk, beaten with 1 tablespoon water</p><p>Preheat oven to 400 degrees F.</p><p>Sauté diced onion in a small amount of butter or margarine until golden brown.</p><p>In medium bowl, combine mashed potatoes, sautéed onion, eggs, margarine, salt, pepper, and Matzoh meal.</p><p>Form mixture into small balls about the size of a walnut. Brush with egg yolk mixture and place on a well-greased baking sheet and bake for 20 minutes or until well browned.</p>"
The goal is to separate out the header, junk, ingredients, instructions, serving, and number of ingredients.
Here is my code that accomplishes that
from bs4 import BeautifulSoup
def list_to_string(list):
joined = ""
for item in list:
joined += str(item)
return joined
def get_ingredients(soup):
for p in soup.find_all('p'):
if p.find('br'):
return p
def get_instructions(p_list, ingredient_index):
instructions = []
instructions += p_list[ingredient_index+1:]
return instructions
def get_junk(p_list, ingredient_index):
junk = []
junk += p_list[:ingredient_index]
return junk
def get_serving(p_list):
for item in p_list:
item_str = str(item).lower()
if ("yield" or "make" or "serve" or "serving") in item_str:
yield_index = p_list.index(item)
del p_list[yield_index]
return item
def ingredients_count(ingredients):
ingredients_list = ingredients.find_all(text=True)
return len(ingredients_list)
def get_header(soup):
return soup.find('h1')
def html_chunk_splitter(soup):
ingredients = get_ingredients(soup)
if ingredients == None:
error = 1
header = ""
junk_string = ""
instructions_string = ""
serving = ""
count = ""
else:
p_list = soup.find_all('p')
serving = get_serving(p_list)
ingredient_index = p_list.index(ingredients)
junk_list = get_junk(p_list, ingredient_index)
instructions_list = get_instructions(p_list, ingredient_index)
junk_string = list_to_string(junk_list)
instructions_string = list_to_string(instructions_list)
header = get_header(soup)
error = ""
count = ingredients_count(ingredients)
return (header, junk_string, ingredients, instructions_string,
serving, count, error)
It works well except in situations where I have chunks that contain strings like "Sauté" because soup = BeautifulSoup(html_chunk) causes Sauté to turn into Sauté and this is a problem because I have a huge csv file of recipes like the html_chunk and I'm trying to structure all of them nicely and then get the output back into a database. I tried checking it Sauté comes out right using this html previewer and it still comes out as Sauté. I don't know what to do about this.
What's stranger is that when I do what BeautifulSoup's documentation shows
BeautifulSoup("Sacré bleu!")
# <html><head></head><body>Sacré bleu!</body></html>
I get
# Sacré bleu!
But my colleague tried that on his Mac, running from terminal, and he got exactly what the documentation shows.
I really appreciate all your help. Thank you.
This is not a parsing problem; it is about encoding, rather.
Whenever working with text which might contain non-ASCII characters (or in Python programs which contain such characters, e.g. in comments or docstrings), you should put a coding cookie in the first or - after the shebang line - second line:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
... and make sure this matches your file encoding (with vim: :set fenc=utf-8).
BeautifulSoup tries to guess the encoding, sometimes it makes a mistake, however you can specify the encoding by adding the from_encoding parameter:
for example
soup = BeautifulSoup(html_text, from_encoding="UTF-8")
The encoding is usually available in the header of the webpage
I am writing a script that works like google suggest. Problem is that I am trying to get a suggestion for next 2 most likely words.
The example uses a txt file working_bee.txt. When writing a text "mis" I should get suggestions like "Miss Mary , Miss Taylor, ...". I only get "Miss, ...". I suspect the Ajax responseText method gives only a single word?
Any ideas what is wrong?
# Something that looks like Google suggest
def count_words(xFile):
frequency = {}
words=[]
for l in open(xFile, "rt"):
l = l.strip().lower()
for r in [',', '.', "'", '"', "!", "?", ":", ";"]:
l = l.replace(r, " ")
words += l.split()
for i in range(len(words)-1):
frequency[words[i]+" "+words[i+1]] = frequency.get(words[i]+" "+words[i+1], 0) + 1
return frequency
# read valid words from file
ws = count_words("c:/mod_python/working_bee.txt").keys()
def index(req):
req.content_type = "text/html"
return '''
<script>
function complete(q) {
var xhr, ws, e
e = document.getElementById("suggestions")
if (q.length == 0) {
e.innerHTML = ''
return
}
xhr = XMLHttpRequest()
xhr.open('GET', 'suggest_from_file.py/complete?q=' + q, true)
xhr.onreadystatechange = function() {
if (xhr.readyState == 4) {
ws = eval(xhr.responseText)
e.innerHTML = ""
for (i = 0; i < ws.length; i++)
e.innerHTML += ws[i] + "<br>"
}
}
xhr.send(null)
}
</script>
<input type="text" onkeyup="complete(this.value)">
<div id="suggestions"></div>
'''
def complete(req, q):
req.content_type = "text"
return [w for w in ws if w.startswith(q)]
txt file:
IV. Miss Taylor's Working Bee
"So you must. Well, then, here goes!" Mr. Dyce swung her up to his shoulder and went, two steps at a time, in through the crowd of girls, so that he arrived there first when the door was opened. There in the hall stood Miss Mary Taylor, as pretty as a pink.
"I heard there was to be a bee here this afternoon, and I've brought Phronsie; that's my welcome," he announced.
"See, I've got a bag," announced Phronsie from her perch, and holding it forth.
So the bag was admired, and the girls trooped in, going up into Miss Mary's pretty room to take off their things. And presently the big library, with the music-room adjoining, was filled with the gay young people, and the bustle and chatter began at once.
"I should think you'd be driven wild by them all wanting you at the same minute." Mr. Dyce, having that desire at this identical time, naturally felt a bit impatient, as Miss Mary went about inspecting the work, helping to pick out a stitch here and to set a new one there, admiring everyone's special bit of prettiness, and tossing a smile and a gay word in every chance moment between.
"Oh, no," said Miss Mary, with a little laugh, "they're most of them my Sunday- school scholars, you know."
Looking at your code I believe you are not sending the correct thing to Apache. You are sending apache a list and apache is expecting a string. I would suggest changing your return to json:
import json
def complete(req, q):
req.content_type = "text"
return json.dumps([w for w in ws if w.startswith(q)])