I'm creating a simple text editor in Python 3.4 and Tkinter. At the moment I'm stuck on the find feature.
I can find characters successfully but I'm not sure how to highlight them. I've tried the tag method without success, error:
str object has no attribute 'tag_add'.
Here's my code for the find function:
def find(): # is called when the user clicks a menu item
findin = tksd.askstring('Find', 'String to find:')
contentsfind = editArea.get(1.0, 'end-1c') # editArea is a scrolledtext area
findcount = 0
for x in contentsfind:
if x == findin:
findcount += 1
print('find - found ' + str(findcount) + ' of ' + findin)
if findcount == 0:
nonefound = ('No matches for ' + findin)
tkmb.showinfo('No matches found', nonefound)
print('find - found 0 of ' + findin)
The user inputs text into a scrolledtext field, and I want to highlight the matching strings on that scrolledtext area.
How would I go about doing this?
Use tag_add to add a tag to a region. Also, instead of getting all the text and searching the text, you can use the search method of the widget. I will return the start of the match, and can also return how many characters matched. You can then use that information to add the tag.
It would look something like this:
...
editArea.tag_configure("find", background="yellow")
...
def find():
findin = tksd.askstring('Find', 'String to find:')
countVar = tk.IntVar()
index = "1.0"
matches = 0
while True:
index = editArea.search(findin, index, "end", count=countVar)
if index == "": break
matches += 1
start = index
end = editArea.index("%s + %s c" % (index, countVar.get()))
editArea.tag_add("find", start, end)
index = end
Related
I am trying to scrape data from a word document available at:-
https://dl.dropbox.com/s/pj82qrctzkw9137/HE%20Distributors.docx
I need to scrape the Name, Address, City, State, and Email ID. I am able to scrape the E-mail using the below code.
import docx
content = docx.Document('HE Distributors.docx')
location = []
for i in range(len(content.paragraphs)):
stat = content.paragraphs[i].text
if 'Email' in stat:
location.append(i)
for i in location:
print(content.paragraphs[i].text)
I tried to use the steps mentioned:
How to read data from .docx file in python pandas?
I need to convert this into a data frame with all the columns mentioned above.
Still facing issues with the same.
There are some inconsistencies in the document - phone numbers starting with Tel: sometimes, and Tel.: other times, and even Te: once, and I noticed one of the emails is just in the last line for that distributor without the Email: prefix, and the State isn't always in the last line.... Still, for the most part, most of the data can be extracted with regex and/or splits.
The distributors are separated by empty lines, and the names are in a different color - so I defined this function to get the font color of any paragraph from its xml:
# from bs4 import BeautifulSoup
def getParaColor(para):
try:
return BeautifulSoup(
para.paragraph_format.element.xml, 'xml'
).find('color').get('w:val')
except:
return ''
The try...except hasn't been necessary yet, but just in case...
(The xml is actually also helpful for double-checking that .text hasn't missed anything - in my case, I noticed that the email for Shri Adhya Educational Books wasn't getting extracted.)
Then, you can process the paragraphs from docx.Document with a function like:
# import re
def splitParas(paras):
ptc = [(
p.text, getParaColor(p), p.paragraph_format.element.xml
) for p in paras]
curSectn = 'UNKNOWN'
splitBlox = [{}]
for pt, pc, px in ptc:
# double-check for missing text
xmlText = BeautifulSoup(px, 'xml').text
xmlText = ' '.join([s for s in xmlText.split() if s != ''])
if len(xmlText) > len(pt): pt = xmlText
# initiate
if not pt:
if splitBlox[-1] != {}:
splitBlox.append({})
continue
if pc == '20752E':
curSectn = pt.strip()
continue
if splitBlox[-1] == {}:
splitBlox[-1]['section'] = curSectn
splitBlox[-1]['raw'] = []
splitBlox[-1]['Name'] = []
splitBlox[-1]['address_raw'] = []
# collect
splitBlox[-1]['raw'].append(pt)
if pc == 'D12229':
splitBlox[-1]['Name'].append(pt)
elif re.search("^Te.*:.*", pt):
splitBlox[-1]['tel_raw'] = re.sub("^Te.*:", '', pt).strip()
elif re.search("^Mob.*:.*", pt):
splitBlox[-1]['mobile_raw'] = re.sub("^Mob.*:", '', pt).strip()
elif pt.startswith('Email:') or re.search(".*[#].*[.].*", pt):
splitBlox[-1]['Email'] = pt.replace('Email:', '').strip()
else:
splitBlox[-1]['address_raw'].append(pt)
# some cleanup
if splitBlox[-1] == {}: splitBlox = splitBlox[:-1]
for i in range(len(splitBlox)):
addrsParas = splitBlox[i]['address_raw'] # for later
# join lists into strings
splitBlox[i]['Name'] = ' '.join(splitBlox[i]['Name'])
for k in ['raw', 'address_raw']:
splitBlox[i][k] = '\n'.join(splitBlox[i][k])
# search address for City, State and PostCode
apLast = addrsParas[-1].split(',')[-1]
maybeCity = [ap for ap in addrsParas if '–' in ap]
if '–' not in apLast:
splitBlox[i]['State'] = apLast.strip()
if maybeCity:
maybePIN = maybeCity[-1].split('–')[-1].split(',')[0]
maybeCity = maybeCity[-1].split('–')[0].split(',')[-1]
splitBlox[i]['City'] = maybeCity.strip()
splitBlox[i]['PostCode'] = maybePIN.strip()
# add mobile to tel
if 'mobile_raw' in splitBlox[i]:
if 'tel_raw' not in splitBlox[i]:
splitBlox[i]['tel_raw'] = splitBlox[i]['mobile_raw']
else:
splitBlox[i]['tel_raw'] += (', ' + splitBlox[i]['mobile_raw'])
del splitBlox[i]['mobile_raw']
# split tel [as needed]
if 'tel_raw' in splitBlox[i]:
tel_i = [t.strip() for t in splitBlox[i]['tel_raw'].split(',')]
telNum = []
for t in range(len(tel_i)):
if '/' in tel_i[t]:
tns = [t.strip() for t in tel_i[t].split('/')]
tel1 = tns[0]
telNum.append(tel1)
for tn in tns[1:]:
telNum.append(tel1[:-1*len(tn)]+tn)
else:
telNum.append(tel_i[t])
splitBlox[i]['Tel_1'] = telNum[0]
splitBlox[i]['Tel'] = telNum[0] if len(telNum) == 1 else telNum
return splitBlox
(Since I was getting font color anyway, I decided to add another
column called "section" to put East/West/etc in. And I added "PostCode" too, since it seems to be on the other side of "City"...)
Since "raw" is saved, any other value can be double checked manually at least.
The function combines "Mobile" into "Tel" even though they're extracted with separate regex.
I'd say "Tel_1" is fairly reliable, but some of the inconsistent patterns mean that other numbers in "Tel" might come out incorrect if they were separated with '/'.
Also, "Tel" is either a string or a list of strings depending on how many numbers there were in "tel_raw".
After this, you can just view as DataFrame with:
#import docx
#import pandas
content = docx.Document('HE Distributors.docx')
# pandas.DataFrame(splitParas(content.paragraphs)) # <--all Columns
pandas.DataFrame(splitParas(content.paragraphs))[[
'section', 'Name', 'address_raw', 'City',
'PostCode', 'State', 'Email', 'Tel_1', 'tel_raw'
]]
import keyword
from tkinter import END
def highlight(text):
keywords = keyword.kwlist
for kw in keywords:
text.tag_remove(kw, 1.0, END)
first = 1.0
while True:
first = text.search(
kw, first,
nocase=False,
stopindex=END
)
if first is None or first == "":
break
first_splitted = first.split(".")
if len(first_splitted) == 1:
break
last = f"{first_splitted[0]}.{int(first_splitted[1]) + len(kw)}"
character_position_before_first = f"{first_splitted[0]}.{int(first_splitted[1]) - 1}"
character_before_first = text.get(character_position_before_first)
last_splitted = last.split(".")
character_position_after_last = f"{last_splitted[0]}.{int(last_splitted[1])}"
character_after_last = text.get(character_position_after_last)
if not character_before_first.isspace() and not character_after_last.isspace():
break
text.tag_add(kw, first, last)
first = last
text.tag_config(
kw,
foreground="#aa71eb"
)
Given the following code, I'm trying to highlight key words in a text. The issue is that sub strings are being marked.
Example:
hello this is a test open works too lmao lol lol lol
Would mark is from this and is
I only want it to mark the second is as the first is is a sub string of this
I have no clue why the code above is not working. Help would be appreciated.
I have a batch of .doc documents, in the first line of each document I have the name of a person written. I would like to add in each document the email adress of the person, based on a list I have. How can I use python or vba to program something that does the job for me?
I tried to do this vba code, that finds the name of the person and then writes the email, was thinking to loop it over. However even this minumum working example does not actually work. What am I doing wrong?
Sub email()
Selection.find.ClearFormatting
Selection.find.Replacement.ClearFormatting
If Selection.find.Text = "Chiara Gatta" Then
With Selection.find
.Text = "E-mail:"
.Replacement.Text = "E-mail: chiara.gatta#gmail.com"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchByte = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.find.Execute replace:=wdReplaceAll
End If
End Sub
The question lacks minimum details & code required for help. However I am trying to give you a code that would pickup person names & email addresses from one table in a document containing the code. the table should have 3 columns, 1st col contain Name of the person, 2nd col should contain Email address with 3rd column blank for remarks from code. See image
On running the code you would be prompted to select the word files that would be replaced by the email address. On trial use only copy files and may try only a handful of files at a time (if file sizes are large). It is assumed that files will contain Name and word “E-mail:” (if "E-mail:" word is not in the file try to modify the code as commented)
Code:
Sub test2()
Dim Fldg As FileDialog, Fl As Variant
Dim Thdoc As Document, Edoc As Document
Dim Tbl As Table, Rw As Long, Fnd As Boolean
Dim xName As String, xEmail As String
Set Thdoc = ThisDocument
Set Tbl = Thdoc.Tables(1)
Set Fldg = Application.FileDialog(msoFileDialogFilePicker)
With Fldg
.Filters.Clear
.Filters.Add "Word Documents ", "*.doc,*.dot,*docx,*.docm,*.dotm", 1
.AllowMultiSelect = True
.InitialFileName = "C:\users\user\desktop\folder1\*.doc*" 'use your choice of folder
If .Show <> -1 Then Exit Sub
End With
'Search for each Name in Table 1 column 1
For Rw = 1 To Tbl.Rows.Count
xName = Tbl.Cell(Rw, 1).Range.Text
xEmail = Tbl.Cell(Rw, 2).Range.Text
If Len(xName) > 2 And Len(xEmail) > 2 Then
xName = Left(xName, Len(xName) - 2) 'Clean special characters in word cell text
xEmail = Left(xEmail, Len(xEmail) - 2) 'Clean special characters in word cell text
'open each Document selected & search for names
For Each Fl In Fldg.SelectedItems
Set Edoc = Documents.Open(Fl)
Fnd = False
With Edoc.Content.Find
.ClearFormatting
.Text = xName
.Replacement.Text = xName & vbCrLf & "E-mail: " & xEmail
.Wrap = wdFindContinue
.Execute Replace:=wdReplaceNone
'.Execute Replace:=wdReplaceOne
Fnd = .Found
End With
'if Word "E-mail is not already in the file, delete next if Fnd Branch"
' And use .Execute Replace:=wdReplaceOne instead of .Execute Replace:=wdReplaceNone
If Fnd Then ' If Name is found then Search for "E-Mail:"
Fnd = False
With Edoc.Content.Find
.ClearFormatting
.Text = "E-mail:"
.Replacement.Text = "E-mail: " & xEmail
.Wrap = wdFindContinue
.Execute Replace:=wdReplaceOne
Fnd = .Found
End With
End If
If Fnd Then
Edoc.Save
Tbl.Cell(Rw, 3).Range.Text = "Found & Replaced in " & Fl
Exit For
Else
Tbl.Cell(Rw, 3).Range.Text = "Not found in any selected document"
End If
Edoc.Close False
Next Fl
End If
Next Rw
End Sub
it's operation would be like this. Try to understand each action in the code and modify to your requirement.
So, i have a function, which aim is to color words if they are surrounded by commas.
def __init__(...something):
...something
self.user_input = QtGui.QTextEdit(self)
self.user_input.textChanged.connect(self.check_text)
...something
def check_text(self):
text = self.user_input.toPlainText().strip()
comma = ","
if comma in text:
elements_quantity = text.count(comma)
sites = text.split(comma)
sites_quantity = len(sites)
done_sites = []
if sites_quantity > elements_quantity:
done_sites = sites[:elements_quantity]
else:
done_sites = sites
else:
done_sites = [""]
for site in done_sites:
new_site = "<strong>{site}</strong>"
text = text.replace(site, new_site.format(site=site))
self.user_input.setHtml(text)
self.user_input.moveCursor(QtGui.QTextCursor.End)
And, when I start writing, I have RecursionError: maximum recursion depth exceeded while calling a Python object each time I write a symbol.
What I should to do to improve it?
Just block signals when you try to change the text
self.blockSignal(True)
self.user_input.setHtml(text)
self.user_input.moveCursor(QtGui.QTextCursor.End)
self.blockSignal(False)
I wrote some code that grabs the numbers I need from this website, but I don't know what to do next.
It grabs the numbers from the table at the bottom. The ones under calving ease, birth weight, weaning weight, yearling weight, milk and total maternal.
#!/usr/bin/python
import urllib2
from bs4 import BeautifulSoup
import pyperclip
def getPageData(url):
if not ('abri.une.edu.au' in url):
return -1
webpage = urllib2.urlopen(url).read()
soup = BeautifulSoup(webpage, "html.parser")
# This finds the epd tree and saves it as a searchable list
pedTreeTable = soup.find('table', {'class':'TablesEBVBox'})
# This puts all of the epds into a list.
# it looks for anything in pedTreeTable with an td tag.
pageData = pedTreeTable.findAll('td')
pageData.pop(7)
return pageData
def createPedigree(animalPageData):
''' make animalPageData much more useful. Strip the text out and put it in a dict.'''
animals = []
for animal in animalPageData:
animals.append(animal.text)
prettyPedigree = {
'calving_ease' : animals[18],
'birth_weight' : animals[19],
'wean_weight' : animals[20],
'year_weight' : animals[21],
'milk' : animals[22],
'total_mat' : animals[23]
}
for animalKey in prettyPedigree:
if animalKey != 'year_weight' and animalKey != 'dam':
prettyPedigree[animalKey] = stripRegNumber(prettyPedigree[animalKey])
return prettyPedigree
def stripRegNumber(animal):
'''returns the animal with its registration number stripped'''
lAnimal = animal.split()
strippedAnimal = ""
for word in lAnimal:
if not word.isdigit():
strippedAnimal += word + " "
return strippedAnimal
def prettify(pedigree):
''' Takes the pedigree and prints it out in a usable format '''
s = ''
pedString = ""
# this is also ugly, but it was the only way I found to format with a variable
cFormat = '{{:^{}}}'
rFormat = '{{:>{}}}'
#row 1 of string
s += rFormat.format(len(pedigree['calving_ease'])).format(
pedigree['calving_ease']) + '\n'
#row 2 of string
s += rFormat.format(len(pedigree['birth_weight'])).format(
pedigree['birth_weight']) + '\n'
#row 3 of string
s += rFormat.format(len(pedigree['wean_weight'])).format(
pedigree['wean_weight']) + '\n'
#row 4 of string
s += rFormat.format(len(pedigree['year_weight'])).format(
pedigree['year_weight']) + '\n'
#row 4 of string
s += rFormat.format(len(pedigree['milk'])).format(
pedigree['milk']) + '\n'
#row 5 of string
s += rFormat.format(len(pedigree['total_mat'])).format(
pedigree['total_mat']) + '\n'
return s
if __name__ == '__main__':
while True:
url = raw_input('Input a url you want to use to make life easier: \n')
pageData = getPageData(url)
s = prettify(createPedigree(pageData))
pyperclip.copy(s)
if len(s) > 0:
print 'the easy string has been copied to your clipboard'
I've just been using this code for easy copying and pasting. All I have to do is insert the URL, and it saves the numbers to my clipboard.
Now I want to use this code on my website; I want to be able to insert a URL in my HTML code, and it displays these numbers on my page in a table.
My questions are as follows:
How do I use the python code on the website?
How do I insert collected data into a table with HTML?
It sounds like you would want to use something like Django. Although the learning curve is a bit steep, it is worth it and it (of course) supports python.