How to modify Text widget in Tkinter for interlinear anotation - python

I was wondering if it is possible to create a textbox with tkinter that can handle interlinear input. I need to be able to have different lines "connected" to each other, and the behaviour of each line can be independent. Here is an example. It is from a linguistic annotation program. The idea is that if I have a line, say:
this is an example
x.Det be.V a.Det example.N
The spacing of the first line is automatically adjusted some line is modified to allow enough space for each word in the second line to not overlap with the words in the first line.
Is there any way to do this?

A simple way to do this would be to use a fixed width font (such as courrier) in which all characters are the same width, then format it as pure text by padding with spaces.
from Tkinter import *
sentence = [ 'this', 'is', 'an', 'example' ]
result = [ 'x.Det', 'be.V', 'a.Det', 'example.N' ]
line_start = [0, len(sentence)] # Used to split the sentence into lines no longer than line_len
line_len = 20 # Max characters in each line, including extra spaces
segment_len = 0
for i in range(len(sentence)):
s_len = len(sentence[i])
r_len = len(result[i])
# Pad words (or word groups) so the segments of sentence and result have the same width
if s_len > r_len:
result[i] += ' ' * s_len - r_len
elif s_len < r_len:
sentence[i] += ' ' * (r_len - s_len)
segment_len += max(r_len, s_len) + 1
# Check the line length
if segment_len > line_len:
segment_len = 0
line_start.insert(1, i)
root = Tk()
for i in range(len(line_start)-1):
sentence_segment = ' '.join( sentence[line_start[i]:line_start[i+1]] )
ts = Text(root, font='TkFixedFont', width = line_len, height = 1)
ts.insert(END, sentence_segment)
ts.pack()
result_segment = ' '.join( result[line_start[i]:line_start[i+1]] )
tr = Text(root, font='TkFixedFont', width = line_len, height = 1, foreground='grey')
tr.insert(END, result_segment)
tr.pack()
root.mainloop()

Related

I was stuck when I try to print first index in this function. I don't understand why the white spaces are printed together

There is the code in the below and I have to solve print error at the very bottom.
I checked a lot about this error but I couldn't find even now.
How can I delete white space which located next to list indexes? This is the example, I want to change this method
['11', '22', '33'] = > ['11','22','33']
def arithmetic_arranger(problems,display=False):
operators = []
list1 = []
list2 = []
new_list1=[]
new_list2=[]
for problem in range(len(problems)):
if '+' in problems[problem] or '-' in problems[problem]:
problems[problem] = problems[problem].split(' ')
list1.append(problems[problem][0])
operators.append(problems[problem][1])
list2.append(problems[problem][2])
for problem in range(len(problems)):
width = max(len(list1[problem]),len(list2[problem]))+2
new_list1.append(' '*(width-len(list1[problem]))+list1[problem])
print(list1)
print(new_list1)
print(arithmetic_arranger(["32 + 698", "3801 - 2", "45 + 43", "123 + 49"]))
[' 32', ' 3801', ' 45', ' 123']
new_list1.append(' ' * (width - len(list1[problem])) + list1[problem])
with this part of the code append(' ' * ( you are telling the program to append to eache list element a number of white spaces equal to the widht minus the lenght of the list1[problem]...
I don't think you can cancel the spaces between the elements of a list

Analyze the time and space complexity of the following code

Problem from leetcode:
https://leetcode.com/problems/text-justification/description/
Given an array of words and a width maxWidth, format the text such that each line has exactly maxWidth characters and is fully (left and right) justified.
You should pack your words in a greedy approach; that is, pack as many words as you can in each line. Pad extra spaces ' ' when necessary so that each line has exactly maxWidth characters.
Extra spaces between words should be distributed as evenly as possible. If the number of spaces on a line do not divide evenly between words, the empty slots on the left will be assigned more spaces than the slots on the right.
For the last line of text, it should be left justified and no extra space is inserted between words.
Original code:
class Solution:
def fullJustify(self, words, maxWidth):
ans, curr, word_length = [], [], 0
words.append(' ' * maxWidth)
for w in words:
if word_length + len(w) + len(curr) > maxWidth:
space = maxWidth-word_length
if w != words[-1]:
for i in range(space):
curr[i%(len(curr)-1 or 1)] += ' '
ans.append(''.join(curr))
else:
ans.append(' '.join(curr) + ' ' * (space - (len(curr) - 1)))
curr = []
word_length = 0
curr += [w]
word_length += len(w)
return ans
So there are 2 for-loops, one is inside another.
The second for-loop is determined by the space which change everytime but always smaller than 'maxWidth'. First loop has time-complexity of O(n), what's the overall time complexity?
If you call n = |words| and m = maxWidth then you'll notice that you have an outer loop that does n iterations, inside of that there are different conditions but if they happen to be true you have another loop that in the worst case scenario is executed m times.
Therefore you can say time complexity is: T(n, m) = O(n * m)

Python: replace string, matched from a list

Trying to match and mark character based n-grams. The string
txt = "how does this work"
is to be matched with n-grams from the list
ngrams = ["ow ", "his", "s w"]
and marked with <> – however, only if there is no preceding opened quote. The output i am seeking for this string is h<ow >does t<his w>ork (notice the double match in the 2-nd part, but within just 1 pair of expected quotes).
The for loop i’ve tried for this doesn’t, however, produce the wanted output at all:
switch = False
for i in txt:
if i in "".join(ngrams) and switch == False:
txt = txt.replace(i, "<" + i)
switch = True
if i not in "".join(ngrams) and switch == True:
txt = txt.replace(i, ">" + i)
switch = False
print(txt)
Any help would be greatly appreciated.
This solution uses the str.find method to find all copies of an ngram within the txt string, saving the indices of each copy to the indices set so we can easily handle overlapping matches.
We then copy txt, char by char to the result list, inserting angle brackets where required. This strategy is more efficient than inserting the angle brackets using multiple .replace call because each .replace call needs to rebuild the whole string.
I've extended your data slightly to illustrate that my code handles multiple copies of an ngram.
txt = "how does this work now chisolm"
ngrams = ["ow ", "his", "s w"]
print(txt)
print(ngrams)
# Search for all copies of each ngram in txt
# saving the indices where the ngrams occur
indices = set()
for s in ngrams:
slen = len(s)
lo = 0
while True:
i = txt.find(s, lo)
if i == -1:
break
lo = i + slen
print(s, i)
indices.update(range(i, lo-1))
print(indices)
# Copy the txt to result, inserting angle brackets
# to show matches
switch = True
result = []
for i, u in enumerate(txt):
if switch:
if i in indices:
result.append('<')
switch = False
result.append(u)
else:
result.append(u)
if i not in indices:
result.append('>')
switch = True
print(''.join(result))
output
how does this work now chisolm
['ow ', 'his', 's w']
ow 1
ow 20
his 10
his 24
s w 12
{1, 2, 10, 11, 12, 13, 20, 21, 24, 25}
h<ow >does t<his w>ork n<ow >c<his>olm
If you want adjacent groups to be merged, we can easily do that using the str.replace method. But to make that work properly we need to pre-process the original data, converting all runs of whitespace to single spaces. A simple way to do that is to split the data and re-join it.
txt = "how does this\nwork now chisolm hisow"
ngrams = ["ow", "his", "work"]
#Convert all whitespace to single spaces
txt = ' '.join(txt.split())
print(txt)
print(ngrams)
# Search for all copies of each ngram in txt
# saving the indices where the ngrams occur
indices = set()
for s in ngrams:
slen = len(s)
lo = 0
while True:
i = txt.find(s, lo)
if i == -1:
break
lo = i + slen
print(s, i)
indices.update(range(i, lo-1))
print(indices)
# Copy the txt to result, inserting angle brackets
# to show matches
switch = True
result = []
for i, u in enumerate(txt):
if switch:
if i in indices:
result.append('<')
switch = False
result.append(u)
else:
result.append(u)
if i not in indices:
result.append('>')
switch = True
# Convert the list to a single string
output = ''.join(result)
# Merge adjacent groups
output = output.replace('> <', ' ').replace('><', '')
print(output)
output
how does this work now chisolm hisow
['ow', 'his', 'work']
ow 1
ow 20
ow 34
his 10
his 24
his 31
work 14
{32, 1, 34, 10, 11, 14, 15, 16, 20, 24, 25, 31}
h<ow> does t<his work> n<ow> c<his>olm <hisow>
This should work:
txt = "how does this work"
ngrams = ["ow ", "his", "s w"]
# first find where letters match ngrams
L = len(txt)
match = [False]*L
for ng in ngrams:
l = len(ng)
for i in range(L-l):
if txt[i:i+l] == ng:
for j in range(l):
match[i+j] = True
# then sandwich matches with quotes
out = []
switch = False
for i in range(L):
if not switch and match[i]:
out.append('<')
switch = True
if switch and not match[i]:
out.append('>')
switch = False
out.append(txt[i])
print "".join(out)
Here's a method with only one for loop. I timed it and it's about as fast as the other answers to this question. I think it's a bit more clear, although that might be because I wrote it.
I iterate over the index of the first character in the n-gram, then if it matches, I use a bunch of if-else clauses to see whether I should add a < or > in this situation. I add to the end of the string output from the original txt, so I'm not really inserting in the middle of a string.
txt = "how does this work"
ngrams = set(["ow ", "his", "s w"])
n = 3
prev = -n
output = ''
shift = 0
open = False
for i in xrange(len(txt) - n + 1):
ngram = txt[i:i + n]
if ngram in ngrams:
if i - prev > n:
if open:
output += txt[prev:prev + n] + '>' + txt[prev + n:i] + '<'
elif not open:
if prev > 0:
output += txt[prev + n:i] + '<'
else:
output += txt[:i] + '<'
open = True
else:
output += txt[prev:i]
prev = i
if open:
output += txt[prev:prev + n] + '>' + txt[prev + n:]
print output

I want to extract a certain number of words surrounding a given word in a long string(paragraph) in Python 2.7

I am trying to extract a selected number of words surrounding a given word. I will give example to make it clear:
string = "Education shall be directed to the full development of the human personality and to the strengthening of respect for human rights and fundamental freedoms."
1) The selected word is development and I need to get the 6 words surrounding it, and get : [to, the, full, of, the, human]
2) But if the selected word is in the beginning or in second position I still need to get 6 words, e.g:
The selected word is shall , I should get: [Education, be, directed, to , the , full]
I should use 're' module. What I managed to find until now is :
def search(text,n):
'''Searches for text, and retrieves n words either side of the text, which are retuned seperatly'''
word = r"\W*([\w]+)"
groups = re.search(r'{}\W*{}{}'.format(word*n,'place',word*n), text).groups()
return groups[:n],groups[n:]
but it helps me only with the first case. Can someone help me out with this, I will be really grateful. Thank you in advance!
This will extract all occurrences of the target word in your text, with context:
import re
text = ("Education shall be directed to the full development of the human personality "
"and to the strengthening of respect for human rights and fundamental freedoms.")
def search(target, text, context=6):
# It's easier to use re.findall to split the string,
# as we get rid of the punctuation
words = re.findall(r'\w+', text)
matches = (i for (i,w) in enumerate(words) if w.lower() == target)
for index in matches:
if index < context //2:
yield words[0:context+1]
elif index > len(words) - context//2 - 1:
yield words[-(context+1):]
else:
yield words[index - context//2:index + context//2 + 1]
print(list(search('the', text)))
# [['be', 'directed', 'to', 'the', 'full', 'development', 'of'],
# ['full', 'development', 'of', 'the', 'human', 'personality', 'and'],
# ['personality', 'and', 'to', 'the', 'strengthening', 'of', 'respect']]
print(list(search('shall', text)))
# [['Education', 'shall', 'be', 'directed', 'to', 'the', 'full']]
print(list(search('freedoms', text)))
# [['respect', 'for', 'human', 'rights', 'and', 'fundamental', 'freedoms']]
Tricky with potential for off-by-one errors but I think this meets your spec. I have left removal of punctuation, probably best to remove it before sending the string for analysis. I assumed case was not important.
test_str = "Education shall be directed to the full development of the human personality and to the strengthening of respect for human rights and fundamental freedoms."
def get_surrounding_words(search_word, s, n_words):
words = s.lower().split(' ')
try:
i = words.index(search_word)
except ValueError:
return []
# Word is near start
if i < n_words/2:
words.pop(i)
return words[:n_words]
# Word is near end
elif i >= len(words) - n_words/2:
words.pop(i)
return words[-n_words:]
# Word is in middle
else:
words.pop(i)
return words[i-n_words/2:i+n_words/2]
def test(word):
print('{}: {}'.format(word, get_surrounding_words(word, test_str, 6)))
test('notfound')
test('development')
test('shall')
test('education')
test('fundamental')
test('for')
test('freedoms')
import sys, os
args = sys.argv[1:]
if len(args) != 2:
os.exit("Use with <string> <query>")
text = args[0]
query = args[1]
words = text.split()
op = []
left = 3
right = 3
try:
index = words.index(query)
if index <= left:
start = 0
else:
start = index - left
if start + left + right + 1 > len(words):
start = len(words) - left - right - 1
if start < 0:
start = 0
while len(op) < left + right and start < len(words):
if start != index:
op.append(words[start])
start += 1
except ValueError:
pass
print op
How do this work?
find the word in the string
See if we can make left+right words from the index the
Take left+right number of words and save them in op
print op
A simple approach to your problem. First separates all the words and then selects words from left and right.
def custom_search(sentence, word, n):
given_string = sentence
given_word = word
total_required = n
word_list = given_string.strip().split(" ")
length_of_words = len(word_list)
output_list = []
given_word_position = word_list.index(given_word)
word_from_left = 0
word_from_right = 0
if given_word_position + 1 > total_required / 2:
word_from_left = total_required / 2
if given_word_position + 1 + (total_required / 2) <= length_of_words:
word_from_right = total_required / 2
else:
word_from_right = length_of_words - (given_word_position + 1)
remaining_words = (total_required / 2) - word_from_right
word_from_left += remaining_words
else:
word_from_right = total_required / 2
word_from_left = given_word_position
if word_from_left + word_from_right < total_required:
remaining_words = (total_required / 2) - word_from_left
word_from_right += remaining_words
required_words = []
for i in range(given_word_position - word_from_left, word_from_right +
given_word_position + 1):
if i != given_word_position:
required_words.append(word_list[i])
return required_words
sentence = "Education shall be directed to the full development of the human personality and to the strengthening of respect for human rights and fundamental freedoms."
custom_search(sentence, "shall", 6)
>>[Education, be, directed, to , the , full]
custom_search(sentence, "development", 6)
>>['to', 'the', 'full', 'of', 'the', 'human']
I don't think regular expressions are necessary here. Assuming the text is well-constructed, just split it up into an array of words, and write a couple if-else statements to make sure it retrieves the necessary amount of surrounding words:
def search(text, word, n):
# text is the string you are searching
# word is the word you are looking for
# n is the TOTAL number of words you want surrounding the word
words = text.split(" ") # Create an array of words from the string
position = words.index(word) # Find the position of the desired word
distance_from_end = len(words) - position # How many words are after the word in the text
if position < n // 2 + n % 2: # If there aren't enough words before...
return words[:position], words[position + 1:n + 1]
elif distance_from_end < n // 2 + n % 2: # If there aren't enough words after...
return words[position - n + distance_from_end:position], words[position + 1:]
else: # Otherwise, extract an equal number of words from both sides (take from the right if odd)
return words[position - n // 2 - n % 2:position], words[position + 1:position + 1 + n//2]
string = "Education shall be directed to the full development of the human personality and to the \
strengthening of respect for human rights and fundamental freedoms."
print search(string, "shall", 6)
# >> (['Education'], ['be', 'directed', 'to', 'the', 'full'])
print search(string, "human", 5)
# >> (['development', 'of', 'the'], ['personality', 'and'])
In your example you didn't have the target word included in the output, so I kept it out as well. If you'd like the target word included simply combine the two arrays the function returns (join them at position).
Hope this helped!

Python Logic - Centering text using periods

I'm taking this intro to python course online
The problem reads:
For this program, the first line of input is an integer width. Then, there are some lines of text; the line "END" indicates the end of the text. For each line of text, you need to print out a centered version of it, by adding periods .. to the left and right, so that the total length of each line of text is width. (All input lines will have length at most width.) Centering means that the number of periods added to the left and added to the right should be equal if possible; if needed we allow one more period on the left than the right. For example, for input
13
Text
in
the
middle!
END
the correct output would be
.....Text....
......in.....
.....the.....
...middle!...
the Hint given is:
For input line length of L, you should add (width-L)\\2 periods to the right side
Here is my code so far:
width = int(input())
s1 = input()
periods_remain = width - len(s1)
L = periods_remain
periods_rtside = (width-L)//2
periods_leftside = width - periods_rtside
periods_rt_str = '.' * periods_rtside
periods_left_str = '.' * periods_leftside
line1 = periods_left_str + s1 + periods_rt_str
My line1 result looks like "...........Text.." Instead of .....Text....
It can be run here
My problem seems to be the L. I'm not sure how to define L. Thanks!
You can use str.center for this:
>>> lis = ['Text', 'in', 'the', 'middle!', 'END']
>>> for item in lis:
... print item.center(13, '.')
...
.....Text....
......in.....
.....the.....
...middle!...
.....END.....
or format:
for item in lis:
print format(item,'.^13')
...
....Text.....
.....in......
.....the.....
...middle!...
.....END.....
Working version of your code:
lis = ['Text', 'in', 'the', 'middle!', 'END']
width = 13
for s1 in lis:
L = len(s1) #length of line
periods_rtside = (width - L)//2 #periods on the RHS
periods_leftside = width - periods_rtside - L #peroids on the LHS
periods_rt_str = '.' * periods_rtside
periods_left_str = '.' * periods_leftside
line1 = periods_left_str + s1 + periods_rt_str
print line1
output:
.....Text....
......in.....
.....the.....
...middle!...
.....END.....
For those still struggling with this tough question, here is my code that works in Python 3 shell, even though it still fails in http://cscircles.cemc.uwaterloo.ca/8-remix/
First_line = input("First input: ")
width = int(First_line)
while True:
s1 = input("Second input: ")
if s1 != 'END':
L = len(s1) #length of line
periods_rtside = (width - L)//2 #periods on the RHS
periods_leftside = width - periods_rtside - L #periods on the LHS
periods_rt_str = '.' * periods_rtside
periods_left_str = '.' * periods_leftside
line1 = periods_left_str + s1 + periods_rt_str
print(line1)
else:
break
To get it to work in http://cscircles.cemc.uwaterloo.ca/8-remix/ console, you need to change the first 2 lines to
width = int(input())
and
s1 to s1 = input()
and also provide your own test input by clicking on the "Enter test input" button
Just some small changes to the above code and it worked in the grader.
width = int(input())
s1 = input()
while s1 != "END":
L = len(s1)
periods_rtside = (width - L)//2
periods_leftside = width - periods_rtside - L
periods_rt_str = '.' * periods_rtside
periods_left_str = '.' * periods_leftside
line1 = periods_left_str + s1 + periods_rt_str
print(line1)
s1 = input()

Categories