Sublime Text - A more selective "match selection" tool? - python

I'm currently using the "match_selection" feature in Sublime Text that comes set up by default. If highlight a word, Sublime will highlight "other occurrences of the currently selected text." It works if I'm looking for matching in terms of spelling. However, I'd like a more selective 'match selector' that'll only match the word if its the same exact data type.
I'll include an image. Here I select the word first in the parameters on line 3. If it was truly grabbing exact matches it should only be highlighting those two other occurrences of first to the right of the equals sign, yet it grabs the first in self.first as well as the first in the string, even though they aren't the same data type/variable
It does not do this in PyCharm. Is there a package that would solve this?

Related

Search and comment all matches

Is there a way to comment all the matches when doing CTRL+F or CTRL+R?
I have tried a quick fix, but this is not working properly when the line to be printed is in different lines:
# print("Hello"
"World")
I am using Python 3.7 and PyCharm 2021.3.1
yeah, that a PyCharm (or any Jetbrains IDE) feature.
after search, click on the Select All Occurrences button (the 4th button from the right of the 33/33 in the picture you uploaded) - it will mark all occurrences of your search.
than simply comment it with Cmd+slash (or Control+slash for windows) and all the occurrences will be commented out
for the case of multi line you can use the regex search to match your search term something like: ^print\(.*(\n*[^\)]*)*\)$
I guess you want select the print function and comment the all of the print function with "Hello World" part.
You can do step by step;
Click CTRL+F and after then click regex button
Write what you want with regex( In this question you should write ^print\(.*(\n*.*)*\)$). When you do that, you already select the whole line
If you want to make comment all of the print function you can click Select All Occurences.
Then you can make comment with your multiline comment shortcute

Join runs from python-docx for the purpose of applying regex to group of runs

I am using Python-Docx to read through docx files, find a particular string (e.g. a date), and replace it with another string (e.g. a new date).
Here are the two functions I am using:
def docx_replace_regex(doc_obj, regex , replace):
for p in doc_obj.paragraphs:
if regex.search(p.text):
inline = p.runs
# Loop added to work with runs (strings with same style)
for i in range(len(inline)):
if regex.search(inline[i].text):
text = regex.sub(replace, inline[i].text)
inline[i].text = text
for table in doc_obj.tables:
for row in table.rows:
for cell in row.cells:
docx_replace_regex(cell, regex , replace)
def replace_date(folder,replaceDate,*date):
docs = [y for x in os.walk(folder) for y in glob(os.path.join(x[0], '*.docx'))]
for doc in docs:
if date: #Date is optional date to replace
regex = re.compile(r+date)
else: #If no date provided, replace all dates
regex = re.compile(r"(\w{3,12}\s\d{1,2}\,?\s?[0-9]{4})|((the\s)?\d{1,2}[th]{0,2}\sday\sof\s\w{3,12}\,\s?\d{4})")
docObj = Document(doc)
docx_replace_regex(docObj,regex,replaceDate)
docObj.save(doc)
The first function is essentially a find and replace function to use python with a docx file. The second file recursively searches through a file path to find docx files to search. The details of the regex aren't relevant (I think). It essentially searches for different date formats. It works as I want it to and shouldn't impact on my issue.
When a document is passed to docx_replace_regex that function iterates through paragraphs, then runs and searches the runs for my regex. The issue is that the runs sometimes break up a single line of text so that if the doc were in plaintext the regex would capture the text, but because the runs break up the text, the text isn't captured.
For example, if my paragraph is "10th day of May, 2020", the inline array may be ['1','0th day of May,',' 2020'].
Initially, I joined the inline array so that it would be equal to "10th day of May, 2020" but then I can't replace the run with the new text because my inline variable is a string, not a run object. Even if I kept inline as a run object it would still replace only one part of the text I'm looking for.
Looking for any ideas on how to properly replace the portion of text captured by my regex. Alternatively, why the sentence is being broken up into separate runs as it is.
This is not a simple problem, as it looks like you're starting to realize :)
The simplest possible approach is to search and replace in paragraph.text, like:
paragraph.text = my_replace_function(paragraph.text, ...)
This works, but all character formatting is lost. A more sophisticated approach finds the offset of the search phrase, maps that to runs, and then splits and rejoins runs as necessary to change only those runs containing the search phrase.
It looks like there's a working solution here: https://stackoverflow.com/a/55733040/1902513, which shows by its length just how much is involved.
It's come up quite a few times before, so if you search here in SO on [python-docx] replace you'll find more on the nature of the problem.

Tag everything in a row after a certain character [tkinter Text widget]

As a coding challenge, I've been building a rich-text editor. So far, I've made working save/save as/load systems and working Headers. But, when you save as .txt all the heading data is lost. So I've been thinking about doing a system that relies on '#' to mark headers (basically syntax highlighting)(#-H1,##-H2,###-H3...). I've looked around, and haven't found anything of the sort. So far, I use this as my system of headings:
editor.tag_configure('heading7', font=heading7_font)
removeTags()
editor.tag_add('heading7', SEL_FIRST, SEL_LAST)
*heading7_font=("Consolas Bold", 16), removeTags(): lists through tags and removes all.
Basically, you just select on an OptionMenu if you wish to change the fontsize (or use a certain bind). This question is problably too vague, but, I would very much like so direction or an answer.
Here's the code of my entire project (YES, I know I'm not using classes, and it's a jittery mess, but, I'm going to work on that later): https://pastebin.com/wthVT6q4 (Here's the stylesheet variables: https://pastebin.com/WrX4EDKM)
You can use the text widget search method to search for a string or a pattern. Then it's just a matter of applying the tag to the results.
This is how it is documented in the Text class:
search(self, pattern, index, stopindex=None, forwards=None, backwards=None, exact=None, regexp=None, nocase=None, count=None, elide=None)
Search PATTERN beginning from INDEX until STOPINDEX.
Return the index of the first character of a match or an
empty string.
As you can see, you can use the search method to search on a regular expression. Since you won't always know the length of the matched text, you can specify an IntVar to be given the count of the matching characters.
For example, to search for a line that begins with ## you can do something like this:
count_var = tk.IntVar()
index = editor.search(r'## .*', "1.0", "end", count=count_var, regexp=True)
With that, you can use index as the start of the range, and "{} +{} chars".format(index, count_var.get()) for the end of the range. Or, use "{} lineend".format(index)" to add the highlight to the entire line.
If you only want to highlight the characters after ##, you can adjust index in a similar way: "{}+{}chars".format(index, 3)
Note: the regular expression syntax must follow the rules of Tcl regular expressions. Conceptually the same, they differ from python's rules in some of the special character classes.

How to perform a tag-agnostic text string search in an html file?

I'm using LanguageTool (LT) with the --xmlfilter option enabled to spell-check HTML files. This forces LanguageTool to strip all tags before running the spell check.
This also means that all reported character positions are off because LT doesn't "see" the tags.
For example, if I check the following HTML fragment:
<p>This is kin<b>d</b> o<i>f</i> a <b>stupid</b> question.</p>
LanguageTool will treat it as a plain text sentence:
This is kind of a stupid question.
and returns the following message:
<error category="Grammar" categoryid="GRAMMAR" context=" This is kind of a stupid question. " contextoffset="24" errorlength="9" fromx="8" fromy="8" locqualityissuetype="grammar" msg="Don't include 'a' after a classification term. Use simply 'kind of'." offset="24" replacements="kind of" ruleId="KIND_OF_A" shortmsg="Grammatical problem" subId="1" tox="17" toy="8"/>
(In this particular example, LT has flagged "kind of a.")
Since the search string might be wrapped in tags and might occur multiple times I can't do a simple index search.
What would be the most efficient Python solution to reliably locate any given text string in an HTML file? (LT returns an approximate character position, which might be off by 10-30% depending on the number of tags, as well as the words before and after the flagged word(s).)
I.e. I'd need to do a search that ignores all tags, but includes them in the character position count.
In this particular example, I'd have to locate "kind of a" and find the location of the letter k in:
kin<b>d</b> o<i>f</i>a
This may not be the speediest way to go, but pyparsing will recognize HTML tags in most forms. The following code inverts the typical scan, creating a scanner that will match any single character, and then configuring the scanner to skip over HTML open and close tags, and also common HTML '&xxx;' entities. pyparsing's scanString method returns a generator that yields the matched tokens, the starting, and the ending location of each match, so it is easy to build a list that maps every character outside of a tag to its original location. From there, the rest is pretty much just ''.join and indexing into the list. See the comments in the code below:
test = "<p>This is kin<b>d</b> o<i>f</i> a <b>stupid</b> question.</p>"
from pyparsing import Word, printables, anyOpenTag, anyCloseTag, commonHTMLEntity
non_tag_text = Word(printables+' ', exact=1).leaveWhitespace()
non_tag_text.ignore(anyOpenTag | anyCloseTag | commonHTMLEntity)
# use scanString to get all characters outside of tags, and build list
# of (char,loc) tuples
char_locs = [(t[0], loc) for t,loc,endloc in non_tag_text.scanString(test)]
# imagine a world without HTML tags...
untagged = ''.join(ch for ch, loc in char_locs)
# look for our string in the untagged text, then index into the char,loc list
# to find the original location
search_str = 'kind of a'
orig_loc = char_locs[untagged.find(search_str)][1]
# print the test string, and mark where we found the matching text
print(test)
print(' '*orig_loc + '^')
"""
Should look like this:
<p>This is kin<b>d</b> o<i>f</i> a <b>stupid</b> question.</p>
^
"""
The --xmlfilter option is deprecated because of issues like this. The proper solution is to remove the tags yourself but keep the positions so you have a mapping to correct the results that come back from LT. When using LT from Java, this is supported by AnnotatedText, but the algorithm should be simple enough to port it. (full disclosure: I'm the maintainer of LT)

Can I change text in MS Word using python-docx, without losing characteristics?

I now have a English word document in MS Word and I want to change its texts into Chinese using python. I've been using Python 3.4 and installed python-docx. Here's my code:
from docx import Document
document = Document(*some MS Word file*)
# I only change the texts of the first two paragraphs
document.paragraphs[0].text = '带有消毒模式的地板清洁机'
document.paragraphs[1].text = '背景'
document.save(*save_file_path*)
The first two lines did turn into Chinese characters, but characteristics like font and bold are all gone:
Is there anyway I could alter text without losing the original characteristics?
It depends on how the characteristics are applied. There is a thing called the style hierarchy, and text characteristics can be applied anywhere from directly to a run of text, a style, or a document default, and levels in-between.
There are two main classes of characteristic: paragraph properties and run properties. Paragraph properties are things like justification, space before and after, etc. Everything having to do with character-level formatting, like size, typeface, color, subscript, italic, bold, etc. is a run property, also loosely known as a font.
So if you want to preserve the font of a run of text, you need to operate at the run level. An operation like this will preserve font formatting:
run.text = "New text"
An operation like this will preserve paragraph formatting, but remove any character level formatting not applied by the paragraph style:
paragraph.text = "New paragraph text"
You'll need to decide for your application whether you modify individual runs (which may be tricky to identify) or whether you work perhaps with distinct paragraphs and apply different styles to each. I recommend the latter. So in your example, "FLOOR CLEANING MACHINE ...", "BACKGROUND", and "[0001]..." would each become distinct paragraphs. In your screenshot they appear as separate runs in a single paragraph, separated by a line break.
You can get the style of the existing paragraphs and apply it to your new paragraphs - beware that the existing paragraphs might specify a font that does not support Chinese.

Categories