PyQt5 How to save textEdit text as rich text - python

Hi i'm coding a rich text editor and i want to save my textEdit field text as rich text file. I did that.However,i write my rich text which has different font color, size, bold. but when i save this as rtf file. All changes are gone.(i write toPlainText. i have to write different method)
How can i save my text with changes(like fonts, size, colors) ?
def savefl(self):
try:
filey = QtWidgets.QFileDialog.getSaveFileName(self,"Save","","Rich Text File (*.rtf);;Text File(*.txt);;All Files (*.*)")
with open(filey[0], "w", encoding="utf-8") as file2:
file2.write(self.textEdit.toPlainText())
except (FileNotFoundError,FileExistsError):
pass

Rich text and the rich text format, RTF, are not necessarily the same thing. Microsoft Word documents (.doc), Markdown (.md), and Libreoffice documents (.odf) are all rich text file formats.
So is HTML, which is how Qt lets you get the rich text, using the toHtml method. There's no way to get an RTF out of Qt; you'll have to convert the HTML to RTF.
If HTML can suffice for you, use it. As has been written before, RTF is an ancient format and its age is showing more and more. If you absolutely need RTF, you'll need to do a conversion. I'd recommend pandoc if you can call an external program; if not, you'll have to use a library like PyRTF and manually parse the HTML and create a document with PyRTF.

Related

Python 2.7 reads encoded text file as code rather than text. (Fixed with io module)

I have a text file (*.txt) which displays as plain text when opened in notepad. When i attempt to read the file into python:
with open(Working_File,'r') as WorkTXT:
WorkTXT_Lines = WorkTXT.readlines()
WorkTXT.close()
My script then fails because the text is being converted into something else. I can manually test what's in the list using the console:
In[51]: WorkTXT_Lines[4]
Out[51]: "\x00T\x00h\x00e\x00 \x00A\x00c\x00q\x00.\x00 \x00M\x00e\x00t\x00h\x00o\x00d\x00'\x00s\x00 \x00I\x00n\x00s\x00t\x00r\x00u\x00m\x00e\x00n\x00t\x00 \x00P\x00a\x00r\x00a\x00m\x00e\x00t\x00e\x00r\x00s\x00 \x00f\x00o\x00r\x00 \x00t\x00h\x00e\x00 \x00R\x00u\x00n\x00 \x00w\x00e\x00r\x00e\x00 \x00:\x00 \x00\r\x00\n"
If i open the original text file and copy-paste the text into a new text file then run it seems to pick up actual text and the script works correctly. That does not help though as i am parsing through hundreds of text files generated from a lab instrument.
Any help is appreciated, even something like an OS command to alter the text file.
Edit - was able to solve the issue after being led in the correct direction. The io module is able to decode the text file and "read as text (rt)"
import io
with io.open(Working_File,'rt') as WorkTXT:
WorkTXT_Lines = WorkTXT.readlines()
WorkTXT.close()
The page content is encoded i googled your output and it said it was utf-16
if you decode the file after reading it everything becomes in plain text
import io
with io.open(Working_File,'r', encoding='utf-16-le' ) as WorkTXT:
#here you read the whole file -> decode it -> and split it to lines
#now you are working with a plain text :)
WorkTXT_Lines = WorkTXT.readlines()
for line in WorkTXT_Lines:
print(line)

Html rich text to Microsoft Word

Right now i have this format:
"This is a bold word, this is in italic, this is regular"
Which translates to:
<p>This is a bold <strong>word</strong>, <em>this is in italic</em>, this is regular.</p>
Is there any python library which turns the above code into Microsoft Word format? I couldn't find any, i found only pandoc and the subsequent pypandoc which read html but can only translate it into .docx format by saving a .docx file- and this isn't helpful.
I figured to ask a question here before i put work into writing a parser to do this.

Python outputs a .txt file, which's format differs depending on the text editor I use to open it

So I have some python code taht outputs some data to a .txt file like this:
f3 = codecs.open(r'C:\Users\dimrizo\Desktop\PythonData\GTFS\routes.txt','w+',"UTF-8")
f3.write('route_id,agency_id,route_short_name,route_long_name,route_desc,route_type,route_url,route_color,route_text_color\n')
f3.write('blah,blah,blah,blah,blah,blah,blah,blah,blah\n')
Problem is that if I open the produced file with the simple windows text editor the text is not properly formatted. The "\n" don't even count. If I open the file with sublime text everything is fine it is formatted as it should. What should I do in order to see the text properly formatted in both the editors?
That is a problem with notepad itself. It can't handle "Linux newlines" instead it only recognizes "windows newlinew", so you have to write \r\n and then you will see the linebreaks in notepad.

Get text from Gtk3 TextView/TextBuffer with formatting tags included in Python

I'm working on a Python 3 project that uses the Gtk3 TextView/TextBuffer to get a user's input, and I've got it working to where I can have the user typing in rich text and able to format it as Bold/Italic/Underline/Combination of these.
However, I'm stuck on trying to figure out how to get the text from the TextBuffer with those flags included so I can use the formatting flags to convert the text to properly formatted HTML when I need to.
Calling textbuffer.get_text(start, end, True) simply returns the text without any flags.
Here's the code and the editor.glade file. Save them both in the same directory.
How can I get the text with the flags included? Or, alternatively, is there a way I can get the user's input formatted as HTML automatically in another variable automatically?
That's not very easy. Here is a link to some code that I once wrote to do the same thing for RTF output. You can probably adapt it to produce HTML output. If you manage to do so, I'd possibly integrate it into that library's successor.
Alternatively, if you prefer text processing to the above, you can export the rich text in GtkTextBuffer's internal serialization format and convert it to HTML yourself later:
format = textbuffer.register_serialize_tagset('my-tagset')
exported = textbuffer.serialize(textbuffer, format, start, end)

Docx content and formatting extraction in python

I am trying to parse a docx folder and take specific elements base on wether or not a certain word is bolded. If this is the text in the document:
Foo: Hello
Boo:
Blah Blah
•Blah
•Blah
Choo: Hello
I would want to scan, line by line, and take all the text after the bolded word until the next bolded word.
As of right now I am using using an XML parser that parses based on newline charactrs. I cannot find anything in the Zipfile or the individual lines that would give me metadata like that.
Is it possible to do this?
I'd use a higher-level library that supports reading docx files rather than parsing the XML document.
One library that looks up to the task is python-docx.
If you're using Jython, Apache POI HWPF is another option.

Categories