i wrote a code which write in a text file a String which contains
both a language whose wrote in right to left (like hebrew )and left to right like english
i used unicode code to make it right to left and left to right
u'\u2067' + hebrew + u'\u2069' surrounding the part of the hebrew part but it is not working
after running i see that the printing is good as you can see in the picture
but when i look in the text file , it changed the positions of each fields
and i want that the text file will be the same as the printing
how can i make it the same also in the text file ???
.txt is literally just a plain text editor. There is no formatting on that file. .txt didn't have style formatting like ms word when you saved it, you can load the document with style formatting.
.txt is literally just a plain text editor. You cannot style it like HTML using CSS.
Related
I have a line of text formatted and using a certain style, let's say this is (Test sentence). This is text with formatting is in paragraph 32. I now want to copy this exact paragraph, changing nothing and inserting it somewhere else.
While I can get the content of the paragraph with:
document.paragraphs[32].text = Test sentence
I can't figure out how to get the exact formatting. How do I copy the paragraph, so I can paste it in at another point in the document?
I have two issues with how PyQt is formatting my QLabels
Issue 1:
When hyperlinks are added it displays as if there were no newlines in the string.
For the input text:
https://www.google.co.uk/
https://www.google.co.uk/
https://www.google.co.uk/
It's shown like this without newlines
Issue 2: Sometimes PyQt just doesn't even detect the 'a' tag this happens when the start of string is not a hyperlink but it is then followed by newlines with hyperlinks e.g. this input:
test
https://www.google.co.uk/
https://www.google.co.uk/
https://www.google.co.uk/
As you can see the newlines are properly shown but PyQt has no longer detected the hyperlinks
From the text property documentation of QLabel:
The text will be interpreted either as plain text or as rich text, depending on the text format setting; see setTextFormat(). The default setting is Qt::AutoText; i.e. QLabel will try to auto-detect the format of the text set.
The AutoText flag can only make a guess using simple tag syntax checks (basic tags without arguments, such as <b>, or document type declaration headers, like <html>).
This is obviously done for performance reasons.
If you are sure that you're always setting rich text content, use the appropriate Qt.TextFormat enum:
label.setTextFormat(QtCore.Qt.RichText)
Using the HTML-like syntax of rich text will obviously use the same basic concept HTML had since its birth, almost 30 years ago: line breaks between any word in the document (text or tag) are ignored, as much as multiple spaces are always considered as one.
So, if you want to add line breaks, you have to use the appropriate <br> (or <br/> for xhtml) tag.
Also remember that Qt rich text engine has a limited support, as described in the documentation about the Supported HTML Subset.
I'm pretty new to programming and I've been having a problem: I have a txt file with a long (30,000+ chars) single string made of 4 letters (a DNA sequence) and I need to search that file for certain repeats (for example 'TTAGGG'), highlight them and save as a simple readable file. obviously I can't save it as a txt file because there is no highlight option.
I tried html as well as docx but every search I try removes the previous highlights.
Does anyone have any suggestions?
Open it in MS-Word, and do following Find-and-replace:
I'm trying to work with Urdu text but am unable to get the right output.
name = '\xd9\x87\xd9\x84\xd9\x84\xd8\xa7 \xd8\xa7\xd9\x85\xd8\xa7\xd9\x86'
print name
OUTPUT
هللا امان
DESIRED OUTPUT
امان اللہ
please advise.
I see two main issues with your snippet.
The first is that in Arabic, there are special code points for entire words, and the word you are trying to print اللہ is called ARABIC LIGATURE ALLAH ISOLATED FORM, which is 0xFDF2 or 0xEF 0xB7 0xB2.
If you write it isolated (each individual character), you will not get the correct representation.
Second, your font in your terminal (or whatever application is being used to render the text) should support the glyph, and you should ensure that the text direction is switched to right-to-left.
Here is an example from the online Python shell:
>>> print(u"\uFDF2")
ﷲ
Since this shell is not configured for right to left you can see that it is printing it left to right.
I am attempting to read text from a PDF file, and then later on, write that same text back to another PDF using Python. After the text is read in, the representation of the string when I print it to the console is:
Officially, it’s called
However, when I print the repr() of this text string, I see:
O\xef\xac\x83cially, it\xe2\x80\x99s called
This makes plenty of sense to me - these are ligatures of symbols from the PDFs i.e. \xef\xac\x83 represents a ligature for 'ff'. The problem is that when I write this string to PDF, using reportlab libraries, the PDFs have black symbols in place, as seen below:
This only happens with certain ligatures. I am wondering what I can do so that the string I write to the PDF does not contain these ligatures or if there is an efficient way to replace all of them.
It appears your input is correct, but to see the ffi character in your output, use a font that does have one.
The font you are using here is bog standard Arial, which does not contain it.
Some suggestions (mainly depending on your platform, but some of these are Open Source):
Arial Unicode MS
Lucida Grande
Calibri
Cambria
Corbel
Droid Sans/Droid Serif
Helvetica Neue
Ubuntu
If you don't want, or are not able, to change the font, replace the sequence \xef\xac\x83 with the plain characters ffi in your program before writing text to PDF. (And similar for those other certain ligatures you mentioned.)
What I ended up doing was copying the characters out of my text file and doing a .replace on them. ie str.replace('ff','ff') - if this looks the same, it's the same. The param on the left is the ligature character and the param on the right is two f's. Also, don't forget # -- coding: utf-8 -- .