Stylesheet does not work with special characters pyqt - python

Could not parse stylesheet of object 0x2d956381ba0
def change_btn_image(self, btn):
imagem = self.atv_imagens_bd[self.atividades.get_contador()]
nome_imagem = imagem[:-4]
print(nome_imagem)
btn.setStyleSheet(f"border-image: url(src/main/resources/base/{imagem});")
return nome_imagem

As explained in a slightly related answer, HTML (and, therefore, stylesheet) parsing capabilities of Qt are a bit limited if compared to web browser parsers.
While those syntax are somehow pretty "liberal", one shouldn't push the boundaries of their freedom too much: it's always good practice to use quotes for string based values (besides keywords, obviously), most importantly for URLs.
In this case, the problem is the utf character, which creates a problem as the parser is unable to correctly identify where the url ends.
Just add quotes around the url path:
btn.setStyleSheet(f"border-image: url('src/main/resources/base/{imagem}');")

Related

PyQt5 incorrect label formatting with links

I have two issues with how PyQt is formatting my QLabels
Issue 1:
When hyperlinks are added it displays as if there were no newlines in the string.
For the input text:
https://www.google.co.uk/
https://www.google.co.uk/
https://www.google.co.uk/
It's shown like this without newlines
Issue 2: Sometimes PyQt just doesn't even detect the 'a' tag this happens when the start of string is not a hyperlink but it is then followed by newlines with hyperlinks e.g. this input:
test
https://www.google.co.uk/
https://www.google.co.uk/
https://www.google.co.uk/
As you can see the newlines are properly shown but PyQt has no longer detected the hyperlinks
From the text property documentation of QLabel:
The text will be interpreted either as plain text or as rich text, depending on the text format setting; see setTextFormat(). The default setting is Qt::AutoText; i.e. QLabel will try to auto-detect the format of the text set.
The AutoText flag can only make a guess using simple tag syntax checks (basic tags without arguments, such as <b>, or document type declaration headers, like <html>).
This is obviously done for performance reasons.
If you are sure that you're always setting rich text content, use the appropriate Qt.TextFormat enum:
label.setTextFormat(QtCore.Qt.RichText)
Using the HTML-like syntax of rich text will obviously use the same basic concept HTML had since its birth, almost 30 years ago: line breaks between any word in the document (text or tag) are ignored, as much as multiple spaces are always considered as one.
So, if you want to add line breaks, you have to use the appropriate <br> (or <br/> for xhtml) tag.
Also remember that Qt rich text engine has a limited support, as described in the documentation about the Supported HTML Subset.

Urls.py unable to pass #(pound) character to a view in Django,

I used to pass data through django URL while passing #character is not able to pass through urls.py, I am using pattern as
url(r'^pass/(?P<sentence>[\w|\W]*)/$',pass)
I tried with these pattern also
url(r'^pass/(?P<sentence>[a-zA-Z0-9-/:-?##{-~!^_\'\[\]*]*)/$',pass)
Thanks in advance.
The "#" character marks inline anchors (links within the same page) in a URL, so the browser will never send it to Django.
For example, if the URL is /something/pass/#test/something-else/ the browser will sent only /something/pass/ to the server. You can try /something/pass/%23test/something-else/ instead, 23 is the hexadecimal ascii code for # - not pretty (ugly by ugly just pass it as a get variable instead).
There is nothing you can do on the Django side - you better avoid characters with special meanings in the URL path when designing your routes - of course it is a matter of taste, but I really think that strings passed in the URL path should be "slugfied" in order to remove any funny character.
Browsers won't send the url fragment part (ends with "#") to servers. Why not converting your data to base64 first, then pass the data via url.
RFC 1808 (Relative Uniform Resource Locators) : Note that the fragment identifier (and the "#" that precedes it) is
not considered part of the URL. However, since it is commonly used
within the same string context as a URL, a parser must be able to
recognize the fragment when it is present and set it aside as part of
the parsing process.

Python XML: write " instead of &quot

I am using Python's xml minidom and all works well except that in text sequences it writes out &quot escape characters instead of ". This of course makes sense if a quote appears in a tag, but it bugs me in the text. How do I change this?
looking at the source (Python 3.2 if it matters), this is hardcoded in the _write_data() function. you would need to modify the writexml() method of TextNode - either by subclassing it or simply editing it - so that it didn't call that method, but instead did something similar to escape only < and >.
if you created a subclass outside of the package (instead of copying and hacking the package to make your own custom xmlminidom) then it looks like, with a little care, you could make things work. so you would create your own (subclass of) TextNode, modified as above and then, to add text to the DOM, you would add an instance of your new class (or replace existing text nodes with instances of that class). you would need to set the ownerDocument attribute. perhaps simplest would be to also subclass Document and fix the createTextNode() method.
but i don't see a simpler way of doing what you want. it might be best to use a better dom implementation.
ps i have no idea whether this behaviour is required by the xml spec, or not. update: a quick scan of http://www.w3.org/TR/2008/REC-xml-20081126/#syntax suggests that only < and & must be encoded.

regex regarding symbols in urls

I want to replace consecutive symbols just one such as;
this is a dog???
to
this is a dog?
I'm using
str = re.sub("([^\s\w])(\s*\1)+", "\\1",str)
however I notice that this might replace symbols in urls that might happen in my text.
like http://example.com/this--is-a-page.html
Can someone give me some advice how to alter my regex?
So you want to unleash the power of regular expressions on an irregular language like HTML. First of all, search SO for "parse HTML with regex" to find out why that might not be such a good idea.
Then consider the following: You want to replace duplicate symbols in (probably user-entered) text. You don't want to replace them inside a URL. How can you tell what a URL is? They don't always start with http – let's say ars.userfriendly.org might be a URL that is followed by a longer path that contains duplicate symbols.
Furthermore, you'll find lots of duplicate symbols that you definitely don't want to replace (think of nested parentheses (like this)), some of them maybe inside a <script> on the page you're working on (||, && etc. come to mind.
So you might come up with something like
(?<!\b(?:ftp|http|mailto)\S+)([^\\|&/=()"'\w\s])(?:\s*\1)+
which happens to work on the source code of this very page but will surely fail in other cases (for example if URLs don't start with ftp, http or mailto). Plus, it won't work in Python since it uses variable repetition inside lookbehind.
All in all, you probably won't get around parsing your HTML with a real parser, locating the body text, applying a regex to it and writing it back.
EDIT:
OK, you're already working on the parsed text, but it still might contain URLs.
Then try the following:
result = re.sub(
r"""(?ix) # case-insensitive, verbose regex
# Either match a URL
# (protocol optional (if so, URL needs to start with www or ftp))
(?P<URL>\b(?:(?:https?|ftp|file)://|www\.|ftp\.)[-A-Z0-9+&##/%=~_|$?!:,.]*[A-Z0-9+&##/%=~_|$])
# or
|
# match repeated non-word characters
(?P<rpt>[^\s\w])(?:\s{0,100}(?P=rpt))+""",
# and replace with both captured groups (one will always be empty)
r"\g<URL>\g<rpt>", subject)
Re-EDIT: Hm, Python chokes on the (?:\s*(?P=rpt))+ part, saying the + has nothing to repeat. Looks like a bug in Python (reproducible with (.)(\s*\1)+ whereas (.)(\s?\1)+ works)...
Re-Re-EDIT: If I replace the * with {0,100}, then the regex compiles. But now Python complains about an unmatched group. Obviously you can't reference a group in a replacement if it hasn't participated in the match. I give up... :(

need to selectively escape html entities (&)

I'm scraping a html page, then using xml.dom.minidom.parseString() to create a dom object.
however, the html page has a '&'. I can use cgi.escape to convert this into & but it also converts all my html <> tags into <> which makes parseString() unhappy.
how do i go about this? i would rather not just hack it and straight replace the "&"s
thanks
For scraping, try to use a library that can handle such html "tag soup", like lxml, which has a html parser (as well as a dedicated html package in lxml.html), or BeautifulSoup (you will also find that these libraries also contain other stuff that makes scraping/working with html easier, aside from being able to handle ill-formed documents: getting information out of forms, making hyperlinks absolute, using css selectors...)
i would rather not just hack it and
straight replace the "&"s
Er, why? That's what cgi.escape is doing - effectively just a search and replace operation for certain characters that have to be escaped.
If you only want to replace a single character, just replace the single character:
yourstring.replace('&', '&')
Don't beat around the bush.
If you want to make sure that you don't accidentally re-escape an already escaped & (i. e. not transform & into &amp; or ß into &szlig;), you could
import re
newstring = re.sub(r"&(?![A-Za-z])", "&", oldstring)
This will leave &s alone when they are followed by a letter.
You shouldn't use an XML parser to parse data that isn't XML. Find an HTML parser instead, you'll be happier in the long run. The standard library has a few (HTMLParser and htmllib), and BeautifulSoup is a well-loved third-party package.

Categories