Python query for code examples - python

I want to create something like a dictionary for python code examples. My problem is, that I have to escape all the code examples. Also r'some string' is not useful. Would you recommend to use an other solution to query this entries?
import easygui
lex = {"dict": "woerter = {\"house\" : \"Haus\"}\nwoerter[\"house\"]",\
"for": "for x in range(0, 3):\n print \"We are on time %d\" % (x)",\
"while": "while expression:\n statement(s)"}
input_ = easygui.enterbox("Python-lex","")
output = lex[input_]
b = easygui.textbox("","",output)

Use triple quoting:
lex = {"dict": '''\
woerter = {"house" : "Haus"}
woerter["house"]
''',
"for": '''\
for x in range(0, 3):
print "We are on time %d" % (x)
''',
"while": '''\
while expression:
statement(s)
'''}
Triple-quoted strings (using ''' or """ delimiters) preserve newlines and any embedded single quotes do not need to be escaped.
The \ escape after the opening ''' triple quote escapes the newline at the start, making the value a little easier to read. The alternative would be to put the first line directly after the opening quotes.
You can make these raw as well; r'''\n''' would contain the literal characters \ and n, but literal newlines still remain literal newlines. Triple-quoting works with double-quote characters too: """This is a triple-quoted string too""". The only thing you'd have to escape is another triple quote in the same style; you only need to escape one quote character in that case:
triple_quote_with_embedded_triple = '''Triple quotes use \''' and """ delimiters'''

I guess you can use json.dumps(data, incident=1) to convert the data, and transfer into easygui.textbox.
like this below:
import json
import easygui
resp = dict(...)
easygui.textbox(text=json.dumps(resp, indent=1))

Related

How can i change string encoding? [duplicate]

I have a string where special characters like ' or " or & (...) can appear. In the string:
string = """ Hello "XYZ" this 'is' a test & so on """
how can I automatically escape every special character, so that I get this:
string = " Hello "XYZ" this 'is' a test & so on "
In Python 3.2, you could use the html.escape function, e.g.
>>> string = """ Hello "XYZ" this 'is' a test & so on """
>>> import html
>>> html.escape(string)
' Hello "XYZ" this 'is' a test & so on '
For earlier versions of Python, check http://wiki.python.org/moin/EscapingHtml:
The cgi module that comes with Python has an escape() function:
import cgi
s = cgi.escape( """& < >""" ) # s = "& < >"
However, it doesn't escape characters beyond &, <, and >. If it is used as cgi.escape(string_to_escape, quote=True), it also escapes ".
Here's a small snippet that will let you escape quotes and apostrophes as well:
html_escape_table = {
"&": "&",
'"': """,
"'": "&apos;",
">": ">",
"<": "<",
}
def html_escape(text):
"""Produce entities within text."""
return "".join(html_escape_table.get(c,c) for c in text)
You can also use escape() from xml.sax.saxutils to escape html. This function should execute faster. The unescape() function of the same module can be passed the same arguments to decode a string.
from xml.sax.saxutils import escape, unescape
# escape() and unescape() takes care of &, < and >.
html_escape_table = {
'"': """,
"'": "&apos;"
}
html_unescape_table = {v:k for k, v in html_escape_table.items()}
def html_escape(text):
return escape(text, html_escape_table)
def html_unescape(text):
return unescape(text, html_unescape_table)
The cgi.escape method will convert special charecters to valid html tags
import cgi
original_string = 'Hello "XYZ" this \'is\' a test & so on '
escaped_string = cgi.escape(original_string, True)
print original_string
print escaped_string
will result in
Hello "XYZ" this 'is' a test & so on
Hello "XYZ" this 'is' a test & so on
The optional second paramter on cgi.escape escapes quotes. By default, they are not escaped
A simple string function will do it:
def escape(t):
"""HTML-escape the text in `t`."""
return (t
.replace("&", "&").replace("<", "<").replace(">", ">")
.replace("'", "'").replace('"', """)
)
Other answers in this thread have minor problems: The cgi.escape method for some reason ignores single-quotes, and you need to explicitly ask it to do double-quotes. The wiki page linked does all five, but uses the XML entity &apos;, which isn't an HTML entity.
This code function does all five all the time, using HTML-standard entities.
The other answers here will help with such as the characters you listed and a few others. However, if you also want to convert everything else to entity names, too, you'll have to do something else. For instance, if á needs to be converted to á, neither cgi.escape nor html.escape will help you there. You'll want to do something like this that uses html.entities.entitydefs, which is just a dictionary. (The following code is made for Python 3.x, but there's a partial attempt at making it compatible with 2.x to give you an idea):
# -*- coding: utf-8 -*-
import sys
if sys.version_info[0]>2:
from html.entities import entitydefs
else:
from htmlentitydefs import entitydefs
text=";\"áèïøæỳ" #This is your string variable containing the stuff you want to convert
text=text.replace(";", "$ஸ$") #$ஸ$ is just something random the user isn't likely to have in the document. We're converting it so it doesn't convert the semi-colons in the entity name into entity names.
text=text.replace("$ஸ$", "&semi;") #Converting semi-colons to entity names
if sys.version_info[0]>2: #Using appropriate code for each Python version.
for k,v in entitydefs.items():
if k not in {"semi", "amp"}:
text=text.replace(v, "&"+k+";") #You have to add the & and ; manually.
else:
for k,v in entitydefs.iteritems():
if k not in {"semi", "amp"}:
text=text.replace(v, "&"+k+";") #You have to add the & and ; manually.
#The above code doesn't cover every single entity name, although I believe it covers everything in the Latin-1 character set. So, I'm manually doing some common ones I like hereafter:
text=text.replace("ŷ", "&ycirc;")
text=text.replace("Ŷ", "&Ycirc;")
text=text.replace("ŵ", "&wcirc;")
text=text.replace("Ŵ", "&Wcirc;")
text=text.replace("ỳ", "ỳ")
text=text.replace("Ỳ", "Ỳ")
text=text.replace("ẃ", "&wacute;")
text=text.replace("Ẃ", "&Wacute;")
text=text.replace("ẁ", "ẁ")
text=text.replace("Ẁ", "Ẁ")
print(text)
#Python 3.x outputs: &semi;"áèïøæỳ
#The Python 2.x version outputs the wrong stuff. So, clearly you'll have to adjust the code somehow for it.

I want to replace single quotes with double quotes in a list

So I am making a program that takes a text file, breaks it into words, then writes the list to a new text file.
The issue I am having is I need the strings in the list to be with double quotes not single quotes.
For example
I get this ['dog','cat','fish'] when I want this ["dog","cat","fish"]
Here is my code
with open('input.txt') as f:
file = f.readlines()
nonewline = []
for x in file:
nonewline.append(x[:-1])
words = []
for x in nonewline:
words = words + x.split()
textfile = open('output.txt','w')
textfile.write(str(words))
I am new to python and haven't found anything about this.
Anyone know how to solve this?
[Edit: I forgot to mention that i was using the output in an arduino project that required the list to have double quotes.]
You cannot change how str works for list.
How about using JSON format which use " for strings.
>>> animals = ['dog','cat','fish']
>>> print(str(animals))
['dog', 'cat', 'fish']
>>> import json
>>> print(json.dumps(animals))
["dog", "cat", "fish"]
import json
...
textfile.write(json.dumps(words))
Most likely you'll want to just replace the single quotes with double quotes in your output by replacing them:
str(words).replace("'", '"')
You could also extend Python's str type and wrap your strings with the new type changing the __repr__() method to use double quotes instead of single. It's better to be simpler and more explicit with the code above, though.
class str2(str):
def __repr__(self):
# Allow str.__repr__() to do the hard work, then
# remove the outer two characters, single quotes,
# and replace them with double quotes.
return ''.join(('"', super().__repr__()[1:-1], '"'))
>>> "apple"
'apple'
>>> class str2(str):
... def __repr__(self):
... return ''.join(('"', super().__repr__()[1:-1], '"'))
...
>>> str2("apple")
"apple"
>>> str2('apple')
"apple"
In Python, double quote and single quote are the same. There's no different between them. And there's no point to replace a single quote with a double quote and vice versa:
2.4.1. String and Bytes literals
...In plain English: Both types of literals can be enclosed in matching single quotes (') or double quotes ("). They can also be enclosed in matching groups of three single or double quotes (these are generally referred to as triple-quoted strings). The backslash () character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character...
"The issue I am having is I need the strings in the list to be with double quotes not single quotes." - Then you need to make your program accept single quotes, not trying to replace single quotes with double quotes.

Convert escaped utf-8 string to utf in python 3

I have a py3 string that includes escaped utf-8 sequencies, such as "Company\\ffffffc2\\ffffffae", which I would like to convert to the correct utf 8 string (which would in the example be "Company®", since the escaped sequence is c2 ae). I've tried
print (bytes("Company\\\\ffffffc2\\\\ffffffae".replace(
"\\\\ffffff", "\\x"), "ascii").decode("utf-8"))
result: Company\xc2\xae
print (bytes("Company\\\\ffffffc2\\\\ffffffae".replace (
"\\\\ffffff", "\\x"), "ascii").decode("unicode_escape"))
result: Company®
(wrong, since chracters are treated separately, but they should be treated together.
If I do
print (b"Company\xc2\xae".decode("utf-8"))
It gives the correct result.
Company®
How can i achieve that programmatically (i.e. starting from a py3 str)
A simple solution is:
import ast
test_in = "Company\\\\ffffffc2\\\\ffffffae"
test_out = ast.literal_eval("b'''" + test_in.replace('\\\\ffffff','\\x') + "'''").decode('utf-8')
print(test_out)
However it will fail if there is a triple quote ''' in the input string itself.
Following code does not have this problem, but it is not as simple as the first one.
In the first step the string is split on a regular expression. The odd items are ascii parts, e.g. "Company"; each even item corresponds to one escaped utf8 code, e.g. "\\\\ffffffc2". Each substring is converted to bytes according to its meaning in the input string. Finally all parts are joined together and decoded from bytes to a string.
import re
REGEXP = re.compile(r'(\\\\ffffff[0-9a-f]{2})', flags=re.I)
def convert(estr):
def split(estr):
for i, substr in enumerate(REGEXP.split(estr)):
if i % 2:
yield bytes.fromhex(substr[-2:])
elif substr:
yield bytes(substr, 'ascii')
return b''.join(split(estr)).decode('utf-8')
test_in = "Company\\\\ffffffc2\\\\ffffffae"
print(convert(test_in))
The code could be optimized. Ascii parts do not need encode/decode and consecutive hex codes should be concatenated.

Python: Caesar shift - inputting string that contains multiple "#'{]$ values

I created a caesar shift program in python, see below:
from string import maketrans
originalChar = (raw_input("Enter a letter: "))
numToInc = int(raw_input("Enter a number: "))
code = ""
for x in originalChar:
newChar = (chr(ord(x) + numToInc))
code = code + newChar
transtab = maketrans(chr(32+numToInc), " ")
print code.translate(transtab)
I have to shift the following code:
%%$#_$^__#)^)&!_+]!*#&^}#[#%]()%+$&[(_#%+%$*^#$^!+]!&_#)_*}{}}!}_]$[%}#[{_##_^{*
###&{#&{&)*%(]{{([*}#[#&]+!!*{)!}{%+{))])[!^})+)$]#{*+^((#^#}$[**$&^{$!##$%)!#(&
+^!{%_$&#^!}$_${)$_#)!({#!)(^}!*^&!$%_&&}&_#&#{)]{+)%*{&*%*&#%$+]!*__(#!*){%&#++
!_)^$&&%#+)}!#!)&^}**#!_$([$!$}#*^}$+&#[{*{}{((#$]{[$[$$()_#}!#}^#_&%^*!){*^^_$^
As you can see, it contains comments.
How do I make that code into a string? The comments are stopping my python script from giving any output.
Thanks
Tripple quote it:
"""%%$#_$^__#)^)&!_+]!*#&^}#[#%]()%+$&[(_#%+%$*^#$^!+]!&_#)_*}{}}!}_]$[%}#[{_##_^{*
###&{#&{&)*%(]{{([*}#[#&]+!!*{)!}{%+{))])[!^})+)$]#{*+^((#^#}$[**$&^{$!##$%)!#(&
+^!{%_$&#^!}$_${)$_#)!({#!)(^}!*^&!$%_&&}&_#&#{)]{+)%*{&*%*&#%$+]!*__(#!*){%&#++
!_)^$&&%#+)}!#!)&^}**#!_$([$!$}#*^}$+&#[{*{}{((#$]{[$[$$()_#}!#}^#_&%^*!){*^^_$^"""
Triple quoted strings can span multiple lines as well as have comments in them.
Surround the input text between triple quotes: """your code"""
The triple-quote approach won't work if your source string contains Python escape characters. For example, the sequence \n means newline, and is interpreted (correctly) as one character, not two.
If you want a general solution in which all the characters provided in your input are captured as-is and without escaping / interpretation, use the raw string approach via a leading r outside the quotes:
>>> s = '\n\n\n'
>>> print len(s)
3
vs.
>>> r = r'\n\n\n'
>>> print len(r)
6
No special cases to worry about.

python regex for repeating string

I am wanting to verify and then parse this string (in quotes):
string = "start: c12354, c3456, 34526; other stuff that I don't care about"
//Note that some codes begin with 'c'
I would like to verify that the string starts with 'start:' and ends with ';'
Afterward, I would like to have a regex parse out the strings. I tried the following python re code:
regx = r"start: (c?[0-9]+,?)+;"
reg = re.compile(regx)
matched = reg.search(string)
print ' matched.groups()', matched.groups()
I have tried different variations but I can either get the first or the last code but not a list of all three.
Or should I abandon using a regex?
EDIT: updated to reflect part of the problem space I neglected and fixed string difference.
Thanks for all the suggestions - in such a short time.
In Python, this isn’t possible with a single regular expression: each capture of a group overrides the last capture of that same group (in .NET, this would actually be possible since the engine distinguishes between captures and groups).
Your easiest solution is to first extract the part between start: and ; and then using a regular expression to return all matches, not just a single match, using re.findall('c?[0-9]+', text).
You could use the standard string tools, which are pretty much always more readable.
s = "start: c12354, c3456, 34526;"
s.startswith("start:") # returns a boolean if it starts with this string
s.endswith(";") # returns a boolean if it ends with this string
s[6:-1].split(', ') # will give you a list of tokens separated by the string ", "
This can be done (pretty elegantly) with a tool like Pyparsing:
from pyparsing import Group, Literal, Optional, Word
import string
code = Group(Optional(Literal("c"), default='') + Word(string.digits) + Optional(Literal(","), default=''))
parser = Literal("start:") + OneOrMore(code) + Literal(";")
# Read lines from file:
with open('lines.txt', 'r') as f:
for line in f:
try:
result = parser.parseString(line)
codes = [c[1] for c in result[1:-1]]
# Do something with teh codez...
except ParseException exc:
# Oh noes: string doesn't match!
continue
Cleaner than a regular expression, returns a list of codes (no need to string.split), and ignores any extra characters in the line, just like your example.
import re
sstr = re.compile(r'start:([^;]*);')
slst = re.compile(r'(?:c?)(\d+)')
mystr = "start: c12354, c3456, 34526; other stuff that I don't care about"
match = re.match(sstr, mystr)
if match:
res = re.findall(slst, match.group(0))
results in
['12354', '3456', '34526']

Categories