I'm looking for a way to convert an arbitrary sympy symbol to a string such that it can later be parsed back into the same symbol. For example, I would like to be able to do something like this:
from sympy.parsing.sympy_parser import parse_expr
from sympy import Symbol
A = Symbol("A")
B = Symbol("B")
pathological = Symbol("A B")
parsed = parse_expr(str(pathological)) # this raises an error
assert parsed == pathological
Instead of parsing str(pathological) as representing the pathological symbol, the parser parses A and B separately and we get the following error:
File "<string>", line 1
Symbol ('A' )Symbol ('B' )
^
SyntaxError: invalid syntax
Is there a way to create an escaped string from pathological that is guaranteed to be parsed back to pathological?
The reason I am trying to do this is so that I can store sympy expressions as JSON and reconstruct them. If there is a completely different way to do that, I would be happy to hear.
For such symbols I would store the srepr() form. It may be too much to store that for the whole expression so perhaps it would be useful to make a custom printer that does what StrPrinter does except when necessary falls back to ReprPrinter. See https://docs.sympy.org/latest/modules/printing.html
If you have access to the pathological symbols before they are converted to strings then you can create a "kerned" version that will parse ok and pass the kerned version with the desired version in a local_dict. re.escape will put a backslash in front of the space and the kerned version will replace that \ with something unique:
>>> kerned = re.escape(str(pathological)).replace('\\ ','_kern_')
>>> d = {kerned: pathological}
>>> parse_expr(kerned, d)
A B
Related
my project is to capture a log number from Google Sheet using gspread module. But now the problem is that the log number captured is in the form of string ".\1300". I only want the number in the string but I could not remove it using the below code.
Tried using .replace() function to replace "\" with "" but failed.
a='.\1362'
a.replace('\\',"")
Should obtain the string "1362" without the symbol.
But the result obtained is ".^2"
The problem is that \136 has special meaning (similar to \n for newline, \t for tab, etc). Seemingly it represents ^.
Check out the following example:
a = '.\1362'
a = a.replace('\\',"")
print(a)
b = r'.\1362'
b = b.replace('\\',"")
print(b)
Produces
.^2
.\1362
Now, if your Google Sheets module sends .\1362 instead of .\\1362, if is very likely because you are in fact supposed to receive .^2. Or, there's a problem with your character encoding somewhere along the way.
The r modifier I put on the b variable means raw string, meaning Python will not interpret backlashes and leave your string alone. This is only really useful when typing the strings in manually, but you could perhaps try:
a = r'{}'.format(yourStringFromGoogle)
Edit: As pointed out in the comments, the original code did in fact discard the result of the .replace() method. I've updated the code, but please note that the string interpolation issue remains the same.
When you do a='.\1362', a will only have three bytes:
a = '.\1362'`
print(len(a)) # => 3
That is because \132 represents a single character. If you want to create a six byte string with a dot, a slash, and the digits 1362, you either need to escape the backslash, or create a raw string:
a = r'.\1362'
print(len(a)) # => 6
In either case, calling replace on a string will not replace the characters in that string. a will still be what it was before calling replace. Instead, replace returns a new string:
a = r'.\1362'
b = a.replace('\\', '')
print(a) # => .\1362
print(b) # => .1362
So, if you want to replace characters, calling replace is the way to do it, but you've got to save the result in a new variable or overwrite the old.
See String and Bytes literals in the official python documentation for more information.
Your string should contains 2 backslashes like this .\\1362 or use r'.\1362' (which is declaring the string as raw and then it will be converted to normal during compile time). If there is only one backslash, Python will understand that \136 mean ^ as you can see (ref: link)
Whats happening here is that \1362 is being encoded as ^2 because of the backslash, so you need to make the string raw before you're able to use it, you can do this by doing
a = r'{}'.format(rawInputString)
or if you're on python3.6+ you can do
a = rf'{rawInputString}'
I am trying to sympify a string like these
str1="a^0_0"
ns={}
ns['a^0_0']=Symbol('a^0_0')
pprint(sympify(str1,locals=ns))
But I get the following error
Traceback (most recent call last):
File "cuaterniones_basic.py", line 114, in <module>
pprint(sympify(str1,locals=ns))
File "/usr/local/lib/python2.7/dist-packages/sympy/core/sympify.py", line 356, in sympify
raise SympifyError('could not parse %r' % a, exc)
sympy.core.sympify.SympifyError: Sympify of expression 'could not parse u'a^0_0'' failed, because of exception being raised:
SyntaxError: invalid syntax (<string>, line 1
How can get the symbol I want?
sympify can only parse expressions if they are valid Python (with a few minor exceptions). That means that symbol names can only be parsed if they are valid Python variable names. The solution depends on the exact nature of what you are trying to parse.
If the whole string is the symbol name, just use Symbol instead of sympify.
If you are constructing the Symbol objects from known strings, wrap them in Symbol('...') in your string, like sympify("Symbol('a^0') + 1").
If you know what characters you will see, you can try swapping them before parsing, then swapping them back in the expression with replace.
>>> sympify('a^0 + 1'.replace('^', '__').replace(lambda a: isinstance(a, Symbol), lambda a: Symbol(a.name.replace('__', '^')))
a^0 + 1
(don't confuse str.replace and SymPy's expr.replace here).
This will not work if the characters in your symbol names are also used to represent math outside of the symbol names (like if you use ^ to represent actual exponentiation).
In general, you may need to write your own parsing tool. SymPy's parsing utilities in sympy.parsing can help here.
Indeed, the parser makes a decision about the structure of your input string before it comes to converting pieces to SymPy atoms.
There are a bunch of knobs one can twist by using parse_expr instead of sympify but I haven't found one that works for this string. Instead, it may be easiest to preprocess the input with string replacement, replacing the troublesome characters with something else. This preprocessing doesn't affect the final outcome because the dictionary ns will make things right again.
str1 = "a^0_0"
new_str1 = str1.replace("^", "up")
ns = {new_str1: Symbol(str1)}
print(sympify(new_str1, locals=ns))
Prints a^0_0 which is the name of the created symbol.
I get
ElementTree.ParseError: reference to invalid character number
when parsing XML that contains the following as a tag value: locat
My code looks like:
respXML = httpResponse.content
#also possible respXML = httpResponse.content.decode("utf-8")
#but both get the same error
#this line throws the error
respRoot = ET.fromstring(respXML)
How can I bulletproof my parser against seemingly invalid character numbers?
That looks like html. See if using the html package on the input string before anything else.
https://pypi.python.org/pypi/html
>>> import html
>>> test = "locat"
>>> html.unescape(test)
'local'
Then convert some known unicode characters to their equivalents. i.e
“ => "
’ => '
...
Finally replace double spaces to single space.
Since it'll be pretty cumbersome to address everything successfully upfront - I recommend placing specific exceptions and writing the bad line to file.
One by one address each error in the output file by adding more rules.
Good luck.
I sometimes find useful to save the original input characters with an regex pattern, such as (re.sub(r'&#([a-zA-Z0-9]+);?', r'[#\1;]', s). For example, with
from xml.etree import ElementTree as ET
import re
s = "<Tag>locat</Tag>"
using html.unescape produces
ET.fromstring(html.unescape(s)).text
#Out: 'locat'
but the regex pattern mentioned produces
ET.fromstring(re.sub(r'&#([a-zA-Z0-9]+);?', r'[#\1;]', s)).text
#Out: 'loca[#1;]t'
which preserves the "bad characters".
I'm trying to find a way to print a string in raw form from a variable. For instance, if I add an environment variable to Windows for a path, which might look like 'C:\\Windows\Users\alexb\', I know I can do:
print(r'C:\\Windows\Users\alexb\')
But I cant put an r in front of a variable.... for instance:
test = 'C:\\Windows\Users\alexb\'
print(rtest)
Clearly would just try to print rtest.
I also know there's
test = 'C:\\Windows\Users\alexb\'
print(repr(test))
But this returns 'C:\\Windows\\Users\x07lexb'
as does
test = 'C:\\Windows\Users\alexb\'
print(test.encode('string-escape'))
So I'm wondering if there's any elegant way to make a variable holding that path print RAW, still using test? It would be nice if it was just
print(raw(test))
But its not
I had a similar problem and stumbled upon this question, and know thanks to Nick Olson-Harris' answer that the solution lies with changing the string.
Two ways of solving it:
Get the path you want using native python functions, e.g.:
test = os.getcwd() # In case the path in question is your current directory
print(repr(test))
This makes it platform independent and it now works with .encode. If this is an option for you, it's the more elegant solution.
If your string is not a path, define it in a way compatible with python strings, in this case by escaping your backslashes:
test = 'C:\\Windows\\Users\\alexb\\'
print(repr(test))
In general, to make a raw string out of a string variable, I use this:
string = "C:\\Windows\Users\alexb"
raw_string = r"{}".format(string)
output:
'C:\\\\Windows\\Users\\alexb'
You can't turn an existing string "raw". The r prefix on literals is understood by the parser; it tells it to ignore escape sequences in the string. However, once a string literal has been parsed, there's no difference between a raw string and a "regular" one. If you have a string that contains a newline, for instance, there's no way to tell at runtime whether that newline came from the escape sequence \n, from a literal newline in a triple-quoted string (perhaps even a raw one!), from calling chr(10), by reading it from a file, or whatever else you might be able to come up with. The actual string object constructed from any of those methods looks the same.
I know i'm too late for the answer but for people reading this I found a much easier way for doing it
myVariable = 'This string is supposed to be raw \'
print(r'%s' %myVariable)
try this. Based on what type of output you want. sometime you may not need single quote around printed string.
test = "qweqwe\n1212as\t121\\2asas"
print(repr(test)) # output: 'qweqwe\n1212as\t121\\2asas'
print( repr(test).strip("'")) # output: qweqwe\n1212as\t121\\2asas
Get rid of the escape characters before storing or manipulating the raw string:
You could change any backslashes of the path '\' to forward slashes '/' before storing them in a variable. The forward slashes don't need to be escaped:
>>> mypath = os.getcwd().replace('\\','/')
>>> os.path.exists(mypath)
True
>>>
Just simply use r'string'. Hope this will help you as I see you haven't got your expected answer yet:
test = 'C:\\Windows\Users\alexb\'
rawtest = r'%s' %test
I have my variable assigned to big complex pattern string for using with re module and it is concatenated with few other strings and in the end I want to print it then copy and check on regex101.com.
But when I print it in the interactive mode I get double slash - '\\w'
as #Jimmynoarms said:
The Solution for python 3x:
print(r'%s' % your_variable_pattern_str)
Your particular string won't work as typed because of the escape characters at the end \", won't allow it to close on the quotation.
Maybe I'm just wrong on that one because I'm still very new to python so if so please correct me but, changing it slightly to adjust for that, the repr() function will do the job of reproducing any string stored in a variable as a raw string.
You can do it two ways:
>>>print("C:\\Windows\Users\alexb\\")
C:\Windows\Users\alexb\
>>>print(r"C:\\Windows\Users\alexb\\")
C:\\Windows\Users\alexb\\
Store it in a variable:
test = "C:\\Windows\Users\alexb\\"
Use repr():
>>>print(repr(test))
'C:\\Windows\Users\alexb\\'
or string replacement with %r
print("%r" %test)
'C:\\Windows\Users\alexb\\'
The string will be reproduced with single quotes though so you would need to strip those off afterwards.
To turn a variable to raw str, just use
rf"{var}"
r is raw and f is f-str; put them together and boom it works.
Replace back-slash with forward-slash using one of the below:
re.sub(r"\", "/", x)
re.sub(r"\", "/", x)
This does the trick
>>> repr(string)[1:-1]
Here is the proof
>>> repr("\n")[1:-1] == r"\n"
True
And it can be easily extrapolated into a function if need be
>>> raw = lambda string: repr(string)[1:-1]
>>> raw("\n")
'\\n'
i wrote a small function.. but works for me
def conv(strng):
k=strng
k=k.replace('\a','\\a')
k=k.replace('\b','\\b')
k=k.replace('\f','\\f')
k=k.replace('\n','\\n')
k=k.replace('\r','\\r')
k=k.replace('\t','\\t')
k=k.replace('\v','\\v')
return k
Here is a straightforward solution.
address = 'C:\Windows\Users\local'
directory ="r'"+ address +"'"
print(directory)
"r'C:\\Windows\\Users\\local'"
I have an XML in which I'd like to rename one of the tag groups like this:
<string>ABC</string>
<string>unknown string</string>
should be
<xyz>ABC</xyz>
<xyz>unknown string</xyz>
ABC is always the same, so that's no issue. However, "unknown string" is always different, but since I need this information extracted, I also want to keep the same string in the replacement.
Here's what I got so far:
import re
#open the xml file for reading:
file = open('path/file','r+')
#convert to string:
data = file.read()
file.write(re.sub("<string>ABC</string>(\s+)<string>(.*)</string>","<xyz>ABC</xyz>[\1]<xyz>[\2]</xyz>",data))
print (data)
file.close()
I tried to use capture groups, but didn't do it correctly. The string is replaced with weird symbols in my XML. Plus, it's printed twice. I have both the unchanged and the changed version in my XML, which I don't want.
The problem you're experiencing is not due to your regex pattern. The backslash (\) in the strings are escaping proceeding characters thus resulting in the weird symbols that you see.
>>> print "hello\1world"
helloworld
>>> print r"hello\1world"
hello\1world
Always use the raw string notation to define your re patterns.
>>> data = """
... <string>ABC</string>
... <string>unknown string</string>
... """
>>> print re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data)
<xyz>ABC</xyz>
<xyz>unknown string</xyz>
Why are you including the content in your replacement operation? All you need to do is:
Replace <string> by <xyz>.
Replace </string> by </xyz>.
It would take two operations but the intent of your code would be clear and you don't need to know what unknown string is.