Sympifying strings containing expressions with superscripts - python

I am trying to sympify a string like these
str1="a^0_0"
ns={}
ns['a^0_0']=Symbol('a^0_0')
pprint(sympify(str1,locals=ns))
But I get the following error
Traceback (most recent call last):
File "cuaterniones_basic.py", line 114, in <module>
pprint(sympify(str1,locals=ns))
File "/usr/local/lib/python2.7/dist-packages/sympy/core/sympify.py", line 356, in sympify
raise SympifyError('could not parse %r' % a, exc)
sympy.core.sympify.SympifyError: Sympify of expression 'could not parse u'a^0_0'' failed, because of exception being raised:
SyntaxError: invalid syntax (<string>, line 1
How can get the symbol I want?

sympify can only parse expressions if they are valid Python (with a few minor exceptions). That means that symbol names can only be parsed if they are valid Python variable names. The solution depends on the exact nature of what you are trying to parse.
If the whole string is the symbol name, just use Symbol instead of sympify.
If you are constructing the Symbol objects from known strings, wrap them in Symbol('...') in your string, like sympify("Symbol('a^0') + 1").
If you know what characters you will see, you can try swapping them before parsing, then swapping them back in the expression with replace.
>>> sympify('a^0 + 1'.replace('^', '__').replace(lambda a: isinstance(a, Symbol), lambda a: Symbol(a.name.replace('__', '^')))
a^0 + 1
(don't confuse str.replace and SymPy's expr.replace here).
This will not work if the characters in your symbol names are also used to represent math outside of the symbol names (like if you use ^ to represent actual exponentiation).
In general, you may need to write your own parsing tool. SymPy's parsing utilities in sympy.parsing can help here.

Indeed, the parser makes a decision about the structure of your input string before it comes to converting pieces to SymPy atoms.
There are a bunch of knobs one can twist by using parse_expr instead of sympify but I haven't found one that works for this string. Instead, it may be easiest to preprocess the input with string replacement, replacing the troublesome characters with something else. This preprocessing doesn't affect the final outcome because the dictionary ns will make things right again.
str1 = "a^0_0"
new_str1 = str1.replace("^", "up")
ns = {new_str1: Symbol(str1)}
print(sympify(new_str1, locals=ns))
Prints a^0_0 which is the name of the created symbol.

Related

Which one? f string or using format in python3

Which one is better in python3?
They have same output but most of codes are using format instead F string.
a = "Test"
print(f"this is for {a}")
or format?
print("This is for {}".format(a))
Some times when I used F string for Directory and file path I got some errors but there
were no problem with using format.
As pointed out in the comments, a good comparison of string formatting methods is provided on the Real Python website.
f-strings in many situations can be more readable and less human-error prone to code than other variants, but require Python >= 3.6, so may have to be avoided if backwards compatibility is required. In general they are a good choice though, up to a few gotchas that come up from time to time.
When nesting f-strings you have to be careful with quotation marks. This fails:
>>> f"Hello {"there"}"
File "<stdin>", line 1
f"Hello {"there"}"
^
SyntaxError: invalid syntax
But using other quotation marks on the inside lets you get around this:
>>> f"Hello {'there'}"
'Hello there'
You cannot nest f-strings containing string literals deeper than that though, because you have no further different quotation marks to use.
Another gotcha I stumble across regularly is the restriction on not allowing backslashes in the f-string expression part, even if they are inside a string literal:
>>> f"Path: {'C:\Windows'}"
File "<stdin>", line 1
SyntaxError: f-string expression part cannot include a backslash
You can get around that using an intermediate variable, or format():
>>> path = 'C:\Windows'
>>> f"Path: {path}"
'Path: C:\\Windows'
>>> "Path: {0}".format('C:\Windows')
'Path: C:\\Windows'
This is probably the issue you had with using f-strings to format paths. I personally tend to encounter this limitation when working with string literals that have newlines '\n' in them inside the f-string expression part.

Convert sympy Symbol to string such that it can always be parsed?

I'm looking for a way to convert an arbitrary sympy symbol to a string such that it can later be parsed back into the same symbol. For example, I would like to be able to do something like this:
from sympy.parsing.sympy_parser import parse_expr
from sympy import Symbol
A = Symbol("A")
B = Symbol("B")
pathological = Symbol("A B")
parsed = parse_expr(str(pathological)) # this raises an error
assert parsed == pathological
Instead of parsing str(pathological) as representing the pathological symbol, the parser parses A and B separately and we get the following error:
File "<string>", line 1
Symbol ('A' )Symbol ('B' )
^
SyntaxError: invalid syntax
Is there a way to create an escaped string from pathological that is guaranteed to be parsed back to pathological?
The reason I am trying to do this is so that I can store sympy expressions as JSON and reconstruct them. If there is a completely different way to do that, I would be happy to hear.
For such symbols I would store the srepr() form. It may be too much to store that for the whole expression so perhaps it would be useful to make a custom printer that does what StrPrinter does except when necessary falls back to ReprPrinter. See https://docs.sympy.org/latest/modules/printing.html
If you have access to the pathological symbols before they are converted to strings then you can create a "kerned" version that will parse ok and pass the kerned version with the desired version in a local_dict. re.escape will put a backslash in front of the space and the kerned version will replace that \ with something unique:
>>> kerned = re.escape(str(pathological)).replace('\\ ','_kern_')
>>> d = {kerned: pathological}
>>> parse_expr(kerned, d)
A B

Having some issues with re.sub

In my program I'm parsing Japanese definitions, and I need to take a few things out. There are three things I need to take things out between. 「text」 (text) 《text》
To take out things between 「」 I've been doing sentence = re.sub('「[^)]*」','', sentence) The problem with this is, for some reason if there are parentheses within 「」 it will not replace anything. Also, I've tried using the same code for the other two things like sentence = re.sub('([^)]*)','', sentence)
sentence = re.sub('《[^)]*》','', sentence) but it doesn't work for some reason. There isn't an error or anything, it just doesn't replace anything.
How can I make this work, or is there some better way of doing this?
EDIT:
I'm having a slight problem with another part of this though. Before I replace anything I check the length to make sure it's over a certain length.
parse = re.findall(r'「[^」]*」','', match.text)
if len(str(parse)) > 8:
sentence = re.sub(r'「[^」]*」','', match.text)
This seems to be causing an error now:
Traceback (most recent call last):
File "C:/Users/Dominic/PycharmProjects/untitled9/main.py", line 48, in <module>
parse = re.findall(r'「[^」]*」','', match.text)
File "C:\Python34\lib\re.py", line 206, in findall
return _compile(pattern, flags).findall(string)
File "C:\Python34\lib\re.py", line 275, in _compile
bypass_cache = flags & DEBUG
TypeError: unsupported operand type(s) for &: 'str' and 'int'
I sort of understand what's causing this, but I don't understand why It's not working just from that slight change. I know the re.sub part is fine, It's just the first two lines that are causing the problems.
You should read a tutorial on regular expressions so you understand what your regexps do.
The regexp '「[^)]*」' matches anything between the angles that is not a closing parenthesis. You need this:
sentence = re.sub(r'「[^」]*」','', sentence)
The second regexp has an additional problem: Parentheses have a special meaning (when they are not inside square brackets), so to match parentheses you need to write \( and \). So you need this:
'\([^)]*\)'
Finally: You should always use raw strings for your python regexps. It doesn't happen to make a difference in this case, but it very often does, and the bugs are maddening to spot. E.g., use:
r'\([^)]*\)'
sentence = re.sub(ur'「[^」]*」','', sentence)
^^
You need to change the negatiion based quantifer to stop at 」 instead of ).
You should use unicode flag if dealing with them.If there are ) within them then it will fail as you have used 「[^)]*」
^^
You have instructed regex to stop when it finds ).

Python TypeError when using variable in re.sub

I'm new to python and I keep getting an error doing the simpliest thing.
I'm trying to use a variable in a regular expression and replace that with an *
the following gets me the error "TypeError: not all arguments converted during string formatting" and I can't tell why. this should be so simple.
import re
file = "my123filename.zip"
pattern = "123"
re.sub(r'%s', "*", file) % pattern
Error:
Traceback (most recent call last):
File "", line 1, in ?
TypeError: not all arguments converted during string formatting
Any tips?
You're problem is on this line:
re.sub(r'%s', "*", file) % pattern
What you're doing is replacing every occurance of %s with * in the text from the string file (in this case, I'd recommend renaming the variable filename to avoid shadowing the builtin file object and to make it more explicit what you're working with). Then you're trying to replace the %s in the (already replaced) text with pattern. However, file doesn't have any format modifiers in it which leads to the TypeError you see. It's basically the same as:
'this is a string' % ("foobar!")
which will give you the same error.
What you probably want is something more like:
re.sub(str(pattern),'*',file)
which is exactly equivalent to:
re.sub(r'%s' % pattern,'*',file)
Try re.sub(pattern, "*", file)? Or maybe skip re altogether and just do file.replace("123", "*").

Problem with Boolean Expression with a string value from a lIst

I have the following problem:
# line is a line from a file that contains ["baa","beee","0"]
line = TcsLine.split(",")
NumPFCs = eval(line[2])
if NumPFCs==0:
print line
I want to print all the lines from the file if the second position of the list has a value == 0.
I print the lines but after that the following happens:
Traceback (most recent call last):
['baaa', 'beee', '0', '\n']
BUT after I have the next ERROR
ilation.py", line 141, in ?
getZeroPFcs()
ilation.py", line 110, in getZeroPFcs
NumPFCs = eval(line[2])
File "<string>", line 0
Can you please help me?
thanks
What0s
Let me explain a little what you do here.
If you write:
NumPFCs = eval(line[2])
the order of evaluation is:
take the second character of the string line, i.e. a quote '"'
eval this quote as a python expression, which is an error.
If you write it instead as:
NumPFCs = eval(line)[2]
then the order of evaluation is:
eval the line, producing a python list
take the second element of that list, which is a one-character string: "0"
a string cannot be compared with a number; this is an error too.
In your terms, you want to do the following:
NumPFCs = eval(eval(line)[2])
or, slightly better, compare NumPFCs to a string:
if NumPFCs == "0":
but the ways this could go wrong are almost innumerable. You should forget about eval and try to use other methods: string splitting, regular expressions etc. Others have already provided some suggestions, and I'm sure more will follow.
Your question is kind of hard to read, but using eval there is definitely not a good idea. Either just do a direct string comparison:
line=TcsLine.split(",")
if line[2] == "0":
print line
or use int
line=TcsLine.split(",")
if int(line[2]) == 0:
print line
Either way, your bad data will fail you.
I'd also recomment reading PEP 8.
There are a few issues I see in your code segment:
you make an assumption that list always has at least 3 elements
eval will raise exception if containing string isn't valid python
you say you want second element, but you access the 3rd element.
This is a safer way to do this
line=TcsLine.split(",")
if len(line) >=3 and line[2].rfind("0") != -1:
print line
I'd recommend using a regular expression to capture all of the variants of how 0 can be specified: with double-quotes, without any quotes, with single quotes, with extra whitespace outside the quotes, with whitespace inside the quotes, how you want the square brackets handled, etc.
There are many ways of skinning a cat, as it were :)
Before we begin though, don't use eval on strings that are not yours so if the string has ever left your program; i.e. it has stayed in a file, sent over a network, someone can send in something nasty. And if someone can, you can be sure someone will.
And you might want to look over your data format. Putting strings like ["baa","beee","0", "\n"] in a file does not make much sense to me.
The first and simplest way would be to just strip away the stuff you don't need and to a string comparison. This would work as long as the '0'-string always looks the same and you're not really after the integer value 0, only the character pattern:
TcsLine = '["baa","beee","0"]'
line = TcsLine.strip('[]').split(",")
if line[2] == '"0"':
print line
The second way would be to similar to the first except that we cast the numeric string to an integer, yielding the integer value you were looking for (but printing 'line' without all the quotation marks):
TcsLine = '["baa","beee","0"]'
line = [e.strip('"') for e in TcsLine.strip('[]').split(",")]
NumPFCs = int(line[2])
if NumPFCs==0:
print line
Could it be that the string is actually a json array? Then I would probably go get simplejson to parse it properly if I were running Python<2.6 or just import json on Python>=2.6. Then cast the resulting '0'-string to an integer as in the previous example.
TcsLine = '["baa","beee","0"]'
#import json # for >= Python2.6
import simplejson as json # for <Python2.6
line = json.loads(TcsLine)
NumPFCs = int(line[2])
if NumPFCs==0:
print line

Categories