How to evaluate a variable as an f-string? - python

I would like to have a mechanism which evaluates an f-string where the contents to be evaluated are provided inside a variable. For example,
x=7
s='{x+x}'
fstr_eval(s)
For the usage case I have in mind, the string s may arise from user input (where the user is trusted with eval).
While using eval in production is generally very bad practice, there are notable exceptions. For instance, the user may be a Python developer, working on a local machine, who would like to use full Python syntax to develop SQL queries.
Note on duplication: There are similar questions here and here. The first question was asked in the limited context of templates. The second question, although very similar to this one, has been marked as a duplicate. Because the context of this question is significantly different from the first, I decided to ask this third question based on the automatically-generated advice following the second question:
This question has been asked before and already has an answer. If
those answers do not fully address your question, please ask a new
question.

Even with a trusted user, using eval should only be a very last resort.
If you are willing to sacrifice flexibility of your syntax for a bit more security and control, then you could use str.format and provide it your whole scope.
This will disallow evaluation of expressions, but single variables will be formated into the output.
Code
x = 3
y = 'foo'
s = input('> ')
print(s.format(**vars()))
Example
> {x} and {y}
3 and foo

Here is my attempt at more robust evaluation of f-strings, inspired by kadee's elegant answer to a similar question.
I would however like to avoid some basic pitfalls of the eval approach. For instance, eval(f"f'{template}'") fails whenever the template contains an apostrophe, e.g. the string's evaluation becomes f'the string's evaluation' which evaluates with a syntax error. The first improvement is to use triple-apostrophes:
eval(f"f'''{template}'''")
Now it is (mostly) safe to use apostrophes in the template, as long as they are not triple-apostrophes. (Triple-quotes are however fine.) A notable exception is an apostrophe at the end of the string: whatcha doin' becomes f'''whatcha doin'''' which evaluates with a syntax error at the fourth consecutive apostrophe. The following code avoids this particular issue by stripping apostrophes at the end of the string and putting them back after evaluation.
import builtins
def fstr_eval(_s: str, raw_string=False, eval=builtins.eval):
r"""str: Evaluate a string as an f-string literal.
Args:
_s (str): The string to evaluate.
raw_string (bool, optional): Evaluate as a raw literal
(don't escape \). Defaults to False.
eval (callable, optional): Evaluation function. Defaults
to Python's builtin eval.
Raises:
ValueError: Triple-apostrophes ''' are forbidden.
"""
# Prefix all local variables with _ to reduce collisions in case
# eval is called in the local namespace.
_TA = "'''" # triple-apostrophes constant, for readability
if _TA in _s:
raise ValueError("Triple-apostrophes ''' are forbidden. " + \
'Consider using """ instead.')
# Strip apostrophes from the end of _s and store them in _ra.
# There are at most two since triple-apostrophes are forbidden.
if _s.endswith("''"):
_ra = "''"
_s = _s[:-2]
elif _s.endswith("'"):
_ra = "'"
_s = _s[:-1]
else:
_ra = ""
# Now the last character of s (if it exists) is guaranteed
# not to be an apostrophe.
_prefix = 'rf' if raw_string else 'f'
return eval(_prefix + _TA + _s + _TA) + _ra
Without specifying an evaluation function, this function's local variables are accessible, so
print(fstr_eval(r"raw_string: {raw_string}\neval: {eval}\n_s: {_s}"))
prints
raw_string: False
eval: <built-in function eval>
_s: raw_string: {raw_string}\neval: {eval}\n_s: {_s}
While the prefix _ reduces the likelihood of unintentional collisions, the issue can be avoided by passing an appropriate evaluation function. For instance, one could pass the current global namespace by means of lambda:
fstr_eval('{_s}', eval=lambda expr: eval(expr))#NameError: name '_s' is not defined
or more generally by passing suitable globals and locals arguments to eval, for instance
fstr_eval('{x+x}', eval=lambda expr: eval(expr, {}, {'x': 7})) # 14
I have also included a mechanism to select whether or not \ should be treated as an escape character via the "raw string literal" mechanism. For example,
print(fstr_eval(r'x\ny'))
yields
x
y
while
print(fstr_eval(r'x\ny', raw_string=True))
yields
x\ny
There are likely other pitfalls which I have not noticed, but for many purposes I think this will suffice.

Related

Is there a blank identifier in Python as there is in Golang? [duplicate]

What is the meaning of _ after for in this code?
if tbh.bag:
n = 0
for _ in tbh.bag.atom_set():
n += 1
_ has 3 main conventional uses in Python:
To hold the result of the last executed expression in an interactive
interpreter session (see docs). This precedent was set by the standard CPython
interpreter, and other interpreters have followed suit
For translation lookup in i18n (see the
gettext
documentation for example), as in code like
raise forms.ValidationError(_("Please enter a correct username"))
As a general purpose "throwaway" variable name:
To indicate that part
of a function result is being deliberately ignored (Conceptually, it is being discarded.), as in code like:
label, has_label, _ = text.partition(':')
As part of a function definition (using either def or lambda), where
the signature is fixed (e.g. by a callback or parent class API), but
this particular function implementation doesn't need all of the
parameters, as in code like:
def callback(_):
return True
[For a long time this answer didn't list this use case, but it came up often enough, as noted here, to be worth listing explicitly.]
This use case can conflict with the translation lookup use case, so it is necessary to avoid using _ as a throwaway variable in any code block that also uses it for i18n translation (many folks prefer a double-underscore, __, as their throwaway variable for exactly this reason).
Linters often recognize this use case. For example year, month, day = date() will raise a lint warning if day is not used later in the code. The fix, if day is truly not needed, is to write year, month, _ = date(). Same with lambda functions, lambda arg: 1.0 creates a function requiring one argument but not using it, which will be caught by lint. The fix is to write lambda _: 1.0. An unused variable is often hiding a bug/typo (e.g. set day but use dya in the next line).
The pattern matching feature added in Python 3.10 elevated this usage from "convention" to "language syntax" where match statements are concerned: in match cases, _ is a wildcard pattern, and the runtime doesn't even bind a value to the symbol in that case.
For other use cases, remember that _ is still a valid variable name, and hence will still keep objects alive. In cases where this is undesirable (e.g. to release memory or external resources) an explicit del name call will both satisfy linters that the name is being used, and promptly clear the reference to the object.
It's just a variable name, and it's conventional in python to use _ for throwaway variables. It just indicates that the loop variable isn't actually used.
Underscore _ is considered as "I don't Care" or "Throwaway" variable in Python
The python interpreter stores the last expression value to the special variable called _.
>>> 10
10
>>> _
10
>>> _ * 3
30
The underscore _ is also used for ignoring the specific values. If you don’t need the specific values or the values are not used, just assign the values to underscore.
Ignore a value when unpacking
x, _, y = (1, 2, 3)
>>> x
1
>>> y
3
Ignore the index
for _ in range(10):
do_something()
There are 5 cases for using the underscore in Python.
For storing the value of last expression in interpreter.
For ignoring the specific values. (so-called “I don’t care”)
To give special meanings and functions to name of variables or functions.
To use as ‘internationalization (i18n)’ or ‘localization (l10n)’ functions.
To separate the digits of number literal value.
Here is a nice article with examples by mingrammer.
As far as the Python languages is concerned, _ generally has no special meaning. It is a valid identifier just like _foo, foo_ or _f_o_o_.
The only exception are match statements since Python 3.10:
In a case pattern within a match statement, _ is a soft keyword that denotes a wildcard. source
Otherwise, any special meaning of _ is purely by convention. Several cases are common:
A dummy name when a variable is not intended to be used, but a name is required by syntax/semantics.
# iteration disregarding content
sum(1 for _ in some_iterable)
# unpacking disregarding specific elements
head, *_ = values
# function disregarding its argument
def callback(_): return True
Many REPLs/shells store the result of the last top-level expression to builtins._.
The special identifier _ is used in the interactive interpreter to store the result of the last evaluation; it is stored in the builtins module. When not in interactive mode, _ has no special meaning and is not defined. [source]
Due to the way names are looked up, unless shadowed by a global or local _ definition the bare _ refers to builtins._ .
>>> 42
42
>>> f'the last answer is {_}'
'the last answer is 42'
>>> _
'the last answer is 42'
>>> _ = 4 # shadow ``builtins._`` with global ``_``
>>> 23
23
>>> _
4
Note: Some shells such as ipython do not assign to builtins._ but special-case _.
In the context internationalization and localization, _ is used as an alias for the primary translation function.
gettext.gettext(message)
Return the localized translation of message, based on the current global domain, language, and locale directory. This function is usually aliased as _() in the local namespace (see examples below).

What does '_' mean in python [duplicate]

What is the meaning of _ after for in this code?
if tbh.bag:
n = 0
for _ in tbh.bag.atom_set():
n += 1
_ has 3 main conventional uses in Python:
To hold the result of the last executed expression in an interactive
interpreter session (see docs). This precedent was set by the standard CPython
interpreter, and other interpreters have followed suit
For translation lookup in i18n (see the
gettext
documentation for example), as in code like
raise forms.ValidationError(_("Please enter a correct username"))
As a general purpose "throwaway" variable name:
To indicate that part
of a function result is being deliberately ignored (Conceptually, it is being discarded.), as in code like:
label, has_label, _ = text.partition(':')
As part of a function definition (using either def or lambda), where
the signature is fixed (e.g. by a callback or parent class API), but
this particular function implementation doesn't need all of the
parameters, as in code like:
def callback(_):
return True
[For a long time this answer didn't list this use case, but it came up often enough, as noted here, to be worth listing explicitly.]
This use case can conflict with the translation lookup use case, so it is necessary to avoid using _ as a throwaway variable in any code block that also uses it for i18n translation (many folks prefer a double-underscore, __, as their throwaway variable for exactly this reason).
Linters often recognize this use case. For example year, month, day = date() will raise a lint warning if day is not used later in the code. The fix, if day is truly not needed, is to write year, month, _ = date(). Same with lambda functions, lambda arg: 1.0 creates a function requiring one argument but not using it, which will be caught by lint. The fix is to write lambda _: 1.0. An unused variable is often hiding a bug/typo (e.g. set day but use dya in the next line).
The pattern matching feature added in Python 3.10 elevated this usage from "convention" to "language syntax" where match statements are concerned: in match cases, _ is a wildcard pattern, and the runtime doesn't even bind a value to the symbol in that case.
For other use cases, remember that _ is still a valid variable name, and hence will still keep objects alive. In cases where this is undesirable (e.g. to release memory or external resources) an explicit del name call will both satisfy linters that the name is being used, and promptly clear the reference to the object.
It's just a variable name, and it's conventional in python to use _ for throwaway variables. It just indicates that the loop variable isn't actually used.
Underscore _ is considered as "I don't Care" or "Throwaway" variable in Python
The python interpreter stores the last expression value to the special variable called _.
>>> 10
10
>>> _
10
>>> _ * 3
30
The underscore _ is also used for ignoring the specific values. If you don’t need the specific values or the values are not used, just assign the values to underscore.
Ignore a value when unpacking
x, _, y = (1, 2, 3)
>>> x
1
>>> y
3
Ignore the index
for _ in range(10):
do_something()
There are 5 cases for using the underscore in Python.
For storing the value of last expression in interpreter.
For ignoring the specific values. (so-called “I don’t care”)
To give special meanings and functions to name of variables or functions.
To use as ‘internationalization (i18n)’ or ‘localization (l10n)’ functions.
To separate the digits of number literal value.
Here is a nice article with examples by mingrammer.
As far as the Python languages is concerned, _ generally has no special meaning. It is a valid identifier just like _foo, foo_ or _f_o_o_.
The only exception are match statements since Python 3.10:
In a case pattern within a match statement, _ is a soft keyword that denotes a wildcard. source
Otherwise, any special meaning of _ is purely by convention. Several cases are common:
A dummy name when a variable is not intended to be used, but a name is required by syntax/semantics.
# iteration disregarding content
sum(1 for _ in some_iterable)
# unpacking disregarding specific elements
head, *_ = values
# function disregarding its argument
def callback(_): return True
Many REPLs/shells store the result of the last top-level expression to builtins._.
The special identifier _ is used in the interactive interpreter to store the result of the last evaluation; it is stored in the builtins module. When not in interactive mode, _ has no special meaning and is not defined. [source]
Due to the way names are looked up, unless shadowed by a global or local _ definition the bare _ refers to builtins._ .
>>> 42
42
>>> f'the last answer is {_}'
'the last answer is 42'
>>> _
'the last answer is 42'
>>> _ = 4 # shadow ``builtins._`` with global ``_``
>>> 23
23
>>> _
4
Note: Some shells such as ipython do not assign to builtins._ but special-case _.
In the context internationalization and localization, _ is used as an alias for the primary translation function.
gettext.gettext(message)
Return the localized translation of message, based on the current global domain, language, and locale directory. This function is usually aliased as _() in the local namespace (see examples below).

Python function parameter as introspectable string

I'm currently building up a library of personal debugging functions, and wanted to make a debugging print function that would provide more quality-of-life functionality than adding print(thing) lines everywhere. My current goal is to have this pseudocode function translated to Python:
def p(*args, **kwargs)
for arg in args:
print(uneval(arg), ": ", arg)
return args
The types of **kwargs and how they'd effect the output isn't as important to this question, hence why I don't use them here.
My main problem comes from trying to have some method of uneval-ing the arguments, such that I can get the string of code that actually produced them. As a use case:
>>> p(3 + 5) + 1
3 + 5: 8 # printed by p
9 # Return value of the entire expression
There's a couple of ways that I see that I can do this, and not only am I not sure if any of them work, I'm also concerned about how Pythonic any solution that actually implements these could possibly be.
Pass the argument as a string, eval it using the previous context's local identifier dictionary (how could I get that without passing it as another argument, which I definitely don't want to do?)
Figure out which line it's being run from in the .py file to extract the uneval'd strings that way (is that even possible?)
Find some metadata magic that has the information I need already in it (if it exists, which is unlikely at best).

Is there a string `s` such that eval(repr(s)) leads to arbitrary code execution?

I found similar code somewhere:
USER_CONTROLLED = 'a'
open("settings.py", "w").write("USER_CONTROLLED = %s" % eval(repr(a)))
And in another file:
import settings
x = settings.USER_CONTROLLED * [0]
Is this a security vulnerability?
In contrast to what you were told on IRC, there definitely is an x that makes eval(repr(x)) dangerous, so saying it just like that without any restrictions is wrong too.
Imagine a custom object that implements __repr__ differently. The documentation says on __repr__ that it “should look like a valid Python expression that could be used to recreate an object with the same value”. But there is simply nothing that can possibly enforce this guideline.
So instead, we could create a class that has a custom __repr__ that returns a string which when evaluated runs arbitrary code. For example:
class MyObj:
def __repr__ (self):
return "__import__('urllib.request').request.urlopen('http://example.com').read()"
Calling repr() on an object of that type shows that it returns a string that can surely be evaluated:
>>> repr(MyObj())
"__import__('urllib.request').request.urlopen('http://example.com').read()"
Here, that would just involve making a request to example.com. But as you can see, we can import arbitrary modules here and run code with them. And that code can have any kind of side effects. So it’s definitely dangerous.
If we however limit that x to known types of which we know what calling repr() on them will do, then we can indeed say when it’s impossible to run arbitrary code with it. For example, if that x is a string, then the implementation of unicode_repr makes sure that everything is properly escaped and that evaluating the repr() of that object will always return a proper string (which even equals x) without any side effects.
So we should check for the type before evaluating it:
if type(a) is not str:
raise Exception('Only strings are allowed!')
something = eval(repr(a))
Note that we do not use isinstance here to do an inheritance-aware type check. Because I could absolutely make MyObj above inherit from str:
>>> x = MyObj()
>>> isinstance(x, str)
True
>>> type(x)
<class '__main__.MyObj'>
So you should really test against concrete types here.
Note that for strings, there is actually no reason to call eval(repr(x)) because as mentioned above, this will result in x itself. So you could just assign x directly.
Coming to your actual use case however, you do have a very big security problem here. You want to create a variable assignment and store that code in a Python file to be later run by an actual Python interpreter. So you should absolutely make sure that the right side of the assignment is not arbitrary code but actually the repr of a string:
>>> a = 'runMaliciousCode()'
>>> "USER_CONTROLLED = %s" % eval(repr(a))
'USER_CONTROLLED = runMaliciousCode()'
>>> "USER_CONTROLLED = %s" % repr(a)
"USER_CONTROLLED = 'runMaliciousCode()'"
As you can see, evaluating the repr() will put the actual content on the right side of the assignment (since it’s equivalent to "…" % a). But that can then lead to malicious code running when you import that file. So you should really just insert the repr of the string there, and completely forget about using eval altogether.

Why does python disallow usage of hyphens within function and variable names?

I have always wondered why can't we use hyphens in between function names and variable names in python
Having tried functional programming languages like Lisp and Clojure, where hyphens are allowed. Why python doesn't do that.
# This won't work -- SyntaxError
def is-even(num):
return num % 2
# This will work
def is_even(num):
return num % 2
I am sure Sir Guido must have done this because of some reasons. I googled but couldn't manage to find the answer. Can anyone please throw some light on this?
Because hyphen is used as the subtraction operator. Imagine that you could have an is-even function, and then you had code like this:
my_var = is-even(another_var)
Is is-even(another_var) a call to the function is-even, or is it subtracting the result of the function even from a variable named is?
Lisp dialects don't have this problem, since they use prefix notation. For example, there's clear difference between
(is-even 4)
and
(- is (even 4))
in Lisps.
Because Python uses infix notation to represent calculations and a hyphen and a minus has the exact same ascii code. You can have ambiguous cases such as:
a-b = 10
a = 1
b = 1
c = a-b
What is the answer? 0 or 10?
Because it would make the parser even more complicated. It would be confusing too for the programmers.
Consider def is-even(num): : now, if is is a global variable, what happens?
Also note that the - is the subtraction operator in Python, hence would further complicate parsing.
is-even(num)
contains a hyphen ? I thought it was a subtraction of the value returned by function even with argument num from the value of is.
As #jdupont says, parsing can be tricky.
Oddly enough it is possible to have class variable names with hyphens using setattr(), not that you would want to. Here is an example:
class testclass:
pass
x = testclass()
setattr(x, "is-even", True)
getattr(x, "is-even")
True
This still fails:
x.is-even
File "<stdin>", line 1
x.is-even
^
SyntaxError: invalid syntax

Categories