For a project, I am trying to read through a python file and keep a list of all the variable being used within a certain function. I am reading through the lines in the python file in string format and then focusing on a line where starting with "def". For the purpose of this example pretend we have the following line identified:
def func(int_var:int,float_var=12.1,string_var=foo()):
I want to use regex or any other method to grab the values within this function declaration.
I want to grab the string "int_var:int,float_var=12.1,string_var=foo()", and later split it based on the commas to get ["int_var:int","float_var=12.1","string_var=foo()"]
I am having a lot of trouble being able to isolate the items between the parenthesis corresponding to 'func'.
Any help creating a regex pattern would be greatly appreciated!
Instead of regex, it is much easier and far more robust to use the ast module:
import ast
s = """
def func(int_var:int,float_var=12.1,string_var=foo()):
pass
"""
def form_sig(sig):
a = sig.args
d = [f'{ast.unparse(a.pop())}={ast.unparse(j)}' for j in sig.defaults[::-1]][::-1]
v_arg = [] if sig.vararg is None else [f'*{sig.vararg.arg}']
kwarg = [] if sig.vararg is None else [f'*{sig.kwark.arg}']
return [*map(ast.unparse, a), *d, *v_arg, *kwarg]
f = [{'name':i.name, 'sig':form_sig(i.args)} for i in ast.walk(ast.parse(s))
if isinstance(i, ast.FunctionDef)]
Output:
[{'name': 'func', 'sig': ['int_var: int', 'float_var=12.1', 'string_var=foo()']}]
func_pattern = re.compile(r'^\s*def\s(?P<name>[A-z_][A-z0-9_]+)\((?P<args>.*)\):$')
match = func_pattern.match('def my_func(arg1, arg2):')
func_name = match.group('name') # my_func
func_args = match.group('args').split(',') # ['arg1', 'arg2']
I am seeking to be able to use variables within the format() parentheses, in order to parameterize it within a function. Providing an example below:
sample_str = 'sample_str_{nvars}'
nvars_test = 'apple'
sample_str.format(nvars = nvars_test) #Successful Result: ''sample_str_apple''
But the following does not work -
sample_str = 'sample_str_{nvars}'
nvars_test_2 = 'nvars = apple'
sample_str.format(nvars_test_2) # KeyError: 'nvars'
Would anyone know how to do this? Thanks.
Many thanks for guidance. I did a bit more searching. For anyone who may run into the same problem, please see examples here: https://pyformat.info
sample_str = 'sample_str_{nvars}'
nvars_test_2 = {'nvars':'apple'}
sample_str.format(**nvars_test_2) #Successful Result: ''sample_str_apple''
First, I'd recommend checking out the string format examples.
Your first example works as expected. From the documentation, you are permitted to actually name the thing you are passing into {}, and then pass in a same-named variable for str.format():
'Coordinates: {latitude}, {longitude}'.format(latitude='37.24N', longitude='-115.81W')
# returns 'Coordinates: 37.24N, -115.81W'
Your second example doesn't work because you are not passing a variable called nvars in with str.format() - you are passing in a string: 'nvars = apple'.
sample_str = 'sample_str_{nvars}'
nvars_test_2 = 'nvars = apple'
sample_str.format(nvars_test_2) # KeyError: 'nvars'
It's a little more common (I think) to not name those curly-braced parameters - easier to read at least.
print('sample_str_{}'.format("apple")) should return 'sample_str_apple'.
If you're using Python 3.6 you also have access to Python's formatted string literals.
>>> greeting = 'hello'
>>> name = 'Jane'
>>> f'{greeting} {name}'
'hello Jane'
Note that the literal expects the variables to be already present. Otherwise you get an error.
>>> f'the time is now {time}'
NameError: name 'time' is not defined
In Python, I'd like to test for the existence of a keyword in the output of a Linux command. The keywords to test for would be passed as a list as shown below. I've not spent a lot of time with Python so brute-force approach is below. Is there a cleaner way to write this?
def test_result (result, mykeys):
hit = 0
for keyword in mykeys:
if keyword in result:
hit = 1
print "found a match for " + keyword
if hit == 1:
return True
result = "grep says awk"
mykeys = ['sed', 'foo', 'awk']
result = test_result (result, mykeys)
The any built-in will do it.
def test_result(result, mykeys):
return any(key in result for key in mykeys)
You can use a regular expression to accomplish this. A regular expression of the form a|b|c matches any of a, b or c. So, you'd want something of the form:
import re
p = re.compile('|'.join(mykeys))
return bool(p.search(result))
p.search(result) searches the entire string for a match of the regular expression; it returns a match (which is truth-y) if present and returns None (which is false-y) otherwise. Converting the result to bool gives True if it matches and False otherwise.
Putting this together, you'd have:
import re
def test_result(result, mykeys):
p = re.compile('|'.join(mykeys))
return bool(p.search(result))
You can also make this more concise by not pre-compiling the regular expression; this should be fine if it's a one-time use:
def test_result(result, mykeys):
return bool(re.search('|'.join(mykeys), result))
For reference, read about Python's re library.
Your function does two things, printing and returning the result. You could break them up like so:
def test_result(result, mykeys):
return [k in result for k in mykeys]
def print_results(results):
for result in results:
print("found a match for " + result)
test_result will return a list with all the found keys, or an empty list. The empty list is falsey, so you can use it for whatever tests you want. The print_results is only needed if you actually want to print, otherwise you can use the result in some other function.
If you only want to check for the presence and don't care about which key you found, you can do something like:
def test_result(result, my_keys):
return any(map(lambda k: k in result, mykeys))
If you're using python3 (as you should be), I believe this will be lazy and only evaluate as much of the list as necessary.
See A more concise way to write this Python code for a more concise version of this last function.
To search for an element in a list, you can use a for-else statement. In particular, this allows to return the found element.
def test_result (result, mykeys):
for keyword in mykeys:
if keyword in result: break
else:
return None
return keyword
print(test_result("grep says awk", ['sed', 'foo', 'awk'])) # 'awk'
print(test_result("grep says awk", ['bar', 'foo'])) # None
I have two similar codes that need to be parsed and I'm not sure of the most pythonic way to accomplish this.
Suppose I have two similar "codes"
secret_code_1 = 'asdf|qwer-sdfg-wert$$otherthing'
secret_code_2 = 'qwersdfg-qw|er$$otherthing'
both codes end with $$otherthing and contain a number of values separated by -
At first I thought of using functools.wrap to separate some of the common logic from the logic specific to each type of code, something like this:
from functools import wraps
def parse_secret(f):
#wraps(f)
def wrapper(code, *args):
_code = code.split('$$')[0]
return f(code, *_code.split('-'))
return wrapper
#parse_secret
def parse_code_1b(code, a, b, c):
a = a.split('|')[0]
return (a,b,c)
#parse_secret
def parse_code_2b(code, a, b):
b = b.split('|')[1]
return (a,b)
However doing it this way makes it kind of confusing what parameters you should actually pass to the parse_code_* functions i.e.
parse_code_1b(secret_code_1)
parse_code_2b(secret_code_2)
So to keep the formal parameters of the function easier to reason about I changed the logic to something like this:
def _parse_secret(parse_func, code):
_code = code.split('$$')[0]
return parse_func(code, *_code.split('-'))
def _parse_code_1(code, a, b, c):
"""
a, b, and c are descriptive parameters that explain
the different components in the secret code
returns a tuple of the decoded parts
"""
a = a.split('|')[0]
return (a,b,c)
def _parse_code_2(code, a, b):
"""
a and b are descriptive parameters that explain
the different components in the secret code
returns a tuple of the decoded parts
"""
b = b.split('|')[1]
return (a,b)
def parse_code_1(code):
return _parse_secret(_parse_code_1, code)
def parse_code_2(code):
return _parse_secret(_parse_code_2, code)
Now it's easier to reason about what you pass to the functions:
parse_code_1(secret_code_1)
parse_code_2(secret_code_2)
However this code is significantly more verbose.
Is there a better way to do this? Would an object-oriented approach with classes make more sense here?
repl.it example
repl.it example
Functional approaches are more concise and make more sense.
We can start from expressing concepts in pure functions, the form that is easiest to compose.
Strip $$otherthing and split values:
parse_secret = lambda code: code.split('$$')[0].split('-')
Take one of inner values:
take = lambda value, index: value.split('|')[index]
Replace one of the values with its inner value:
parse_code = lambda values, p, q: \
[take(v, q) if p == i else v for (i, v) in enumerate(values)]
These 2 types of codes have 3 differences:
Number of values
Position to parse "inner" values
Position of "inner" values to take
And we can compose parse functions by describing these differences. Split values are keep packed so that things are easier to compose.
compose = lambda length, p, q: \
lambda code: parse_code(parse_secret(code)[:length], p, q)
parse_code_1 = compose(3, 0, 0)
parse_code_2 = compose(2, 1, 1)
And use composed functions:
secret_code_1 = 'asdf|qwer-sdfg-wert$$otherthing'
secret_code_2 = 'qwersdfg-qw|er$$otherthing'
results = [parse_code_1(secret_code_1), parse_code_2(secret_code_2)]
print(results)
I believe something like this could work:
secret_codes = ['asdf|qwer-sdfg-wert$$otherthing', 'qwersdfg-qw|er$$otherthing']
def parse_code(code):
_code = code.split('$$')
if '-' in _code[0]:
return _parse_secrets(_code[1], *_code[0].split('-'))
return _parse_secrets(_code[0], *_code[1].split('-'))
def _parse_secrets(code, a, b, c=None):
"""
a, b, and c are descriptive parameters that explain
the different components in the secret code
returns a tuple of the decoded parts
"""
if c is not None:
return a.split('|')[0], b, c
return a, b.split('|')[1]
for secret_code in secret_codes:
print(parse_code(secret_code))
Output:
('asdf', 'sdfg', 'wert')
('qwersdfg', 'er')
I'm not sure about your secret data structure but if you used the index of the position of elements with data that has | in it and had an appropriate number of secret data you could also do something like this and have an infinite(well almost) amount of secrets potentially:
def _parse_secrets(code, *data):
"""
data is descriptive parameters that explain
the different components in the secret code
returns a tuple of the decoded parts
"""
i = 0
decoded_secrets = []
for secret in data:
if '|' in secret:
decoded_secrets.append(secret.split('|')[i])
else:
decoded_secrets.append(secret)
i += 1
return tuple(decoded_secrets)
I'm really not sure what exactly you mean. But I came with idea which might be what you are looking for.
What about using a simple function like this:
def split_secret_code(code):
return [code] + code[:code.find("$$")].split("-")
And than just use:
parse_code_1(*split_secret_code(secret_code_1))
I'm not sure exactly what constraints you're working with, but it looks like:
There are different types of codes with different rules
The number of dash separated args can vary
Which arg has a pipe can vary
Straightforward Example
This is not too hard to solve, and you don't need fancy wrappers, so I would just drop them because it adds reading complexity.
def pre_parse(code):
dash_code, otherthing = code.split('$$')
return dash_code.split('-')
def parse_type_1(code):
dash_args = pre_parse(code)
dash_args[0], toss = dash_args[0].split('|')
return dash_args
def parse_type_2(code):
dash_args = pre_parse(code)
toss, dash_args[1] = dash_args[1].split('|')
return dash_args
# Example call
parse_type_1(secret_code_1)
Trying to answer question as stated
You can supply arguments in this way by using python's native decorator pattern combined with *, which rolls/unrolls positional arguments into a tuple, so you don't need to know exactly how many there are.
def dash_args(code):
dash_code, otherthing = code.split('$$')
return dash_code.split('-')
def pre_parse(f):
def wrapper(code):
# HERE is where the outer function, the wrapper,
# supplies arguments to the inner function.
return f(code, *dash_args(code))
return wrapper
#pre_parse
def parse_type_1(code, *args):
new_args = list(args)
new_args[0], toss = args[0].split('|')
return new_args
#pre_parse
def parse_type_2(code, *args):
new_args = list(args)
toss, new_args[1] = args[1].split('|')
return new_args
# Example call:
parse_type_1(secret_code_1)
More Extendable Example
If for some reason you needed to support many variations on this kind of parsing, you could use a simple OOP setup, like
class BaseParser(object):
def get_dash_args(self, code):
dash_code, otherthing = code.split('$$')
return dash_code.split('-')
class PipeParser(BaseParser):
def __init__(self, arg_index, split_index):
self.arg_index = arg_index
self.split_index = split_index
def parse(self, code):
args = self.get_dash_args(code)
pipe_arg = args[self.arg_index]
args[self.arg_index] = pipe_arg.split('|')[self.split_index]
return args
# Example call
pipe_parser_1 = PipeParser(0, 0)
pipe_parser_1.parse(secret_code_1)
pipe_parser_2 = PipeParser(1, 1)
pipe_parser_2.parse(secret_code_2)
My suggestion attempts the following:
to be non-verbose enough
to separate common and specific logic in a clear way
to be sufficiently extensible
Basically, it separates common and specific logic into different functions (you could do the same using OOP). The thing is that it uses a mapper variable that contains the logic to select a specific parser, according to each code's content. Here it goes:
def parse_common(code):
"""
Provides common parsing logic.
"""
encoded_components = code.split('$$')[0].split('-')
return encoded_components
def parse_code_1(code, components):
"""
Specific parsing for type-1 codes.
"""
components[0] = components[0].split('|')[0] # decoding some type-1 component
return tuple([c for c in components])
def parse_code_2(code, components):
"""
Specific parsing for type-2 codes.
"""
components[1] = components[1].split('|')[1] # decoding some type-2 component
return tuple([c for c in components])
def parse_code_3(code, components):
"""
Specific parsing for type-3 codes.
"""
components[2] = components[2].split('||')[0] # decoding some type-3 component
return tuple([c for c in components])
# ... and so on, if more codes need to be added ...
# Maps specific parser, according to the number of components
CODE_PARSER_SELECTOR = [
(3, parse_code_1),
(2, parse_code_2),
(4, parse_code_3)
]
def parse_code(code):
# executes common parsing
components = parse_common(code)
# selects specific parser
parser_info = [s for s in CODE_PARSER_SELECTOR if len(components) == s[0]]
if parser_info is not None and len(parser_info) > 0:
parse_func = parser_info[0][1]
return parse_func(code, components)
else:
raise RuntimeError('No parser found for code: %s' % code)
secret_codes = [
'asdf|qwer-sdfg-wert$$otherthing', # type 1
'qwersdfg-qw|er$$otherthing', # type 2
'qwersdfg-hjkl-yui||poiuy-rtyu$$otherthing' # type 3
]
print [parse_code(c) for c in secret_codes]
Are you married to the string parsing? If you are passing variables with values and are in no need for variable names you can "pack" them into integer.
If you are working with cryptography you can formulate a long hexadecimal number of characters and then pass it as int with "stop" bytes (0000 for example since "0" is actually 48 try: chr(48) ) and if you are married to a string I would suggest a lower character byte identifier for example ( 1 -> aka try: chr(1) ) so you can scan the integer and bit shift it by 8 to get bytes with 8 bit mask ( this would look like (secret_code>>8)&0xf.
Hashing works in similar manner since one variable with somename and somevalue, somename and somevalue can be parsed as integer and then joined with stop module, then retrieved when needed.
Let me give you an example for hashing
# lets say
a = 1
# of sort hashing would be
hash = ord('a')+(0b00<<8)+(1<<16)
#where a hashed would be 65633 in integer value on 64 bit computer
# and then you just need to find a 0b00 aka separator
if you want to use only variables ( names don't matter ) then you need to hash only variable value so the size of parsed value is a lot smaller ( not name part and no need for separator (0b00) and you can use separator cleverly to divide necessary data one fold (0b00) twofolds (0b00, 0b00<<8) etc.
a = 1
hash = a<<8 #if you want to shift it 1 byte
But if you want to hide it and you need cryptography example, you can do the above methods and then scramble, shift ( a->b ) or just convert it to another type later. You just need to figure out the order of operations you are doing. Since a-STOP-b-PASS-c is not equal to a-PASS-b-STOP-c.
You can find bitwise operators here binary operators
But have in mind that 65 is number and 65 is a character as well it only matters where are those bytes sent, if they are sent to graphics card they are pixels, if they are sent to audiocard they are sounds and if they are sent to mathematical processing they are numbers, and as programmers that is our playground.
But if this is not answering your problem, you can always use map.
def mapProcces(proccesList,listToMap):
currentProcces = proccesList.pop(0)
listToMap = map( currentProcces, listToMap )
if proccesList != []:
return mapProcces( proccesList, listToMap )
else:
return list( listToMap )
then you could map it:
mapProcces([str.lower,str.upper,str.title],"stackowerflow")
or you can simply replace every definitive separator with space and then split space.
secret_code_1 = 'asdf|qwer-sdfg-wert$$otherthing'
separ = "|,-,$".split(",")
secret_code_1 = [x if x not in separ else " " for x in secret_code_1]# replaces separators with empty chars
secret_code_1 = "".join(secret_code_1) #coverts list to a string
secret_code_1 = secret_code_1.split(" ") #it splited them to list
secret_code_1 = filter(None,secret_code_1) # filter empty chars ''
first,second,third,fourth,other = secret_code_1
And there you have it, your secret_code_1 is split and assigned to definitive amount of variables. Of course " " is used as declaration, you can use whatever you want, you can replace every separator with "someseparator" if you want and then split with "someseparator". You can also use str.replace function to make it clearer.
I hope this helps
I think you need to provide more information of exactly what you're trying to achieve, and what the clear constraints are. For instance, how many times can $$ occur? Will there always be a | dividor? That kind of thing.
To answer your question broadly, an elegant pythonic way to do this is to use python's unpacking feature, combined with split. for example
secret_code_1 = 'asdf|qwer-sdfg-wert$$otherthing'
first_$$_part, last_$$_part = secret_code_1.split('$$')
By using this technique, in addition to simple if blocks, you should be able to write an elegant parser.
If I understand it correctly, you want to be able to define your functions as if the parsed arguments are passed, but want to pass the unparsed code to the functions instead.
You can do that very similarly to the first solution you presented.
from functools import wraps
def parse_secret(f):
#wraps(f)
def wrapper(code):
args = code.split('$$')[0].split('-')
return f(*args)
return wrapper
#parse_secret
def parse_code_1(a, b, c):
a = a.split('|')[0]
return (a,b,c)
#parse_secret
def parse_code_2(a, b):
b = b.split('|')[1]
return (a,b)
For the secret codes mentioned in the examples,
secret_code_1 = 'asdf|qwer-sdfg-wert$$otherthing'
print (parse_code_1(secret_code_1))
>> ('asdf', 'sdfg', 'wert')
secret_code_2 = 'qwersdfg-qw|er$$otherthing'
print (parse_code_2(secret_code_2))
>> ('qwersdfg', 'er')
I haven't understood anything of your question, neither your code, but maybe a simple way to do it is by regular expression?
import re
secret_code_1 = 'asdf|qwer-sdfg-wert$$otherthing'
secret_code_2 = 'qwersdfg-qw|er$$otherthing'
def parse_code(code):
regex = re.search('([\w-]+)\|([\w-]+)\$\$([\w]+)', code) # regular expression
return regex.group(3), regex.group(1).split("-"), regex.group(2).split("-")
otherthing, first_group, second_group = parse_code(secret_code_2)
print(otherthing) # otherthing, string
print(first_group) # first group, list
print(second_group) # second group, list
The output:
otherthing
['qwersdfg', 'qw']
['er']
Is there a way to know, during run-time, a variable's name (from the code)?
Or do variable's names forgotten during compilation (byte-code or not)?
e.g.:
>>> vari = 15
>>> print vari.~~name~~()
'vari'
Note: I'm talking about plain data-type variables (int, str, list etc.)
Variable names don't get forgotten, you can access variables (and look which variables you have) by introspection, e.g.
>>> i = 1
>>> locals()["i"]
1
However, because there are no pointers in Python, there's no way to reference a variable without actually writing its name. So if you wanted to print a variable name and its value, you could go via locals() or a similar function. ([i] becomes [1] and there's no way to retrieve the information that the 1 actually came from i.)
Variable names persist in the compiled code (that's how e.g. the dir built-in can work), but the mapping that's there goes from name to value, not vice versa. So if there are several variables all worth, for example, 23, there's no way to tell them from each other base only on the value 23 .
Here is a function I use to print the value of variables, it works for local as well as globals:
import sys
def print_var(var_name):
calling_frame = sys._getframe().f_back
var_val = calling_frame.f_locals.get(var_name, calling_frame.f_globals.get(var_name, None))
print (var_name+':', str(var_val))
So the following code:
global_var = 123
def some_func():
local_var = 456
print_var("global_var")
print_var("local_var")
print_var("some_func")
some_func()
produces:
global_var: 123
local_var: 456
some_func: <function some_func at 0x10065b488>
here a basic (maybe weird) function that shows the name of its argument...
the idea is to analyze code and search for the calls to the function (added in the init method it could help to find the instance name, although with a more complex code analysis)
def display(var):
import inspect, re
callingframe = inspect.currentframe().f_back
cntext = "".join(inspect.getframeinfo(callingframe, 5)[3]) #gets 5 lines
m = re.search("display\s+\(\s+(\w+)\s+\)", cntext, re.MULTILINE)
print m.group(1), type(var), var
please note:
getting multiple lines from the calling code helps in case the call was split as in the below example:
display(
my_var
)
but will produce unexpected result on this:
display(first_var)
display(second_var)
If you don't have control on the format of your project you can still improve the code to detect and manage different situations...
Overall I guess a static code analysis could produce a more reliable result, but I'm too lazy to check it now
This will work for simple data types (str, int, float, list etc.)
def my_print(var_str) :
print var_str+':', globals()[var_str]
You can do it, it's just not pretty.
import inspect, sys
def addVarToDict(d, variable):
lineNumber = inspect.currentframe().f_back.f_lineno
with open(sys.argv[0]) as f:
lines = f.read().split("\n")
line = lines[lineNumber-1]
varName = line.split("addVarToDict")[1].split("(")[1].split(",")[1].split(")")[0].strip()
d[varName] = variable
d = {}
a=1
print d # {}
addVarToDict(d,a)
print d # {'a': 1}
I tried the following link from the post above with no success:
Googling returned this one.
http://pythonic.pocoo.org/2009/5/30/finding-objects-names
Just yesterday I saw a blog post with working code that does just this. Here's the link:
http://pyside.blogspot.com/2009/05/finding-objects-names.html
Nice easy solution using f-string formatting, which is native to Python 3.6 and later:
vari = 15
vari_name = f"{vari=}".split("=")[0]
print(vari_name)
Produces:
vari