Python: splitting a function and arguments - python

Here are some simple function calls in python:
foo(arg1, arg2, arg3)
func1()
Assume it is a valid function call.
Suppose I read these lines while parsing a file.
What is the cleanest way to separate the function name and the args into a list with two elements, the first a string for the function name, and the second a string for the arguments?
Desired results:
["foo", "arg, arg2, arg3"]
["func1", ""]
I'm currently using string searches to find the first instance of "(" from the left side and the first instance of ")" from the right side and just splicing the string with those given indices, but I don't like how I am approaching the problem.

I'm currently doing something similar using regular expressions. Adapting my code to your case, the following works with the examples you provide.
import re
def explode(s):
pattern = r'(\w[\w\d_]*)\((.*)\)$'
match = re.match(pattern, s)
if match:
return list(match.groups())
else:
return []

If you're parsing a Python file in Python, consider using Python's parser: ast (specifically the ast.parse() call).
That said, your current approach isn't terrible (though it will break on function calls that spam multiple lines). There are few completely correct approaches short of the aforementioned full parser - for instance, you could count matching parens, so that a((b,c)) would return the correct value even if there was a line break in the middle - but then that code would probably do the wrong thing when faced with a((b, "c)")), and so on.

Related

Python formatted string using list range

I have been researching a lot of questions about It on stackOverflow and google. But none of then solved my problem.
Let's say, I have an irregular length string (like a phone number or specific and territorial documents) for example:
docA="A123B456"
docB="A123B456CD"
docC="6CD"
I'm writing a function to print documents. But they don't have a definite pattern, my approach was to use a default variable using the most common pattern and give the responsibility of corner cases to the programmer.
e.g:
def printDoc(text, pattern="{}{}{}{}#{}{}{}-{}")
print(pattern.format(*text))
But It would be much more clean and explicit if there's a way to simplify the pattern like
def printDoc(text, pattern="{0:3}#{4:-1}-{-1}")
print(pattern.format(*text))
Then I could use It like:
printDoc(docA)
printDoc(docB)
printDoc(docC, "{0:1}-{2}")
But It's not a valid syntax. Is there a way of doing this properly?
If my approach is wrong, is there a better way of doing this?
You could use regular expression to parse the indexes/slices from the format string and use those to index given text. You'd also have to remove the indeces from format string before using it with str.format. The only tricky part is actually getting format parameters out from text but if you consider eval acceptable you could do following:
import re
def printDoc(text, pattern="{0:3}#{4:-1}-{-1}"):
params = []
# Find occurrences of '{}' from format string and extract format parameter
for m in re.finditer(r'\{([-:\d]+)\}', pattern):
params.append(eval('text[{}]'.format(m.group(1))))
# Remove indeces from format string
pattern = re.sub(r'\{([-:\d]+)\}', '{}', pattern)
print(pattern.format(*params))
printDoc('A123B456')
Output:
A12#B45-6
Note that using eval is generally considered bad and unsafe practice. Although the potential risks are limited here because of restricted character set given to eval you might want to consider other alternatives unless you're the one who controls the format strings.

Python pandas: use of DataFrame.replace function with a function as a value

Using Python pandas, I have been attempting to use a function, as one of a few replacement values for a pandas.DataFrame (i.e. one of the replacements should itself be the result of a function call). My understanding is that pandas.DataFrame.replace delegates internally to re.sub and that anything that works with it should also work with pandas.DataFrame.replace, provided that the regex parameter is set to True.
Accordingly, I followed the guidance provided elsewhere on stackoverflow, but pertaining to re.sub, and attempted to apply it to pandas.DataFrame.replace (using replace with regex=True, inplace=True and with to_replace set as either a nested dictionary, if specifying a specific column, or otherwise as two lists, per its documentation). My code works fine without using a function call, but fails if I try to provide a function as one of the replacement values, despite doing so in the same manner as re.sub (which was tested, and worked correctly). I realize that the function is expected to accept a match object as its only required parameter and return a string.
Instead of the resultant DataFrame having the result of the function call, it contains the function itself (i.e. as a first-class, unparameterized, object).
Why is this occurring and how can I get this to work correctly (return and store the function's result)? If this is not possible, I would appreciate if a viable and "Pandasonic" alternative could be suggested.
I provide an example of this below:
def fn(match):
id = match.group(1)
result = None
with open(file_name, 'r') as file:
for line in file:
if 'string' in line:
result = line.split()[-1]
return (result or id)
data.replace(to_replace={'col1': {'string': fn}},
regex=True, inplace=True)
The above does not work, in that it replaces the right search string, but replaces it with:
<function fn at 0x3ad4398>
For the above (contrived) example, the expected output would be that all values of "string" in col1 are substituted for the string returned from fn.
However, import re; print(re.sub('string', fn, 'test string')), works as expected (and as previously depicted).
My current solution (which seems sub-optimal and ad hoc to me) is as follows (ellipses indicate irrelevant additional code, which has been omitted; specific data used are contrived):
def _fn(match):
...
return ...
def _multiple_replace(text, repl_dictionary):
"""Adapted from: http://stackoverflow.com/a/15175239
Returns the result for the first regex that matches
the provided text."""
for pattern in repl_dictionary.keys():
regex = re.compile(pattern)
res, num_subs = regex.subn(repl_dictionary[pattern], text)
if num_subs > 0:
break
return res
repl_dict = {'ABC.*(\w\w\w)': _fn, 'XYZ': 'replacement_string'}
data['col1'] = data['col1'].apply(_multiple_replace,
repl_dictionary=repl_dict)

Regular expression dictionary in python

Is it possible to implement a dictionary with keys as regular expressions and actions (with parameters) as values?
for e.g.
key = "actionname 1 2", value = "method(1, 2)"
key = "differentaction par1 par2", value = "appropriate_method(par1, par2)"
User types in the key, i need to execute the matching method with the parameters provided as part of user input.
It would be great if we can achieve the lookup in O(1) time, even if its not possible atleast i am looking for solutions to solve this problem.
I will be having few hundred regular expressions (say 300) and matching parameterized actions to execute.
I can write a loop to achieve this, but is there any elegant way to do this without using a for loop?
Related question: Hashtable/dictionary/map lookup with regular expressions
Yes, it's perfectly possible:
import re
dict = {}
dict[re.compile('actionname (\d+) (\d+)')] = method
dict[re.compile('differentaction (\w+) (\w+)')] = appropriate_method
def execute_method_for(str):
#Match each regex on the string
matches = (
(regex.match(str), f) for regex, f in dict.iteritems()
)
#Filter out empty matches, and extract groups
matches = (
(match.groups(), f) for match, f in matches if match is not None
)
#Apply all the functions
for args, f in matches:
f(*args)
Of course, the values of your dictionary can be python functions.
Your matching function can try to match your string to each key and execute appropriate function if there is a match. This will be linear in time in the best case, but I don't think you can get anything better if you want to use regular expressions.
But looking at your example data I think you should reconsider whether you need regular expressions at all. Perhaps you can just parse your input string into, e.g. <procedure-name> <parameter>+ and then lookup appropriate procedure by it's name (simple string), that can be O(1)
Unfortunately this is not possible. You will need to iterate over the regular expressions in order to find out if they match. The lookup in the dictionary will be O(1) though (but that doesn't solve your problem).
IMHO, you are asking the WRONG QUESTION.
You ask if there's an elegant way to do this. Answer: The most elegant way is the most OBVIOUS way. Code will be read 10x to 20x as often as it's modified. Therefore, if you write something 'elegant' that's hard to read and quickly understand, you've just sabotaged the guy after you who has to modify it somehow.
BETTER CODE:
Another answer here reads like this:
matches = ( (regex.match(str), f) for regex, f in dict.iteritems() )
This is functionally equivalent (IMPORTANTLY, the same in terms of Python generated bytecode) to:
# IMHO 'regex' var should probably be named 'pattern' since it's type is <sre.SRE_Pattern>
for pattern, func in dictname.items():
if pattern.match(str):
func()
But, the below sample is hugely easier to read and understand at a glance.
I apologize (a little) if you're one of those people who is offended by code that is even slightly more wordy than you think it could be. My criteria, and Guido's as mentioned in PEP-8, is that the clearest code is the best code.

Python: Effective replacing of substring

I have code like this:
def escape_query(query):
special_chars = ['\\','+','-','&&','||','!','(',')','{','}','[',']',
'^','"','~','*','?',':']
for character in special_chars:
query = query.replace(character, '\\%s' % character)
return query
This function should escape all occurrences of every substring (Notice && and ||) in special_characters with backslash.
I think, that my approach is pretty ugly and I couldn't stop wondering if there aren't any better ways to do this. Answers should be limited to standart library.
Using reduce:
def escape_query(query):
special_chars = ['\\','+','-','&&','||','!','(',')','{','}','[',']',
'^','"','~','*','?',':']
return reduce(lambda q, c: q.replace(c, '\\%s' % c), special_chars, query)
The following code has exactly the same principle than the steveha's one.
But I think it fulfills your requirement of clarity and maintainability since the special chars are still listed in the same list as yours.
special_chars = ['\\','+','-','&&','||','!','(',')','{','}','[',']',
'^','"','~','*','?',':']
escaped_special_chars = map(re.escape, special_chars)
special_chars_pattern = '|'.join(escaped_special_chars).join('()')
def escape_query(query, reg = re.compile(special_chars_pattern) ):
return reg.sub(r'\\\1',query)
With this code:
when the function definition is executed, an object is created with a value (the regex re.compile(special_chars_pattern) ) received as default argument, and the name reg is assigned to this object and defined as a parameter for the function.
This happens only one time, at the moment when the function definition is executed, which is performed only one time at compilation time.
That means that during the execution of the compiled code that takes place after the compilation, each time a call to the function will be done, this creation and assignement won't be done again: the regex object already exists and is permanantly registered and avalaible in the tuple func_defaults that is definitive attribute of the function.
That's interesting if several calls to the function are done during execution, because Python has not to search for the regex outside if it was defined outside or to reassign it to parameter reg if it was passed as simple argument.
If I understand your requirements correctly, some of the special "chars" are two-character strings (specifically: "&&" and "||"). The best way to do such an odd collection is with a regular expression. You can use a character class to match anything that is one character long, then use vertical bars to separate some alternative patterns, and these can be multi-character. The trickiest part is the backslash-escaping of chars; for example, to match "||" you need to put r'\|\|' because the vertical bar is special in a regular expression. In a character class, backslash is special and so are '-' and ']'. The code:
import re
_s_pat = r'([\\+\-!(){}[\]^"~*?:]|&&|\|\|)'
_pat = re.compile(_s_pat)
def escape_query(query):
return re.sub(_pat, r'\\\1', query)
I suspect the above is the fastest solution to your problem possible in Python, because it pushes the work down to the regular expression machinery, which is written in C.
If you don't like the regular expression, you can make it easier to look at by using the verbose format, and compile using the re.VERBOSE flag. Then you can sprawl the regular expression across multiple lines, and put comments after any parts you find confusing.
Or, you can build your list of special characters, just like you already did, and run it through this function which will automatically compile a regular expression pattern that matches any alternative in the list. I made sure it will match nothing if the list is empty.
import re
def make_pattern(lst_alternatives):
if lst_alternatives:
temp = '|'.join(re.escape(s) for s in lst_alternatives)
s_pat = '(' + temp + ')'
else:
s_pat = '$^' # a pattern that will never match anything
return re.compile(s_pat)
By the way, I recommend you put the string and the pre-compiled pattern outside the function, as I showed above. In your code, Python will run code on each function invocation to build the list and bind it to the name special_chars.
If you want to not put anything but the function into the namespace, here's a way to do it without any run-time overhead:
import re
def escape_query(query):
return re.sub(escape_query.pat, r'\\\1', query)
escape_query.pat = re.compile(r'([\\+\-!(){}[\]^"~*?:]|&&|\|\|)')
The above uses the function's name to look up the attribute, which won't work if you rebind the function's name later. There is a discussion of this and a good solution here: how can python function access its own attributes?
(Note: The above paragraph replaces some stuff including a question that was discussed in the discussion comments below.)
Actually, upon further thought, I think this is cleaner and more Pythonic:
import re
_pat = re.compile(r'([\\+\-!(){}[\]^"~*?:]|&&|\|\|)')
def escape_query(query, pat=_pat):
return re.sub(pat, r'\\\1', query)
del(_pat) # not required but you can do it
At the time escape_query() is compiled, the object bound to the name _pat will be bound to a name inside the function's name space (that name is pat). Then you can call del() to unbind the name _pat if you like. This nicely encapsulates the pattern inside the function, does not depend at all on the function's name, and allows you to pass in an alternate pattern if you wish.
P.S. If your special characters were always a single character long, I would use the code below:
_special = set(['[', ']', '\\', '+']) # add other characters as desired, but only single chars
def escape_query(query):
return ''.join('\\' + ch if (ch in _special) else ch for ch in query)
Not sure if this is any better but it works and probably faster.
def escape_query(query):
special_chars = ['\\','+','-','&&','||','!','(',')','{','}','[',']', '^','"','~','*','?',':']
query = "".join(map(lambda x: "\\%s" % x if x in special_chars else x, query))
for sc in filter(lambda x: len(x) > 1, special_chars):
query = query.replace(sc, "\%s" % sc)
return query

regular expression help with converting exp1^exp2 to pow(exp1, exp2)

I am converting some matlab code to C, currently I have some lines that have powers using the ^, which is rather easy to do with something along the lines \(?(\w*)\)?\^\(?(\w*)\)?
works fine for converting (glambda)^(galpha),using the sub routine in python pattern.sub(pow(\g<1>,\g<2>),'(glambda)^(galpha)')
My problem comes with nested parenthesis
So I have a string like:
glambdastar^(1-(1-gphi)*galpha)*(glambdaq)^(-(1-gphi)*galpha);
And I can not figure out how to convert that line to:
pow(glambdastar,(1-(1-gphi)*galpha))*pow(glambdaq,-(1-gphi)*galpha));
Unfortunately, regular expressions aren't the right tool for handling nested structures. There are some regular expressions engines (such as .NET) which have some support for recursion, but most — including the Python engine — do not, and can only handle as many levels of nesting as you build into the expression (which gets ugly fast).
What you really need for this is a simple parser. For example, iterate over the string counting parentheses and storing their locations in a list. When you find a ^ character, put the most recently closed parenthesis group into a "left" variable, then watch the group formed by the next opening parenthesis. When it closes, use it as the "right" value and print the pow(left, right) expression.
I think you can use recursion here.
Once you figure out the Left and Right parts, pass each of those to your function again.
The base case would be that no ^ operator is found, so you will not need to add the pow() function to your result string.
The function will return a string with all the correct pow()'s in place.
I'll come up with an example of this if you want.
Nested parenthesis cannot be described by a regexp and require a full parser (able to understand a grammar, which is something more powerful than a regexp). I do not think there is a solution.
See recent discussion function-parser-with-regex-in-python (one of many similar discussions). Then follow the suggestion to pyparsing.
An alternative would be to iterate until all ^ have been exhausted. no?.
Ruby code:
# assuming str contains the string of data with the expressions you wish to convert
while str.include?('^')
str!.gsub!(/(\w+)\^(\w+)/, 'pow(\1,\2)')
end

Categories