I have a small python script which i use everyday......it basically reads a file and for each line i basically apply different string functions like strip(), replace() etc....im constanstly editing the file and commenting to change the functions. Depending on the file I'm dealing with, I use different functions. For example I got a file where for each line, i need to use line.replace(' ','') and line.strip()...
What's the best way to make all of these as part of my script? So I can just say assign numbers to each functions and just say apply function 1 and 4 for each line.
First of all, many string functions – including strip and replace – are deprecated. The following answer uses string methods instead. (Instead of string.strip(" Hello "), I use the equivalent of " Hello ".strip().)
Here's some code that will simplify the job for you. The following code assumes that whatever methods you call on your string, that method will return another string.
class O(object):
c = str.capitalize
r = str.replace
s = str.strip
def process_line(line, *ops):
i = iter(ops)
while True:
try:
op = i.next()
args = i.next()
except StopIteration:
break
line = op(line, *args)
return line
The O class exists so that your highly abbreviated method names don't pollute your namespace. When you want to add more string methods, you add them to O in the same format as those given.
The process_line function is where all the interesting things happen. First, here is a description of the argument format:
The first argument is the string to be processed.
The remaining arguments must be given in pairs.
The first argument of the pair is a string method. Use the shortened method names here.
The second argument of the pair is a list representing the arguments to that particular string method.
The process_line function returns the string that emerges after all these operations have performed.
Here is some example code showing how you would use the above code in your own scripts. I've separated the arguments of process_line across multiple lines to show the grouping of the arguments. Of course, if you're just hacking away and using this code in day-to-day scripts, you can compress all the arguments onto one line; this actually makes it a little easier to read.
f = open("parrot_sketch.txt")
for line in f:
p = process_line(
line,
O.r, ["He's resting...", "This is an ex-parrot!"],
O.c, [],
O.s, []
)
print p
Of course, if you very specifically wanted to use numerals, you could name your functions O.f1, O.f2, O.f3… but I'm assuming that wasn't the spirit of your question.
If you insist on numbers, you can't do much better than a dict (as gimel suggests) or list of functions (with indices zero and up). With names, though, you don't necessarily need an auxiliary data structure (such as gimel's suggested dict), since you can simply use getattr to retrieve the method to call from the object itself or its type. E.g.:
def all_lines(somefile, methods):
"""Apply a sequence of methods to all lines of some file and yield the results.
Args:
somefile: an open file or other iterable yielding lines
methods: a string that's a whitespace-separated sequence of method names.
(note that the methods must be callable without arguments beyond the
str to which they're being applied)
"""
tobecalled = [getattr(str, name) for name in methods.split()]
for line in somefile:
for tocall in tobecalled: line = tocall(line)
yield line
It is possible to map string operations to numbers:
>>> import string
>>> ops = {1:string.split, 2:string.replace}
>>> my = "a,b,c"
>>> ops[1](",", my)
[',']
>>> ops[1](my, ",")
['a', 'b', 'c']
>>> ops[2](my, ",", "-")
'a-b-c'
>>>
But maybe string descriptions of the operations will be more readable.
>>> ops2={"split":string.split, "replace":string.replace}
>>> ops2["split"](my, ",")
['a', 'b', 'c']
>>>
Note:
Instead of using the string module, you can use the str type for the same effect.
>>> ops={1:str.split, 2:str.replace}
To map names (or numbers) to different string operations, I'd do something like
OPERATIONS = dict(
strip = str.strip,
lower = str.lower,
removespaces = lambda s: s.replace(' ', ''),
maketitle = lamdba s: s.title().center(80, '-'),
# etc
)
def process(myfile, ops):
for line in myfile:
for op in ops:
line = OPERATIONS[op](line)
yield line
which you use like this
for line in process(afile, ['strip', 'removespaces']):
...
Related
Problem Statement
I would like to apply a list of functions fs = [ f, g, h ] sequentially to a string text=' abCdEf '
Something like f( g( h( text) ) ).
This could easily be accomplished with the following code:
# initial text
text = ' abCDef '
# list of functions to apply sequentially
fs = [str.rstrip, str.lstrip, str.lower]
for f in fs:
text = f(text)
# expected result is 'abcdef' with spaces stripped, and all lowercase
print(text)
Using functools.reduce
It seems that functools.reduce should do the job here, since it "consumes" the list of functions at each iteration.
from functools import reduce
# I know `reduce` requires two arguments, but I don't even know
# which one to chose as text of function from the list
reduce(f(text), fs)
# first interaction should call
y = str.rstrip(' abCDef ') --> ' abCDef'
# next iterations fails, because tries to call ' abCDef'() -- as a function
Unfortunately, this code doesn't work, since each iteration returns a string istead of a function, and fails with TypeError : 'str' object is not callable.
QUESTION: Is there any solution using map, reduce or list comprehension to this problem?
reduce can take three arguments:
reduce(function, iterable, initializer)
What are these three arguments in general?
function is a function of two arguments. Let's call these two arguments t and f.
the first argument, t, will start as initializer; then will continue as the return value of the previous call of function.
the second argument, f, is taken from iterable.
What are these three arguments in our case?
the iterable is your list of function;
the second argument f is going to be one of the functions;
the first argument t must be the text;
the initializer must be the initial text;
the return of function must be the resulting text;
function(t, f) must be f(t).
Finally:
from functools import reduce
# initial text
text = ' abCDef '
# list of functions to apply sequentially
fs = [str.rstrip, str.lstrip, str.lower]
result = reduce(lambda t,f: f(t), fs, text)
print(repr(result))
# 'abcdef'
Here's an alternative solution, which allows you to compose any number of functions and save the composed function for reuse:
import functools as ft
def compose(*funcs):
return ft.reduce(lambda f, g: lambda x: f(g(x)), funcs)
Usage:
In [4]: strip_and_lower = compose(str.rstrip, str.lstrip, str.lower)
In [5]: strip_and_lower(' abCDef ')
Out[5]: 'abcdef'
In [6]: strip_and_lower(" AJWEGIAJWGIAWJWGIWAJ ")
Out[6]: 'ajwegiajwgiawjwgiwaj'
In [7]: strip_lower_title = compose(str.title, str.lower, str.strip)
In [8]: strip_lower_title(" hello world ")
Out[8]: 'Hello World'
Note that the order of functions matters; this works just like mathematical function composition, i.e., (f . g . h)(x) = f(g(h(x)) so the functions are applied from right to left.
You can try this:
import functools
text = ' abCDef '
fs = [str.rstrip, str.lstrip, str.lower]
text = functools.reduce(lambda store, func: func(store), fs, text)
print(text)
I think you have misunderstood how reduce works. Reduce reduces an iterable into a single value. The callback function can take two arguments, a store and a element.
The reduce function first creates a store variable. Then, looping through the iterable, it calls the function with the store variable and the current element, and updating the store to the returned value. Finally, the function returns the store value. The final argument is what the store variable starts with.
So in the snippet, it loops through the function array, and calls the respective function on it. The lambda will then return the processed value, updating the store.
Since you also asked for a map solution, here is one. My values contains single-element iterables, values[0] has the original value and values[i] has the value after applying the first i functions.
text = ' abCDef '
fs = [str.rstrip, str.lstrip, str.lower]
values = [[text]]
values += map(map, fs, values)
result = next(values[-1])
print(repr(result)) # prints 'abcdef'
But I wouldn't recommend this. I was mostly curious whether I can do it. And now I'll try to think of how to avoid building that auxiliary list.
For a project, I am trying to read through a python file and keep a list of all the variable being used within a certain function. I am reading through the lines in the python file in string format and then focusing on a line where starting with "def". For the purpose of this example pretend we have the following line identified:
def func(int_var:int,float_var=12.1,string_var=foo()):
I want to use regex or any other method to grab the values within this function declaration.
I want to grab the string "int_var:int,float_var=12.1,string_var=foo()", and later split it based on the commas to get ["int_var:int","float_var=12.1","string_var=foo()"]
I am having a lot of trouble being able to isolate the items between the parenthesis corresponding to 'func'.
Any help creating a regex pattern would be greatly appreciated!
Instead of regex, it is much easier and far more robust to use the ast module:
import ast
s = """
def func(int_var:int,float_var=12.1,string_var=foo()):
pass
"""
def form_sig(sig):
a = sig.args
d = [f'{ast.unparse(a.pop())}={ast.unparse(j)}' for j in sig.defaults[::-1]][::-1]
v_arg = [] if sig.vararg is None else [f'*{sig.vararg.arg}']
kwarg = [] if sig.vararg is None else [f'*{sig.kwark.arg}']
return [*map(ast.unparse, a), *d, *v_arg, *kwarg]
f = [{'name':i.name, 'sig':form_sig(i.args)} for i in ast.walk(ast.parse(s))
if isinstance(i, ast.FunctionDef)]
Output:
[{'name': 'func', 'sig': ['int_var: int', 'float_var=12.1', 'string_var=foo()']}]
func_pattern = re.compile(r'^\s*def\s(?P<name>[A-z_][A-z0-9_]+)\((?P<args>.*)\):$')
match = func_pattern.match('def my_func(arg1, arg2):')
func_name = match.group('name') # my_func
func_args = match.group('args').split(',') # ['arg1', 'arg2']
def mysum(L):
return 0 if not L else L[0] + mysum(L[1:])
def mysum(L):
return L[0] if len(L) == 1 else L[0] + mysum(L[1:])
def mysum(L):
first, *rest = L
return first if not rest else first + mysum(rest)
The latter two also work on a single string argument e.g mysum('spam') because strings are sequences of one-character strings.
The third variant works on arbitrary iterables, including open input files mysum(open(name)), but the others do not because they use index.
The function header def mysum(first *rest), although similar to the third variant, because it expects individual arguments not a single iterable.
The author seems to be implying that the variant with (first, *rest) as the input arguments wouldn't work with files but after experimenting with it, I found that it does work.
# Code I tried:
def mysum(first, *rest):
return first if not rest else first + mysum(*rest)
mysum(*open("script1.py")) works fine.
I think mysum(open("script1.py")) won't work because that what python would then see is first = open("script1.py and rest = [] which means it's gonna give me the <_io.TextIOWrapper name='script1.py' mode='r' encoding='cp1252'> because not [] is true.
The author wants a function that takes an iterable (e.g. a list, tuple, etc) as input and returns the sum, e.g. like this:
mysum(open("script1.py"))
When you write
mysum(*open("script1.py"))
This is roughly equivalent to
f = open("script1.py").readlines()
mysum(f[0], f[1], ..., f[n])
Note that here your code does not take an interable as input, instead it takes several separate arguments which is not what the author wanted.
Using a tuple to explain what happens. The *sequence syntax is used for unpacking.
numbers = (1, 2, 3)
mysum(*numbers) # this happens: mysum(1, 2, 3)
is equivalent to mysum(1, 2, 3). The members are taken from the iterable and fed into the function as arguments. Using *open('path/to/file') causes the file to be opened and its contents passed into mysum(L) as arguments. This is equivalent to mysum(open('path/to/file').read())
I wanted to remove a substring from a string, for example "a" in "a,b,c" and then return "b,c" to me, it does not matter what's the order of a in string(like "a,b,c", "b,a,c", and so one).
DELIMITER = ","
def remove(member, members_string):
"""removes target from string"""
members = members_string.split(DELIMITER)
members.remove(member)
return DELIMITER.join(members)
print remove("a","b,a,c")
output: b,c
The above function is working as it is expected.
My question is that accidently I modified my code, and it looks as:
def remove_2(member, members_string):
"""removes target from string"""
members = members_string.split(DELIMITER).remove(member)
return DELIMITER.join(members)
You can see that I modified
members = members_string.split(DELIMITER)
members.remove(member)
to
members = members_string.split(DELIMITER).remove(member)
after that the method is broken, it throws
Traceback (most recent call last):
File "test.py", line 15, in <module>
remove_2("a","b,a,c")
File "test.py", line 11, in remove_2
return DELIMITER.join(members)
TypeError
Based on my understanding, members_string.split(DELIMITER) is a list, and invokes remove() is allowed and it should return the new list and stores into members, but
when I print members_string.split(DELIMITER) it returns None, it explains why throws TypeError, my question is , why it returns None other than a list with elements "b" and "c"?
remove() does not return anything. It modifies the list it's called on (lists are mutable, so it would be a major waste of cpu time and memory to create a new list) so returning the same list would be somewhat pointless.
This was already answered here.
Quote from the pythondocs:
You might have noticed that methods like insert, remove or sort that only modify the list have no return value printed – they return the default None. This is a design principle for all mutable data structures in Python.
Mutable objects like lists can be manipulated under the hood via their data-manipulation methods, like remove(),insert(),add().
Immutable objects like strings always return a copy of themselves from their data-manipulation methods, like with replace() or upper().
Method chaining
The next sample shows that your intended method-chaining works with strings:
# Every replace() call is catching a different case from
# member_string like
# a,b,member
# member,b,c
# a,member,c
DELIMITER = ","
def remove(member, member_string):
members = member_string.replace(DELIMITER + member, '').replace(member + DELIMITER, '').replace(DELIMITER + member + DELIMITER, '').upper()
return members
# puts out B,C
print remove("a","b,a,c")
List comprehension
Now for clever lists manipulation (it is even faster than for-looping) the pythonians invented a different feature named list comprehension. You can read about it in python documentation.
DELIMITER = ","
def remove(member, members_string):
members = [m.upper() for m in members_string.split(DELIMITER) if m != member]
return DELIMITER.join(members)
# puts out B,C
print remove("a","b,a,c")
In addition you could google for generators or look into pythondocs. But don't know about that a lot.
BTW, flame me down as a noob but, I hate it when they call python a beginner language, as above list-comprehension looks easy, it could be intimidating for a beginner, couldn't it?
I find that in lots of different projects I'm writing a lot of code where I need to evaluate a (moderately complex, possibly costly-to-evaluate) expression and then do something with it (e.g. use it for string formatting), but only if the expression is True/non-None.
For example in lots of places I end up doing something like the following:
result += '%s '%( <complexExpressionForGettingX> ) if <complexExpressionForGettingX> else ''
... which I guess is basically a special-case of the more general problem of wanting to return some function of an expression, but only if that expression is True, i.e.:
f( e() ) if e() else somedefault
but without re-typing the expression (or re-evaluating it, in case it's a costly function call).
Obviously the required logic can be achieved easily enough in various long-winded ways (e.g. by splitting the expression into multiple statements and assigning the expression to a temporary variable), but that's a bit grungy and since this seems like quite a generic problem, and since python is pretty cool (especially for functional stuff) I wondered if there's a nice, elegant, concise way to do it?
My current best options are either defining a short-lived lambda to take care of it (better than multiple statements, but a bit hard to read):
(lambda e: '%s ' % e if e else '')( <complexExpressionForGettingX> )
or writing my own utility function like:
def conditional(expr, formatStringIfTrue, default='')
... but since I'm doing this in lots of different code-bases I'd much rather use a built-in library function or some clever python syntax if such a thing exists
I like one-liners, definitely. But sometimes they are the wrong solution.
In professional software development, if the team size is > 2, you spent more time on understanding code someone else wrote than on writing new code. The one-liners presented here are definitely confusing, so just do two lines (even though you mentioned multiple statements in your post):
X = <complexExpressionForGettingX>
result += '%s '% X if X else ''
This is clear, concise, and everybody immediately understands what's going on here.
Python doesn't have expression scope (Is there a Python equivalent of the Haskell 'let'), presumably because the abuses and confusion of the syntax outweigh the advantages.
If you absolutely have to use an expression scope, the least worst option is to abuse a generator comprehension:
result += next('%s '%(e) if e else '' for e in (<complexExpressionForGettingX>,))
You could define a conditional formatting function once, and use it repeatedly:
def cond_format(expr, form, alt):
if expr:
return form % expr
else:
return alt
Usage:
result += cond_format(<costly_expression>, '%s ', '')
After hearing the responses (thanks guys!) I'm now convinced there's no way to achieve what I want in Python without defining a new function (or lambda function) since that's the only way to introduce a new scope.
For best clarity I decided this needed to be implemented as a reusable function (not lambda) so for the benefit of others, I thought I'd share the function I finally came up with - which is flexible enough to cope with multiple additional format string arguments (in addition to the main argument used to decide whether it's to do the formatting at all); it also comes with pythondoc to show correctness and illustrate usage (if you're not sure how the **kwargs thing works just ignore it, it's just an implementation detail and was the only way I could see to implement an optional defaultValue= kwarg following the variable list of format string arguments).
def condFormat(formatIfTrue, expr, *otherFormatArgs, **kwargs):
""" Helper for creating returning the result of string.format() on a
specified expression if the expressions's bool(expr) is True
(i.e. it's not None, an empty list or an empty string or the number zero),
or return a default string (typically '') if not.
For more complicated cases where the operation on expr is more complicated
than a format string, or where a different condition is required, use:
(lambda e=myexpr: '' if not e else '%s ' % e)
formatIfTrue -- a format string suitable for use with string.format(), e.g.
"{}, {}" or "{1}, {0:d}".
expr -- the expression to evaluate. May be of any type.
defaultValue -- set this keyword arg to override
>>> 'x' + condFormat(', {}.', 'foobar')
'x, foobar.'
>>> 'x' + condFormat(', {}.', [])
'x'
>>> condFormat('{}; {}', 123, 456, defaultValue=None)
'123; 456'
>>> condFormat('{0:,d}; {2:d}; {1:d}', 12345, 678, 9, defaultValue=None)
'12,345; 9; 678'
>>> condFormat('{}; {}; {}', 0, 678, 9, defaultValue=None) == None
True
"""
defaultValue = kwargs.pop('defaultValue','')
assert not kwargs, 'unexpected kwargs: %s'%kwargs
if not bool(expr): return defaultValue
if otherFormatArgs:
return formatIfTrue.format( *((expr,)+otherFormatArgs) )
else:
return formatIfTrue.format(expr)
Presumably, you want to do this repeatedly to build up a string. With a more global view, you might find that filter (or itertools.ifilter) does what you want to the collection of values.
You'll wind up with something like this:
' '.join(map(str, filter(None, <iterable of <complexExpressionForGettingX>>)))
Using None as the first argument for filter indicates to accept any true value. As a concrete example with a simple expression:
>>> ' '.join(map(str, filter(None, range(-3, 3))))
'-3 -2 -1 1 2'
Depending on how you're calculating the values, it may be that an equivalent list or generator comprehension would be more readable.