`map`-like function preserving sequence-type - python

I would like to implement a map-like function which preserves the type of input sequence. map does not preserve it:
map(str, (8, 9)) # input is a tuple
=> ['8', '9'] # output is a list
One way I came up with is this:
def map2(f, seq):
return type(seq)( f(x) for x in seq )
map2(str, (1,2))
=> ('1', '2')
map2(str, [3,4])
=> ['3', '4']
map2(str, deque([5,6]))
=> deque(['5', '6'])
However, this does not work if seq is an iterator/generator. imap works in this case.
So my questions are:
Is there a better way to implement map2, which supports list, tuple, and many others?
Is there an elegant way to extend map2 to also support generators (like imap does)? Clearly, I'd like to avoid: try: return map2(...) except TypeError: return imap(...)
The reason I'm looking for something like that is that I'm writing a function-decorator which converts the return value, from type X to Y. If the original function returns a sequence (let's assume a sequence can only be a list, a tuple, or a generator), I assume it is a sequence of X's, and I want to convert it to the corresponding sequence of Y's (while preserving the type of the sequence).
As you probably realize, I'm using python 2.7, but python 3 is also of interest.

Your formalism also doesn't work for map(str,'12') either.
Ultimately, you don't know what arguments the type of the iterable will actually take in the constructor/initializer, so there's no way to do this in general. Also note that imap doesn't give you the same type as a generator:
>>> type(x for x in range(10))
<type 'generator'>
>>> type(imap(str,range(10)))
<type 'itertools.imap'>
>>> isinstance((x for x in range(10)),type(imap(str,range(10))))
False
You might be thinking to yourself "surely with python's introspection, I could inspect the arguments to the initializer" -- And you'd be right! However, even if you know how many arguments go to the initializer, and what their names are, you still can't get any information on what you're actually supposed to pass to them. I suppose you could write some sort of machine learning algorithm to figure it out from the docstrings ... but I think that's well beyond the scope of this question (and it assumes the author was behaving nicely and creating good docstrings to begin with).

First, type(seq)( f(x) for x in seq ) is really just type(seq)(imap(f, seq)). Why not just use that?
Second, what you're trying to do doesn't make sense in general. map takes any iterable, not just a sequence. The difference is, basically, that a sequence has a len and is randomly-accessible.
There is no rule that an iterable of type X can be constructed from values of type Y by calling type(X)(y_iter). In fact, while it's generally true for sequences, there are very few other examples for which it is true.
If what you want is to handle a few special types specially, you can do that:
def map2(f, seq):
it = imap(f, seq)
if isinstance(seq, (tuple, list)):
return type(seq)(it)
else:
return it
Or, if you want to assume that all sequences can be constructed this way (which is true for most built-in sequences, but consider, e.g. xrange—which wasn't designed as a sequence but does meet the protocol—and of course there are no guarantees beyond what's built in):
def map2(f, seq):
it = imap(f, seq)
try:
len(seq)
except:
return it
else:
return type(seq)(it)
You could assume that any iterable type that can be constructed from an iterable is a sequence (as you suggested in your question)… but this is likely to lead to more false positives than benefits, so I wouldn't. Again, remember that len is part of the definition of being a sequence, while "constructible from an iterator" is not, and there are perfectly reasonable iterable types that will do something completely different when given an iterator.
Whatever you do is going to be a hack, because the very intention is a hack, and goes against the explicit design wishes of the Python developers. The whole point of the iterator/iterable protocol is that you should care about the type of the iterable as rarely as possible. That's why Python 3.x has gone further and replaced the list-based functions like map and filter with iterator-based functions instead.
So, how do we turn one of these transformations into a decorator?
Well, first, let's skip the decorator bit and just write a higher-order function that takes an imap-like function and returns an equivalent function with this transformation applied to it:
def sequify(func):
def wrapped(f, seq):
it = func(f, seq)
try:
len(seq)
except:
return it
else:
return type(seq)(it)
return wrapped
So:
>>> seqmap = sequify(itertools.imap)
>>> seqmap(int, (1.2, 2.3))
(1, 2)
>>> sequify(itertools.ifilter)(lambda x: x>0, (-2, -1, 0, 1, 2))
(1, 2)
Now, how do we turn that into a decorator? Well, a function that returns a function already is a decorator. You probably want to add in functools.wraps (although you may want that even in the non-decorator case), but that's the only change. For example, I can write a generator that acts like imap, or a function that returns an iterator, and automatically transform either into a seqmap-like function:
#sequify
def map_and_discard_none(func, it):
for elem in imap(func, it):
if elem is not None:
yield elem
Now:
>>> map_and_discard_none(lambda x: x*2 if x else x, (1, 2, None))
(2, 4)
This, of course, only works for functions with map-like syntax—that is, they take a function and an iterable. (Well, it will accidentally work for functions that take various kinds of wrong types—e.g., you can call sequify(itertools.count(10, 5)) and it will successfully detect that 5 isn't a sequence and therefore just pass the iterator back untouched.) To make it more general, you could do something like:
def sequify(func, type_arg=1):
def wrapped(*args, **kwargs):
it = func(f, seq)
try:
len(args[type_arg])
except:
return it
else:
return type(seq)(it)
return wrapped
And now, you can go crazy with sequify(itertools.combinations, 0) or whatever you prefer. In this case, to make it a useful decorator, you probably want to go a step further:
def sequify(type_arg=1):
def wrapper(func):
def wrapped(*args, **kwargs):
it = func(f, seq)
try:
len(args[type_arg])
except:
return it
else:
return type(seq)(it)
return wrapped
return wrapper
So you can do this:
#sequify(3)
def my_silly_function(pred, defval, extrastuff, main_iterable, other_iterable):

Your question boils down to:
Given a sequence (by which you seem to mean any python object which supports iteration, not the same sequence python docs lay down) and a transformation, is there a general way to apply the transformation to each element and create a new sequence of the exact same type?
The answer is no. There is no guarantee that the iterable type will support creating a new instance from an iterable. Some objects support this inherently in their constructors; some do not. An iterable type makes no guarantees about supporting the opposite operation. You would need to special case all types you were aware of that would not work with the simple iterable as an argument to the initialization case.

Related

Map or apply a function to single value - the proper way to do in Python 3

In my python project, im many places appears the functionality strict equivalent to:
def map_or_single(item,func, iterative_classes, *args ):
if isinstance(item, iterative_classes):
for k in item:
func(k, *args)
return
func(item, *args)
but i am sure there must be some Python (3 ?) built-in function for this trivial task, i dont want to invent a bycicle and use this crap in the project. Anyone know the proper build-in function for that ?
Don't try to write this at all. Instead of
map_or_single(item, func, ...)
just write
def mapper(func, items, *args):
for k in items:
func(k, *args)
mapper always takes a list (or some other iterable); it's up to the caller to provide one if necessary.
mapper(f, [1,2,3,4])
mapper(f, [1]) # Not mapper(f, 1)
mapper(f, some_tree_object.depth_first_iterator())
mapper(f, some_tree_object.breadth_first_iterator())
This is a clear antipattern, therefore there is no "proper" or "correct" way to do this in Python (or most, if not all other languages). You most certainly can come up with a wide variety of brittle solutions, but this is the type of thing that leads to bugs in the long term for very little benefit.
This problem is solved by avoiding this pattern altogether and requiring user input to conform to one pattern instead of two or more.
Instead of this interface:
map_or_single(single, func, ...)
map_or_single(iterable, func, ...)
You have this interface:
map_or_single([single], func, ...)
map_or_single(iterable, func, ...)
Requiring single values to be wrapped is a small price to pay to avoid all the potential headaches that can easily result from this pattern.
And obviously if the situation permits:
func(single)
map(func, iterable)
I'm pretty sure that there's no one-liner builtin for this. I have this helper function in my project;
def any2list(val):
if isinstance(val, (str, bytes)):
val = [val]
try:
iter(val)
except TypeError:
val = [val]
return list(val)
The goal of this function is to convert any input to a list. If a single element is given, it returns a list of this one element.
I use this in other functions like so,
def map_or_single(container, func, *args):
container = any2list(container)
for k in container:
func(k, *args)
and now the container argument may be a proper container type (like list) or it might be a single element (scalar) like an int. I find this strategy pretty clean as I do not explicitly worry about the single element case.
Note: I choose to treat strs and bytes specially, so that any2list does not split a word or sentence into individual letters, which is almost always what I want. Your use cases might be different.
You can use functions map() and partial():
from functools import partial
from operator import add
l = [1, 2, 3]
result = map(partial(add, 2), l)
print(list(result))
# [3, 4, 5]

What does keyword "in" really do when is used to test if a sequence (list, tuple, string etc.) contains a value. A loop until find the value?

given the following code:
numbers= [1,2,3,1,2,4,5]
if 5 in numbers:
.................
we can notice we have a list(numbers) with 7 items and I want to know if the keyword in behind the scene do a loop for find a match in the list
It depends. It will call __contains__() on the container class (right side) - that can be implemented as a loop, and for some classes it can be calculated by some other faster method if possible.
You can even define it on your own class, like this ilustrative example:
class ContainsEverything:
def __contains__(self, item):
return True
c = ContainsEverything()
>>> None in c
True
>>> 4 in c
True
For container types in general, this is documented under __contains__ in the data model chapter and Membership test operations in the expressions chapter.
When you write this:
x in s
… what Python does is effectively (slightly oversimplified):
try:
return s.__contains__(x)
except AttributeError:
for value in s:
if value == x:
return True
return False
So, any type that defines a __contains__ method can do whatever it wants; any type that doesn't, Python automatically loops over it as an iterable. (Which in turn calls s.__iter__ if present, the old-style sequence API with s.__getitem__ if not.)
For the builtin sequence types list and tuple, the behavior is defined under Sequence Types — list, tuple, range and (again) Membership test operations:
True if an item of s is equal to x, else False
… which is exactly the same as the fallback behavior.
Semantically:
For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).
Fpr list and tuple, this is in fact implemented by looping over all of the elements, and it's hard to imagine how it could be implemented otherwise. (In CPython, it's slightly faster because it can do the loop directly over the underlying array, instead of using an iterator, but it's still linear time.)
However, some other builtin types do something smarter. For example:
range does the closed-form arithmetic to check whether the value is in the range in constant time.
set, frozenset, and dict use the underlying hash table to look up the value in constant time.
There are some third-party types, like sorted-dict collections based on trees or skiplists or similar, that can't search in constant time but can search in logarithmic time by walking the tree. So, they'll (hopefully) implement __contains__ in logarithmic time.
Also note that if you use the ABC/mixin helpers in collections.abc to define your own Sequence or Mapping type, you get a __contains__ implementation for free. For sequences, this works by iterating over all the elements; for mappings, it works by try: self.[key].

using exception with for loop in python

hey guys am new to python app development..i have been trying to fetch only numbers from a list using a for loop..But am confused with the correct syntax..The code i have been used.is like below.
babe = [10,11,13,'vv']
int(honey) [for honey in babe]:
print honey
When i run this i got syntax error.i have tried many situations.But it didnt helped me at all.Sorry for the silly question..
do i wanna add square brackets or something on the second line ??
Am really stuck.Hope you guys can help me out..Thanks in advance
You seem to be conflating the syntax for for loops (a statement followed by a suite of statements ... otherwise known as a "block of code") and a list comprehension (an expression).
Here's a list comprehension:
#!/usr/bin/python
# Given:
b = [1,2,3,'vv']
a = [int(x) for x in b]
... that's syntactically valid. However, the semantics of that example will raise an exception because 'vv' is not a valid literal (string). It cannot be interpreted as a decimal integer.
Here's a for loop:
#!/usr/bin/python
# Given:
b = [1,2,3,'vv']
a = list()
for x in b:
try:
a.append(int(x))
except ValueError:
pass
In this case we explicitly loop over the given list (b) and ignore any ValueError exceptions raised when we try to convert each of those entries into an integer.
There is no reasonable way to handle exceptions from within a list comprehension. You could write a function which returned some sentinel value (from the expression) for any invalid input value. That would look something like this:
#/usr/bin/python
# Given:
b = [1, 2, 3, 'vv']
def mk_integer_if_possible(n):
'''Returns an integer or the sentinel value None
'''
results = None
try:
results = int(n)
except ValueError:
pass
return results
# Use that function:
a = [mk_integer_if_possible(x) for x in b if mk_integer_if_possible(x) is not None]
Note: the absurd function name is deliberate. This is an ugly way to do this and the awkwardness of having to call this putative function TWICE for each element of b is an indication that you should NOT use a list comprehension for this situation. (You have to call it once to make the conversion but again for the conditional. Saving the results from one call would, of course, be a STATEMENT, which we can't have embedded within an EXPRESSION).
Statements contain one or more expressions. Expressions are components of statements. Python strictly delineates between statements and expressions. Assignments are statements in Python. These distinctions can be nuanced and there are other programming languages where assignments are expressions rather than being strictly defined, by the language's syntax, as statements.
So, use the for loop whenever you have to handle possible exceptions while iterating over any sort of data set and usually when you need to filter on the results generated by mapping a function over a list comprehension.
Incidentally the explicit use of the expression is not None is necessary in this example. If I attempted to shorten that test to simply be if mk_integer_if_possible(x) using Python's implicit boolean handling then we'd be inadvertently filtering out any entries from b that evaluated to integer 0 as well as any that were returned as the None sentinel by my ill-advised function.
In Python it's often fine to use implicit boolean values as conditions. None and False as well as any numerically zero value, any empty string or any sort of empty list, tuple or dictionary, are all treated as "false" in a boolean context. However, when dealing with sentinel values it's best to use the is operator and explicitly test for object identity. Otherwise you'll have corner cases where your condition might be matched by values other than your sentinel.
(Handy trick: if you ever come across the need to allow None through some sort of filter or pass it along, but you need some other sentinel ... just use sentinel = object() ... you can create (instantiate) a generic Pythonobject and use is to match it for your sentinel handling. That will be unique to your code and no other Python object or type will match it. Guaranteed).
By the way ... I should note that this code it technically not "fetching only numbers from a list." It is returning integers for all entries in the list which can be converted thereto. This is a nitpick; but it's a distinction that any good engineer will notice. Do you want to return all integers from the input list? Or do you want to return all entries as integers if that can be so converted? Your code suggested that you're trying to accomplish the latter; so that's how I implemented my working examples for you. However, to implement the later semantics you'd probably want to use either the (mathematical) additive or multiplicative identity property like so:
# ... from within some function:
try:
results = x == x + 0 # Additive identity
except (TypeError, ValueError):
results = None
return results
babe = [10,11,13,'vv']
a = [honey for honey in babe if isinstance(honey, int)]
print a
See more here about list comprehension: https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions

Python lists/arrays: disable negative indexing wrap-around in slices

While I find the negative number wraparound (i.e. A[-2] indexing the second-to-last element) extremely useful in many cases, when it happens inside a slice it is usually more of an annoyance than a helpful feature, and I often wish for a way to disable that particular behaviour.
Here is a canned 2D example below, but I have had the same peeve a few times with other data structures and in other numbers of dimensions.
import numpy as np
A = np.random.randint(0, 2, (5, 10))
def foo(i, j, r=2):
'''sum of neighbours within r steps of A[i,j]'''
return A[i-r:i+r+1, j-r:j+r+1].sum()
In the slice above I would rather that any negative number to the slice would be treated the same as None is, rather than wrapping to the other end of the array.
Because of the wrapping, the otherwise nice implementation above gives incorrect results at boundary conditions and requires some sort of patch like:
def ugly_foo(i, j, r=2):
def thing(n):
return None if n < 0 else n
return A[thing(i-r):i+r+1, thing(j-r):j+r+1].sum()
I have also tried zero-padding the array or list, but it is still inelegant (requires adjusting the lookup locations indices accordingly) and inefficient (requires copying the array).
Am I missing some standard trick or elegant solution for slicing like this? I noticed that python and numpy already handle the case where you specify too large a number nicely - that is, if the index is greater than the shape of the array it behaves the same as if it were None.
My guess is that you would have to create your own subclass wrapper around the desired objects and re-implement __getitem__() to convert negative keys to None, and then call the superclass __getitem__
Note, what I am suggesting is to subclass existing custom classes, but NOT builtins like list or dict. This is simply to make a utility around another class, not to confuse the normal expected operations of a list type. It would be something you would want to use within a certain context for a period of time until your operations are complete. It is best to avoid making a globally different change that will confuse users of your code.
Datamodel
object.getitem(self, key)
Called to implement evaluation of
self[key]. For sequence types, the accepted keys should be integers
and slice objects. Note that the special interpretation of negative
indexes (if the class wishes to emulate a sequence type) is up to the
getitem() method. If key is of an inappropriate type, TypeError may be raised; if of a value outside the set of indexes for the
sequence (after any special interpretation of negative values),
IndexError should be raised. For mapping types, if key is missing (not
in the container), KeyError should be raised.
You could even create a wrapper that simply takes an instance as an arg, and just defers all __getitem__() calls to that private member, while converting the key, for cases where you can't or don't want to subclass a type, and instead just want a utility wrapper for any sequence object.
Quick example of the latter suggestion:
class NoWrap(object):
def __init__(self, obj, default=None):
self._obj = obj
self._default = default
def __getitem__(self, key):
if isinstance(key, int):
if key < 0:
return self._default
return self._obj.__getitem__(key)
In [12]: x = range(-10,10)
In [13]: x_wrapped = NoWrap(x)
In [14]: print x_wrapped[5]
-5
In [15]: print x_wrapped[-1]
None
In [16]: x_wrapped = NoWrap(x, 'FOO')
In [17]: print x_wrapped[-1]
FOO
While you could subclass e.g. list as suggested by jdi, Python's slicing behaviour is not something anyone's going to expect you to muck about with.
Changing it is likely to lead to some serious head-scratching by other people working with your code when it doesn't behave as expected - and it could take a while before they go looking at the special methods of your subclass to see what's actually going on.
See: Action at a distance
I think this isn't ugly enough to justify new classes and wrapping things.
Then again it's your code.
def foo(i, j, r=2):
'''sum of neighbours within r steps of A[i,j]'''
return A[i-r:abs(i+r+1), j-r:abs(j+r+1)].sum() # ugly, but works?
(Downvoting is fun, so I've added some more options)
I found out something quite unexpected (for me): The __getslice__(i,j) does not wrap! Instead, negative indices are just ignored, so:
lst[1:3] == lst.__getslice__(1,3)
lst[-3:-1] == 2 next to last items but lst.__getslice__(-3,-1) == []
and finally:
lst[-2:1] == [], but lst.__getslice__(-2,1) == lst[0:1]
Surprising, interesting, and completely useless.
If this only needs to apply in a few specific operations, a simple & straightworward if index>=0: do_something(array[i]) / if index<0: raise IndexError would do.
If this needs to apply wider, it's still the same logic, just being wrapped in this manner or another.

Create list by repeated application of function

I want this:
[foo() for _ in xrange (100)]
but beautifuller. ?
You can write a generator repeat like this:
def repeat(times, func, *args, **kwargs):
for _ in xrange(times):
yield func(*args, **kwargs)
Then:
list(repeat(100, foo))
It also accepts arguments to be passed on to the function, so you can:
from random import randint
list(repeat(100, randint, 1, 100)) # 100 random ints between 1 and 100
Since it's a generator, you can pipe it into any kind of iterable, be it a list (as here) or a tuple or a set, or use it in a comprehension or a loop.
I'm afraid you're not gonna get it any prettier than that in Python, except that some people would advise against _ for an "anonymous" variable. This is the Pythonic idiom for doing what you want.
(The _ can be considered confusing to novices because it can be mistaken for special syntax. I use it, but only in the "expert parts" of my code. I also encounter it more and more often, but opinion still seems a bit divided on this one.)
Depending on your definition of "beautifuller", you may prefer this:
map(lambda x: foo(), xrange(100))
Although what you have already is much nicer IMO.
Depending on what it does, you can make foo() a generator.
Your list comprehension is already beatiful and effective but if you need several options to do the same things then i think you can use map here. In case you need to call a certain function the specified number of times use:
# in case your func looks like
def func():
# do something
#then
map(func(), xrange(numberOfTimes))
In case your function need value from range then you can use map with lambda:
# in case your func looks like
def func(value):
# do something with value
#then
map(lambda val: func(val), xrange(numberOfTimes))
Or in case you need to use data from several lists of the same length:
# in case your func looks like
def func(value1, value2):
# do something with values
#then
map(lambda val: func(*val), zip(xrange(10), xrange(10,20)))
And so on...
In case foo() always returns the same result, you could use
[foo()]*100
This has the advantage that foo() is only called once.
Edit: As #larsmans points out this only makes sense though if foo() returns an immutable result.
In all other cases, your solution is fine!

Categories