While I find the negative number wraparound (i.e. A[-2] indexing the second-to-last element) extremely useful in many cases, when it happens inside a slice it is usually more of an annoyance than a helpful feature, and I often wish for a way to disable that particular behaviour.
Here is a canned 2D example below, but I have had the same peeve a few times with other data structures and in other numbers of dimensions.
import numpy as np
A = np.random.randint(0, 2, (5, 10))
def foo(i, j, r=2):
'''sum of neighbours within r steps of A[i,j]'''
return A[i-r:i+r+1, j-r:j+r+1].sum()
In the slice above I would rather that any negative number to the slice would be treated the same as None is, rather than wrapping to the other end of the array.
Because of the wrapping, the otherwise nice implementation above gives incorrect results at boundary conditions and requires some sort of patch like:
def ugly_foo(i, j, r=2):
def thing(n):
return None if n < 0 else n
return A[thing(i-r):i+r+1, thing(j-r):j+r+1].sum()
I have also tried zero-padding the array or list, but it is still inelegant (requires adjusting the lookup locations indices accordingly) and inefficient (requires copying the array).
Am I missing some standard trick or elegant solution for slicing like this? I noticed that python and numpy already handle the case where you specify too large a number nicely - that is, if the index is greater than the shape of the array it behaves the same as if it were None.
My guess is that you would have to create your own subclass wrapper around the desired objects and re-implement __getitem__() to convert negative keys to None, and then call the superclass __getitem__
Note, what I am suggesting is to subclass existing custom classes, but NOT builtins like list or dict. This is simply to make a utility around another class, not to confuse the normal expected operations of a list type. It would be something you would want to use within a certain context for a period of time until your operations are complete. It is best to avoid making a globally different change that will confuse users of your code.
Datamodel
object.getitem(self, key)
Called to implement evaluation of
self[key]. For sequence types, the accepted keys should be integers
and slice objects. Note that the special interpretation of negative
indexes (if the class wishes to emulate a sequence type) is up to the
getitem() method. If key is of an inappropriate type, TypeError may be raised; if of a value outside the set of indexes for the
sequence (after any special interpretation of negative values),
IndexError should be raised. For mapping types, if key is missing (not
in the container), KeyError should be raised.
You could even create a wrapper that simply takes an instance as an arg, and just defers all __getitem__() calls to that private member, while converting the key, for cases where you can't or don't want to subclass a type, and instead just want a utility wrapper for any sequence object.
Quick example of the latter suggestion:
class NoWrap(object):
def __init__(self, obj, default=None):
self._obj = obj
self._default = default
def __getitem__(self, key):
if isinstance(key, int):
if key < 0:
return self._default
return self._obj.__getitem__(key)
In [12]: x = range(-10,10)
In [13]: x_wrapped = NoWrap(x)
In [14]: print x_wrapped[5]
-5
In [15]: print x_wrapped[-1]
None
In [16]: x_wrapped = NoWrap(x, 'FOO')
In [17]: print x_wrapped[-1]
FOO
While you could subclass e.g. list as suggested by jdi, Python's slicing behaviour is not something anyone's going to expect you to muck about with.
Changing it is likely to lead to some serious head-scratching by other people working with your code when it doesn't behave as expected - and it could take a while before they go looking at the special methods of your subclass to see what's actually going on.
See: Action at a distance
I think this isn't ugly enough to justify new classes and wrapping things.
Then again it's your code.
def foo(i, j, r=2):
'''sum of neighbours within r steps of A[i,j]'''
return A[i-r:abs(i+r+1), j-r:abs(j+r+1)].sum() # ugly, but works?
(Downvoting is fun, so I've added some more options)
I found out something quite unexpected (for me): The __getslice__(i,j) does not wrap! Instead, negative indices are just ignored, so:
lst[1:3] == lst.__getslice__(1,3)
lst[-3:-1] == 2 next to last items but lst.__getslice__(-3,-1) == []
and finally:
lst[-2:1] == [], but lst.__getslice__(-2,1) == lst[0:1]
Surprising, interesting, and completely useless.
If this only needs to apply in a few specific operations, a simple & straightworward if index>=0: do_something(array[i]) / if index<0: raise IndexError would do.
If this needs to apply wider, it's still the same logic, just being wrapped in this manner or another.
Related
I have a custom Python class which essentially encapsulate a list of some kind of object, and I'm wondering how I should implement its __repr__ function. I'm tempted to go with the following:
class MyCollection:
def __init__(self, objects = []):
self._objects = []
self._objects.extend(objects)
def __repr__(self):
return f"MyCollection({self._objects})"
This has the advantage of producing a valid Python output which fully describes the class instance. However, in my real-wold case, the object list can be rather large and each object may have a large repr by itself (they are arrays themselves).
What are the best practices in such situations? Accept that the repr might often be a very long string? Are there potential issues related to this (debugger UI, etc.)? Should I implement some kind of shortening scheme using semicolon? If so, is there a good/standard way to achieve this? Or should I skip listing the collection's content altogether?
The official documentation outlines this as how you should handle __repr__:
Called by the repr() built-in function to compute the “official”
string representation of an object. If at all possible, this should
look like a valid Python expression that could be used to recreate an
object with the same value (given an appropriate environment). If this
is not possible, a string of the form <...some useful description...>
should be returned. The return value must be a string object. If a
class defines __repr__() but not __str__(), then __repr__() is also
used when an “informal” string representation of instances of that
class is required.
This is typically used for debugging, so it is important that the
representation is information-rich and unambiguous.
Python 3 __repr__ Docs
Lists, strings, sets, tuples and dictionaries all print out the entirety of their collection in their __repr__ method.
Your current code looks to perfectly follow the example of what the documentation suggests. Though I would suggest changing your __init__ method so it looks more like this:
class MyCollection:
def __init__(self, objects=None):
if objects is None:
objects = []
self._objects = objects
def __repr__(self):
return f"MyCollection({self._objects})"
You generally want to avoid using mutable objects as default arguments. Technically because of the way your method is implemented using extend (which makes a copy of the list), it will still work perfectly fine, but Python's documentation still suggests you avoid this.
It is good programming practice to not use mutable objects as default
values. Instead, use None as the default value and inside the
function, check if the parameter is None and create a new
list/dictionary/whatever if it is.
https://docs.python.org/3/faq/programming.html#why-are-default-values-shared-between-objects
If you're interested in how another library handles it differently, the repr for Numpy arrays only shows the first three items and the last three items when the array length is greater than 1,000. It also formats the items so they all use the same amount of space (In the example below, 1000 takes up four spaces so 0 has to be padded with three more spaces to match).
>>> repr(np.array([i for i in range(1001)]))
'array([ 0, 1, 2, ..., 998, 999, 1000])'
To mimic this numpy array style you could implement a __repr__ method like this in your class:
class MyCollection:
def __init__(self, objects=None):
if objects is None:
objects = []
self._objects = objects
def __repr__(self):
# If length is less than 1,000 return the full list.
if len(self._objects) < 1000:
return f"MyCollection({self._objects})"
else:
# Get the first and last three items
items_to_display = self._objects[:3] + self._objects[-3:]
# Find the which item has the longest repr
max_length_repr = max(items_to_display, key=lambda x: len(repr(x)))
# Get the length of the item with the longest repr
padding = len(repr(max_length_repr))
# Create a list of the reprs of each item and apply the padding
values = [repr(item).rjust(padding) for item in items_to_display]
# Insert the '...' inbetween the 3rd and 4th item
values.insert(3, '...')
# Convert the list to a string joined by commas
array_as_string = ', '.join(values)
return f"MyCollection([{array_as_string}])"
>>> repr(MyCollection([1,2,3,4]))
'MyCollection([1, 2, 3, 4])'
>>> repr(MyCollection([i for i in range(1001)]))
'MyCollection([ 0, 1, 2, ..., 998, 999, 1000])'
given the following code:
numbers= [1,2,3,1,2,4,5]
if 5 in numbers:
.................
we can notice we have a list(numbers) with 7 items and I want to know if the keyword in behind the scene do a loop for find a match in the list
It depends. It will call __contains__() on the container class (right side) - that can be implemented as a loop, and for some classes it can be calculated by some other faster method if possible.
You can even define it on your own class, like this ilustrative example:
class ContainsEverything:
def __contains__(self, item):
return True
c = ContainsEverything()
>>> None in c
True
>>> 4 in c
True
For container types in general, this is documented under __contains__ in the data model chapter and Membership test operations in the expressions chapter.
When you write this:
x in s
… what Python does is effectively (slightly oversimplified):
try:
return s.__contains__(x)
except AttributeError:
for value in s:
if value == x:
return True
return False
So, any type that defines a __contains__ method can do whatever it wants; any type that doesn't, Python automatically loops over it as an iterable. (Which in turn calls s.__iter__ if present, the old-style sequence API with s.__getitem__ if not.)
For the builtin sequence types list and tuple, the behavior is defined under Sequence Types — list, tuple, range and (again) Membership test operations:
True if an item of s is equal to x, else False
… which is exactly the same as the fallback behavior.
Semantically:
For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).
Fpr list and tuple, this is in fact implemented by looping over all of the elements, and it's hard to imagine how it could be implemented otherwise. (In CPython, it's slightly faster because it can do the loop directly over the underlying array, instead of using an iterator, but it's still linear time.)
However, some other builtin types do something smarter. For example:
range does the closed-form arithmetic to check whether the value is in the range in constant time.
set, frozenset, and dict use the underlying hash table to look up the value in constant time.
There are some third-party types, like sorted-dict collections based on trees or skiplists or similar, that can't search in constant time but can search in logarithmic time by walking the tree. So, they'll (hopefully) implement __contains__ in logarithmic time.
Also note that if you use the ABC/mixin helpers in collections.abc to define your own Sequence or Mapping type, you get a __contains__ implementation for free. For sequences, this works by iterating over all the elements; for mappings, it works by try: self.[key].
So suppose I have an array of some elements. Each element have some number of properties.
I need to filter this list from some subsets of values determined by predicates. This subsets of course can have intersections.
I also need to determine amount of values in each such subset.
So using imperative approach I could write code like that and it would have running time of 2*n. One iteration to copy array and another one to filter it count subsets sizes.
from split import import groupby
a = [{'some_number': i, 'some_time': str(i) + '0:00:00'} for i in range(10)]
# imperative style
wrong_number_count = 0
wrong_time_count = 0
for item in a[:]:
if predicate1(item):
delete_original(item, a)
wrong_number_count += 1
if predicate2(item):
delete_original(item, a)
wrong_time_count += 1
update_some_data(item)
do_something_with_filtered(a)
def do_something_with_filtered(a, c1, c2):
print('filtered a {}'.format(a))
print('{} items had wrong number'.format(c1))
print('{} items had wrong time'.format(c2))
def predicate1(x):
return x['some_number'] < 3
def predicate2(x):
return x['some_time'] < '50:00:00'
Somehow I can't think of the way to do that in Python in functional way with same running time.
So in functional style I could have used groupby multiple times probably or write a comprehension for each predicate, but that's obviously would be slower than imperative approach.
I think such thing possible in Haskell using Stream Fusion (am I right?)
But how do that in Python?
Python has a strong support to "stream processing" in the form of its iterators - and what you ask seens just trivial to do. You just have to have a way to group your predicates and attributes to it - it could be a dictionary where the predicate itself is the key.
That said, a simple iterator function that takes in your predicate data structure, along with the data to be processed could do what you want. TThe iterator would have the side effect of changing your data-structure with the predicate-information. If you want "pure functions" you'd just have to duplicate the predicate information before, and maybe passing and retrieving all predicate and counters valus to the iterator (through the send method) for each element - I don´ t think it would be worth that level of purism.
That said you could have your code something along:
from collections import OrderedDict
def predicate1(...):
...
...
def preticateN(...):
...
def do_something_with_filtered(item):
...
def multifilter(data, predicates):
for item in data:
for predicate in predicates:
if predicate(item):
predicates[predicate] += 1
break
else:
yield item
def do_it(data):
predicates = OrderedDict([(predicate1, 0), ..., (predicateN, 0) ])
for item in multifilter(data, predicates):
do_something_with_filtered(item)
for predicate, value in predicates.items():
print("{} filtered out {} items".format(predicate.__name__, value)
a = ...
do_it(a)
(If you have to count an item for all predicates that it fails, then an obvious change from the "break" statement to a state flag variable is enough)
Yes, fusion in Haskell will often turn something written as two passes into a single pass. Though in the case of lists, it's actually foldr/build fusion rather than stream fusion.
That's not generally possible in languages that don't enforce purity, though. When side effects are involved, it's no longer correct to fuse multiple passes into one. What if each pass performed output? Unfused, you get all the output from each pass separately. Fused, you get the output from both passes interleaved.
It's possible to write a fusion-style framework in Python that will work correctly if you promise to only ever use it with pure functions. But I'm doubtful such a thing exists at the moment. (I'd loved to be proven wrong, though.)
hey guys am new to python app development..i have been trying to fetch only numbers from a list using a for loop..But am confused with the correct syntax..The code i have been used.is like below.
babe = [10,11,13,'vv']
int(honey) [for honey in babe]:
print honey
When i run this i got syntax error.i have tried many situations.But it didnt helped me at all.Sorry for the silly question..
do i wanna add square brackets or something on the second line ??
Am really stuck.Hope you guys can help me out..Thanks in advance
You seem to be conflating the syntax for for loops (a statement followed by a suite of statements ... otherwise known as a "block of code") and a list comprehension (an expression).
Here's a list comprehension:
#!/usr/bin/python
# Given:
b = [1,2,3,'vv']
a = [int(x) for x in b]
... that's syntactically valid. However, the semantics of that example will raise an exception because 'vv' is not a valid literal (string). It cannot be interpreted as a decimal integer.
Here's a for loop:
#!/usr/bin/python
# Given:
b = [1,2,3,'vv']
a = list()
for x in b:
try:
a.append(int(x))
except ValueError:
pass
In this case we explicitly loop over the given list (b) and ignore any ValueError exceptions raised when we try to convert each of those entries into an integer.
There is no reasonable way to handle exceptions from within a list comprehension. You could write a function which returned some sentinel value (from the expression) for any invalid input value. That would look something like this:
#/usr/bin/python
# Given:
b = [1, 2, 3, 'vv']
def mk_integer_if_possible(n):
'''Returns an integer or the sentinel value None
'''
results = None
try:
results = int(n)
except ValueError:
pass
return results
# Use that function:
a = [mk_integer_if_possible(x) for x in b if mk_integer_if_possible(x) is not None]
Note: the absurd function name is deliberate. This is an ugly way to do this and the awkwardness of having to call this putative function TWICE for each element of b is an indication that you should NOT use a list comprehension for this situation. (You have to call it once to make the conversion but again for the conditional. Saving the results from one call would, of course, be a STATEMENT, which we can't have embedded within an EXPRESSION).
Statements contain one or more expressions. Expressions are components of statements. Python strictly delineates between statements and expressions. Assignments are statements in Python. These distinctions can be nuanced and there are other programming languages where assignments are expressions rather than being strictly defined, by the language's syntax, as statements.
So, use the for loop whenever you have to handle possible exceptions while iterating over any sort of data set and usually when you need to filter on the results generated by mapping a function over a list comprehension.
Incidentally the explicit use of the expression is not None is necessary in this example. If I attempted to shorten that test to simply be if mk_integer_if_possible(x) using Python's implicit boolean handling then we'd be inadvertently filtering out any entries from b that evaluated to integer 0 as well as any that were returned as the None sentinel by my ill-advised function.
In Python it's often fine to use implicit boolean values as conditions. None and False as well as any numerically zero value, any empty string or any sort of empty list, tuple or dictionary, are all treated as "false" in a boolean context. However, when dealing with sentinel values it's best to use the is operator and explicitly test for object identity. Otherwise you'll have corner cases where your condition might be matched by values other than your sentinel.
(Handy trick: if you ever come across the need to allow None through some sort of filter or pass it along, but you need some other sentinel ... just use sentinel = object() ... you can create (instantiate) a generic Pythonobject and use is to match it for your sentinel handling. That will be unique to your code and no other Python object or type will match it. Guaranteed).
By the way ... I should note that this code it technically not "fetching only numbers from a list." It is returning integers for all entries in the list which can be converted thereto. This is a nitpick; but it's a distinction that any good engineer will notice. Do you want to return all integers from the input list? Or do you want to return all entries as integers if that can be so converted? Your code suggested that you're trying to accomplish the latter; so that's how I implemented my working examples for you. However, to implement the later semantics you'd probably want to use either the (mathematical) additive or multiplicative identity property like so:
# ... from within some function:
try:
results = x == x + 0 # Additive identity
except (TypeError, ValueError):
results = None
return results
babe = [10,11,13,'vv']
a = [honey for honey in babe if isinstance(honey, int)]
print a
See more here about list comprehension: https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions
I would like to implement a map-like function which preserves the type of input sequence. map does not preserve it:
map(str, (8, 9)) # input is a tuple
=> ['8', '9'] # output is a list
One way I came up with is this:
def map2(f, seq):
return type(seq)( f(x) for x in seq )
map2(str, (1,2))
=> ('1', '2')
map2(str, [3,4])
=> ['3', '4']
map2(str, deque([5,6]))
=> deque(['5', '6'])
However, this does not work if seq is an iterator/generator. imap works in this case.
So my questions are:
Is there a better way to implement map2, which supports list, tuple, and many others?
Is there an elegant way to extend map2 to also support generators (like imap does)? Clearly, I'd like to avoid: try: return map2(...) except TypeError: return imap(...)
The reason I'm looking for something like that is that I'm writing a function-decorator which converts the return value, from type X to Y. If the original function returns a sequence (let's assume a sequence can only be a list, a tuple, or a generator), I assume it is a sequence of X's, and I want to convert it to the corresponding sequence of Y's (while preserving the type of the sequence).
As you probably realize, I'm using python 2.7, but python 3 is also of interest.
Your formalism also doesn't work for map(str,'12') either.
Ultimately, you don't know what arguments the type of the iterable will actually take in the constructor/initializer, so there's no way to do this in general. Also note that imap doesn't give you the same type as a generator:
>>> type(x for x in range(10))
<type 'generator'>
>>> type(imap(str,range(10)))
<type 'itertools.imap'>
>>> isinstance((x for x in range(10)),type(imap(str,range(10))))
False
You might be thinking to yourself "surely with python's introspection, I could inspect the arguments to the initializer" -- And you'd be right! However, even if you know how many arguments go to the initializer, and what their names are, you still can't get any information on what you're actually supposed to pass to them. I suppose you could write some sort of machine learning algorithm to figure it out from the docstrings ... but I think that's well beyond the scope of this question (and it assumes the author was behaving nicely and creating good docstrings to begin with).
First, type(seq)( f(x) for x in seq ) is really just type(seq)(imap(f, seq)). Why not just use that?
Second, what you're trying to do doesn't make sense in general. map takes any iterable, not just a sequence. The difference is, basically, that a sequence has a len and is randomly-accessible.
There is no rule that an iterable of type X can be constructed from values of type Y by calling type(X)(y_iter). In fact, while it's generally true for sequences, there are very few other examples for which it is true.
If what you want is to handle a few special types specially, you can do that:
def map2(f, seq):
it = imap(f, seq)
if isinstance(seq, (tuple, list)):
return type(seq)(it)
else:
return it
Or, if you want to assume that all sequences can be constructed this way (which is true for most built-in sequences, but consider, e.g. xrange—which wasn't designed as a sequence but does meet the protocol—and of course there are no guarantees beyond what's built in):
def map2(f, seq):
it = imap(f, seq)
try:
len(seq)
except:
return it
else:
return type(seq)(it)
You could assume that any iterable type that can be constructed from an iterable is a sequence (as you suggested in your question)… but this is likely to lead to more false positives than benefits, so I wouldn't. Again, remember that len is part of the definition of being a sequence, while "constructible from an iterator" is not, and there are perfectly reasonable iterable types that will do something completely different when given an iterator.
Whatever you do is going to be a hack, because the very intention is a hack, and goes against the explicit design wishes of the Python developers. The whole point of the iterator/iterable protocol is that you should care about the type of the iterable as rarely as possible. That's why Python 3.x has gone further and replaced the list-based functions like map and filter with iterator-based functions instead.
So, how do we turn one of these transformations into a decorator?
Well, first, let's skip the decorator bit and just write a higher-order function that takes an imap-like function and returns an equivalent function with this transformation applied to it:
def sequify(func):
def wrapped(f, seq):
it = func(f, seq)
try:
len(seq)
except:
return it
else:
return type(seq)(it)
return wrapped
So:
>>> seqmap = sequify(itertools.imap)
>>> seqmap(int, (1.2, 2.3))
(1, 2)
>>> sequify(itertools.ifilter)(lambda x: x>0, (-2, -1, 0, 1, 2))
(1, 2)
Now, how do we turn that into a decorator? Well, a function that returns a function already is a decorator. You probably want to add in functools.wraps (although you may want that even in the non-decorator case), but that's the only change. For example, I can write a generator that acts like imap, or a function that returns an iterator, and automatically transform either into a seqmap-like function:
#sequify
def map_and_discard_none(func, it):
for elem in imap(func, it):
if elem is not None:
yield elem
Now:
>>> map_and_discard_none(lambda x: x*2 if x else x, (1, 2, None))
(2, 4)
This, of course, only works for functions with map-like syntax—that is, they take a function and an iterable. (Well, it will accidentally work for functions that take various kinds of wrong types—e.g., you can call sequify(itertools.count(10, 5)) and it will successfully detect that 5 isn't a sequence and therefore just pass the iterator back untouched.) To make it more general, you could do something like:
def sequify(func, type_arg=1):
def wrapped(*args, **kwargs):
it = func(f, seq)
try:
len(args[type_arg])
except:
return it
else:
return type(seq)(it)
return wrapped
And now, you can go crazy with sequify(itertools.combinations, 0) or whatever you prefer. In this case, to make it a useful decorator, you probably want to go a step further:
def sequify(type_arg=1):
def wrapper(func):
def wrapped(*args, **kwargs):
it = func(f, seq)
try:
len(args[type_arg])
except:
return it
else:
return type(seq)(it)
return wrapped
return wrapper
So you can do this:
#sequify(3)
def my_silly_function(pred, defval, extrastuff, main_iterable, other_iterable):
Your question boils down to:
Given a sequence (by which you seem to mean any python object which supports iteration, not the same sequence python docs lay down) and a transformation, is there a general way to apply the transformation to each element and create a new sequence of the exact same type?
The answer is no. There is no guarantee that the iterable type will support creating a new instance from an iterable. Some objects support this inherently in their constructors; some do not. An iterable type makes no guarantees about supporting the opposite operation. You would need to special case all types you were aware of that would not work with the simple iterable as an argument to the initialization case.