What method slices a string in Python? [duplicate] - python

I am trying to implement slice functionality for a class I am making that creates a vector representation.
I have this code so far, which I believe will properly implement the slice but whenever I do something like v[4] where v is a vector, python raises an error about not having enough arguments. So I am trying to figure out how to define the __getitem__ special method in my class to handle both plain indexes and slicing.
def __getitem__(self, start, stop, step):
index = start
if stop == None:
end = start + 1
else:
end = stop
if step == None:
stride = 1
else:
stride = step
return self.__data[index:end:stride]

The __getitem__() method will receive a slice object when the object is sliced. Simply look at the start, stop, and step members of the slice object in order to get the components for the slice.
>>> class C(object):
... def __getitem__(self, val):
... print val
...
>>> c = C()
>>> c[3]
3
>>> c[3:4]
slice(3, 4, None)
>>> c[3:4:-2]
slice(3, 4, -2)
>>> c[():1j:'a']
slice((), 1j, 'a')

I have a "synthetic" list (one where the data is larger than you would want to create in memory) and my __getitem__ looks like this:
def __getitem__(self, key):
if isinstance(key, slice):
# Get the start, stop, and step from the slice
return [self[ii] for ii in xrange(*key.indices(len(self)))]
elif isinstance(key, int):
if key < 0: # Handle negative indices
key += len(self)
if key < 0 or key >= len(self):
raise IndexError, "The index (%d) is out of range." % key
return self.getData(key) # Get the data from elsewhere
else:
raise TypeError, "Invalid argument type."
The slice doesn't return the same type, which is a no-no, but it works for me.

How to define the getitem class to handle both plain indexes and slicing?
Slice objects gets automatically created when you use a colon in the subscript notation - and that is what is passed to __getitem__. Use isinstance to check if you have a slice object:
from __future__ import print_function
class Sliceable(object):
def __getitem__(self, subscript):
if isinstance(subscript, slice):
# do your handling for a slice object:
print(subscript.start, subscript.stop, subscript.step)
else:
# Do your handling for a plain index
print(subscript)
Say we were using a range object, but we want slices to return lists instead of new range objects (as it does):
>>> range(1,100, 4)[::-1]
range(97, -3, -4)
We can't subclass range because of internal limitations, but we can delegate to it:
class Range:
"""like builtin range, but when sliced gives a list"""
__slots__ = "_range"
def __init__(self, *args):
self._range = range(*args) # takes no keyword arguments.
def __getattr__(self, name):
return getattr(self._range, name)
def __getitem__(self, subscript):
result = self._range.__getitem__(subscript)
if isinstance(subscript, slice):
return list(result)
else:
return result
r = Range(100)
We don't have a perfectly replaceable Range object, but it's fairly close:
>>> r[1:3]
[1, 2]
>>> r[1]
1
>>> 2 in r
True
>>> r.count(3)
1
To better understand the slice notation, here's example usage of Sliceable:
>>> sliceme = Sliceable()
>>> sliceme[1]
1
>>> sliceme[2]
2
>>> sliceme[:]
None None None
>>> sliceme[1:]
1 None None
>>> sliceme[1:2]
1 2 None
>>> sliceme[1:2:3]
1 2 3
>>> sliceme[:2:3]
None 2 3
>>> sliceme[::3]
None None 3
>>> sliceme[::]
None None None
>>> sliceme[:]
None None None
Python 2, be aware:
In Python 2, there's a deprecated method that you may need to override when subclassing some builtin types.
From the datamodel documentation:
object.__getslice__(self, i, j)
Deprecated since version 2.0: Support slice objects as parameters to the __getitem__() method. (However, built-in types in CPython currently still implement __getslice__(). Therefore, you have to override it in derived classes when implementing slicing.)
This is gone in Python 3.

To extend Aaron's answer, for things like numpy, you can do multi-dimensional slicing by checking to see if given is a tuple:
class Sliceable(object):
def __getitem__(self, given):
if isinstance(given, slice):
# do your handling for a slice object:
print("slice", given.start, given.stop, given.step)
elif isinstance(given, tuple):
print("multidim", given)
else:
# Do your handling for a plain index
print("plain", given)
sliceme = Sliceable()
sliceme[1]
sliceme[::]
sliceme[1:, ::2]
```
Output:
('plain', 1)
('slice', None, None, None)
('multidim', (slice(1, None, None), slice(None, None, 2)))

The correct way to do this is to have __getitem__ take one parameter, which can either be a number or a slice object.

Related

python how to make "for-iteration" stop at "len" using dunder methods

I have several Python classes I use for calling C-code, using c-types. The return struct looks something like the below example.
import ctypes
class MyCClass(ctypes.Structure):
_fields_ = [('n_values', ctypes.c_int),\
('values', ctypes.c_double * 5)]
def __repr__(self):
return """n_values : {0}, values : {1}""".format(self.n_values,\
self.values)
def __len__(self):
return self.n_values
def __getitem__(self, key):
return self.values[key]
The values array is fixed size to ease the call to C (using a variable size array here is not an option). The "actual" length of the array is controlled by the n_values variable.
For instance if values is an array of three numbers, say 1, 2 and 3,values=[1, 2, 3, 0, 0] and n_values=3.
This is all fine. The problem is when i implement __len__ and __getitem__.
I want to be able to write code like this
for value in my_class:
#do something
But the iterator does not seem to "get" that the values-array is only n_values long. I.e. it does not seem to use MyCClass.__len__ to halt the iteration. Instead it seems to iterate over the full length of values.
my_class = MyCClass()
my_class.n_values = 3
sample_values = [1, 2, 3]
for i in range(3):
my_class.values[i] = sample_values[i]
i = 0
for value in my_class:
print(i)
i += 1
0
1
2
3
4
I want
i = 0
for value in my_class:
print(i)
i += 1
0
1
2
I know I can code
for i in range(my_class):
# do something with my_class[i]
but that is not what I want.
Does anybody know how to fix this?
With the old-school iterator types, the only way is to raise an IndexError:
def __getitem__(self, key):
if key >= len(self):
raise IndexError
return self.values[key]
For a cleaner solution, consider using the more modern iteration protocol, i.e. returning an iterator instance from an __iter__ method defined on your iterable. That's documented here.

Does Python have a shorthand notation for passing part of a list without copy?

Suppose I convert the below pseudocode to Python. Regarding specifically the parameter indicated as 1st half of A, does Python have a mechanism like A[1..n/2] (another pseudocode shortcut I see from time to time) that does not require copy for passing part of a list as a parameter ?
Count(array A, length n)
if n = 1 return 0
else
x = Count(1st half of A, n/2)
y = Count(2nd half of A, n/2)
return x + y
Without such a mechanism I will pass indices as necessary.
The answer is no. You'll have to pass indices (or slice objects).
You could also write a list subclass that handles slices by returning "views" into the original list rather than copies. I've actually tackled this a few times and found it tricky to get completely right, but it's made much easier by the fact that your application doesn't need negative indexing, slice assignment, or the skip parameter. Here's a quick try:
class ListWithAView(list):
class ListView(object):
def __init__(self, list, start, stop, step):
self.list = list
self.start = start
self.stop = stop
self.step = step
def __iter__(self):
for i in xrange(self.start, self.stop, self.step):
yield self.list[i]
def __len__(self):
return (self.stop - self.start) / self.step
def __getitem__(self, i):
if isinstance(i, slice):
return type(self)(self.list, (i.start or 0) + self.start,
min(self.start + (i.stop or 0), self.stop),
i.step * self.step if i.step else self.step)
if isinstance(i, int) and i < len(self):
return self.list[i+self.start]
raise IndexError("invalid index: %r" % i)
def __setitem__(self, i, v):
if isinstance(i, int):
self.list[i+self.start] = v
else:
raise IndexError("invalid index: %r" % i)
def __repr__(self):
return "<slice [%s:%s:%s] of list id 0x%08x>: %s" % (self.start, self.stop, self.step, id(self.list), self)
def __str__(self):
return str(list(self))
__str__ = __repr__
#property
def view(self):
return self.ListView(self, 0, len(self), 1)
The view property of this list subclass returns a ListView object that acts much like a list, but gets and sets the data in the underlying list rather than storing any items itself. The returned object initially refers to the entire list but can be sliced further if desired. For simplicity, negative steps aren't handled, and you can't do slice assignment, just single items.
Quick demo:
seq = ListViwthAView(range(100))
view = seq.view[10:20][5:7]
view[0] = 1337
print seq[15] # 1337
You can sort of use slice objects here, but unfortunately there isn't a __len__ method, so you have to use (s.start + s.stop)/2 to compute the length. Any time you wise to "materialise" the subarray (which of course creates a copy), you can use A[s]
def count(A, s=None):
if s is None:
s=slice(0, len(A))
if s.start + 1 == s.stop:
return 1
else:
x = count(A, slice(s.start, (s.start + s.stop)/2))
y = count(A, slice((s.start + s.stop)/2, s.stop))
return x + y
print count([1,2,3,4,5])
In your example, the best solution is to just pass the list and the indices as you suggested.
If you didn't need to index into the slices (for example, if just having iterators over the first and second halves of the list was sufficient), you could use the islice function from itertools. E.g.
from itertools import islice
half = (len(sequence) + 1) // 2
first_half = islice(sequence, half):
second_half = islice(sequence, half, len(sequence))

Calling a class for testing - Python

First off this is a homework assignment I'm working on, but I really just need help on an error.
So the project is to implement a vector (a list in all but name for this project), using the Array class. The array class I'm using can be found here.
My error is that every time I try to call my code to test it, specifically the getitem and setitem functions, I wind up with an error stating:
builtins.TypeError: 'type' object does not support item assignment
Below is the class I'm currently building, (so far it seems that only len and contains are working).
class Vector:
"""Vector ADT
Creates a mutable sequence type that is similar to Python's list type."""
def __init__(self):
"""Constructs a new empty vector with initial capacity of two elements"""
self._vector = Array(2)
self._capacity = 2
self._len = 0
def __len__(self):
"""Returns the number of items in the vector"""
return self._len
def __contains__(self, item):
"""Determines if the given item is stored in the vector"""
if item in self._vector:
return True
else:
return False
def __getitem__(self, ndx):
"""Returns the item in the index element of the list, must be within the
valid range"""
assert ndx >= 0 and ndx <= self._capacity - 1, "Array subscript out of range"
return self._vector[ndx]
def __setitem__(self, ndx, item):
"""Sets the elements at position index to contain the given item. The
value of index must be within a valid range"""
assert ndx >= 0 and ndx <= self._capacity - 1, "Array subscript out of range"
self._vector[ndx] = item
def append(self, item):
"""Adds the given item to the list"""
if self._len < self._capacity:
self._vector[self._len] = item
self._len += 1
I'm trying to call the code by either typing:
Vector()[i] = item
or
Vector[i] = item
However, trying:
Vector[i] = item
Gives me the error, and:
Vector()[i] = item
Doesn't really seem to do anything other than not cause an error.
You need to create an instance of your Vector class. Try:
vector = Vector()
vector[0] = 42
The error means that you are trying erroneously to assign to the Vector class itself, which does not make much sense.
Try using the replace method instead of assigning a value.
Vector is a class; Vector() creates an instance of that class.
So
Vector[i] = item
gives an error: Vector.__setitem__ is an instance method (runs against an instance of a class, ie an object), not a classmethod (runs against a class). (You could in theory make it a classmethod, but I have trouble picturing a use case where that would make sense.)
On the other hand,
Vector()[i] = item
# 1. creates a Vector() object
# 2. calls {new_object}.__setitem__(self, i, item)
# 3. doesn't keep any reference to {new_object}, so
# (a) you have no way to interact with it any more and
# (b) it will be garbage-collected shortly.
Try
v = Vector()
v[i] = item
print(item in v) # => True

Python set intersections, any way to return elements from the larger set?

When Python takes the intersection of two sets, it always returns elements from the smaller one, which is reasonable in nearly all cases, but I'm trying to do the opposite.
In the piece of code below, note that the intersection yields an integer, not a float.
[in] >>> x = {1.0,2.0,3.0}
[in] >>> y = {1}
[in] >>> x.intersection(y)
[out] >>> {1}
[in] >>> y.intersection(x)
[out] >>> {1}
If I want to get a float back, I have to use some heavy copying.
[in] >>> x - y
[out] >>> {2.0,3.0}
[in] >>> x - (x - y)
[out] >>> {1.0}
I'm dealing with much larger sets than the example above. My question is whether there's any way to trick Python set.intersection method into returning elements from the larger set, or if there another method that can return the float 1.0 besides what I've done here.
The reason why I'm doing this in the first place is I'm trying to implement a frozen dictionary in pure python by sub-classing frozenset. I'm storing the key-value pairs using a subclass of tuple I call "Item" where hash returns the hash of the key only. Using the code below, I'm able to create a set with a single key-value pair inside of it. Then I extract the attribute "value" and return it.
def __getitem__(self, key):
wrapped = Item((key,),flag=False)
if not frozenset.__contains__(self, wrapped):
raise KeyError(key)
matches = self - (self - {wrapped})
for pair in frozenset.__iter__(matches):
return pair.value
I know that the copying is the reason for the slowness because when I try to return an item whose key is not in the dictionary, I get a KeyError immediately, even for sets with 10 million items.
At the risk of answering something different than what you actually asked for (but maybe helping with the end-goal)... a Frozen dict is actually really easy to implement in python:
from collections import Mapping
class FrozenDict(Mapping):
def __init__(self, *args, **kwargs):
self._hash = None # defer calculating hash until needed.
self._data = dict(*args, **kwargs)
def __getitem__(self, item):
return self._data[item]
def __len__(self):
return len(self._data)
def __iter__(self):
return iter(self._dict)
def __repr__(self):
return '{}({!r})'.format(type(self), self._data)
def __hash__(self):
if self._hash is not None:
return self._hash
# Only hashible if the items are hashible.
self._hash = hash(tuple(self.items()))
x = FrozenDict({'a': 'b'})
print x
x['c'] = 'Bad Bad Bad'
Of course, this isn't truly frozen (in the same sense that a frozenset is frozen). A user could reach in and modify the data on the frozendict -- But then they deserve any code breakages that they cause.
To answer your actual question, the only alternative that I can think of is to define your own intersection function:
>>> s1 = set([1])
>>> s2 = set([1., 2.])
>>> def intersection(s1, s2):
... return set(x for x in s1 if x in s2)
...
>>> intersection(s1, s2)
set([1])
>>> intersection(s2, s1)
set([1.0])
This one always returns sets, but you could easily modify to return frozenset or the type of the input if you make the assumption that the type of the first input has a constructor that accepts only an iterable:
def intersection(s1, s2):
output_type = type(s1)
return output_type(x for x in s1 if x in s2)

Possible to use more than one argument on __getitem__?

I am trying to use
__getitem__(self, x, y):
on my Matrix class, but it seems to me it doesn't work (I still don't know very well to use python).
I'm calling it like this:
print matrix[0,0]
Is it possible at all to use more than one argument? Thanks. Maybe I can use only one argument but pass it as a tuple?
__getitem__ only accepts one argument (other than self), so you get passed a tuple.
You can do this:
class matrix:
def __getitem__(self, pos):
x,y = pos
return "fetching %s, %s" % (x, y)
m = matrix()
print m[1,2]
outputs
fetching 1, 2
See the documentation for object.__getitem__ for more information.
Indeed, when you execute bla[x,y], you're calling type(bla).__getitem__(bla, (x, y)) -- Python automatically forms the tuple for you and passes it on to __getitem__ as the second argument (the first one being its self). There's no good way[1] to express that __getitem__ wants more arguments, but also no need to.
[1] In Python 2.* you can actually give __getitem__ an auto-unpacking signature which will raise ValueError or TypeError when you're indexing with too many or too few indices...:
>>> class X(object):
... def __getitem__(self, (x, y)): return x, y
...
>>> x = X()
>>> x[23, 45]
(23, 45)
Whether that's "a good way" is moot... it's been deprecated in Python 3 so you can infer that Guido didn't consider it good upon long reflection;-). Doing your own unpacking (of a single argument in the signature) is no big deal and lets you provide clearer errors (and uniform ones, rather than ones of different types for the very similar error of indexing such an instance with 1 vs, say, 3 indices;-).
No, __getitem__ just takes one argument (in addition to self). In the case of matrix[0, 0], the argument is the tuple (0, 0).
You can directly call __getitem__ instead of using brackets.
Example:
class Foo():
def __init__(self):
self.a = [5, 7, 9]
def __getitem__(self, i, plus_one=False):
if plus_one:
i += 1
return self.a[I]
foo = Foo()
foo[0] # 5
foo.__getitem__(0) # 5
foo.__getitem__(0, True) # 7
I learned today that you can pass double index to your object that implements getitem, as the following snippet illustrates:
class MyClass:
def __init__(self):
self.data = [[1]]
def __getitem__(self, index):
return self.data[index]
c = MyClass()
print(c[0][0])

Categories