I have a question about how to create a sublist (I hope this is the right term to use) from a given list without copying.
It seems that slicing can create sublists, but does it with copying. Here is an example.
In [1]: a = [1,2,3]
In [2]: id(a)
Out[2]: 4354651128
In [3]: b = a[0:2]
In [4]: b
Out[4]: [1, 2]
In [5]: id(b)
Out[5]: 4354621312
In [6]: id(a[0:2])
Out[6]: 4354620880
See here the id of b and a[0:2] are different, although their values are the same. To double check, change the value in a, the value in b does not change.
In [7]: a[1] = 4
In [8]: a
Out[8]: [1, 4, 3]
In [9]: b
Out[9]: [1, 2]
So to get back to my question, how can I create sublists but without copying? I mean, when value of a[1] is set to 4, b will be [1, 4].
I searched around and did not find much help (maybe I am not using the right keywords). Thank you!
Edits:
Thank you all for your comments and answers! Here is what I have learned.
There is no built-in way in Python to create a view of a list (or to create a sublist without copying).
The easiest way to do this is to use the numpy array.
Although numpy array has limitations on data type compared with list, it does serve my purpose (to implement quicksort with no extra memory)
Here is the same process with numpy array.
In [1]: import numpy as np
In [2]: a = np.arange(1,4)
In [3]: a
Out[3]: array([1, 2, 3])
In [4]: b = a[0:2]
In [5]: b
Out[5]: array([1, 2])
In [6]: id(b)
Out[6]: 4361253952
In [7]: id(a[0:2])
Out[7]: 4361254032
In [8]: a[1] = 4
In [9]: a
Out[9]: array([1, 4, 3])
In [10]: b
Out[10]: array([1, 4])
numpy's array objects support this notion of creating interdependent sub-lists, by having slicing return views rather than copies of the data.
Altering the original numpy array will alter the views created from the array, and changes to any of the views will also be reflected in the original array. Especially for large data sets, views are a great way of cutting data in different ways, while saving on memory.
>>> import numpy as np
>>> array1 = np.array([1, 2, 3, 4])
>>> view1 = array1[1:]
>>> view1
array([2, 3, 4])
>>> view1[1] = 5
>>> view1
array([2, 5, 4])
>>> array1
array([1, 2, 5, 4]) # Notice that the change to view1 has been reflected in array1
For further reference, see the numpy documentation on views as well as this SO post.
There is no way to do this with built in Python data structures. However, I created a class that does what you need. I don't guarantee it to be bug-free, but it should get you started.
from itertools import islice
class SubLister(object):
def __init__(self, base=[], start=0, end=None):
self._base = base
self._start = start
self._end = end
def __len__(self):
if self._end is None:
return len(self._base) - self._start
return self._end - self._start
def __getitem__(self, index):
self._check_end_range(index)
return self._base[index + self._start]
def __setitem__(self, index, value):
self._check_end_range(index, "list assignment index out of range")
self._base[index + self._start] = value
def __delitem__(self, index):
self._check_end_range(index, "list assignment index out of range")
del self._base[index + self._start]
def __iter__(self):
return islice(self._base, self._start, self._end)
def __str__(self):
return str(self._base[self._start:self._end])
def __repr__(self):
return repr(self._base[self._start:self._end])
# ...etc...
def get_sublist(self, start=0, end=None):
return SubLister(base=self._base, start=start, end=end)
def _check_end_range(self, index, msg="list index out of range"):
if self._end is not None and index >= self._end - self._start:
raise IndexError(msg)
Example:
>>> from sublister import SubLister
>>> base = SubLister([1, 2, 3, 4, 5])
>>> a = base.get_sublist(0, 2)
>>> b = base.get_sublist(1)
>>> base
[1, 2, 3, 4, 5]
>>> a
[1, 2]
>>> b
[2, 3, 4, 5]
>>> len(base)
5
>>> len(a)
2
>>> len(b)
4
>>> base[1] = 'ref'
>>> base
[1, 'ref', 3, 4, 5]
>>> a
[1, 'ref']
>>> b
['ref', 3, 4, 5]
you can't if you slice a to get b.
All slice operations return a new list containing the requested
elements. This means that the following slice returns a new (shallow)
copy of the list [1]
[1] https://docs.python.org/2/tutorial/introduction.html
There is no built-in way to do this. You could create your own list-like class that takes a reference to a list and reimplements all of the list accessor methods to operate on it.
Related
Is there any way to get the indices of several elements in a NumPy array at once?
E.g.
import numpy as np
a = np.array([1, 2, 4])
b = np.array([1, 2, 3, 10, 4])
I would like to find the index of each element of a in b, namely: [0,1,4].
I find the solution I am using a bit verbose:
import numpy as np
a = np.array([1, 2, 4])
b = np.array([1, 2, 3, 10, 4])
c = np.zeros_like(a)
for i, aa in np.ndenumerate(a):
c[i] = np.where(b == aa)[0]
print('c: {0}'.format(c))
Output:
c: [0 1 4]
You could use in1d and nonzero (or where for that matter):
>>> np.in1d(b, a).nonzero()[0]
array([0, 1, 4])
This works fine for your example arrays, but in general the array of returned indices does not honour the order of the values in a. This may be a problem depending on what you want to do next.
In that case, a much better answer is the one #Jaime gives here, using searchsorted:
>>> sorter = np.argsort(b)
>>> sorter[np.searchsorted(b, a, sorter=sorter)]
array([0, 1, 4])
This returns the indices for values as they appear in a. For instance:
a = np.array([1, 2, 4])
b = np.array([4, 2, 3, 1])
>>> sorter = np.argsort(b)
>>> sorter[np.searchsorted(b, a, sorter=sorter)]
array([3, 1, 0]) # the other method would return [0, 1, 3]
This is a simple one-liner using the numpy-indexed package (disclaimer: I am its author):
import numpy_indexed as npi
idx = npi.indices(b, a)
The implementation is fully vectorized, and it gives you control over the handling of missing values. Moreover, it works for nd-arrays as well (for instance, finding the indices of rows of a in b).
All of the solutions here recommend using a linear search. You can use np.argsort and np.searchsorted to speed things up dramatically for large arrays:
sorter = b.argsort()
i = sorter[np.searchsorted(b, a, sorter=sorter)]
For an order-agnostic solution, you can use np.flatnonzero with np.isin (v 1.13+).
import numpy as np
a = np.array([1, 2, 4])
b = np.array([1, 2, 3, 10, 4])
res = np.flatnonzero(np.isin(a, b)) # NumPy v1.13+
res = np.flatnonzero(np.in1d(a, b)) # earlier versions
# array([0, 1, 2], dtype=int64)
There are a bunch of approaches for getting the index of multiple items at once mentioned in passing in answers to this related question: Is there a NumPy function to return the first index of something in an array?. The wide variety and creativity of the answers suggests there is no single best practice, so if your code above works and is easy to understand, I'd say keep it.
I personally found this approach to be both performant and easy to read: https://stackoverflow.com/a/23994923/3823857
Adapting it for your example:
import numpy as np
a = np.array([1, 2, 4])
b_list = [1, 2, 3, 10, 4]
b_array = np.array(b_list)
indices = [b_list.index(x) for x in a]
vals_at_indices = b_array[indices]
I personally like adding a little bit of error handling in case a value in a does not exist in b.
import numpy as np
a = np.array([1, 2, 4])
b_list = [1, 2, 3, 10, 4]
b_array = np.array(b_list)
b_set = set(b_list)
indices = [b_list.index(x) if x in b_set else np.nan for x in a]
vals_at_indices = b_array[indices]
For my use case, it's pretty fast, since it relies on parts of Python that are fast (list comprehensions, .index(), sets, numpy indexing). Would still love to see something that's a NumPy equivalent to VLOOKUP, or even a Pandas merge. But this seems to work for now.
I have array2D = [[1,2,3],[4,5,6]]. What I want is a function which takes an index and returns the elements in 1D array.
Example: fn(0) -> returns [1,4]
fn{1) -> returns [2,5]
I need a fast way to do this.
you can use lambda and list comprehension:
array2D = [[1,2,3],[4,5,6]]
fn = lambda x: [item[x] for item in array2D]
print(fn(0)) # [1, 4]
print(fn(1)) # [2, 5]
print(fn(2)) # [3, 6]
as suggested in the comments, you may apply the same concept with a function definition:
def fn(x): return [item[x] for item in array2D]
print(fn(0)) # [1, 4]
print(fn(1)) # [2, 5]
print(fn(2)) # [3, 6]
Lambda functions are pretty useful, and let you define operation in a really clear way.
In our example, our lambda accept a variable x, which represent the index we want of each item in array2D
Then you have list comprehension, similarly to lambda function, they are a really powerful tool and a must in python
In this situation you should prefear the function definiton, as suggested by PEP-8.
The following list comprehension will work:
def fn(i, lst):
return [sublst[i] for sublst in lst]
>>> array2D = [[1, 2, 3], [4, 5, 6]]
>>> fn(0, array2D)
[1, 4]
>>> fn(1, array2D)
[2, 5]
You can use operator.itemgetter:
array2D = [[1,2,3],[4,5,6]]
from operator import itemgetter
def fn(x, k):
return list(map(itemgetter(k), x))
fn(array2D, 0) # [1, 4]
If you want to define new functions for retrieving a specific index, you can do so via functools.partial:
from functools import partial
def fn(x, k):
return list(map(itemgetter(k), x))
get_zero_index = partial(fn, k=0)
get_zero_index(array2D) # [1, 4]
Here are my two cents using slicing (I have to use additional np.array() for this because your original data was a list):
array2D = np.array([[1,2,3],[4,5,6]])
def fn(n): return (list(array2D[:,n]))
print (fn(0), fn(1), fn(2))
How about a generator?
We could use zip to pack them, then create a empty list to store the generated data:
class myZip(object):
__slots__ = ('zipData', 'interList')
def __init__(self, *args):
self.zipData = zip(*args)
self.interList = []
def __call__(self, index):
try:
return self.interList[index]
except IndexError:
try:
if index == 0:
self.interList.append(next(self.zipData))
return self.interList[index]
for i in range(index-(len(self.interList)-1)):
self.interList.append(next(self.zipData))
return self.interList[index]
except StopIteration:
raise IndexError("index out of range")
def __iter__(self):
for i in self.interList:
yield i
for i in self.zipData:
yield i
array2D = [[1,2,3],[4,5,6]]
a = myZip(*array2D)
print(a(2))
print(a(1))
print(a(0))
---
(3, 6)
(2, 5)
(1, 4)
The benefits of this is we do not need to produce all data at once.
I have an original list whose contents are determined in another function, and I wish to add the numbers 0 and 5 to the list to make an extended list, without corrupting the original. In this application, I know that 0 and 5 will never be part of the original list, so I am not concerned with duplication. And I am not concerned with the order or the elements either.
For reasons discussed in another question, the following does not work because it corrupts the original list:
>>> orig = [1,6]
>>> extended = orig
>>> extended.extend([0,5])
>>> extended
[1, 6, 0, 5]
>>> orig
[1, 6, 0, 5]
One of the solutions proposed is to use the built-in list() function. This produces the desired result:
>>> orig = [1,6]
>>> extended = list(orig)
>>> extended.extend([0,5])
>>> extended
[1, 6, 0, 5]
>>> orig
[1, 6]
Then I attempted to combine the 2nd and 3rd lines of 2. This produces a 'None' result, and only if you print it.
>>> orig = [1,6]
>>> extended = list(orig).extend([0,5])
>>> extended
>>> print extended
None
What I eventually coded, which is neater than any of the previous attempts, is this, using concatenation.
>>> orig = [1,6]
>>> extended = orig + [0,5]
>>> extended
[1, 6, 0, 5]
>>> orig
[1, 6]
But my question is, why won't example 3 work? It looks reasonable (to me), and it doesn't return an error. It just produces 'None'.
I am using Python 2.7.8.
extend is an inplace operation, like list.sort, list.append it affects the original list. All those methods because they don't return any value return None so you are simply seeing the return value of extend when you extended = list(orig).extend([0,5]).
In [6]: l = [1,2,3]
In [7]: e = l.extend([4,5])
In [8]: print e
None
In [9]: l = [1,2,3]
In [10]: a = l.append(6)
In [11]: print a
None
I can perform
a = [1,2,3]
b = [4,5,6]
a.extend(b)
# a is now [1,2,3,4,5,6]
Is there way to perform an action for extending list and adding new items to the beginning of the list?
Like this
a = [1,2,3]
b = [4,5,6]
a.someaction(b)
# a is now [4,5,6,1,2,3]
I use version 2.7.5, if it is important.
You can assign to a slice:
a[:0] = b
Demo:
>>> a = [1,2,3]
>>> b = [4,5,6]
>>> a[:0] = b
>>> a
[4, 5, 6, 1, 2, 3]
Essentially, list.extend() is an assignment to the list[len(list):] slice.
You can 'insert' another list at any position, just address the empty slice at that location:
>>> a = [1,2,3]
>>> b = [4,5,6]
>>> a[1:1] = b
>>> a
[1, 4, 5, 6, 2, 3]
This is what you need ;-)
a = b + a
You could use collections.deque:
import collections
a = collections.deque([1, 2, 3])
b = [4, 5, 6]
a.extendleft(b[::-1])
If you need fast operations and you need to be able to access arbitrary elements, try a treap or red-black tree.
>>> import treap as treap_mod
>>> treap = treap_mod.treap()
>>> for i in range(100000):
... treap[i] = i
...
>>> treap[treap.find_min() - 1] = -1
>>> treap[100]
100
Most operations on treaps and red-black trees can be done in O(log(n)). Treaps are purportedly faster on average, but red-black trees give a lower variance in operation times.
I am trying to use array slicing to reverse part of a NumPy array. If my array is, for example,
a = np.array([1,2,3,4,5,6])
then I can get a slice b
b = a[::-1]
Which is a view on the original array. What I would like is a view that is partially reversed, for example
1,4,3,2,5,6
I have encountered performance problems with NumPy if you don't play along exactly with how it is designed, so I would like to avoid "fancy" indexing if it is possible.
If you don't like the off by one indices
>>> a = np.array([1,2,3,4,5,6])
>>> a[1:4] = a[1:4][::-1]
>>> a
array([1, 4, 3, 2, 5, 6])
>>> a = np.array([1,2,3,4,5,6])
>>> a[1:4] = a[3:0:-1]
>>> a
array([1, 4, 3, 2, 5, 6])
You can use the permutation matrices (that's the numpiest way to partially reverse an array).
a = np.array([1,2,3,4,5,6])
new_order_for_index = [1,4,3,2,5,6] # Careful: index from 1 to n !
# Permutation matrix
m = np.zeros( (len(a),len(a)) )
for index , new_index in enumerate(new_order_for_index ):
m[index ,new_index -1] = 1
print np.dot(m,a)
# np.array([1,4,3,2,5,6])