Best practice to reduce memory usage when splitting array

Best practice to reduce memory usage when splitting array - python

I have an array that I want to split up in two halves. Because of symmetry I am only interested in keeping the left half of the array.
I can split the array in half by saying:
[a,b] = numpy.split(c,2)
where c is also an array.
Is there a way to only return the 'a' array, or alternatively removing the 'b' array from memory immediately after splitting the array?

You can copy the first half with
a = x[len(x)//2:].copy()
this would need to allocate the copy and move the content (thus temporarily needing 1.5 times the memory)
Otherwise you can just say
a = x[len(x)//2:]
to get a reference to the first half, but the other part will not be removed from memory

I'm not sure, but I think this might be best because it relies on list's implementation (docs) and I'm confident it was done right:
>>> r = range(10)
>>> r
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> del r[5:]
>>> r
[0, 1, 2, 3, 4]
See also del statement for lists.

Simply you can use delete function for this aim! this is an example :
array=np.array([1,2,3,4])
x=len(array)/2
first_h=np.delete(array,array[x-1:]) #second half
Demo:
>>>print first_h
>>>[1,2]

Related

What is in python, fastest way of reversing a part of a list in place?

Supopse I have the a huge list (say ten million elements) and I want to reverse all of the elements except last. The easiest aproach is:
a[0:-1] = a[-1::-1]
But the problem is that I think a temporary list is created. If it is so, how can avoid it?
[Edit]
For a more general case consider reversing a middle part of the list:

The only reason you may want to avoid copying the list is if you think it is too big for doing possible repeated copies.
I think there is no other way of doing it (without copies) than going manual:
a = [1, 2, 3, 4, 5, 6]
def revert_slice(first, last, a_list):
while first < last:
a_list[first], a_list[last] = a_list[last], a_list[first]
first += 1
last -= 1
revert_slice(0, 2, a)
print(a)
Outputs:
[3, 2, 1, 4, 5, 6]
In the call, only a temporary copy of the reference is made, not a list copy.

To achieve list reversal, except last item , i would just do -
>>> a=[1,2,3,4,5]
>>> temp=a.pop()
>>> a.reverse()
>>> a.append(temp)
>>> print(a)
[4, 3, 2, 1, 5]

Slicing with :-1 and None - What does each of the statement mean?

I came across a code snippet where I could not understand two of the statements, though I could see the end result of each.
I will create a variable before giving the statements:
train = np.random.random((10,100))
One of them read as :
train = train[:-1, 1:-1]
What does this slicing mean? How to read this? I know that that -1 in slicing denotes from the back. But I cannot understand this.
Another statement read as follows:
la = [0.2**(7-j) for j in range(1,t+1)]
np.array(la)[:,None]
What does slicing with None as in [:,None] mean?
For the above two statements, along with how each statement is read, it will be helpful to have an alternative method along, so that I understand it better.

One of Python's strengths is its uniform application of straightforward principles. Numpy indexing, like all indexing in Python, passes a single argument to the indexed object's (i.e., the array's) __getitem__ method, and numpy arrays were one of the primary justifications for the slicing mechanism (or at least one of its very early uses).
When I'm trying to understand new behaviours I like to start with a concrete and comprehensible example, so rather than 10x100 random values I'll start with a one-dimensional 4-element vector and work up to 3x4, which should be big enough to understand what's going on.
simple = np.array([1, 2, 3, 4])
train = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
The interpreter shows these as
array([1, 2, 3, 4])
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
The expression simple[x] is equivalent to (which is to say the interpreter ends up executing) simple.__getitem__(x) under the hood - note this call takes a single argument.
The numpy array's __getitem__ method implements indexing with an integer very simply: it selects a single element from the first dimension. So simple[1] is 2, and train[1] is array([5, 6, 7, 8]).
When __getitem__ receives a tuple as an argument (which is how Python's syntax interprets expressions like array[x, y, z]) it applies each element of the tuple as an index to successive dimensions of the indexed object. So result = train[1, 2] is equivalent (conceptually - the code is more complex in implementation) to
temp = train[1] # i.e. train.__getitem__(1)
result = temp[2] # i.e. temp.__getitem__(2)
and sure enough we find that result comes out at 7. You could think of array[x, y, z] as equivalent to array[x][y][z].
Now we can add slicing to the mix. Expressions containing a colon can be regarded as slice literals (I haven't seen a better name for them), and the interpreter creates slice objects for them. As the documentation notes, a slice object is mostly a container for three values, start, stop and slice, and it's up to each object's __getitem__ method how it interprets them. You might find this question helpful to understand slicing further.
With what you now know, you should be able to understand the answer to your first question.
result = train[:-1, 1:-1]
will call train.__getitem__ with a two-element tuple of slices. This is equivalent to
temp = train[:-1]
result = temp[..., 1:-1]
The first statement can be read as "set temp to all but the last row of train", and the second as "set result to all but the first and last columns of temp". train[:-1] is
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
and applying the [1:-1] subscripting to the second dimension of that array gives
array([[2, 3],
[6, 7]])
The ellipsis on the first dimension of the temp subscript says "pass everything," so the subscript expression[...]can be considered equivalent to[:]. As far as theNonevalues are concerned, a slice has a maximum of three data points: _start_, _stop_ and _step_. ANonevalue for any of these gives the default value, which is0for _start_, the length of the indexed object for _stop_, and1for _step. Sox[None:None:None]is equivalent tox[0:len(x):1]which is equivalent tox[::]`.
With this knowledge under your belt you should stand a bit more chance of understanding what's going on.

Fanccy Indexing vs View in Numpy part II

Fancy Indexing vs Views in Numpy
In an answer to this equation: is is explained that different idioms will produce different results.
Using the idiom where fancy indexing is to chose the values and said values are set to a new value in the same line means that the values in the original object will be changed in place.
However the final example below:
https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html
"A final exercise"
The example appears to use the same idiom:
a[x, :][:, y] = 100
but it still produces a different result depending on whether x is a slice or a fancy index (see below):
a = np.arange(12).reshape(3,4)
ifancy = [0,2]
islice = slice(0,3,2)
a[islice, :][:, ifancy] = 100
a
#array([[100, 1, 100, 3],
# [ 4, 5, 6, 7],
# [100, 9, 100, 11]])
a = np.arange(12).reshape(3,4)
ifancy = [0,2]
islice = slice(0,3,2)
a[ifancy, :][:, islice] = 100 # note that ifancy and islice are interchanged here
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
My intuition is that if the first set of fancy indexes is a slice it treats the object like a view and therefore the values in the orignal object are changed.
Whereas in the second case the first set of fancy indexes is itself a fancy index so it treats the object as a fancy index creating a copy of the original object. This then means that the original object is not changed when the values of the copy object are changed.
Is my intuition correct?
The example hints that one should think of the sqeuence of getitem and setitem can someone explain it to my properly in theis way?

Python evaluates each set of [] separately. a[x, :][:, y] = 100 is 2 operations.
temp = a[x,:] # getitem step
temp[:,y] = 100 # setitem step
Whether the 2nd line ends up modifying a depends on whether temp is a view or copy.
Remember, numpy is an addon to Python. It does not modify basic Python syntax or interpretation.

What's the difference between these two codes?

I recently started coding in Python 2.7. I'm a molecular biologist.
I'm writing a script that involves creating lists like this one:
mylist = [[0, 4, 6, 1], 102]
These lists are incremented by adding an item to mylist[0] and summing a value to mylist[1].
To do this, I use the code:
def addres(oldpep, res):
return [oldpep[0] + res[0], oldpep[1] + res[1]]
Which works well. Since mylist[0] can become a bit long, and I have millions of these lists to take care of, I thought that using append or extend might make my code faster, so I tried:
def addres(pep, res):
pep[0].extend(res[0])
pep[1] += res[1]
return pep
Which in my mind should give the same result. It does give the same result when I try it on an arbitrary list. But when I feed it to the million of lists, it gives me a very different result. So... what's the difference between the two? All the rest of the script is exactly the same.
Thank you!
Roberto

The difference is that the second version of addres modifies the list that you passed in as pep, where the first version returns a new one.
>>> mylist = [[0, 4, 6, 1], 102]
>>> list2 = [[3, 1, 2], 205]
>>> addres(mylist, list2)
[[0, 4, 6, 1, 3, 1, 2], 307]
>>> mylist
[[0, 4, 6, 1, 3, 1, 2], 307]
If you need to not modify the original lists, I don't think you're going to really going to get a faster Python implementation of addres than the first one you wrote. You might be able to deal with the modification, though, or come up with a somewhat different approach to speed up your code if that's the problem you're facing.

List are objects in python which are passed by reference.
a=list()
This doesn't mean that a is the list but a is pointing towards a list just created.
In first example, you are using list element and creating a new list, an another object while in the second one you are modifying the list content itself.

Inserting and removing into/from sorted list in Python

I have a sorted list of integers, L, and I have a value X that I wish to insert into the list such that L's order is maintained. Similarly, I wish to quickly find and remove the first instance of X.
Questions:
How do I use the bisect module to do the first part, if possible?
Is L.remove(X) going to be the most efficient way to do the second part? Does Python detect that the list has been sorted and automatically use a logarithmic removal process?
Example code attempts:
i = bisect_left(L, y)
L.pop(i) #works
del L[bisect_left(L, i)] #doesn't work if I use this instead of pop

You use the bisect.insort() function:
bisect.insort(L, X)
L.remove(X) will scan the whole list until it finds X. Use del L[bisect.bisect_left(L, X)] instead (provided that X is indeed in L).
Note that removing from the middle of a list is still going to incur a cost as the elements from that position onwards all have to be shifted left one step. A binary tree might be a better solution if that is going to be a performance bottleneck.

You could use Raymond Hettinger's IndexableSkiplist. It performs 3 operations in O(ln n) time:
insert value
remove value
lookup value by rank
import skiplist
import random
random.seed(2013)
N = 10
skip = skiplist.IndexableSkiplist(N)
data = range(N)
random.shuffle(data)
for num in data:
skip.insert(num)
print(list(skip))
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
for num in data[:N//2]:
skip.remove(num)
print(list(skip))
# [0, 3, 4, 6, 9]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Best practice to reduce memory usage when splitting array - python

You can copy the first half with a = x[len(x)//2:].copy() this would need to allocate the copy and move the content (thus temporarily needing 1.5 times the memory) Otherwise you can just say a = x[len(x)//2:] to get a reference to the first half, but the other part will not be removed from memory

I'm not sure, but I think this might be best because it relies on list's implementation (docs) and I'm confident it was done right: >>> r = range(10) >>> r [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> del r[5:] >>> r [0, 1, 2, 3, 4] See also del statement for lists.

Simply you can use delete function for this aim! this is an example : array=np.array([1,2,3,4]) x=len(array)/2 first_h=np.delete(array,array[x-1:]) #second half Demo: >>>print first_h >>>[1,2]

Related

What is in python, fastest way of reversing a part of a list in place?

Slicing with :-1 and None - What does each of the statement mean?

Fanccy Indexing vs View in Numpy part II

What's the difference between these two codes?

Inserting and removing into/from sorted list in Python

Categories

Resources