Python: list vs. np.array: switching to use certain attributes - python

I know, there are plenty of threads about list vs. array but I've got a slightly different problem.
Using Python, I find myself converting between np.array and list quite often as I want to use attributes like
remove, append, extend, sort, index, … for lists
and on the other hand modify the content by things like
*, /, +, -, np.exp(), np.sqrt(), … which only works for arrays.
It must be pretty messy to switch between data types with list(array) and np.asarray(list), I assume. But I just can't think of a proper solution. I don't really want to write a loop every time I want to find and remove something from my array.
Any suggestions?

A numpy array:
>>> A=np.array([1,4,9,2,7])
delete:
>>> A=np.delete(A, [2,3])
>>> A
array([1, 4, 7])
append (beware: it's O(n), unlike list.append which is O(1)):
>>> A=np.append(A, [5,0])
>>> A
array([1, 4, 7, 5, 0])
sort:
>>> np.sort(A)
array([0, 1, 4, 5, 7])
index:
>>> A
array([1, 4, 7, 5, 0])
>>> np.where(A==7)
(array([2]),)

Related

What is the 'a' in numpy.arange?

What does the 'a' in numpy's numpy.arange method stand for, and how does it differ from a simple range produced by Python's builtin range method (definitionally, not in terms of performance and whatnot)?
I tried looking online for an answer to this, but all I find is tutorials for how to use numpy.arange by GeeksForGeeks and co.
You can inspect the return types and reason about what it could mean that way:
print(type(range(0,5)))
import numpy as np
print(type(np.arange(0,5)))
Which prints:
<class 'range'>
<class 'numpy.ndarray'>
Here's a related question: Why was the name "arange" chosen for the numpy function?
Some people do from numpy import * which would shadow range which causes problems.
Naming the function arrayrange was not chosen because it's too long to type.
From the previous SO we learn that the 'a' stands, in some sense, for 'array'. arange is a function that returns a numpy array that is similar, at least in simple cases, to the list produced by list(range(...)). From the official arange docs:
For integer arguments the function is roughly equivalent to the Python built-in range, but returns an ndarray rather than a range instance.
In [104]: list(range(-3,10,2))
Out[104]: [-3, -1, 1, 3, 5, 7, 9]
In [105]: np.arange(-3,10,2)
Out[105]: array([-3, -1, 1, 3, 5, 7, 9])
In py3, range by itself is "unevaluated", it's generator like. It's the equivalent of the py2 xrange.
The best "definition" is the official documentation page:
https://numpy.org/doc/stable/reference/generated/numpy.arange.html
But maybe you are wondering when to use one or the other. The simple answer is - if you are doing python level iteration, range is usually better. If you need an array, use arange (or np.linspace as suggested by the docs).
In [106]: [x**2 for x in range(5)]
Out[106]: [0, 1, 4, 9, 16]
In [107]: np.arange(5)**2
Out[107]: array([ 0, 1, 4, 9, 16])
I often use arange to create a example array, as in:
In [108]: np.arange(12).reshape(3,4)
Out[108]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
While it is possible to make an array from a range, e.g. np.array(range(5)), that is relatively slow. np.fromiter(range(5),int) is faster, but still not as good as the direct np.arange.
The 'a' stands for 'array' in numpy.arange. Numpy.arange is a function that produces an array of sequential numbers within a given interval. It differs from Python's builtin range() function in that it can handle floating-point numbers as well as arbitrary step sizes. Also, the output of numpy.arange is an array of elements instead of a range object.

Rationale for numpy.split returning a list and not an array

I was surprised that numpy.split yields a list and not an array. I would have thought it would be better to return an array, since numpy has put a lot of work into making arrays more useful than lists. Can anyone justify numpy returning a list instead of an array? Why would that be a better programming decision for the numpy developers to have made?
A comment pointed out that if the slit is uneven, the result can't be a array, at least not one that has the same dtype. At best it would be an object dtype.
But lets consider the case of equal length subarrays:
In [124]: x = np.arange(10)
In [125]: np.split(x,2)
Out[125]: [array([0, 1, 2, 3, 4]), array([5, 6, 7, 8, 9])]
In [126]: np.array(_) # make an array from that
Out[126]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
But we can get the same array without split - just reshape:
In [127]: x.reshape(2,-1)
Out[127]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
Now look at the code for split. It just passes the task to array_split. Ignoring the details about alternative axes, it just does
sub_arys = []
for i in range(Nsections):
# st and end from `div_points
sub_arys.append(sary[st:end])
return sub_arys
In other words, it just steps through array and returns successive slices. Those (often) are views of the original.
So split is not that sophisticate a function. You could generate such a list of subarrays yourself without a lot of numpy expertise.
Another point. Documentation notes that split can be reversed with an appropriate stack. concatenate (and family) takes a list of arrays. If give an array of arrays, or a higher dim array, it effectively iterates on the first dimension, e.g. concatenate(arr) => concatenate(list(arr)).
Actually you are right it returns a list
import numpy as np
a=np.random.randint(1,30,(2,2))
b=np.hsplit(a,2)
type(b)
it will return type(b) as list so, there is nothing wrong in the documentation, i also first thought that the documentation is wrong it doesn't return a array, but when i checked
type(b[0])
type(b[1])
it returned type as ndarray.
it means it returns a list of ndarrary's.

Convert list to np.arrays efficient

I am loading a dataset in my python code with contains two matrices. The name of those matrices are train_dataset_face and train_dataset_audio and the way that I am reading them is as list of np.arrays. Finally I convert them as np.arrrays of np.arrays. Initially my matrices during the debug look like that:
and
Then I convert them into np.arrays using the following code:
train_dataset_face = np.array(train_dataset_face)
train_dataset_audio = np.array(train_dataset_audio)
And in the end my matrices look like:
For some weird reason in the case of train_dataset_face I got this array indication before each vector of my array while in the case of train_dataset_audio i dont have it. Is it possible to remove it? This "array" indication cause me problems when I am trying to apply several algorithms to the train_dataset_face. Any idea what happened here?
You can only create a single array, if all arrays in your list has the same shape, which is true for train_dataset_audio but not for train_dataset_face.
>>> a = [numpy.array([1,2,3,4]), numpy.array([1,2,3,4])]
>>> numpy.array(a)
array([[1, 2, 3, 4],
[1, 2, 3, 4]])
>>> b = [numpy.array([1,2,3]), numpy.array([1,2,3,4])]
>>> numpy.array(b)
array([array([1, 2, 3]), array([1, 2, 3, 4])], dtype=object)

Best practice to reduce memory usage when splitting array

I have an array that I want to split up in two halves. Because of symmetry I am only interested in keeping the left half of the array.
I can split the array in half by saying:
[a,b] = numpy.split(c,2)
where c is also an array.
Is there a way to only return the 'a' array, or alternatively removing the 'b' array from memory immediately after splitting the array?
You can copy the first half with
a = x[len(x)//2:].copy()
this would need to allocate the copy and move the content (thus temporarily needing 1.5 times the memory)
Otherwise you can just say
a = x[len(x)//2:]
to get a reference to the first half, but the other part will not be removed from memory
I'm not sure, but I think this might be best because it relies on list's implementation (docs) and I'm confident it was done right:
>>> r = range(10)
>>> r
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> del r[5:]
>>> r
[0, 1, 2, 3, 4]
See also del statement for lists.
Simply you can use delete function for this aim! this is an example :
array=np.array([1,2,3,4])
x=len(array)/2
first_h=np.delete(array,array[x-1:]) #second half
Demo:
>>>print first_h
>>>[1,2]

Why are slices in Python 3 still copies and not views?

As I only now noticed after commenting on this answer, slices in Python 3 return shallow copies of whatever they're slicing rather than views. Why is this still the case? Even leaving aside numpy's usage of views rather than copies for slicing, the fact that dict.keys, dict.values, and dict.items all return views in Python 3, and that there are many other aspects of Python 3 geared towards greater use of iterators, makes it seem that there would have been a movement towards slices becoming similar. itertools does have an islice function that makes iterative slices, but that's more limited than normal slicing and does not provide view functionality along the lines of dict.keys or dict.values.
As well, the fact that you can use assignment to slices to modify the original list, but slices are themselves copies and not views, is a contradictory aspect of the language and seems like it violates several of the principles illustrated in the Zen of Python.
That is, the fact you can do
>>> a = [1, 2, 3, 4, 5]
>>> a[::2] = [0, 0, 0]
>>> a
[0, 2, 0, 4, 0]
But not
>>> a = [1, 2, 3, 4, 5]
>>> a[::2][0] = 0
>>> a
[0, 2, 3, 4, 5]
or something like
>>> a = [1, 2, 3, 4, 5]
>>> b = a[::2]
>>> b
view(a[::2] -> [1, 3, 5]) # numpy doesn't explicitly state that its slices are views, but it would probably be a good idea to do it in some way for regular Python
>>> b[0] = 0
>>> b
view(a[::2] -> [0, 3, 5])
>>> a
[0, 2, 3, 4, 5]
Seems somewhat arbitrary/undesirable.
I'm aware of http://www.python.org/dev/peps/pep-3099/ and the part where it says "Slices and extended slices won't go away (even if the __getslice__ and __setslice__ APIs may be replaced) nor will they return views for the standard object types.", but the linked discussion provides no mention of why the decision about slicing with views was made; in fact, the majority of the comments on that specific suggestion out of the suggestions listed in the original post seemed to be positive.
What prevented something like this from being implemented in Python 3.0, which was specifically designed to not be strictly backwards-compatible with Python 2.x and thus would have been the best time to implement such a change in design, and is there anything that may prevent it in future versions of Python?
As well, the fact that you can use assignment to slices to modify the original list, but slices are themselves copies and not views.
Hmm.. that's not quite right; although I can see how you might think that. In other languages, a slice assignment, something like:
a[b:c] = d
is equivalent to
tmp = a.operator[](slice(b, c)) # which returns some sort of reference
tmp.operator=(d) # which has a special meaning for the reference type.
But in python, the first statement is actually converted to this:
a.__setitem__(slice(b, c), d)
Which is to say that an item assignment is actually specially recognized in python to have a special meaning, separate from item lookup and assignment; they may be unrelated. This is consistent with python as a whole, because python doesn't have concepts like the "lvalues" found in C/C++; There's no way to overload the assignment operator itself; only specific cases when the left side of the assignment is not a plain identifier.
Suppose lists did have views; And you tried to use it:
myView = myList[1:10]
yourList = [1, 2, 3, 4]
myView = yourList
In languages besides python, there might be a way to shove yourList into myList, but in python, since the name myView appears as a bare identifier, it can only mean a variable assignemnt; the view is lost.
Well it seems I found a lot of the reasoning behind the views decision, going by the thread starting with http://mail.python.org/pipermail/python-3000/2006-August/003224.html (it's primarily about slicing strings, but at least one e-mail in the thread mentions mutable objects like lists), and also some things from:
http://mail.python.org/pipermail/python-3000/2007-February/005739.html
http://mail.python.org/pipermail/python-dev/2008-May/079692.html and following e-mails in the thread
Looks like the advantages of switching to this style for base Python would be vastly outweighed by the induced complexity and various undesirable edge cases. Oh well.
...And as I then started wondering about the possibility of just replacing the current way slice objects are worked with with an iterable form a la itertools.islice, just as zip, map, etc. all return iterables instead of lists in Python 3, I started realizing all the unexpected behavior and possible problems that could come out of that. Looks like this might be a dead end for now.
On the plus side, numpy's arrays are fairly flexible, so in situations where this sort of thing might be necessary, it wouldn't be too hard to use one-dimensional ndarrays instead of lists. However, it seems ndarrays don't support using slicing to insert additional items within arrays, as happens with Python lists:
>>> a = [0, 0]
>>> a[:1] = [2, 3]
>>> a
[2, 3, 0]
I think the numpy equivalent would instead be something like this:
>>> a = np.array([0, 0]) # or a = np.zeros([2]), but that's not important here
>>> a = np.hstack(([2, 3], a[1:]))
>>> a
array([2, 3, 0])
A slightly more complicated case:
>>> a = [1, 2, 3, 4]
>>> a[1:3] = [0, 0, 0]
>>> a
[1, 0, 0, 0, 4]
versus
>>> a = np.array([1, 2, 3, 4])
>>> a = np.hstack((a[:1], [0, 0, 0], a[3:]))
>>> a
array([1, 0, 0, 0, 4])
And, of course, the above numpy examples don't store the result in the original array as happens with the regular Python list expansion.

Categories