min, max and mean over large NumPy arrays in Python - python

I have a very large NumPy array: a = np.array. From this array I want to get the min, max and average which can be easily done with np.min(a), np.max(a) and np.mean(a).
However, I want also to have the min, max and average of a portion (begin part or end part) of this array. Are there some functions for this without creating a new array/list (because that would really result in a bad performance penalty)?

All arrays generated by basic slicing are always views of the original array.
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
So, yes, just use slices.

If the chunk you're working on is contiguous (i.e. no fancy indexing, in that case the part will get copied), you can use usual slicing syntax to get a view on the part of the array in question, without copying:
>>> import numpy as np
>>> arr = np.array([1,2,3,4,5])
>>> part = arr[1:3] # no copies here
>>> part[:] = 22,33
>>> print arr
[ 1 22 33 4 5]

Related

How to get arrays that ouput result in brackets like [1][2][3] to [1 2 3]

The title kind of says it all. I have this (excerpt):
import numpy as np
import matplotlib.pyplot as plt
number_of_particles=1000
phi = np.arccos(1-2*np.random.uniform(0.0,1.,(number_of_particles,1)))
vc=2*pi
mux=-vc*np.sin(phi)
and I get out
[[-4.91272413]
[-5.30620302]
[-5.22400513]
[-5.5243784 ]
[-5.65050497]...]
which is correct, but I want it to be in the format
[-4.91272413 -5.30620302 -5.22400513 -5.5243784 -5.65050497....]
Feel like there should be a simple solution, but I couldn't find it.
Suppose your array is represented by the variable arr.
You can do,
l = ''
for i in arr:
l = l+i+' '
arr = [l]
Use this command:
new_mux = [i[0] for i in mux]
But I need it in an array, so then I add this
new_mux=np.array(new_mux)
and I get the desired output.
There's a method transpose in numpy's array object
mux.transpose()[0]
(I just noticed that this is a very old question, but since I have typed up this answer, and I believe it is simpler and more efficient than the existing ones, I'll post it...)
Notice that when you do
np.random.uniform(0.0,1.,(number_of_particles, 1))
you are creating a two-dimensional array with number_of_particles rows and one column. If you want a one-dimensional array throughout, you could do
np.random.uniform(0.0,1.,(number_of_particles,))
instead.
If you want to keep things 2d, but reshape mux for some reason, you can... well, reshape it:
mux_1d = mux.reshape(-1)
-1 here means "reshape it to one axis (because there’s just one number) and figure out automatically home many elements there should be along that axis (because the number is -1)."

Python: return the row index of the minimum in a matrix

I wanna print the index of the row containing the minimum element of the matrix
my matrix is matrix = [[22,33,44,55],[22,3,4,12],[34,6,4,5,8,2]]
and the code
matrix = [[22,33,44,55],[22,3,4,12],[34,6,4,5,8,2]]
a = np.array(matrix)
buff_min = matrix.argmin(axis = 0)
print(buff_min) #index of the row containing the minimum element
min = np.array(matrix[buff_min])
print(str(min.min(axis=0))) #print the minium of that row
print(min.argmin(axis = 0)) #index of the minimum
print(matrix[buff_min]) # print all row containing the minimum
after running, my result is
1
3
1
[22, 3, 4, 12]
the first number should be 2, because the minimum is 2 in the third list ([34,6,4,5,8,2]), but it returns 1. It returns 3 as minimum of the matrix.
What's the error?
I am not sure which version of Python you are using, i tested it for Python 2.7 and 3.2 as mentioned your syntax for argmin is not correct, its should be in the format
import numpy as np
np.argmin(array_name,axis)
Next, Numpy knows about arrays of arbitrary objects, it's optimized for homogeneous arrays of numbers with fixed dimensions. If you really need arrays of arrays, better use a nested list. But depending on the intended use of your data, different data structures might be even better, e.g. a masked array if you have some invalid data points.
If you really want flexible Numpy arrays, use something like this:
np.array([[22,33,44,55],[22,3,4,12],[34,6,4,5,8,2]], dtype=object)
However this will create a one-dimensional array that stores references to lists, which means that you will lose most of the benefits of Numpy (vector processing, locality, slicing, etc.).
Also, to mention if you can resize your numpy array thing might work, i haven't tested it, but by the concept that should be an easy solution. But i will prefer use a nested list in this case of input matrix
Does this work?
np.where(a == a.min())[0][0]
Note that all rows of the matrix need to contain the same number of elements.

numpy.ndarray sent as argument doesn't need loop for iteration?

In this code np.linspace() assigns to inputs 200 evenly spaced numbers from -20 to 20.
This function works. What I am not understanding is how could it work. How can inputs be sent as an argument to output_function() without needing a loop to iterate over the numpy.ndarray?
def output_function(x):
return 100 - x ** 2
inputs = np.linspace(-20, 20, 200)
plt.plot(inputs, output_function(inputs), 'b-')
plt.show()
numpy works by defining operations on vectors the way that you really want to work with them mathematically. So, I can do something like:
a = np.arange(10)
b = np.arange(10)
c = a + b
And it works as you might hope -- each element of a is added to the corresponding element of b and the result is stored in a new array c. If you want to know how numpy accomplishes this, it's all done via the magic methods in the python data model. Specifically in my example case, the __add__ method of numpy's ndarray would be overridden to provide the desired behavior.
What you want to use is numpy.vectorize which behaves similarly to the python builtin map.
Here is one way you can use numpy.vectorize:
outputs = (np.vectorize(output_function))(inputs)
You asked why it worked, it works because numpy arrays can perform operations on its array elements en masse, for example:
a = np.array([1,2,3,4]) # gives you a numpy array of 4 elements [1,2,3,4]
b = a - 1 # this operation on a numpy array will subtract 1 from every element resulting in the array [0,1,2,3]
Because of this property of numpy arrays you can perform certain operations on every element of a numpy array very quickly without using a loop (like what you would do if it were a regular python array).

Numpy sum between pairs of indices in 2d array

I have a 2-d numpy array (MxN) and two more 1-d arrays (Mx1) that represent starting and ending indices for each row of the 2-d array that I'd like to sum over. I'm looking for the most efficient way to do this in a large array (preferably without having to use a loop, which is what I'm currently doing). An example of what i'd like to do is the following.
>>> random.seed(1234)
>>> a = random.rand(4,4)
>>> print a
[[ 0.19151945 0.62210877 0.43772774 0.78535858]
[ 0.77997581 0.27259261 0.27646426 0.80187218]
[ 0.95813935 0.87593263 0.35781727 0.50099513]
[ 0.68346294 0.71270203 0.37025075 0.56119619]]
>>> b = array([1,0,2,1])
>>> c = array([3,2,4,4])
>>> d = empty(4)
>>> for i in xrange(4):
d[i] = sum(a[i, b[i]:c[i]])
>>> print d
[ 1.05983651 1.05256841 0.8588124 1.64414897]
My problem is similar to the following question, however, I don't think the solution presented there would be very efficient. Numpy sum of values in subarrays between pairs of indices In that question, they are wanting to find the sum of multiple subsets for the same row, so cumsum() can be used. However, I will only be finding one sum per row, so I don't think this would be the most efficient means of computing the sum.
Edit: I'm sorry, I made a mistake in my code. The line inside the loop previously read d[i] = sum(a[b[i]:c[i]]). I forgot the index for the first dimension. Each set of starting and ending indices corresponds to a new row in the 2-d array.
You could do something like this:
from numpy import array, random, zeros
random.seed(1234)
a = random.rand(4,4)
b = array([1,0,2,1])
c = array([3,2,4,4])
lookup = zeros(len(a) + 1, a.dtype)
lookup[1:] = a.sum(1).cumsum()
d = lookup[c] - lookup[b]
print d
This might help if your b/c arrays are large and the windows you're summing over are large. Because your windows might overlap, for example 2:4 and 1:4 are mostly the same, you're essentially repeating operations. By taking the cumsum as a per-processing step you reduce the number of repeated operations and you may save time. This won't help much if your windows are small and b/c are small, mostly because you'll be summing parts of the a matrix that you don't much care about. Hope that helps.

2D arrays in Python

What's the best way to create 2D arrays in Python?
What I want is want is to store values like this:
X , Y , Z
so that I access data like X[2],Y[2],Z[2] or X[n],Y[n],Z[n] where n is variable.
I don't know in the beginning how big n would be so I would like to append values at the end.
>>> a = []
>>> for i in xrange(3):
... a.append([])
... for j in xrange(3):
... a[i].append(i+j)
...
>>> a
[[0, 1, 2], [1, 2, 3], [2, 3, 4]]
>>>
Depending what you're doing, you may not really have a 2-D array.
80% of the time you have simple list of "row-like objects", which might be proper sequences.
myArray = [ ('pi',3.14159,'r',2), ('e',2.71828,'theta',.5) ]
myArray[0][1] == 3.14159
myArray[1][1] == 2.71828
More often, they're instances of a class or a dictionary or a set or something more interesting that you didn't have in your previous languages.
myArray = [ {'pi':3.1415925,'r':2}, {'e':2.71828,'theta':.5} ]
20% of the time you have a dictionary, keyed by a pair
myArray = { (2009,'aug'):(some,tuple,of,values), (2009,'sep'):(some,other,tuple) }
Rarely, will you actually need a matrix.
You have a large, large number of collection classes in Python. Odds are good that you have something more interesting than a matrix.
In Python one would usually use lists for this purpose. Lists can be nested arbitrarily, thus allowing the creation of a 2D array. Not every sublist needs to be the same size, so that solves your other problem. Have a look at the examples I linked to.
If you want to do some serious work with arrays then you should use the numpy library. This will allow you for example to do vector addition and matrix multiplication, and for large arrays it is much faster than Python lists.
However, numpy requires that the size is predefined. Of course you can also store numpy arrays in a list, like:
import numpy as np
vec_list = [np.zeros((3,)) for _ in range(10)]
vec_list.append(np.array([1,2,3]))
vec_sum = vec_list[0] + vec_list[1] # possible because we use numpy
print vec_list[10][2] # prints 3
But since your numpy arrays are pretty small I guess there is some overhead compared to using a tuple. It all depends on your priorities.
See also this other question, which is pretty similar (apart from the variable size).
I would suggest that you use a dictionary like so:
arr = {}
arr[1] = (1, 2, 4)
arr[18] = (3, 4, 5)
print(arr[1])
>>> (1, 2, 4)
If you're not sure an entry is defined in the dictionary, you'll need a validation mechanism when calling "arr[x]", e.g. try-except.
If you are concerned about memory footprint, the Python standard library contains the array module; these arrays contain elements of the same type.
Please consider the follwing codes:
from numpy import zeros
scores = zeros((len(chain1),len(chain2)), float)
x=list()
def enter(n):
y=list()
for i in range(0,n):
y.append(int(input("Enter ")))
return y
for i in range(0,2):
x.insert(i,enter(2))
print (x)
here i made function to create 1-D array and inserted into another array as a array member. multiple 1-d array inside a an array, as the value of n and i changes u create multi dimensional arrays

Categories