Element-wise median and percentiles of arrays with Numeric Python

Element-wise median and percentiles of arrays with Numeric Python - python

I am using Numeric Python. Unfortunately, NumPy is not an option. If I have multiple arrays, such as:
a=Numeric.array(([1,2,3],[4,5,6],[7,8,9]))
b=Numeric.array(([9,8,7],[6,5,4],[3,2,1]))
c=Numeric.array(([5,9,1],[5,4,7],[5,2,3]))
How do I return an array that represents the element-wise median of arrays a,b and c?...such as,
array(([5,8,3],[5,5,6],[5,2,3]))
And then looking at a more general situation: Given n number of arrays, how do I find the percentiles of each element? For example, return an array that represents the 30th percentile of 10 arrays. Thank you very much for your help!

Combine your stack of 2-D arrays into one 3-D array, d = Numeric.array([a, b, c]) and then sort on the third dimension. Afterwards, the successive 2-D planes will be rank order so you can extract planes for the low, high, quartiles, percentiles, or median.

Well, I'm not versed in Numeric, but I'll just start with a naive solution and see if we can make it any better.
To get the 30th percentile of list foo let x=0.3, sort the list, and pick the the element at foo[int(len(foo)*x)]
For your data, you want to put it in a matrix, transpose it, sort each row, and get the median of each row.
A matrix in Numeric (just like numpy) is an array with two dimensions.
I think that bar = Numeric.array(a,b,c) would make Array you want, and then you could get the nth column with 'bar[:,n]' if Numeric has the same slicing techniques as Numpy.
foo = sorted(bar[:,n])
foo[int(len(foo)*x)]
I hope that helps you.

Putting Raymond Hettinger's description into python:
a=Numeric.array(([1,2,3],[4,5,6],[7,8,9]))
b=Numeric.array(([9,8,7],[6,5,4],[3,2,1]))
c=Numeric.array(([5,9,1],[5,4,7],[5,2,3]))
d = Numeric.array([a, b, c])
d.sort(axis=0)
Since there are n=3 input matrii so the median would be that of the middle one, the one indexed by one,
print d[n//2]
[[5 8 3]
[5 5 6]
[5 2 3]]
And if you had 4 input matrii, you would have to get the mean-elements of d[1] and d[2].

Related

indexing in python ml [duplicate]

How can I index the last axis of a Numpy array if I don't know its rank in advance?
Here is what I want to do: Let a be a Numpy array of unknown rank. I want the slice of the last k elements of the last axis.
If a is 1D, I want
b = a[-k:]
If a is 2D, I want
b = a[:, -k:]
If a is 3D, I want
b = a[:, :, -k:]
and so on.
I want this to work regardless of the rank of a (as long as the rank is at least 1).
The fact that I want the last k elements in the example is irrelevant of course, the point is that I want to specify indices for whatever the last axis is when I don't know the rank of an array in advance.

b = a[..., -k:]
This is mentioned in the docs.

Dealing with three 2-D numpy arrays without using loops

I have three 2-D numpy arrays with shape as (3,7).
I want to take the (0,0) element from each of the array, pass these values in a function and store the returned value at the (0,0) index in a new 2-D array.
Then I want to take (0,1) element from each of the array, pass these values to the same function and store the returned value at the (0,1) index of the same new array.
I want to run this for all the columns and then move on to the next row and continue till the end of the array.
The catch here is that I don't want to use loops, just the numpy methods. Been struggling a lot on this lately. Any ideas would be of great help.
Thanks!
I am running a loop like this for now. It gives me back the result for each element in the 1st row only. Here a, b and c are the three 2-D arrays that I mentioned earlier.
count = 0
def(a, b, c):
for i in range(0,7):
count += -(c[:1,:][i][0]) - (((a[:1,:][0][i]-b[:1,:][i][0])/c[:1,:][i][0]))**2
return count

Since all three arrays are the same shape, and you're operating on each element in the same way, you can easily translate to vetorised NumPy functions like so:
# res is a 2-D array of the same shape as a, b and c
res = -c - ((a - b) / c)**2
It looks like in your example code you're trying to sum each row, so you can do this after performing the operations:
count = np.sum(res, axis=1)

Minimum value in each row and each column of a matrix - Python

Can someone please help me out? I am trying to get the minimum value of each row and of each column of this matrix
matrix =[[12,34,28,16],
[13,32,36,12],
[15,32,32,14],
[11,33,36,10]]
So for example: I would want my program to print out that 12 is the minimum value of row 1 and so on.

Let's repeat the task statement: "get the minimum value of each row and of each column of this matrix".
Okay, so, if the matrix has n rows, you should get n minimum values, one for each row. Sounds interesting, doesn't it? So, the code'll look like this:
result1 = [<something> for row in matrix]
Well, what do you need to do with each row? Right, find the minimum value, which is super easy:
result1 = [min(row) for row in matrix]
As a result, you'll get a list of n values, just as expected.
Wait, by now we've only found the minimums for each row, but not for each column, so let's do this as well!
Given that you're using Python 3.x, you can do some pretty amazing stuff. For example, you can loop over columns easily:
result2 = [min(column) for column in zip(*matrix)] # notice the asterisk!
The asterisk in zip(*matrix) makes each row of matrix a separate argument of zip's, like this:
zip(matrix[0], matrix[1], matrix[2], matrix[3])
This doesn't look very readable and is dependent on the number of rows in matrix (basically, you'll have to hard-code them), and the asterisk lets you write much cleaner code.
zip returns tuples, and the ith tuple contains the ith values of all the rows, so these tuples are actually the columns of the given matrix.
Now, you may find this code a bit ugly, you may want to write the same thing in a more concise way. Sure enough, you can use some functional programming magic:
result1 = list(map(min, matrix))
result2 = list(map(min, zip(*matrix)))
These two approaches are absolutely equivalent.

Use numpy.
>>> import numpy as np
>>> matrix =[[12,34,28,16],
... [13,32,36,12],
... [15,32,32,14],
... [11,33,36,10]]
>>> np.min(matrix, axis=1) # computes minimum in each row
array([12, 12, 14, 10])
>>> np.min(matrix, axis=0) # computes minimum in each column
array([11, 32, 28, 10])

Fastest way to calculate "cosine" metrics with scipy

I am given a matrix of ones and zeros. I need to find 20 rows which have the highest cosine metrics towards 1 specific row in matrix:
If I have 10 rows, and 5th is called specific, I want to choose the highest value between these:
cosine(1row,5row),cosine(2row,5row),...,cosine(8row,5row),cosine(9row,5row)
First, i tried to count metrics.
This didn't work:
A = ratings[:,100]
A = A.reshape(1,A.shape[0])
B = ratings.transpose()
similarity = -cosine(A,B)+1
A.shape = (1L, 71869L)
B.shape = (10000L, 71869L)
Error is: Input vector should be 1-D. I'd like to know, how to implement this aesthetically with no errors, but the most important - which solution will be the fastest?
In my opinion, the fastest way is not realized with help of scipy;
We just have to take all ones in specific row and look at these indices in all other rows. Those rows, which have the highest coincidence will have the highest matrix.
Are there any faster ways?

The fastest way is to use matrix operations: something like np.multipy(A,B)

How to sum over columns with some weight in a csr matrix in python

If I have a large csr_matrix A, I want to sum over its columns, simply
A.sum(axis=0)
does this for me, right? Are the corresponding axis values: 1->rows, 0->columns?
I stuck when I want to sum over columns with some weights which are specified in a list, e.g. [1 2 3 4 5 4 3 ... 4 2 5] with the same length as the number of rows in the csr_matrix A. To be more clear, I want the inner product of each column vector with this weight vector. How can I achieve this with Python?
This is a part of my code:
uniFeature = csr_matrix(uniFeature)
[I,J] = uniFeature.shape
sumfreq = uniFeature.sum(axis=0)
sumratings = []
for j in range(J):
column = uniFeature.getcol(j)
column = column.toarray()
sumtemp = np.dot(ratings,column)
sumratings.append(sumtemp)
sumfreq = sumfreq.toarray()
average = np.true_divide(sumratings,sumfreq)
(Numpy is imported as np) There is a weight vector "ratings", the program is supposed to output the average rating for each column of the matrix "uniFeature".
I experimented to dot column=uniFeature.getcol(j) directly with ratings(which is a list), but there is an error that says format does not agree. It's ok after column.toarray() then dot with ratings. But isn't making each column back to dense form losing the point of having the sparse matrix and would be very slow? I ran the above code and it's too slow to show the results. I guess there should be a way that dots the vector "ratings" with each column of the sparse matrix efficiently.
Thanks in advance!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Element-wise median and percentiles of arrays with Numeric Python - python

Combine your stack of 2-D arrays into one 3-D array, d = Numeric.array([a, b, c]) and then sort on the third dimension. Afterwards, the successive 2-D planes will be rank order so you can extract planes for the low, high, quartiles, percentiles, or median.

Related

indexing in python ml [duplicate]

Dealing with three 2-D numpy arrays without using loops

Minimum value in each row and each column of a matrix - Python

Fastest way to calculate "cosine" metrics with scipy

How to sum over columns with some weight in a csr matrix in python

Categories

Resources