Rescaling 2D numpy array as Dense representation - python

I have a numpy array . I want rescale the elements in the array, so that the smallest number in array is represented by 1, and largest number in array is represented by the number of unique elements in array.
For example
A=[ [2,8,8],[3,4,5] ]
would become
[ [1,5,5],[2,3,4] ]

Use np.unique with its return_inverse param -
np.unique(A, return_inverse=1)[1].reshape(A.shape)+1
Sample run -
In [10]: A
Out[10]:
array([[2, 8, 8],
[3, 4, 5]])
In [11]: np.unique(A, return_inverse=1)[1].reshape(A.shape)+1
Out[11]:
array([[1, 5, 5],
[2, 3, 4]])

If you're not opposed to using scipy, you could use rankdata, with method='dense' (judging by the tags on your question):
from scipy.stats import rankdata
rankdata(A, 'dense').reshape(A.shape)
array([[1, 5, 5],
[2, 3, 4]])
Note that in your case, method='min' would achieve the same results, see linked documentation for more details

Related

Numpy linalg.norm with ufunc.reduceat functionality

Solution: #QuangHoang's first comment namely np.linalg.norm(arr,axis=1).
I would like to apply Numpy's linalg.norm function column wise to sub-arrays of a 3D array by using ranges (or indices?), similar in functionality to what ufunc.reduceat does.
Given the following array:
import numpy as np
In []: arr = np.array([[0,1,2,3], [2,2,3,4], [3,2,5,6],
[1,7,1,9], [1,4,8,6], [2,3,5,8],
[2,5,7,3], [2,3,4,6], [2,5,3,2]]).reshape(3,3,4)
Out []: array([[[0, 1, 2, 3],
[2, 2, 3, 4],
[3, 2, 5, 6]],
[[1, 7, 1, 9],
[1, 4, 8, 6],
[2, 3, 5, 8]],
[[2, 5, 7, 3],
[2, 3, 4, 6],
[2, 5, 3, 2]]])
I would like to apply linalg.norm column wise to the three sub-arrays separately i.e. for the first column it would be linalg.norm([0, 2, 3]), linalg.norm([1, 1, 2]) and linalg.norm([2, 2, 2]), for the second linalg.norm([1, 2, 2]), linalg.norm([7, 4, 3]) and linalg.norm([5, 3, 5]) etc. resulting in a 2D vector with shape (3,4) containing the results of the linalg.norm calls.
Doing this with a 2D array is straightforward by specifying the axis:
import numpy.linalg as npla
In []: npla.norm(np.array([[0,1,2,3], [2,2,3,4], [3,2,5,6]]), axis=0)
Out []: array([3.60555128, 3. , 6.164414 , 7.81024968])
But I don't understand how to do that for each sub-array separately. I believe that reduceat with a ufunc like add allows to set indices and ranges. Would something similar be possible here but with linalg.norm?
Edit 1:
I followed #hpaulj's advice to look at the code used for add.reduce. Getting a better understanding of the method I was able to search more precisely and I found np.apply_along_axis which is exactly what I was looking for:
In []: np.apply_along_axis(npla.norm, 1, arr)
Out []: array([[ 3.60555128, 3. , 6.164414 , 7.81024968],
[ 2.44948974, 8.60232527, 9.48683298, 13.45362405],
[ 3.46410162, 7.68114575, 8.60232527, 7. ]])
However, this method is very slow. Is there a way to use linalg.nrom in a vectorized manner instead?
Edit 2:
#QuangHoang's first comment is actually the correct answer I was looking for. I misunderstood the method which is why I misunderstood their comment. Specifying the axis in the linalg.norm call is what is required here:
np.linalg.norm(arr,axis=1)

How to index a numpy array of dimension N with a 1-dimensional array of shape (N,)

I would like to index an array of dimension N using an array of size (N,).
For example, let us consider a case where N is 2.
import numpy as np
foo = np.arange(9).reshape(3,3)
bar = np.array((2,1))
>>> foo
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>>bar
array([2, 1])
>>>foo[bar[0],bar[1]]
7
This works fine. However, with this method, I would need to write N times bar[i], which is not a nice solution if N is high.
The following command does not give the result that I need:
>>>foo[bar]
array([[6, 7, 8],
[3, 4, 5]])
What could I do to get the result that I want in a nice and concise way?
I think you can turn bar into tuple:
foo[tuple(bar)]
# 7

Binary Tree search an unsorted matrix?

is there a way to Binary Tree search an unsorted matrix? If yes, could you explain it as I am new to programming? I have tried implementing it using nested for i for j for loops but was wondering if there is a faster way.
import numpy as np
matrix = [[3, 6, 7], [9, 1, 2], [8, 4, 5]]
matrix = np.array(matrix)
matrix
array([[3, 6, 7],
[9, 1, 2],
[8, 4, 5]]) # how does one perform a binary tree search on an unsorted matrix?
You can use np.where for this.
matrix = np.array([[3,6,7],[9,1,2],[8, 8, 8]])
dim_1, dim_2 = np.where(matrix == 8)
#dim_1 = array([2, 2, 2], dtype=int64)
#dim_2 = array([0, 1, 2], dtype=int64)
#dim_1, dim_2, dim_3 = np.where(matrix == 8) if matrix had shape (3, )
num_8 = len(ret[0]) #total number of 8's
np.where returns a tuple of arrays separated by indexes based on the shape of your array. If you have a 3D array you will get 3 arrays in your tuple.
ret = (array([2, 2, 2], dtype=int64), array([0, 1, 2], dtype=int64))
ret[0] corresponds to the row values, and ret[1] corresponds to the column values.
So this means that the element 8 is present in matrix[2][0], matrix[2][1], matrix[2][2]
Does that help? You won't have to write your own routine for this. Pretty sure this will be faster than any search routine you will implement in pure python because NumPy built-in functions are highly optimized. You should consider using NumPy methods for NumPy arrays wherever possible.

NumPy using the reshape function to reshape an array [duplicate]

This question already has an answer here:
how to reshape an N length vector to a 3x(N/3) matrix in numpy using reshape
(1 answer)
Closed 2 years ago.
I have an array: [1, 2, 3, 4, 5, 6]. I would like to use the numpy.reshape() function so that I end up with this array:
[[1, 4],
[2, 5],
[3, 6]
]
I'm not sure how to do this. I keep ending up with this, which is not what I want:
[[1, 2],
[3, 4],
[5, 6]
]
These do the same thing:
In [57]: np.reshape([1,2,3,4,5,6], (3,2), order='F')
Out[57]:
array([[1, 4],
[2, 5],
[3, 6]])
In [58]: np.reshape([1,2,3,4,5,6], (2,3)).T
Out[58]:
array([[1, 4],
[2, 5],
[3, 6]])
Normally values are 'read' across the rows in Python/numpy. This is call row-major or 'C' order. Read down is 'F', for FORTRAN, and is common in MATLAB, which has Fortran roots.
If you take the 'F' order, make a new copy and string it out, you'll get a different order:
In [59]: np.reshape([1,2,3,4,5,6], (3,2), order='F').copy().ravel()
Out[59]: array([1, 4, 2, 5, 3, 6])
You can set the order in np.reshape, in your case you can use 'F'. See docs for details
>>> arr
array([1, 2, 3, 4, 5, 6])
>>> arr.reshape(-1, 2, order = 'F')
array([[1, 4],
[2, 5],
[3, 6]])
The reason that you are getting that particular result is that arrays are normally allocates in C order. That means that reshaping by itself is not sufficient. You have to tell numpy to change the order of the axes when it steps along the array. Any number of operations will allow you to do that:
Set the axis order to F. F is for Fortran, which, like MATLAB, conventionally uses column-major order:
a.reshape(2, 3, order='F')
Swap the axes after reshaping:
np.swapaxes(a.reshape(2, 3), 0, 1)
Transpose the result:
a.reshape(2, 3).T
Roll the second axis forward:
np.rollaxis(a.reshape(2, 3), 1)
Notice that all but the first case require you to reshape to the transpose.
You can even manually arrange the data
np.stack((a[:3], a[3:]), axis=1)
Note that this will make many unnecessary copies. If you want the data copied, just do
a.reshape(2, 3, order='F').copy()

numpy: how to construct a matrix of vectors from vector of matrix

I'm new to numpy,
so, with numpy, is it possible to use a vector of matrix to get a matrix of vectors"
for example:
matrix1(
[
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]
])
matrix2(
[
[2, 4, 6],
[2, 4, 6],
[2, 4, 6]
])
-->
matrix(
[
[array('1 2'), array('2 4'), array('3 6')],
[array('1 2'), array('2 4'), array('3 6')],
[array('1 2'), array('2 4'), array('3 6')]
])
I'm new to numpy, so I'm not sure if it is allowed to put any thing in numpy's matrix or just numbers.
And it's not easy to get answer from google with descriptions like "matrix of vectors and vectors of matrix"
numpy doesn't have a concept of "vector" separate from "matrix." It does have distinct concepts of "matrix" and "array," but most people avoid the matrix representation entirely. If you use arrays, the concepts of "vector," "matrix," and "tensor" are all subsumed under the general concept of an array's "shape" attribute.
In this worldview, vectors and matrices are both 2-dimensional arrays, distinguished only by their shape. Row vectors are arrays with the shape (1, n), while column vectors are arrays with the shape (n, 1). Matrices are arrays with the shape (n, m). 1-dimensional arrays can behave like vectors sometimes, depending on context, but often you'll find that you won't get what you want unless you "upgrade" them.
With all that in mind, here's one possible answer to your question. First, we create a 1-d array:
>>> a1d = numpy.array([1, 2, 3])
>>> a1d
array([1, 2, 3])
Now we reshape it to create a column vector. The -1 here tells numpy to figure out the right size given the input.
>>> vcol = a1d.reshape((-1, 1))
>>> vcol
array([[1],
[2],
[3]])
Observe the doubled brackets at the beginning and ending of this. That's a subtle cue that this is a 2-d array, even though one dimension has a size of just 1.
We can do the same thing, swapping the dimensions, to get a row. Note again the doubled brackets.
>>> vrow = a1d.reshape((1, -1))
>>> vrow
array([[1, 2, 3]])
You can tell that these are 2-d arrays, because a 1-d array would have only one value in its shape tuple:
>>> a1d.shape
(3,)
>>> vcol.shape
(3, 1)
>>> vrow.shape
(1, 3)
To build a matrix from column vectors we can use hstack. There are lots of other methods that may be faster, but this is a good starting point. Here, note that [vcol] is not a numpy object, but an ordinary python list, so [vcol] * 3 means the same thing as [vcol, vcol, vcol].
>>> mat = numpy.hstack([vcol] * 3)
>>> mat
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
And vstack gives us the same thing from row vectors.
>>> mat2 = numpy.vstack([vrow] * 3)
>>> mat2
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
It's unlikely that any other interpretation of "construct a matrix of vectors from vector of matrix" will generate something you actually want in numpy!
Since you mention wanting to do linear algebra, here are a couple of operations that are possible. This assumes you're using a recent-enough version of python to use the new # operator, which provides an unambiguous inline notation for matrix multiplication of arrays.1
For arrays, multiplication is always element-wise. But sometimes there is broadcasting. For values with the same shape, it's plain element-wise multiplication:
>>> vrow * vrow
array([[1, 4, 9]])
>>> vcol * vcol
array([[1],
[4],
[9]])
When values have different shapes, they are broadcast together if possible to produce a sensible result:
>>> vrow * vcol
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
>>> vcol * vrow
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
Broadcasting works in the way you'd expect for other shapes as well:
>>> vrow * mat
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
>>> vcol * mat
array([[1, 1, 1],
[4, 4, 4],
[9, 9, 9]])
If you want a dot product, you have to use the # operator:
>>> vrow # vcol
array([[14]])
Note that unlike the * operator, this is not symmetric:
>>> vcol # vrow
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
This can be a bit confusing at first, because this looks the same as vrow * vcol, but don't be fooled. * will produce the same result regardless of argument order. Finally, for a matrix-vector product:
>>> mat # vcol
array([[ 6],
[12],
[18]])
Observe again the difference between # and *:
>>> mat * vcol
array([[1, 1, 1],
[4, 4, 4],
[9, 9, 9]])
1. Sadly, this only exists as of Python 3.5. If you need to work with an earlier version, all the same advice applies, except that instead of using inline notation for a # b, you have to use np.dot(a, b). numpy's matrix type overrides * to behave like #... but then you can't do element-wise multiplication or broadcasting the same way! So even if you have an earlier version, I don't recommend using the matrix type.

Categories