For example, I have function:
f1 = lambda x: x % 2
If I want to modify array = np.linspace(0, 5, 6) I can do f1(array). Everything works as expected:
[0. 1. 0. 1. 0. 1.]
If I change function to:
f2 = lambda x: 0
print(f2(array))
gives me 0 while I expected [0. 0. 0. 0. 0. 0.]. How to achieve consistency?
You can use below code to achieve desirable output
import numpy as np
array = np.linspace(0, 5, 6)
f2 = lambda x: x-x
print(f2(array))
Slightly more explicit than previous answer :
import numpy as np
array = np.linspace(0, 5, 6)
f2 = lambda x: np.zeros_like(x)
print(f2(array))
Documentation for numpy.zeros_like: Return an array of zeros with the same shape and type as a given array.
To iterate over an array, evaluate the function for every element, then store it to a resulting array, a list iterator works consistently:
import numpy as np
array = np.linspace(0, 5, 6)
f1 = lambda x: x % 2
f2 = lambda x: 0
print ([f1(x) for x in array])
[0.0, 1.0, 0.0, 1.0, 0.0, 1.0]
print ([f2(x) for x in array])
[0, 0, 0, 0, 0, 0]
Related
I'm looking to compute the ECDF and am using this statsmodels function:
from statsmodels.distributions.empirical_distribution import ECDF
Looks good at first:
ECDF(np.array([0,1,2,3, 3, 3]))(np.array([0,1,2,3, 3,3]))
array([0.16666667, 0.33333333, 0.5 , 1. , 1. ,
1. ])
However, nan seems to be treated as infinity:
>>> x = np.array([0,1,2,3, np.nan, np.nan])
>>> ECDF(x)(x)
array([0.16666667, 0.33333333, 0.5 , 0.66666667, 1. ,
1. ])
Same as:
np.array([0,1,2,3, np.inf, np.inf])
ECDF(x)(x)
array([0.16666667, 0.33333333, 0.5 , 0.66666667, 1. ,
1. ])
Comparing with R:
> x <- c(0,1,2,3,NA,NA)
> x
[1] 0 1 2 3 NA NA
> ecdf(x)(x)
[1] 0.25 0.50 0.75 1.00 NA NA
What's the standard python function for ecdf that is nan aware?
Hot-wiring like so does not seem to work:
def ecdf(x):
return np.where(~np.isfinite(x),
np.full_like(x, np.nan),
ECDF(x[np.isfinite(x)])(x[np.isfinite(x)]))
ecdf(x)
ECDF(x[np.isfinite(x)])(x[np.isfinite(x)]))
File "<__array_function__ internals>", line 6, in where
ValueError: operands could not be broadcast together with shapes (7,) (7,) (4,)
The source code of statsmodel's ECDF is pleasantly brief (after stripping comments):
class ECDF(StepFunction):
def __init__(self, x, side='right'):
x = np.array(x, copy=True)
x.sort()
nobs = len(x)
y = np.linspace(1./nobs,1,nobs)
super(ECDF, self).__init__(x, y, side=side, sorted=True)
Sorting the input samples via x.sort() will move all the np.nan valued elements to the end even after np.inf, which is why they appear to be treated as infinity
bar=np.array([1, np.nan, 2, np.inf, 3])
bar.sort()
# bar is now array([ 1., 2., 3., inf, nan])
The reason np.nan isn't propagated is because ECDF's parent class uses np.searchsorted to find the correct index and then looks it up in y. For np.nan this is simply the last element of the array and a subsequent lookup of self.y will return 1 for this case.
You can make it propagate np.nan with a simple change, which you can realize as a subclass or sibling.
from statsmodels.distributions.empirical_distribution import StepFunction
import numpy as np
class MyECDF(StepFunction):
def __init__(self, x, side='right'):
x = np.sort(x)
# count number of non-nan's instead of length
nobs = np.count_nonzero(~np.isnan(x))
# fill the y values corresponding to np.nan with np.nan
y = np.full_like(x, np.nan)
y[:nobs] = np.linspace(1./nobs,1,nobs)
super(MyECDF, self).__init__(x, y, side=side, sorted=True)
This small change will make the function behave in a way similar to R:
>>> from foobar import MyECDF
>>> from statsmodels.distributions.empirical_distribution import ECDF
>>> import numpy as np
>>> x = np.array([0,1,2,3, np.nan, np.nan])
>>> ECDF(x)(x)
array([0.16666667, 0.33333333, 0.5 , 0.66666667, 1. ,
1. ])
>>> MyECDF(x)(x)
array([0.25, 0.5 , 0.75, 1. , nan, nan])
You can use a masked array:
import numpy.ma as ma
def ecdf(x):
return np.where(np.isnan(x),
np.full_like(x, np.nan),
ECDF(ma.array(x, mask=np.isnan(x)).compressed(), "right")(ma.array(x, mask=np.isnan(x))),
)
>>> ecdf(x)
array([0.25, 0.5 , 0.75, 1. , nan, nan])
Matches what R does natively.
I have come across some code (which may answer this question of mine). Here is the code (from Vivek Maskara's solution to my issue):
import cv2 as cv
import numpy as np
def read(image_path, label):
image = cv.imread(image_path)
image = cv.cvtColor(image, cv.COLOR_BGR2RGB)
image_h, image_w = image.shape[0:2]
image = cv.resize(image, (448, 448))
image = image / 255.
label_matrix = np.zeros([7, 7, 30])
for l in label:
l = l.split(',')
l = np.array(l, dtype=np.int)
xmin = l[0]
ymin = l[1]
xmax = l[2]
ymax = l[3]
cls = l[4]
x = (xmin + xmax) / 2 / image_w
y = (ymin + ymax) / 2 / image_h
w = (xmax - xmin) / image_w
h = (ymax - ymin) / image_h
loc = [7 * x, 7 * y]
loc_i = int(loc[1])
loc_j = int(loc[0])
y = loc[1] - loc_i
x = loc[0] - loc_j
if label_matrix[loc_i, loc_j, 24] == 0:
label_matrix[loc_i, loc_j, cls] = 1
label_matrix[loc_i, loc_j, 20:24] = [x, y, w, h]
label_matrix[loc_i, loc_j, 24] = 1 # response
return image, label_matrix
Would it be possible for you to explain how this part of the code works and what it specifically does:
if label_matrix[loc_i, loc_j, 24] == 0:
label_matrix[loc_i, loc_j, cls] = 1
label_matrix[loc_i, loc_j, 20:24] = [x, y, w, h]
label_matrix[loc_i, loc_j, 24] = 1 # response
I will first create and explain a simplified example, and then explain the part you pointed.
First, we create the ndarray named label_matrix:
import numpy as np
label_matrix = np.ones([2, 3, 4])
print(label_matrix)
This code means that you wil get an array containing 2 arrays, each of these 2 arrays will contain 3 arrays, and each of these 3 arrays will contain 4 elements.
And because we used np.ones, all these elements will have a value of 1.
So, printing label_matrix wil output this:
[[[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]]
[[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]]]
Now, we will change the values of first 4 elements of the first array contained by the first array of label_matrix.
To acces the first array of label_matrix, we do: label_matrix[0]
To access the first array contained by the first array of label_matrix we do: label_matrix[0, 0]
To access the first element of the first array contained by the first array of label_matrix we do: label_matrix[0, 0, 0]
To access the second element of the first array contained by the first array of label_matrix we do: label_matrix[0, 0, 1]
etc.
So, now, we will change the values of first 4 elements of the first array contained by the first array of label_matrix:
label_matrix[0, 0, 0] = 100
label_matrix[0, 0, 1] = 200
label_matrix[0, 0, 2] = 300
label_matrix[0, 0, 2] = 400
Output of label_matrix:
[[[100. 200. 300. 400.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]]
But we could have written it like this, instead of wrting 4 lines of codes:
label_matrix[0, 0, 0:4] = [100,200,300,400]
Writing label_matrix[0, 0, 0:4] means:
in the first array contained by the first array of label_matrix, select the 4 first elements (from index 0 to 4 (4 being not included))
So now you know the meaning of each line.
I'll explain the part of code you pointed:
if label_matrix[loc_i, loc_j, 24] == 0::
Test if the element at index 24 (the 23th element) has value 0
if yes, then:
label_matrix[loc_i, loc_j, cls] = 1:
assign the value 1 to the element at index cls. (If the variable named cls has value 4, it will assigne the value 1 to the element at index 4 of the first array contained by the first array of label_matrix)
label_matrix[loc_i, loc_j, 20:24] = [x, y, w, h]:
Say "x==100", "y==200", "w==300" and "h==400". So, in the first array contained by the first array of label_matrix, assign value 100 to the elemnt at index 20, value 200 to the elemnt at index 21, 300 at index 22 and 400 to index 23
label_matrix[loc_i, loc_j, 24] = 1:
in the first array contained by the first array of label_matrix, assign value 1 to the element at index 24
I'm using numpy in python , in order to create a nx1 matrix . I want the 1st element of the matrix to be 3 , the 2nd -1 , then the n-1 element -1 again and at the end the n element 3. All the in between elements , i.e. from element 3 to element n-2 should be 0. I've made a drawing of the mentioned matrix , is like this :
I'm fairly new to python and using numpy but seems like a great tool for managing matrices. What I've tried so far is creating the nx1 array (giving n some value) and initializing it to 0 .
import numpy as np
n = 100
I = np.arange(n)
matrix = np.row_stack(0*I)
print("\Matrix is \n",matrix)
Any clues to how i proceed? Or what routine to use ?
Probably the simplest way is to just do the following:
import numpy as np
n = 10
a = np.zeros(n)
a[0] = 3
a[1] = -1
a[len(a)-1] = 3
a[len(a)-2] = -1
>>print(a)
output: [ 3. -1. 0. 0. 0. 0. 0. 0. -1. 3.]
Hope this helps ;)
In [97]: n=10
In [98]: arr = np.zeros(n,int)
In [99]: arr[[0,-1]]=3; arr[[1,-2]]=-1
In [100]: arr
Out[100]: array([ 3, -1, 0, 0, 0, 0, 0, 0, -1, 3])
Easily changed to (n,1):
In [101]: arr[:,None]
Out[101]:
array([[ 3],
[-1],
[ 0],
[ 0],
[ 0],
[ 0],
[ 0],
[ 0],
[-1],
[ 3]])
I guess something that works is :
import numpy as np
n = 100
I = np.arange(n)
matrix = np.row_stack(0*I)
matrix[0]=3
matrix[1]=-1
matrix[n-2]=-1
matrix[n-1]=3
print("\Matrix is \n",matrix)
This question already has answers here:
Generalise slicing operation in a NumPy array
(4 answers)
Closed 5 years ago.
Here is some code I'm struggling with.
My goal is to create an array (db) from an existing one (t) , in db each line will represent a value of t. db will have 3 column, 1 for line index in t, 1 for column index in t and 1 for the value in t.
In my case, t was a distance matrix, thus diagonal was 0 and it was symetric, I replaced lower triangular values with 0. I don't need 0 values in the new array but I can just delete them in another step.
import numpy as np
t = np.array([[0, 2.5],
[0, 0]])
My goal is to obtain a new array such as :
db = np.array([[0, 0, 0],
[0, 1, 2.5],
[1, 0, 0],
[1, 1, 0]])
Thanks for your time.
You can create a meshgrid of 2D coordinates for the rows and columns, then unroll these into 1D arrays. You can then concatenate these two arrays as well as the unrolled version of t into one final matrix:
import numpy as np
(Y, X) = np.meshgrid(np.arange(t.shape[1]), np.arange(t.shape[0]))
db = np.column_stack((X.ravel(), Y.ravel(), t.ravel()))
Example run
In [9]: import numpy as np
In [10]: t = np.array([[0, 2.5],
...: [0, 0]])
In [11]: (Y, X) = np.meshgrid(np.arange(t.shape[1]), np.arange(t.shape[0]))
In [12]: db = np.column_stack((X.ravel(), Y.ravel(), t.ravel()))
In [13]: db
Out[13]:
array([[ 0. , 0. , 0. ],
[ 0. , 1. , 2.5],
[ 1. , 0. , 0. ],
[ 1. , 1. , 0. ]])
Given a threshold alpha and a numpy array a, there are multiple possibilities for finding the first index i such that arr[i] > alpha; see Numpy first occurrence of value greater than existing value:
numpy.searchsorted(a, alpha)+1
numpy.argmax(a > alpha)
In my case, alpha can be either a scalar or an array of arbitrary shape. I'd like to have a function get_lowest that works in both cases:
alpha = 1.12
arr = numpy.array([0.0, 1.1, 1.2, 3.0])
get_lowest(arr, alpha) # 2
alpha = numpy.array(1.12, -0.5, 2.7])
arr = numpy.array([0.0, 1.1, 1.2, 3.0])
get_lowest(arr, alpha) # [2, 0, 3]
Any hints?
You can use broadcasting:
In [9]: arr = array([ 0. , 1.1, 1.2, 3. ])
In [10]: alpha = array([ 1.12, -0.5 , 2.7 ])
In [11]: np.argmax(arr > np.atleast_2d(alpha).T, axis=1)
Out[11]: array([2, 0, 3])
To collapse multidimensional arrays, you can use np.squeeze, but you might have to do something special if you want a Python float in your first case:
def get_lowest(arr, alpha):
b = np.argmax(arr > np.atleast_2d(alpha).T, axis=1)
b = np.squeeze(b)
if np.size(b) == 1:
return float(b)
return b
searchsorted actually does the trick:
np.searchsorted(a, alpha)
The axis argument to argmax helps out; this
np.argmax(numpy.add.outer(alpha, -a) < 0, axis=-1)
does the trick. Indeed
import numpy as np
a = np.array([0.0, 1.1, 1.2, 3.0])
alpha = 1.12
np.argmax(np.add.outer(alpha, -a) < 0, axis=-1) # 0
np.searchsorted(a, alpha) # 0
alpha = np.array([1.12, -0.5, 2.7])
np.argmax(np.add.outer(alpha, -a) < 0, axis=-1) # [2 0 3]
np.searchsorted(a, alpha) # [2 0 3]