Harmonic average over slices of three-dimensional numpy array - python

Given a three-dimensional array, I want to compute both the arithmetic and harmonic average over two-dimensional slices.
This can easily be done using numpy's arithmetic average:
import numpy as np
a = np.arange(5*3*3).reshape(5,3,3)
np.mean(a,axis=(1,2))
For the harmonic average, I have to slice the three-dimensional array myself.
I can do so along the first (0th) axis, for example:
from scipy import stats
b = a.reshape(np.shape(a)[0], -1)
stats.hmean(b,axis=1)
How do I have to reshape/slice my three-dimensional array to compute the average perpendicular to the other axes (that is, average over axes 0 and 2 or over axes 0 and 1)?
To clarify, the corresponding arithmetic averages are simply given by:
np.mean(a,axis=(0,2))
np.mean(a,axis=(0,1))

You can just stick to numpy and adapt your code to compute harmonic mean as follows-
1/np.mean(1/a, axis=(0,2))
1/np.mean(1/a, axis=(0,1))

Related

scipy.stats.multivariate_normal error: input matrix must be symmetric positive definite

i'm trying to compute the cumulative distribution function of a multivariate normal using scipy.
i'm having trouble with the "input matrix must be symmetric positive definite" error.
to my knowledge, a diagonal matrix with positive diagonal entries is positive definite (see page 1 problem 2)
However, for different (relatively) small values of these diagonal values, the error shows up for the smaller values.
For example, this code:
import numpy as np
from scipy.stats import multivariate_normal
std = np.array([0.001, 2])
mean = np.array([1.23, 3])
multivariate_normal(mean=mean, cov=np.diag(std**2)).cdf([2,1])
returns 0.15865525393145702
while changing the third line with:
std = np.array([0.00001, 2])
causes the error to show up.
i'm guessing that it has something to do with computation error of floats.
The problem is, when the dimension of the cov matrix is larger, the accepted positive values on the diagoanal are bigger and bigger.
I tried multiple values on the diagonal of the covariance matrix of dimension 9x9. It seems that when other diagonal values are very large, small values cause the error.
Examining the stack trace you will see that it assumes the condition number as
1e6*np.finfo('d').eps ~ 2.2e-10 in _eigvalsh_to_eps
In your example the difference the smaller eigenvalue is 5e-6**2 times smaller than the largest eigenvalue so it will be treated as zero.
You can pass allow_singular=True to get it working
import numpy as np
from scipy.stats import multivariate_normal
std = np.array([0.000001, 2])
mean = np.array([1.23, 3])
multivariate_normal(mean=mean, cov=np.diag(std**2), allow_singular=True).cdf([2,1])

Histogram of 2D arrays and determine array which contains highest and lowest values

I have a 2D array of shape 5 and 10. So 5 different arrays with 10 values. I am hoping to get a histogram and see which array is on the lower end versus higher end of a histogram. Hope that makes sense. I am attaching an image of an example of what I mean (labeled example).
Looking for one histogram but the histogram is organized by the distribution of the highest and lowest of each array.
I'm having trouble doing this with Python. I tried a few ways of doing this:
# setting up 2d array
import numpy as np
from scipy import signal
np.random.seed(1234)
array_2d = np.random.random((5,20))
I thought you could maybe just plot all the histograms of each array (5 of them) like this:
for i in range(5):
plt.hist(signal.detrend(array_2d[i,:],type='constant'),bins=20)
plt.show()
And then looking to see which array's histogram is furthest to the right or left, but not sure if that makes too much sense...
Then also considered using .ravel to make the 2D array into a 1D array which makes a nice histogram. But all the values within each array are being shifted around so it's difficult to tell which array is on the lower or higher end of the histogram:
plt.hist(signal.detrend(array_2d.ravel(),type='constant'),bins=20)
plt.xticks(np.linspace(-1,1,10));
How might I get a histogram of the 5 arrays (shape 5, 10) and get the range of the arrays with the lowest values versus array with highest values?
Also please let me know if this is unclear or not possible at all too haha. Thanks!
Maybe you could use a kdeplot? This would replace each input value with a small Gaussian curve and sum them.
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(1234)
array_2d = np.random.random((5, 20))
sns.kdeplot(data=pd.DataFrame(array_2d.T, columns=range(1, 6)), palette='Set1', multiple='layer')

Is there a way to select a subset of a Numpy 2D array using the Manhattan distance?

Say for example, I have a Numpy 2D array (7 rows, 7 columns) filled with zeros:
my_ array = numpy.zeros((7, 7))
Then for sake of argument say that I want to select the element in the middle and set its value to 1:
my_array[3,3] = 1
Now say that I have been given a Manhattan distance of 3, how do I subset my array to only select the elements that are less than or equal to the Manhattan distance (from the middle element) and set those elements to 1? The end result should be:
I could iterate through each element in the the 2D array but do not want to do this as this is too expensive, especially if my matrix is very large and the Manhattan distance is very small (for example 70x70 matrix with Manhattan distance of 10).
I would create an auxiliar matrix of size 2,n,n with meshgrid to almacenate the index, then substract the desired index center, sum absolute value of index substracted and put a threshold comparation. Here some example
import numpy as np
import matplotlib.pyplot as plt #to draw result
n=70 #size of matrix
distance = 10 #distance
centerCoord = [35,35]
#here i create a mesh matrix of indices
xarr = np.arange(n)
idxMat = np.meshgrid(xarr,xarr) #create a matrix [2,n,n]
pt = np.array(centerCoord).reshape(-1,1,1) #point of size [2,1,1]
elems = np.abs(idxMat-pt).sum(axis=0) <= distance
plt.matshow(elems)
the result:
If you need indices then call np.where that will return you 2 arrays (xindexList,yindexList)

Generate pixel density map (heatmap) from image with numpy array manipulation

The specific problem I try to solve is:
I have a binary image binary map that I want to generate a heatmap (density map) for, my idea is to get the 2D array of this image, let say it is 12x12
a = np.random.randint(20, size=(12, 12));
index and process it with a fixed-size submatrix (let say 3x3), so for every submatrix, a pixel percentage value will be calculated (nonzero pixels/total pixel).
submatrix = a[0:3, 0:3]
pixel_density = np.count_nonzero(submatrix) / submatrix.size
At the end, all the percentage values will made up a new 2D array (a smaller, 4x4 density array) that represent the density estimation of the original image. Lower resolution is fine because the data it will be compared to has a lower resolution as well.
I am not sure how to do that through numpy, especially for the indexing part. Also if there is a better way for generating heatmap like that, please let me know as well.
Thank you!
Maybe a 2-D convolution? Basically this will sweep through the a matrix with the b matrix, which is just 1s below. So it will do the summation you were looking for. This link has a visual demo of convolution near the bottom.
import numpy as np
from scipy import signal
a = np.random.randint(2, size=(12, 12))
b = np.ones((4,4))
signal.convolve2d(a,b, 'valid') / b.sum()

Sum a 3d numpy array for the third dimension only

I need the code in python
for example i have a numpy array sized (x,y,z)
i want to sum it into an array of (x,y), sum z only
z was an array of number, after sum he become a number to finaly get a 2d matrix
You can specify the axis on which the sum will be performed for the numpy function sum:
import numpy as np
res = np.sum(arr, axis=2)
# np.sum(arr, axis=-1) is equivalent in this case

Categories