I have a (very large) number of data points, each consisting of an x and y coordinate and a sigma-uncertainty (sigma is the same in both x and y directions; all three variables are floats). For each data-point I want to generate a 2d array on a standard grid, with probabilities that the the actual value is in that location.
For instance if x=5.0, y=5.0, sigma=1.0, on a (0,0)->(9,9) grid, I expect to generate:
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0.01, 0.02, 0.01, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0.01, 0.06, 0.1 , 0.06, 0.01, 0. , 0. ],
[ 0. , 0. , 0. , 0.02, 0.1 , 0.16, 0.1 , 0.02, 0. , 0. ],
[ 0. , 0. , 0. , 0.01, 0.06, 0.1 , 0.06, 0.01, 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0.01, 0.02, 0.01, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ]]
Above was generated by creating a numpy array with zeroes, and [5,5] = 1, and then applying ndimage.filters.gaussian_filter with a sigma of 1. I feel that I can deal with non-integer x and y by distributing over nearby integer values and get a good approximation.
It feels however to be extreme overkill to get my resulting array this way, since scipy will have to take all values into account, not just the 1 in location [5, 5], even though they are all 0. It only takes 300us for a 64x64 grid, but still, I would likt to know if there is no more efficient way to get a X*Y numpy array with a gaussian kernel with arbitrary x, y and sigma.
A reasonably fast approach is to note that the Gaussian is separable, so you can calculate the 1D gaussian for x and y and then take the outer product:
import numpy as np
import matplotlib.pyplot as plt
x0, y0, sigma = 5.5, 4.2, 1.4
x, y = np.arange(9), np.arange(9)
gx = np.exp(-(x-x0)**2/(2*sigma**2))
gy = np.exp(-(y-y0)**2/(2*sigma**2))
g = np.outer(gx, gy)
g /= np.sum(g) # normalize, if you want that
plt.imshow(g, interpolation="nearest", origin="lower")
plt.show()
#tom10's outer product answer is probably the best for this particular case. If you want to make a kernal out of an arbitrary function in two (or more) dimensions, you may want to look at np.indices or np.meshgrid.
For example:
def gaussian(x, mu=0, sigma=1):
n = np.prod(sigma)*np.sqrt(2*np.pi)**len(x)
return np.exp(-0.5*(((x-mu)/sigma)**2).sum(0))/n
gaussian(np.indices((10,10)), mu=5, sigma=1)
Which yields:
array([[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0.001, 0.002, 0.001, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0.003, 0.013, 0.022, 0.013, 0.003, 0. , 0. ],
[ 0. , 0. , 0.001, 0.013, 0.059, 0.097, 0.059, 0.013, 0.001, 0. ],
[ 0. , 0. , 0.002, 0.022, 0.097, 0.159, 0.097, 0.022, 0.002, 0. ],
[ 0. , 0. , 0.001, 0.013, 0.059, 0.097, 0.059, 0.013, 0.001, 0. ],
[ 0. , 0. , 0. , 0.003, 0.013, 0.022, 0.013, 0.003, 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0.001, 0.002, 0.001, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ]])
For more flexibility, you can use np.meshgrid to control what the scale and scope of your domain is:
kern = gaussian(np.meshgrid(np.linspace(-10, 5), np.linspace(-2, 2)))
For this, kern.shape will be (50, 50) because 50 is the default length of np.linspace, and meshgrid is defining the x and y axes by the arrays passed to it. An equivalent way of doing this is np.mgrid[-10:5:50j, -2:2:50j]
Related
I have a matrix like this:
profile=np.array([[0,0,0.5,0.1],
[0.3,0,0,0],
[0,0,0.1,0.9],
[0,0,0,0.1],
[0,0.5,0,0]])
And I want to add a row before and after filled with zeros. How can I do that?
I thought of using np.pad but not sure how.
Output should be:
np.array([[0,0,0,0],
[0,0,0.5,0.1],
[0.3,0,0,0],
[0,0,0.1,0.9],
[0,0,0,0.1],
[0,0.5,0,0]
[0,0,0,0]])
The np.pad function allows you to specify the axes you want to pad:
In [3]: np.pad(profile, ((1, 1), (0, 0)))
Out[3]:
array([[0. , 0. , 0. , 0. ],
[0. , 0. , 0.5, 0.1],
[0.3, 0. , 0. , 0. ],
[0. , 0. , 0.1, 0.9],
[0. , 0. , 0. , 0.1],
[0. , 0.5, 0. , 0. ],
[0. , 0. , 0. , 0. ]])
The nested tuple can be read as: pad 1 array "above", and 1 array "below" axis 0, and pad 0 arrays "above" and 0 arrays "below" axis 1.
Another example, which pads five columns "after" on axis 1:
In [4]: np.pad(profile, ((0, 0), (0, 5)))
Out[4]:
array([[0. , 0. , 0.5, 0.1, 0. , 0. , 0. , 0. , 0. ],
[0.3, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0.1, 0.9, 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0.1, 0. , 0. , 0. , 0. , 0. ],
[0. , 0.5, 0. , 0. , 0. , 0. , 0. , 0. , 0. ]])
You can use np.pad:
out = np.pad(profile, 1)[:, 1:-1]
Output:
>>> out
array([[0. , 0. , 0. , 0. ],
[0. , 0. , 0.5, 0.1],
[0.3, 0. , 0. , 0. ],
[0. , 0. , 0.1, 0.9],
[0. , 0. , 0. , 0.1],
[0. , 0.5, 0. , 0. ],
[0. , 0. , 0. , 0. ]])
Because np.pad pads it on all sides (left and right, in addition to top and bottom), [:, 1:-1] slices off the first and last columns.
I am trying to create a graph from a numpy array using networkx but I get this error: networkx.exception.NetworkXError: ('Adjacency matrix is not square.', 'nx,ny=(10, 11)')
Someone know how to solve it?
My_Diz = {'X120213_1_0013_2_000004': array([[ 0. , 23.40378234, 30.29631001, 49.45217086,
53.47727757, 74.32949293, 73.27188558, 93.85556785,
132.31971186, 118.04532327, 88.1557181 ],
[ 0. , 0. , 34.41617904, 39.54024761,
34.25713329, 51.79037103, 51.33810652, 70.9900316 ,
109.76561471, 98.51724406, 69.76728919],
[ 0. , 0. , 0. , 26.66788605,
42.7133817 , 79.11779461, 65.88325262, 89.68664703,
125.91837789, 102.22926865, 71.58316322],
[ 0. , 0. , 0. , 0. ,
22.98401022, 65.5730092 , 44.53195174, 68.64071584,
102.34029705, 75.76571351, 45.22368742],
[ 0. , 0. , 0. , 0. ,
0. , 43.0377496 , 23.19245567, 47.19664886,
83.42653241, 65.0762151 , 35.66216118],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 30.28626571, 29.1448064 ,
64.72235299, 72.76481721, 56.93798086],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 24.18622881,
60.591058 , 49.69530936, 27.61846738],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
39.02763348, 46.26701103, 40.06206332],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 44.72240673, 62.0541588 ],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. , 30.69921172]])}
for k,v in My_Diz.items():
G = nx.from_numpy_matrix(v)
nx.draw(G)
Your Matrix is not square. You have to give networkx a square matrix.
Since the matrix is (n × n+1), and it is triangular, you can do that :
for k,v in My_Diz.items():
r, c = v.shape
M = np.zeros((c,c))
M[:r, :c] = v
M[:c, :r] += v.T
G = nx.from_numpy_matrix(M)
nx.draw(G)
I am trying to create anti-aliased (weighted and not boolean) circular masks for making circular kernels for use in convolution.
radius = 3 # no. of pixels to be 1 on either side of the center pixel
# shall be decimal as well; not the real radius
kernel_size = 9
kernel_radius = (kernel_size - 1) // 2
x, y = np.ogrid[-kernel_radius:kernel_radius+1, -kernel_radius:kernel_radius+1]
dist = ((x**2+y**2)**0.5)
mask = (dist-radius).clip(0,1)
print(mask)
and the output is
array([[1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. ],
[1. , 1. , 0.61, 0.16, 0. , 0.16, 0.61, 1. , 1. ],
[1. , 0.61, 0. , 0. , 0. , 0. , 0. , 0.61, 1. ],
[1. , 0.16, 0. , 0. , 0. , 0. , 0. , 0.16, 1. ],
[1. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 1. ],
[1. , 0.16, 0. , 0. , 0. , 0. , 0. , 0.16, 1. ],
[1. , 0.61, 0. , 0. , 0. , 0. , 0. , 0.61, 1. ],
[1. , 1. , 0.61, 0.16, 0. , 0.16, 0.61, 1. , 1. ],
[1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. ]])
Then we can do
mask = 1 - mask
print(mask)
to get
array([[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0.39, 0.84, 1. , 0.84, 0.39, 0. , 0. ],
[0. , 0.39, 1. , 1. , 1. , 1. , 1. , 0.39, 0. ],
[0. , 0.84, 1. , 1. , 1. , 1. , 1. , 0.84, 0. ],
[0. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 0. ],
[0. , 0.84, 1. , 1. , 1. , 1. , 1. , 0.84, 0. ],
[0. , 0.39, 1. , 1. , 1. , 1. , 1. , 0.39, 0. ],
[0. , 0. , 0.39, 0.84, 1. , 0.84, 0.39, 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ]])
I can now normalize and use this as my circular filter (kernel) in convolution operations.
Note: Radius can be decimal. Eg: get_circular_kernel(0.5,(5,5)) should give
array([[0. , 0. , 0. , 0. , 0. ],
[0. , 0.08578644, 0.5 , 0.08578644, 0. ],
[0. , 0.5 , 1. , 0.5 , 0. ],
[0. , 0.08578644, 0.5 , 0.08578644, 0. ],
[0. , 0. , 0. , 0. , 0. ]])
I want to generate a million of these at the very least, with the kernel_size fixed and radius changing, so is there a better or more efficient way to do this? (maybe without costly operations like sqrt and still stay accurate enough to arc integrals i.e., area covered by the curve in the particular pixel?)
Since you want to generate a large number of kernels with the same size, you can greatly improve performance by constructing every kernel in one step rather than one after the other in a loop. You can create a single array of shape (num_radii, kernel_size, kernel_size) given num_radii values for each kernel. The price of this vectorization is memory: you'll have to fit all these values in RAM, otherwise you should chunk up your millions of radii into a handful of smaller batches and generate each batch again separately.
The only thing you need to change is to take an array of radii (rather than a scalar radius), and inject two trailing singleton dimensions so that your mask creation triggers broadcasting:
import numpy as np
kernel_size = 9
kernel_radius = (kernel_size - 1) // 2
x, y = np.ogrid[-kernel_radius:kernel_radius+1, -kernel_radius:kernel_radius+1]
dist = (x**2 + y**2)**0.5 # shape (kernel_size, kernel_size)
# let's create three kernels for the sake of example
radii = np.array([3, 3.5, 4])[...,None,None] # shape (num_radii, 1, 1)
# using ... allows compatibility with arbitrarily-shaped radius arrays
masks = 1 - (dist - radii).clip(0,1) # shape (num_radii, kernel_size, kernel_size)
Now masks[0,...] (or masks[0] for short, but I prefer the explicit version) contains the example mask in your question, and masks[1,...] and masks[2,...] contain the kernels for radii 3.5 and 4, respectively.
If you want to build millions of masks, you should precompute once what never changes, and compute only the strict necessary for each radius.
You can try something like this:
class Circle:
def __init__(self, kernel_size):
self._kernel_size = kernel_size
self._kernel_radius = (self._kernel_size - 1) // 2
x, y = np.ogrid[
-self._kernel_radius:self._kernel_radius+1,
-self._kernel_radius:self._kernel_radius+1]
self._dist = np.sqrt(x**2 + y**2)
def __call__(self, radius):
mask = self._dist - radius
mask = np.clip(mask, 0, 1, out=mask)
mask *= -1
mask += 1
return mask
circle = Circle(kernel_size=9)
for radius in range(1, 4, 0.2):
mask = circle(radius)
print(mask)
I did the operations inplace as much as possible to optimize for speed and memory, but for small arrays it won't matter much.
Suppose we have an array with numbers between 0 and 1:
arr=np.array([ 0. , 0. , 0. , 0. , 0.6934264 ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.6934264 , 0. , 0.6934264 ,
0. , 0. , 0. , 0. , 0.251463 ,
0. , 0. , 0. , 0.87104906, 0.251463 ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.48419626,
0. , 0. , 0. , 0. , 0. ,
0.87104906, 0. , 0. , 0.251463 , 0.48419626,
0. , 0.251463 , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0.251463 , 0. , 0.35524532, 0. ,
0. , 0. , 0. , 0. , 0.251463 ,
0.251463 , 0. , 0.74209813, 0. , 0. ])
Using seaborn, I want to plot a distribution plot:
sns.distplot(arr, hist=False)
Which will give us the following figure:
As you can see, the kde estimation ranges from somewhere near -0.20 to 1.10. Is it possible to force the estimation to be between 0 and 1? I have tried the followings with no luck:
sns.distplot(arr, hist=False, hist_kws={'range': (0.0, 1.0)})
sns.distplot(arr, hist=False, kde_kws={'range': (0.0, 1.0)})
The second line raises an exception -- range not a valid keyword for kde_kws.
The correct way of doing this, is by using the clip keyword instead of range:
sns.distplot(arr, hist=False, kde_kws={'clip': (0.0, 1.0)})
which will produce:
Indeed, if you only care about the kde and not the histogram, you can use the kdeplot function, which will produce the same result:
sns.kdeplot(arr, clip=(0.0, 1.0))
Setting plt.xlim(0, 1) beforehand should help :
import matplotlib.pyplot as plt
plt.xlim(0, 1)
sns.distplot(arr, hist=False)
Let W be some matrix of dimension (x, nP) [see end of question]
Right now, I'm doing the following code:
uUpperDraw = np.zeros(W.shape)
for p in np.arange(0, nP):
uUpperDraw[s, p] = (W[s+1,:(p+1)]).sum()
I want to vectorize this for efficiency gains. Given a pGrid = [0, 1, ...], how can I reproduce the following?
uUpperDraw = np.array([sum(W[x, 0]), sum(W[x,0] + W[x, 1]), sum(W[x,0] + W[x, 1] + W[x, 2]) ...
Here is some reproducible example.
>>> s, nP
(3, 10)
>>> W
array([[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 2. , 1.63636364, 1.38461538, 1.2 , 1.05882353,
0.94736842, 0.85714286, 0.7826087 , 0.72 , 0.66666667]])
>>> uUpperDraw
array([[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. ],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. ],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. ],
[ 2. , 3.63636364, 5.02097902, 6.22097902,
7.27980255, 8.22717097, 9.08431383, 9.86692252,
10.58692252, 11.25358919],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. ]])
This looks like the cumulative sum. When you want to have the cumulative sum for each row seperately this here works
uUpperDraw = np.cumsum(W,axis=1)