Is there a better way to do this? Not necessarily prettier, although it would be nice.
P = [N,3] ‘Cloud of points’
P -= np.sum(P, axis=0) / P.shape[0]
Map = [i for i in range(P.shape[0])]
p_0 = Map[P[:,0] <= 0]
p_1 = Map[P[:,0] > 0]
p_0_0 = p_0[P[p_0,1] <= 0]
p_0_1 = p_0[P[p_0,1] > 0]
p_1_0 = p_1[P[p_1,1] <= 0]
p_1_1 = p_1[P[p_1,1] > 0]
p_0_0_0 = p_0_0[P[p_0_0,2] <= 0]
p_0_0_1 = p_0_0[P[p_0_0,2] > 0]
p_0_1_0 = p_0_1[P[p_0_1,2] <= 0]
p_0_1_1 = p_0_1[P[p_0_1,2] > 0]
p_1_0_0 = p_1_0[P[p_1_0,2] <= 0]
p_1_0_1 = p_1_0[P[p_1_0,2] > 0]
p_1_1_0 = p_1_1[P[p_1_1,2] <= 0]
p_1_1_1 = p_1_1[P[p_1_1,2] > 0]
Or in other words, is there a way to compound conditions like,
Oct_0_0_0 = Map[P[:,0] <= 0 and P[:,1] <= 0 and P[:,2] <= 0]
I’m assuming a loop won’t be better than this… not sure.
Thanks in advance.
Instead of repeatedly slicing and keeping lists of the indices, I'd instead recommend creating a single array that maps the index of the point to the octant it belongs to. I'd argue that this is a more natural way of doing it in numpy. So for instance with
octants = (P>0) # 2**np.arange(P.shape[2])
the n-th entry of octants is the index (here in the range 0,1,2,...,7) of the octant that P belongs to. This works by checking each coordinate whether it is positive or not. This gives three boolean values per point which we can interpret as the binary expansion of that index. (In fact, the line above works for indexing the 2^d-ants in any number of dimensions d.)
To demonstrate this solution, following snippet makes a point cloud and colours them according to their quadrant:
import numpy as np
P = np.random.rand(10000, 3)*2-1
quadrants = (P > 0) # 2**np.arange(3)
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
ax.scatter(*P.T, c=quadrants, cmap='jet')
plt.show()
If you still need to extract an array of the indices of a specific quadrant, say quadrant 0, 1, 1 this corresponds to finding the corresponding decimal number, which is 0*2^0 + 1*2^1 + 1*2^2 = 6, which you can do with e.g.
p_0_1_1 = np.where(quadrants == np.array([0,1,1]) # 2**np.arange(3))
All in all, I’m going with an even messier version,
p_x = P[:,0] < 0
p_0 = np.nonzero(p_x)[0]
p_1 = np.nonzero(~p_x)[0]
p_0_y = P[p_0,1] < 0
p_1_y = P[p_1,1] < 0
p_0_0 = p_0[p_0_y]
p_0_1 = p_0[~p_0_y]
p_1_0 = p_1[p_1_y]
p_1_1 = p_1[~p_1_y]
p_0_0_z = P[p_0_0,2] < 0
p_0_1_z = P[p_0_1,2] < 0
p_1_0_z = P[p_1_0,2] < 0
p_1_1_z = P[p_1_1,2] < 0
p_0_0_0 = p_0_0[p_0_0_z]
p_0_0_1 = p_0_0[~p_0_0_z]
p_0_1_0 = p_0_1[p_0_1_z]
p_0_1_1 = p_0_1[~p_0_1_z]
p_1_0_0 = p_1_0[p_1_0_z]
p_1_0_1 = p_1_0[~p_1_0_z]
p_1_1_0 = p_1_1[p_1_1_z]
p_1_1_1 = p_1_1[~p_1_1_z]
I have a religious belief (I mean based on thin air) that a comparison is fundamentally cheaper than an arithmetic operation.
Related
I am trying to plot a matrix in Python. So, my initial thought was using matshow.
However, this particular matrix develop over time via an algorithm (the function sandpile below), so I need to show how the matrix develop over time - but in the same plot. The end result is sort of an animation. Any ideas as to how that is done ? The code below only produce one graph, and that is a picture of the very last updated matrix (matrix called abba, below).
Thank you in advance.
import numpy as np
import matplotlib.pyplot as plt
dimension = 3
abba = np.matrix( [ [2,5,2], [1,1000,4], [2,1,2] ] )
def sandpile(field):
greater3 = np.where(field > 3)
left = (greater3[0], greater3[1]-1)
right = (greater3[0], greater3[1]+1)
top = (greater3[0] - 1, greater3[1])
bottom = (greater3[0]+1 , greater3[1])
bleft = left[0][np.where(left[1] >= 0)], left[1][np.where(left[1] >= 0)]
bright = right[0][np.where(right[1] < dimension)], right[1][np.where(right[1] < dimension)]
btop = top[0][np.where(top[0] >= 0)], top[1][np.where(top[0] >= 0)]
bbottom = bottom[0][np.where(bottom[0] < dimension)], bottom[1][np.where(bottom[0] < dimension)]
field[greater3] -= 4
field[bleft] += 1
field[bright] += 1
field[btop] += 1
field[bbottom] += 1
return (field)
print(abba)
matfig = plt.figure(figsize=(3,3))
plt.matshow(abba, fignum=matfig.number)
n = 0
while (abba < 4).all() == False:
abba = sandpile(abba)
plt.matshow(abba, fignum=matfig.number)
n += 1
print('Exit with',n,'steps')
print(abba)
This is one way you can see an updating plot in a loop:
...
matfig = plt.figure(figsize=(3,3))
ax1 = matfig.add_subplot(1, 1, 1)
ax_image = ax1.imshow(abba)
plt.show(block=False)
n = 0
while (abba < 4).all() == False:
abba = sandpile(abba)
ax_image.remove()
ax_image = ax1.imshow(abba)
matfig.canvas.draw()
matfig.canvas.flush_events()
n += 1
print(n)
print('Exit with',n,'steps')
print(abba)
plt.show(block=True)
In your example changes happens at the very end of the loop.
I am trying to teach my students about Chi-Square while trapped here at home. I have made a video that should be mostly helpful, however I have been having trouble making a graph with the specific properties of the Chi-Square distribution. The shape is right, however there is a lot of noise. This is simulation data, so it will never be perfectly smooth, however this is a bit much.
I have been trying to smooth the data. I have gone as far as to round the data to the nearest tenth and perform a moving average (k = 3) in order to get a graph as presentable as this:
Chi-Squared Simulation df = 3, sample size = 100, samples = 100000, rounded and smoothed
Chi-Squared Simulation df = 3, sample size = 100, samples = 100000, not rounded, smoothed
A few things I have noticed while working on this problem. First, the spikes and dips seem to occur at predictable locations. Second, without the rounding, the graph seems to alternate back and forth between a spike and dip regularly. I think it may be possible that this is be due to some sort of binary precision problem. I have tried to account for this by switching to using numpy for my operations and forcing the data to be float64. This had no effect.
What I would like to know is either:
If this problem is caused by binary precision, how can I properly mitigate that?
If this cannot be solved in that way, is there a better smoothing operation I could use?
Thank you for the assistance. Code is below.
# Draw n samples of 25 and get Chi-Square list
chiSqrList = []
n = 100000
sampleSize = 100
j = 0
while j < n:
redTotal = 0
greenTotal = 0
yellowTotal = 0
blueTotal = 0
i = 0
while i < sampleSize:
x = random.random()
if x < redLim:
redTotal += 1
elif x < greenLim:
greenTotal += 1
elif x < yellowLim:
yellowTotal +=1
else:
blueTotal += 1
i += 1
observedBalls = np.array([redTotal, greenTotal, yellowTotal, blueTotal], dtype=np.float64)
expectedBalls = np.array([sampleSize*redBalls, sampleSize*greenBalls, sampleSize*yellowBalls, sampleSize*blueBalls], dtype=np.float64)
chiSqr = 0
chiSqr = np.power((observedBalls - expectedBalls), 2)/expectedBalls
chiSqr = np.sum(chiSqr)
chiSqr = round(chiSqr, 1)
chiSqrList.append(chiSqr)
j += 1
# Make count data
avgSqrDist = []
count = []
i = 0
for value in chiSqrList:
if len(avgSqrDist) == 0:
avgSqrDist.append(value)
count.append(1)
elif avgSqrDist[i] != value:
avgSqrDist.append(value)
count.append(1)
i += 1
else:
count[i] += 1
# Smooth curve
i = 0
smoothAvgSqrDist = []
smoothCount = []
while i < len(avgSqrDist)-2:
smoothCount.append((count[i]+count[i+1]+count[i+2])/3)
smoothAvgSqrDist.append(avgSqrDist[i+1])
i += 1
I am looking for a method in which I can smooth a scattered dataset. The scattered dataset comes from sampling a very large array that represents a raster. I have to vectorize this array in order to downsample it. I have done so using the matplotlib.pyplot.contour() function, and I get reasonable set of point value pairs.
The problem is that this signal is noisy, and I need to smooth it. Smoothing the original array is no good, I need to smooth the scattered data. The best I could find is the function below, which I rewrote from a Matlab counterpart. While this function does the job, it is very slow. I am looking either for alternative functions to smooth this data or a way to make the function below faster.
def limgrad(self, triangulation, values, dfdx, imax=100):
"""
See https://github.com/dengwirda/mesh2d/blob/master/hjac-util/limgrad.m
for original source code.
"""
# triangulation is a matplotlib.tri.Triangulation instance
edge = triangulation.edges
dx = np.subtract(
triangulation.x[edge[:, 0]], triangulation.x[edge[:, 1]])
dy = np.subtract(
triangulation.y[edge[:, 0]], triangulation.y[edge[:, 1]])
elen = np.sqrt(dx**2+dy**2)
aset = np.zeros(values.shape)
ftol = np.min(values) * np.sqrt(np.finfo(float).eps)
for i in range(1, imax + 1):
aidx = np.where(aset == i-1)[0]
if len(aidx) == 0.:
break
active_idxs = np.argsort(values[aidx])
for active_idx in active_idxs:
adj_edges_idxs = np.where(
np.any(edge == active_idx, axis=1))[0]
adjacent_edges = edge[adj_edges_idxs]
for nod1, nod2 in adjacent_edges:
if values[nod1] > values[nod2]:
fun1 = values[nod2] + elen[active_idx] * dfdx
if values[nod1] > fun1+ftol:
values[nod1] = fun1
aset[nod1] = i
else:
fun2 = values[nod1] + elen[active_idx] * dfdx
if values[nod2] > fun2+ftol:
values[nod2] = fun2
aset[nod2] = i
return values
I found the answer to my own question and I am posting here for reference. The algorith above is slow because calling np.where() to generate adj_edges_idxs has a heavy overhead. Instead, I precompute the node neighbors and that eliminates the overhead. It went from ~80 iterations per second to 80,000 it/s.
The final version looks like this:
def limgrad(tri, values, dfdx=0.2, imax=100):
"""
See https://github.com/dengwirda/mesh2d/blob/master/hjac-util/limgrad.m
for original source code.
"""
xy = np.vstack([tri.x, tri.y]).T
edge = tri.edges
dx = np.subtract(xy[edge[:, 0], 0], xy[edge[:, 1], 0])
dy = np.subtract(xy[edge[:, 0], 1], xy[edge[:, 1], 1])
elen = np.sqrt(dx**2+dy**2)
ffun = values.flatten()
aset = np.zeros(ffun.shape)
ftol = np.min(ffun) * np.sqrt(np.finfo(float).eps)
# precompute neighbor table
point_neighbors = defaultdict(set)
for simplex in tri.triangles:
for i, j in permutations(simplex, 2):
point_neighbors[i].add(j)
# iterative smoothing
for _iter in range(1, imax+1):
aidx = np.where(aset == _iter-1)[0]
if len(aidx) == 0.:
break
active_idxs = np.argsort(ffun[aidx])
for active_idx in active_idxs:
adjacent_edges = point_neighbors[active_idx]
for adj_edge in adjacent_edges:
if ffun[adj_edge] > ffun[active_idx]:
fun1 = ffun[active_idx] + elen[active_idx] * dfdx
if ffun[adj_edge] > fun1+ftol:
ffun[adj_edge] = fun1
aset[adj_edge] = _iter
else:
fun2 = ffun[adj_edge] + elen[active_idx] * dfdx
if ffun[active_idx] > fun2+ftol:
ffun[active_idx] = fun2
aset[active_idx] = _iter
flag = _iter < imax
return ffun, flag
I'm doing aperture photometry on a cluster of stars, and to get easier detection of background signal, I want to only look at stars further apart than n pixels (n=16 in my case).
I have 2 arrays, xs and ys, with the x- and y-values of all the stars' coordinates:
Using np.where I'm supposed to find the indexes of all stars, where the distance to all other stars is >= n
So far, my method has been a for-loop
import numpy as np
# Lists of coordinates w. values between 0 and 2000 for 5000 stars
xs = np.random.rand(5000)*2000
ys = np.random.rand(5000)*2000
# for-loop, wherein the np.where statement in question is situated
n = 16
for i in range(len(xs)):
index = np.where( np.sqrt( pow(xs[i] - xs,2) + pow(ys[i] - ys,2)) >= n)
Due to the stars being clustered pretty closely together, I expected a severe reduction in data, though even when I tried n=1000 I still had around 4000 datapoints left
Using just numpy (and part of the answer here)
X = np.random.rand(5000,2) * 2000
XX = np.einsum('ij, ij ->i', X, X)
D_squared = XX[:, None] + XX - 2 * X.dot(X.T)
out = np.where(D_squared.min(axis = 0) > n**2)
Using scipy.spatial.pdist
from scipy.spatial import pdist, squareform
D_squared = squareform(pdist(x, metric = 'sqeuclidean'))
out = np.where(D_squared.min(axis = 0) > n**2)
Using a KDTree for maximum fast:
from scipy.spatial import KDTree
X_tree = KDTree(X)
in_radius = np.array(list(X_tree.query_pairs(n))).flatten()
out = np.where(~np.in1d(np.arange(X.shape[0]), in_radius))
np.random.seed(seed=1)
xs = np.random.rand(5000,1)*2000
ys = np.random.rand(5000,1)*2000
n = 16
mask = (xs>=0)
for i in range(len(xs)):
if mask[i]:
index = np.where( np.sqrt( pow(xs[i] - x,2) + pow(ys[i] - y,2)) <= n)
mask[index] = False
mask[i] = True
x = xs[mask]
y = ys[mask]
print(len(x))
4220
You can use np.subtract.outer for creating the pairwise comparisons. Then you check for each row whether the distance is below 16 for exactly one item (which is the comparison with the particular start itself):
distances = np.sqrt(
np.subtract.outer(xs, xs)**2
+ np.subtract.outer(ys, ys)**2
)
indices = np.nonzero(np.sum(distances < 16, axis=1) == 1)
I tried to optimize the code below but I cannot figure out how to improve computation speed. I tried Cthon but the performance is like in python.
Is it possible to improve the performance without rewrite everything in C/C++?
Thanks for any help
import numpy as np
heightSequence = 400
widthSequence = 400
nHeights = 80
DOF = np.zeros((heightSequence, widthSequence), dtype = np.float64)
contrast = np.float64(np.random.rand(heightSequence, widthSequence, nHeights))
initDOF = np.zeros([heightSequence, widthSequence], dtype = np.float64)
initContrast = np.zeros([heightSequence, widthSequence, nHeights], dtype = np.float64)
initHeight = np.float64(np.r_[0:nHeights:1.0])
initPixelContrast = np.array(([0 for ii in range(nHeights)]), dtype = np.float64)
# for each row
for row in range(heightSequence):
# for each col
for col in range(widthSequence):
# initialize variables
height = initHeight # array ndim = 1
c = initPixelContrast # array ndim = 1
# for each height
for indexHeight in range(0, nHeights):
# get contrast profile for current pixel
tempC = contrast[:, :, indexHeight]
c[indexHeight] = tempC[row, col]
# save original contrast
# originalC = c
# originalHeight = height
# remove profile before maximum and after minumum contrast
idxMaxContrast = np.argmax(c)
c = c[idxMaxContrast:]
height = height[idxMaxContrast:]
idxMinContrast = np.argmin(c) + 1
c = c[0:idxMinContrast]
height = height[0:idxMinContrast]
# remove some refraction
if (len(c) <= 1) | (np.max(c) <= 0):
DOF[row, col] = 0
else:
# linear fitting of profile contrast
P = np.polyfit(height, c, 1)
m = P[0]
q = P[1]
# remove some refraction
if m >= 0:
DOF[row, col] = 0
else:
DOF[row, col] = -q / m
print 'row=%i/%i' %(row, heightSequence)
# set range of DOF
DOF[DOF < 0] = 0
DOF[DOF > nHeights] = 0
By looking at the code it seems that you can get rid of the two outer loops completely, converting the code to a vectorised form. However, the np.polyfit call must then be replaced by some other expression, but the coefficients for a linear fit are easy to find, also in vectorised form. The last if-else can then be turned into a np.where call.